We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

CA

PROFESSIONAL SUMMARY:

  • Over 7+ years of strong experience in Data Analyst, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Statistical modeling, Data modeling, Data Visualization, Web Crawling, Web Scraping. Adept in statistical programming languages like R and Python, SAS, Apache Spark, Matlab including Big Data technologies like Hadoop, Hive, Pig.
  • Evaluating technology stack for building Analytics solutions on cloud by doing research and finding right strategies, tools for building end to end analytics solutions and help designing technology roadmap for Data Ingestion, Data lakes, Data processing and Visualization.
  • Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Design and Developed real time streaming pipelines for sourcing data from IOT devices, defining strategy for data lakes, data flow, retention, aggregation, summarization for optimizing the performance of analytics products.
  • Extensive experience on Data analytics for satisfying Marketing Campaign.
  • Good knowledge on Hadoop Architecture and its ecosystem.
  • Having extensive knowledge on Hadoop technology experience in Storage, writing Queries, processing and analysis of data.
  • Strong understanding of NoSQL databases like Hbase.
  • Experience in Microsoft Azure data storage and Azure Data Factory, Azure Data Lake Store (ADLS), AWS S3, EC2 & Vault
  • Experience on migrating on Premises ETL process to Cloud.
  • Experience on working with Apache Nifi for Ingesting data into Bigdata from different source systems.
  • Experience in creating and loading data into Hive tables with appropriate static and dynamic partitions, intended for efficiency.
  • Worked on various Hadoop file formats like Parquet, ORC & AVRO file.
  • Experience in Data Warehousing applications, responsible for the Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse
  • Experience in optimizing Hive SQL quarries and Spark Jobs.
  • Implemented various frameworks like Data Quality Analysis, Data Governance, Data Trending, Data Validation and Data Profiling with the help of technologies like Bigadata,Datastage, Spark, Python, Mainframe with databases like Netezza and DB2,Hive & Snowflakes
  • Good knowledge of business process analysis and design, re - engineering, cost control, capacity planning, performance measurement and quality.
  • Having experience in delivering highly complex projects with Agile and Scrum methodology.
  • Quick learner and up-to-date with industry trends, Excellent written and oral communications, analytical and problem-solving skills and good team player, Ability to work independently and well-organized.

TECHNICAL SKILLS:

Programming Language: Python, SQL, PL/SQL

Hadoop/Big Data Technologies: Hadoop, HDFS, Map Reduce, Hive, Spark, PySpark

Cloud Technologies: AWS and Azure

Visualization: Microsoft Power BI, Tableau, Python (plotly)

IDEs/Utilities: Anaconda, Jupyter Notebook,Databricks, PL/SQL

Version Control: Git

WORK EXPERIENCE:

DATA ENGINEER

Confidential, CA.

Responsibilities:

  • Built a metadata driven ETL framework in Azure for building out enterprise data hubs to support customer360 and marketing insights efforts.
  • Maintained data pipeline up-time of 99.8% while ingesting streaming and transactional data across 8 different primary data sources using Spark, Azure Synapse Analytics, ADF, and Python.
  • Automated ETL processes across billions of rows of data, which reduced manual workload by 29% monthly and built batch processes for financial reporting applications and modules using shell scripts on Oracle database, with partitions and subpartitions.
  • Ingested data from disparate data sources using a combination of SQL, Google Analytics API, and Salesforce API using Python to create data views to be used in BI tools like Tableau.
  • Implemented data storage solutions and lead migration of legacy Data Warehouse systems from On-premise to Azure Data Lake Storage Gen2 using ADF and Apache Spark.
  • Develop framework for converting existing PowerCenter mappings and to PySpark(Python and Spark) Jobs.
  • Operated as a part of the development team for a data migration project to build SSIS packages, providing ongoing contributions to rapid development efforts.
  • Developed Custom UDFs for pig scripts and hive queries to implement business logic/complex analysis on the data.
  • To meet specific business requirements wrote UDFs in Scala and Pyspark
  • Reduced the latency of spark jobs by tweaking the spark configurations and performance and optimization techniques.
  • Responsible for loading structured and semi-structured data into Hadoop by creating static and dynamic partitions
  • Used Sqoop to import/export data from various RDBMS(Teradata/Oracle) to Hadoop Cluster
  • Worked with Avro, Parquet files formats and used various compression techniques to leverage the storage in HDFS
  • Designed and Developed ETL workflow using Oozie and automated them using Autosys.
  • Communicated with project managers and analysts about data pipelines that drove efficiency KPIs up by 26%

DATA ENGINEER

Confidential, Palo Alto, CA

Responsibilities:

  • Developed a key data pipeline to process over 500 TB of data by consolidating data from multiple disparate sources into a single destination, enabling quick data analysis for reliable business insights.
  • Delivered 50+ Big Data requests to increase data correctness, quality, and completeness, thereby allowing decisions made from Big Data analysis to occur faster.
  • Created Pyspark frame to bring data from DB2 to Amazon S3.
  • Provided guidance to development team working on PySpark as ETL platform
  • Collaborated with 25+ global technology professionals including product and data science teams on multiple projects including idea generation, implementation, testing, and success measurement.
  • Utilized strong managerial skills and experience to negotiate with vendors and coordinate tasks with a 25-person IT team.
  • Optimized the Pyspark jobs to run on Kubernetes Cluster for faster data processing
  • Implemented best practice processes and created cloud functions, applications, and databases that have improved data accuracy for decision-making by 35%.
  • Performed end-to-end architecture and implementation assessments of numerous AWS services (including Amazon Elastic MapReduce and Amazon Simple Storage Service) for 25+ clients.
  • Designed a dashboard for the systems and operations department to provide transparency of customer data to drive deliverables, the tool is used by 350+ staff members and decision-makers.
  • Trained and mentored 20+ extract, transform and load (ETL) and report designers and developers which resulted in staff retention of 35% in 2017

DATA ARCHITECT

Confidential

Responsibilities:

  • Successfully redesigned and automated the previously manual capacity budget forecasting process, which enhanced the efficiency and accuracy of the models by over 20%.
  • Architected transportation management OLTP system to support the logistics functionality of a standalone JAVA application.
  • Created SSIS Packages to migrate data to support analytics efforts.
  • Launched new UNIX shell scripts for high-level automation of data loading processes and expanded existing ones for better performance while executing test cases and following established procedures and methodologies resulting in time savings of 20 manual hours.
  • Executed Query optimization and PL/SQL tuning, working with the testing team to resolve bugs related to day one ETL mappings.
  • Proactively translated business requirements into creating and altering database objects, including tables, indexes, constraints, triggers, and stored procedures.

We'd love your feedback!