Azure Data Engineer Resume

SUMMARY

Over 9 years of experience as a Data Engineer and extensively worked with designing, developing, and implementing Big Data Applications using Microsoft Azure Cloud, AWS, and big data technologies like Apache Hive, Apache Spark, PySpark, & SparkSQL.
2X Microsoft Certified Azure Data Engineer.
Developed ETL/ELT pipelines using big data and cloud technologies such as Apache Hive, Apache Spark, Azure Data Factory, and Azure Databricks.
Designed and solved many complex data pipeline issues for building fault - tolerant and scalable ELT pipelines.
Worked with various big data file formats such as Apache Parquet, CSV, AVRO, and JSON while developing big data applications using Apache Hive and Apache Spark.
Worked with Azure Synapse Analytics to develop end-to-end ETL/ELT applications.
Optimized many spark application codes to improve the performance of overall pipelines by implementing performance tuning techniques in databricks.
Implemeted Slowly Changing dimensions Type3 implementation (SCD Type3) to the data in the delta tables in the databricks.
Created stored procedures that are helpful for performing Full loads and Incremental Loads from file-based storage systems into the tables in the data warehouse.
Developed Data Cleaning, Data Validation, and reconciliation scripts to validate the data before and after the data processing.
Developed reusable/generic pipelines that can be used for multiple data products/LOB's
Designed and Developed a generic notebook to validate the data between source and target across multiple stages of data processing.
Created data bricks notebooks with delta format tables and implemented lake house architecture.
Created project architecture, and data lineage documents for every task in the project.
Worked with Visual studio and Repos in Azure DevOps to do code commits and code migrations across dev, test, and prod environments.
Created generic notebooks for a common set of activities to reduce code redundancy and improve code re usability.
Worked with AWS Glue, AWS Data Catalog, AWS RedShift, and AWS RedShift Spectrum for developing and orchestrating ETL/ELT applications using PySpark in AWS Glue.
Implemented audit logs for job execution in AWS Glue.
Implemented control flow architecture for developing a secure & end-to-end big data application using ADF V2, Azure Databricks, Azure Synapse Analytics, Azure Datalake Storage, Azure SQL DB, and Azure Key Vaults.

TECHNICAL SKILLS

Azure Data Lake
Azure Data Factory
Azure Data Bricks
Azure Synapse Analytics
Azure SQL DB
Apache Spark
PySpark
SparkSQL
Apache Hive
AWS Glue & Data Catalog
AWS S3 Storage
Azure Dedicated SQL Pools
Azure Key Vault
Data Modelling
PythonSQL

PROFESSIONAL EXPERIENCE

Azure Data Engineer

Confidential

Responsibilities:

Understand and analyze ETL requirements and identify solution elements for enhancement requests and change requests.
Predominantly worked on Azure Synapse Analytics to design end-to-end Pipelines, spark applications using notebooks, and loaded the data into dedicated SQL Pools.
Built fault tolerant, scalable, and complex Pipelines using synapse pipelines.
Implemented CDC (Change Data Capture logic) for incremental data loads using Azure Synapse features of pipelines.
Created many PySpark and SparkSQL scripts in synapse notebooks for doing data transformations as per the given business requirements.
Implemented performance tuning on the spark applications in synapse notebooks which improved the overall performance by 5 times than the original jobs.
Implemented the data warehousing solution in Azure Synapse Analytics.
Built complex ETL/ELT pipelines for data processing in the azure cloud using Azure Data Factory V2, and Azure Synapse Dedicated SQL Pools.
Did a complete migration project from end-to-end from on-premise to Azure cloud by developing the ELT processing framework using Azure Synapse Analytics.
Designed and developed Change Data Capture (CDC) logic to detect and implement Incremental and Historical data loads for regular data processing.
Also, recently started working on testing CDC (under public preview from Azure) to implement the same to detect and capture any changes in our relational databases.
Built pipelines to move hashed and un-hashed data from Azure Blob to Datalake.
Back traced the ETL/ELT mappings for dataflows using azure data factory pipelines and improved the overall performance of data processing.
Created stored procedures to load the data into fact and dimension tables in Azure SQL Datawarehouse.
Created complex queries based on the business requirement using PySpark/SparkSQL in Azure Synapse Spark pools.
Designated as primary point of contact for production support for the implementation of the new framework using azure data factory, and Azure Databricks.
Migration of on-premises data (SQL Server) to Azure Data Lake Store (ADLS Gen 2) using Azure Data Factory (ADF V2) and Azure Synapse Pipelines.
Created Clustered and non-clustered indexes on the fact and dimension tables in the data warehouse for faster retrieval of data for complex analytical and reporting queries.

Sr.Data Engineer

Confidential

Responsibilities:

The agenda of my team is to develop an application/reporting system to track and report various aspects of HSE measures required for audit and safety measures.
Data Ingestion on premise databases to cloud migration and processing the data in Azure Databricks.
Imported data from file-based systems and relational databases into the azure datalake storage in standard file formats such as Apache Parquet using Azure Data Factory and Azure Databricks.
Hands-on coding - Write and test the code for the Ingest automation process - Full and Incremental Loads.
Design the solution and develop the scripts for data ingestion using PySpark & Spark SQL in Azure Databricks and orchestrate them by using the data factory pipelines.
Implemented a control flow architecture to manage the data extraction from various sources based on the business requirements.
Developed various automated scripts for DI (Data Ingestion) and DL (Data Loading) using PySpark.
Enabling monitoring and azure log analytics to alert support team on usage and stats of the daily runs.
Designed and developed Azure logic apps to trigger emails whenever a pipeline failure occurs in the azure data factory pipelines.
Took proof of concept projects ideas from business, developed and created production pipelines that deliver business value using Azure Data Factory
Exposed transformed data in Azure Spark Databricks platform to Apache Parquet, and Delta file formats for efficient data storage.
Create and maintain optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and Azure Databricks.
Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

Data Modeller/Engineer

Confidential

Responsibilities:

Worked as a Data Modeler/Engineer to generate Data Models using SQL Server and developed an OLAP system.
Worked on Azure SQL Datawarehouse (now called as Dedicated SQL pools) to design the datawarehouse to host the fact and dimension tables that are required for a datawarehouse.
Designed the SQL datawarehouse with the best practices to follow as per the industry standards at that time by creating clustered and non-clustered indexes on the tables for retrieving the data.
Worked with Azure Data Factory V1 (now decommissioned) and also tested the Data Factory V2 (in public preview) to load the data from various sources like on-premise sql server, files in Azure blob storage, Azure Datalake storage Gen1 into the fact and dimension tables in the SQL Datawarehouse.
Implemented a polybase mechanism to load the data from file-based sources into the data warehouse tables which is very efficient when compared with the normal bulk load.
Formulated SQL queries, Aggregate Functions, and database schema to automate information retrieval.
Involved in manipulating data to fulfill analytical and segmentation requests.
Written complex SQL queries for data analysis to meet business requirements and created the views on the base tables for better safety for reporting needs.
Using Data Visualization tools and techniques to best share data with business partners
Designed and implemented a Data Lake to consolidate data from multiple sources, using Hadoop stacks technologies like Sqoop, hive.
Reviewed code and system interfaces and extracts to handle the migration of data between systems/databases.
Developed a Conceptual model using Erwin based on requirements analysis.
Involved in designing ETL mapping documents, project architecture, and data lineage documents.
Created azure data factory (ADF pipelines) using Azure polybase and Azure blob.

Data Analyst

Confidential

Responsibilities:

Responsible for Data Extraction, Exploratory Data analysis, understand the “Why” behind data.
Optimized data collection procedures and generated reports on a Regular basis.
Utilized Microsoft SPSS statistical software to track and analyze data.
Developed Data Flow Diagrams (DFD's), Entity Relationship Diagrams (ERD's), and web page mock ups using MS.
Use Microsoft Access to create forms, queries, reports, and modules.
Extensively used SQL for accessing and manipulating database systems.
Created SAS Datasets, Data manipulation, developed data marts for the preparation of reports, tables, listings & graphs.
Used advanced Excel functions to generate spreadsheets and pivot tables.
Designed and built statistical analysis models on large data sets.
Worked in Data management performing Data analysis, gap analysis and data mapping.
Supported revenue management using statistical and quantitative analysis, developed several statistical approaches and optimization models.
Elicited requirements for agile projects by documenting User stories, Product Backlogs and Acceptance Criteria.
Designed UML Diagrams such as Use Cases, Activity Diagrams, Sequence Diagrams and data flow diagrams.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship