We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

0/5 (Submit Your Rating)

SUMMARY

  • Over 9 years of experience as a Data Engineer and extensively worked with designing, developing, and implementing Big Data Applications using Microsoft Azure Cloud, AWS, and big data technologies like Apache Hive, Apache Spark, PySpark, & SparkSQL.
  • 2X Microsoft Certified Azure Data Engineer.
  • Developed ETL/ELT pipelines using big data and cloud technologies such as Apache Hive, Apache Spark, Azure Data Factory, and Azure Databricks.
  • Designed and solved many complex data pipeline issues for building fault - tolerant and scalable ELT pipelines.
  • Worked with various big data file formats such as Apache Parquet, CSV, AVRO, and JSON while developing big data applications using Apache Hive and Apache Spark.
  • Worked with Azure Synapse Analytics to develop end-to-end ETL/ELT applications.
  • Optimized many spark application codes to improve the performance of overall pipelines by implementing performance tuning techniques in databricks.
  • Implemeted Slowly Changing dimensions Type3 implementation (SCD Type3) to the data in the delta tables in the databricks.
  • Created stored procedures that are helpful for performing Full loads and Incremental Loads from file-based storage systems into the tables in the data warehouse.
  • Developed Data Cleaning, Data Validation, and reconciliation scripts to validate the data before and after the data processing.
  • Developed reusable/generic pipelines that can be used for multiple data products/LOB's
  • Designed and Developed a generic notebook to validate the data between source and target across multiple stages of data processing.
  • Created data bricks notebooks with delta format tables and implemented lake house architecture.
  • Created project architecture, and data lineage documents for every task in the project.
  • Worked with Visual studio and Repos in Azure DevOps to do code commits and code migrations across dev, test, and prod environments.
  • Created generic notebooks for a common set of activities to reduce code redundancy and improve code re usability.
  • Worked with AWS Glue, AWS Data Catalog, AWS RedShift, and AWS RedShift Spectrum for developing and orchestrating ETL/ELT applications using PySpark in AWS Glue.
  • Implemented audit logs for job execution in AWS Glue.
  • Implemented control flow architecture for developing a secure & end-to-end big data application using ADF V2, Azure Databricks, Azure Synapse Analytics, Azure Datalake Storage, Azure SQL DB, and Azure Key Vaults.

TECHNICAL SKILLS

  • Azure Data Lake
  • Azure Data Factory
  • Azure Data Bricks
  • Azure Synapse Analytics
  • Azure SQL DB
  • Apache Spark
  • PySpark
  • SparkSQL
  • Apache Hive
  • AWS Glue & Data Catalog
  • AWS S3 Storage
  • Azure Dedicated SQL Pools
  • Azure Key Vault
  • Data Modelling
  • PythonSQL

PROFESSIONAL EXPERIENCE

Azure Data Engineer

Confidential

Responsibilities:

  • Understand and analyze ETL requirements and identify solution elements for enhancement requests and change requests.
  • Predominantly worked on Azure Synapse Analytics to design end-to-end Pipelines, spark applications using notebooks, and loaded the data into dedicated SQL Pools.
  • Built fault tolerant, scalable, and complex Pipelines using synapse pipelines.
  • Implemented CDC (Change Data Capture logic) for incremental data loads using Azure Synapse features of pipelines.
  • Created many PySpark and SparkSQL scripts in synapse notebooks for doing data transformations as per the given business requirements.
  • Implemented performance tuning on the spark applications in synapse notebooks which improved the overall performance by 5 times than the original jobs.
  • Implemented the data warehousing solution in Azure Synapse Analytics.
  • Built complex ETL/ELT pipelines for data processing in the azure cloud using Azure Data Factory V2, and Azure Synapse Dedicated SQL Pools.
  • Did a complete migration project from end-to-end from on-premise to Azure cloud by developing the ELT processing framework using Azure Synapse Analytics.
  • Designed and developed Change Data Capture (CDC) logic to detect and implement Incremental and Historical data loads for regular data processing.
  • Also, recently started working on testing CDC (under public preview from Azure) to implement the same to detect and capture any changes in our relational databases.
  • Built pipelines to move hashed and un-hashed data from Azure Blob to Datalake.
  • Back traced the ETL/ELT mappings for dataflows using azure data factory pipelines and improved the overall performance of data processing.
  • Created stored procedures to load the data into fact and dimension tables in Azure SQL Datawarehouse.
  • Created complex queries based on the business requirement using PySpark/SparkSQL in Azure Synapse Spark pools.
  • Designated as primary point of contact for production support for the implementation of the new framework using azure data factory, and Azure Databricks.
  • Migration of on-premises data (SQL Server) to Azure Data Lake Store (ADLS Gen 2) using Azure Data Factory (ADF V2) and Azure Synapse Pipelines.
  • Created Clustered and non-clustered indexes on the fact and dimension tables in the data warehouse for faster retrieval of data for complex analytical and reporting queries.

Sr.Data Engineer

Confidential

Responsibilities:

  • The agenda of my team is to develop an application/reporting system to track and report various aspects of HSE measures required for audit and safety measures.
  • Data Ingestion on premise databases to cloud migration and processing the data in Azure Databricks.
  • Imported data from file-based systems and relational databases into the azure datalake storage in standard file formats such as Apache Parquet using Azure Data Factory and Azure Databricks.
  • Hands-on coding - Write and test the code for the Ingest automation process - Full and Incremental Loads.
  • Design the solution and develop the scripts for data ingestion using PySpark & Spark SQL in Azure Databricks and orchestrate them by using the data factory pipelines.
  • Implemented a control flow architecture to manage the data extraction from various sources based on the business requirements.
  • Developed various automated scripts for DI (Data Ingestion) and DL (Data Loading) using PySpark.
  • Enabling monitoring and azure log analytics to alert support team on usage and stats of the daily runs.
  • Designed and developed Azure logic apps to trigger emails whenever a pipeline failure occurs in the azure data factory pipelines.
  • Took proof of concept projects ideas from business, developed and created production pipelines that deliver business value using Azure Data Factory
  • Exposed transformed data in Azure Spark Databricks platform to Apache Parquet, and Delta file formats for efficient data storage.
  • Create and maintain optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and Azure Databricks.
  • Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

Data Modeller/Engineer

Confidential

Responsibilities:

  • Worked as a Data Modeler/Engineer to generate Data Models using SQL Server and developed an OLAP system.
  • Worked on Azure SQL Datawarehouse (now called as Dedicated SQL pools) to design the datawarehouse to host the fact and dimension tables that are required for a datawarehouse.
  • Designed the SQL datawarehouse with the best practices to follow as per the industry standards at that time by creating clustered and non-clustered indexes on the tables for retrieving the data.
  • Worked with Azure Data Factory V1 (now decommissioned) and also tested the Data Factory V2 (in public preview) to load the data from various sources like on-premise sql server, files in Azure blob storage, Azure Datalake storage Gen1 into the fact and dimension tables in the SQL Datawarehouse.
  • Implemented a polybase mechanism to load the data from file-based sources into the data warehouse tables which is very efficient when compared with the normal bulk load.
  • Formulated SQL queries, Aggregate Functions, and database schema to automate information retrieval.
  • Involved in manipulating data to fulfill analytical and segmentation requests.
  • Written complex SQL queries for data analysis to meet business requirements and created the views on the base tables for better safety for reporting needs.
  • Using Data Visualization tools and techniques to best share data with business partners
  • Designed and implemented a Data Lake to consolidate data from multiple sources, using Hadoop stacks technologies like Sqoop, hive.
  • Reviewed code and system interfaces and extracts to handle the migration of data between systems/databases.
  • Developed a Conceptual model using Erwin based on requirements analysis.
  • Involved in designing ETL mapping documents, project architecture, and data lineage documents.
  • Created azure data factory (ADF pipelines) using Azure polybase and Azure blob.

Data Analyst

Confidential

Responsibilities:

  • Responsible for Data Extraction, Exploratory Data analysis, understand the “Why” behind data.
  • Optimized data collection procedures and generated reports on a Regular basis.
  • Utilized Microsoft SPSS statistical software to track and analyze data.
  • Developed Data Flow Diagrams (DFD's), Entity Relationship Diagrams (ERD's), and web page mock ups using MS.
  • Use Microsoft Access to create forms, queries, reports, and modules.
  • Extensively used SQL for accessing and manipulating database systems.
  • Created SAS Datasets, Data manipulation, developed data marts for the preparation of reports, tables, listings & graphs.
  • Used advanced Excel functions to generate spreadsheets and pivot tables.
  • Designed and built statistical analysis models on large data sets.
  • Worked in Data management performing Data analysis, gap analysis and data mapping.
  • Supported revenue management using statistical and quantitative analysis, developed several statistical approaches and optimization models.
  • Elicited requirements for agile projects by documenting User stories, Product Backlogs and Acceptance Criteria.
  • Designed UML Diagrams such as Use Cases, Activity Diagrams, Sequence Diagrams and data flow diagrams.

We'd love your feedback!