Azure Data Engineer Resume
0/5 (Submit Your Rating)
SUMMARY
- Over 9 years of experience as a Data Engineer and extensively worked with designing, developing, and implementing Big Data Applications using Microsoft Azure Cloud, AWS, and big data technologies like Apache Hive, Apache Spark, PySpark, & SparkSQL.
- 2X Microsoft Certified Azure Data Engineer.
- Developed ETL/ELT pipelines using big data and cloud technologies such as Apache Hive, Apache Spark, Azure Data Factory, and Azure Databricks.
- Designed and solved many complex data pipeline issues for building fault - tolerant and scalable ELT pipelines.
- Worked with various big data file formats such as Apache Parquet, CSV, AVRO, and JSON while developing big data applications using Apache Hive and Apache Spark.
- Worked with Azure Synapse Analytics to develop end-to-end ETL/ELT applications.
- Optimized many spark application codes to improve the performance of overall pipelines by implementing performance tuning techniques in databricks.
- Implemeted Slowly Changing dimensions Type3 implementation (SCD Type3) to the data in the delta tables in the databricks.
- Created stored procedures that are helpful for performing Full loads and Incremental Loads from file-based storage systems into the tables in the data warehouse.
- Developed Data Cleaning, Data Validation, and reconciliation scripts to validate the data before and after the data processing.
- Developed reusable/generic pipelines that can be used for multiple data products/LOB's
- Designed and Developed a generic notebook to validate the data between source and target across multiple stages of data processing.
- Created data bricks notebooks with delta format tables and implemented lake house architecture.
- Created project architecture, and data lineage documents for every task in the project.
- Worked with Visual studio and Repos in Azure DevOps to do code commits and code migrations across dev, test, and prod environments.
- Created generic notebooks for a common set of activities to reduce code redundancy and improve code re usability.
- Worked with AWS Glue, AWS Data Catalog, AWS RedShift, and AWS RedShift Spectrum for developing and orchestrating ETL/ELT applications using PySpark in AWS Glue.
- Implemented audit logs for job execution in AWS Glue.
- Implemented control flow architecture for developing a secure & end-to-end big data application using ADF V2, Azure Databricks, Azure Synapse Analytics, Azure Datalake Storage, Azure SQL DB, and Azure Key Vaults.
TECHNICAL SKILLS
- Azure Data Lake
- Azure Data Factory
- Azure Data Bricks
- Azure Synapse Analytics
- Azure SQL DB
- Apache Spark
- PySpark
- SparkSQL
- Apache Hive
- AWS Glue & Data Catalog
- AWS S3 Storage
- Azure Dedicated SQL Pools
- Azure Key Vault
- Data Modelling
- PythonSQL
PROFESSIONAL EXPERIENCE
Azure Data Engineer
Confidential
Responsibilities:
- Understand and analyze ETL requirements and identify solution elements for enhancement requests and change requests.
- Predominantly worked on Azure Synapse Analytics to design end-to-end Pipelines, spark applications using notebooks, and loaded the data into dedicated SQL Pools.
- Built fault tolerant, scalable, and complex Pipelines using synapse pipelines.
- Implemented CDC (Change Data Capture logic) for incremental data loads using Azure Synapse features of pipelines.
- Created many PySpark and SparkSQL scripts in synapse notebooks for doing data transformations as per the given business requirements.
- Implemented performance tuning on the spark applications in synapse notebooks which improved the overall performance by 5 times than the original jobs.
- Implemented the data warehousing solution in Azure Synapse Analytics.
- Built complex ETL/ELT pipelines for data processing in the azure cloud using Azure Data Factory V2, and Azure Synapse Dedicated SQL Pools.
- Did a complete migration project from end-to-end from on-premise to Azure cloud by developing the ELT processing framework using Azure Synapse Analytics.
- Designed and developed Change Data Capture (CDC) logic to detect and implement Incremental and Historical data loads for regular data processing.
- Also, recently started working on testing CDC (under public preview from Azure) to implement the same to detect and capture any changes in our relational databases.
- Built pipelines to move hashed and un-hashed data from Azure Blob to Datalake.
- Back traced the ETL/ELT mappings for dataflows using azure data factory pipelines and improved the overall performance of data processing.
- Created stored procedures to load the data into fact and dimension tables in Azure SQL Datawarehouse.
- Created complex queries based on the business requirement using PySpark/SparkSQL in Azure Synapse Spark pools.
- Designated as primary point of contact for production support for the implementation of the new framework using azure data factory, and Azure Databricks.
- Migration of on-premises data (SQL Server) to Azure Data Lake Store (ADLS Gen 2) using Azure Data Factory (ADF V2) and Azure Synapse Pipelines.
- Created Clustered and non-clustered indexes on the fact and dimension tables in the data warehouse for faster retrieval of data for complex analytical and reporting queries.
Sr.Data Engineer
Confidential
Responsibilities:
- The agenda of my team is to develop an application/reporting system to track and report various aspects of HSE measures required for audit and safety measures.
- Data Ingestion on premise databases to cloud migration and processing the data in Azure Databricks.
- Imported data from file-based systems and relational databases into the azure datalake storage in standard file formats such as Apache Parquet using Azure Data Factory and Azure Databricks.
- Hands-on coding - Write and test the code for the Ingest automation process - Full and Incremental Loads.
- Design the solution and develop the scripts for data ingestion using PySpark & Spark SQL in Azure Databricks and orchestrate them by using the data factory pipelines.
- Implemented a control flow architecture to manage the data extraction from various sources based on the business requirements.
- Developed various automated scripts for DI (Data Ingestion) and DL (Data Loading) using PySpark.
- Enabling monitoring and azure log analytics to alert support team on usage and stats of the daily runs.
- Designed and developed Azure logic apps to trigger emails whenever a pipeline failure occurs in the azure data factory pipelines.
- Took proof of concept projects ideas from business, developed and created production pipelines that deliver business value using Azure Data Factory
- Exposed transformed data in Azure Spark Databricks platform to Apache Parquet, and Delta file formats for efficient data storage.
- Create and maintain optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and Azure Databricks.
- Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
Data Modeller/Engineer
Confidential
Responsibilities:
- Worked as a Data Modeler/Engineer to generate Data Models using SQL Server and developed an OLAP system.
- Worked on Azure SQL Datawarehouse (now called as Dedicated SQL pools) to design the datawarehouse to host the fact and dimension tables that are required for a datawarehouse.
- Designed the SQL datawarehouse with the best practices to follow as per the industry standards at that time by creating clustered and non-clustered indexes on the tables for retrieving the data.
- Worked with Azure Data Factory V1 (now decommissioned) and also tested the Data Factory V2 (in public preview) to load the data from various sources like on-premise sql server, files in Azure blob storage, Azure Datalake storage Gen1 into the fact and dimension tables in the SQL Datawarehouse.
- Implemented a polybase mechanism to load the data from file-based sources into the data warehouse tables which is very efficient when compared with the normal bulk load.
- Formulated SQL queries, Aggregate Functions, and database schema to automate information retrieval.
- Involved in manipulating data to fulfill analytical and segmentation requests.
- Written complex SQL queries for data analysis to meet business requirements and created the views on the base tables for better safety for reporting needs.
- Using Data Visualization tools and techniques to best share data with business partners
- Designed and implemented a Data Lake to consolidate data from multiple sources, using Hadoop stacks technologies like Sqoop, hive.
- Reviewed code and system interfaces and extracts to handle the migration of data between systems/databases.
- Developed a Conceptual model using Erwin based on requirements analysis.
- Involved in designing ETL mapping documents, project architecture, and data lineage documents.
- Created azure data factory (ADF pipelines) using Azure polybase and Azure blob.
Data Analyst
Confidential
Responsibilities:
- Responsible for Data Extraction, Exploratory Data analysis, understand the “Why” behind data.
- Optimized data collection procedures and generated reports on a Regular basis.
- Utilized Microsoft SPSS statistical software to track and analyze data.
- Developed Data Flow Diagrams (DFD's), Entity Relationship Diagrams (ERD's), and web page mock ups using MS.
- Use Microsoft Access to create forms, queries, reports, and modules.
- Extensively used SQL for accessing and manipulating database systems.
- Created SAS Datasets, Data manipulation, developed data marts for the preparation of reports, tables, listings & graphs.
- Used advanced Excel functions to generate spreadsheets and pivot tables.
- Designed and built statistical analysis models on large data sets.
- Worked in Data management performing Data analysis, gap analysis and data mapping.
- Supported revenue management using statistical and quantitative analysis, developed several statistical approaches and optimization models.
- Elicited requirements for agile projects by documenting User stories, Product Backlogs and Acceptance Criteria.
- Designed UML Diagrams such as Use Cases, Activity Diagrams, Sequence Diagrams and data flow diagrams.