Azure Data Engineer Resume
Bothell, WA
SUMMARY
- An experienced professional of IT experience in Design, Analysis, Development, Testing and Deployment.
- 8+ years of experience in Azure Cloud, Azure Data Factory, Azure Data Lake Storage Gen 1 & Gen 2, Azure Synapse Analytics, Azure SQL Database, Azure Event Grid, Azure EventHub, Big Data Technologies (Hive and Spark) and Azure Databricks.
- Strong skills in writing SQL queries and Shell scripts and Python scripts.
- Experienced on data pre - processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
- Worked on Data Warehouse design, implementation, and support (SQL Server, Azure SQL DB, Azure SQL Data warehouse).
- Proficient in developing strategies for Extraction, Transformation and Loading (ETL) and ELT mechanism.
- Hands-on experience in developing Logic App workflows for performing event-based data movement, perform file operations on Data Lake, Blob Storage, SFTP/FTP Servers, getting/manipulating data in Azure SQL Server.
- Developed Spark applications using PySpark, Spark-SQL & DataFrames from multiple file formats for analysing & transforming the data to uncover insights into the customer usage patterns.
- Experience in developing batch, real-time and near real-time solutions on Azure Cloud using Data Factory, Event Grid, Databricks & Storage.
- Experience in implementing Lakehouse solutions on Azure Cloud using Azure DataLake and Databricks DeltaLake.
- Experience in SDLC life cycle - Requirement Gathering, Design, Coding, Code reviews, Configuration control and Testing.
- Good Knowledge in Star and Snowflake schema dimensional models.
- Experience in creating database objects such as Tables, Constraints, Indexes, Views, Indexed Views, Stored Procedures, UDFs and Triggers on Microsoft SQL Server.
- Identify, design, and implement process improvements through automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability.
- Expert in developing Data Factory pipelines that are parametrized and reusable.
- Experience in Design and Development of Enterprise ETL methodology and solutions Data Migration, data transformations, data processing and business reporting using Teradata and SQL Server
- Eager to learn, able to adapt quickly, well organized and very reliable.
- Strong analytical, interpersonal, communication, coordination, problem solving and decision-making skills.
- Hands on experience on tableau desktop versions 7/8.2, Tableau reader and tableau server.
PROFESSIONAL EXPERIENCE
Azure Data Engineer
Confidential, Bothell, WA
Responsibilities:
- Involved in business requirements, Design and Development, testing and implementation of business rules.
- Implemented both ETL and ELT architectures in Azure using Data Factory, Databricks, SQL DB and SQL Data warehouse.
- Designed and developed a new solution to process the NRT data by using Azure stream analytics, Azure Event Hub and Service Bus Queue.
- Created different types of triggers to automate the pipeline in ADF.
- Created numerous pipelines in Azure using ADFv2 to get the data from different source databases by using different Azure Activities like Move &Transform, Copy, filter, for each, Databricks etc.
- Developed Databricks PySpark scripts using DataFrames and Spark-SQL for transformations and loading the data to different targets.
- Implemented Streaming and batch data processing solutions using Azure Databricks.
- Developed Lakehouse architecture with the help of Databricks Delta (Delta Lake).
- Optimized Databricks jobs using Spark UI and tuned long running stages or jobs using different optimization techniques.
- Used Python in Databricks to develop custom packages and frameworks that can help data processing solutions.
- Developed Stored procedures, Tables with optimal data distribution and indexing in SQL Database and Synapse.
- Orchestrated variety of solution in Azure Data factory by chaining required components(activities that uses different computing platforms) that fulfil the desired requirement.
- Implemented self-hosted integration runtime in ADF.
- Experienced working with different source and target systems such as Azure data lake gen1/gen2.
- Created, provisioned different Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters
- Created Linked services to connect the external resources to ADF.
- Worked on monitoring ADF pipelines and creating incidents based on severity.
- Responsible for modifying the code, debugging, and testing the code before deploying on the production cluster.
- Worked with team members to resolve any technical issue, Troubleshooting, Project Risk & Issue identification, and management.
- Hands on experience on tableau desktop versions 7/8.2, Tableau reader and tableau server.
- Produced attractive visuals/dashboards to convey the story inside the data.
- Involved in trouble shooting of performance issues which were associated with tableau reports.
Environment: Azure Data Factory (ADF v2), Azure DataLake Gen2, Blob Storage, Azure SQL Database, Azure SQL Data Warehouse(Synapse), Azure Databricks, Azure DevOps, Visual Studio, Azure CLI, Event Hubs, Event Grids, Tableau.
Azure Data Engineer
Confidential, Indianapolis, IN
Responsibilities:
- Created and managed different Databricks clusters, notebooks, jobs and Autoscaling.
- Created Linked service to land the data from different sources to Azure Data Factory.
- Built various jobs in Data factory using Event Based, Scheduled, Tumbling window Triggers.
- Implemented authentication mechanism using Azure Active Directory for data access and ADF.
- Developed Jupyter notebooks to Join, filter, pre-aggregate, and process the files stored in Azure SQL DW to do file validations and analytics in Databricks.
- Developed different process Workflows to Extract, Transform and Load raw data into HDFS and then process it to Hive tables.
- Implemented complex business logic through T-SQL stored procedures, Functions, Views and advance query concepts.
- Ingested streaming data from Azure EventHub and processed it using spark structured streaming.
- Ingested data from REST API’s using Python and shell scripts.
- Maintain conceptual, logical and physical data models along with corresponding metadata.
- Worked on Advanced SQL to embed the Stored Procedures into ETL PySpark scripts.
- Developed data pipelines that are driven by events from azure blob storage using event grid technology.
- Used Logic App to take decisional actions based on the workflow
- Created pipelines in Azure using ADF to get the data from different source systems and transform the data by using many activities.
- Worked on mapping data flows activities in the Azure data Factory.
Environment: Azure Data Factory (ADF v2), Azure DataLake Gen2, Blob Storage, Azure SQL Database, Azure SQL Data Warehouse(Synapse), Azure Databricks, Azure DevOps, Visual Studio, Azure CLI, Event Hubs, Event Grids.
Data Engineer
Confidential, Costa Mesa, CA
Responsibilities:
- Built logic to covert the different data source imports files (JSON, XML files, flat files) to a CSV format thereby aiding in merging of data sources
- Used Python with pandas to process flat files in to SQL Server using pyodbc and Stored Procedures. created custom User Defined Functions (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregate Functions (UDAF) for Hive
- Performed ETL by populating and creating staging tables and successfully moved data from source to stage table and eventually moved to final table or target database for further use of data.
- Written pre-processing scripts using Python.
- Created high level/detailed level design documents and involved in creating ETL functional and technical specification.
- Ingested data from REST API’s using python and shell scripts.
- Experience in Performance Tuning - Identified and fixed bottlenecks and tuned the queries on Hive
- Worked with complex SQL views, Stored Procedures, Triggers, and packages in large databases from various servers.
- Ensure the developed solutions are formally documented and signed off by business.
- Worked with team members to resolve any technical issue, Troubleshooting, Project Risk & Issue identification, and management.
Environment: Python, UNIX, Teradata, SQL Server, Windows, SQL, Linux, Jenkins, GitHub.
Hadoop Developer
Confidential
Responsibilities:
- Gathering data from multiple sources like Teradata, Oracle and SQL Server using Sqoop and loading to HDFS
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
- Responsible for cleansing and validating data.
- Responsible for writing Map-Reduce job which joins the incoming slices of data and pick only the fields needed for further processing.
- Finding the right join conditions and create datasets conducive to data analysis
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Devised procedures that solve complex business problems with due considerations for hardware/software capacity and limitations, operating times and desired results.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Worked hands on with ETL process.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Extracted the data from Teradata into HDFS using Sqoop.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior like shopping enthusiasts, travelers, music lovers etc.
- Wrote REST Web services to expose the business methods to external services.
- Exported the patterns analyzed back into Teradata using Sqoop.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Installed Oozie workflow engine to run multiple Hive.
- Developed Hive queries to process the data and generate the data cubes for visualizing
Environment: Hadoop, MapReduce, HDFS, Hive, Flume, Sqoop, Cloudera, Oozie, UNIX.
Hadoop Developer
Confidential
Responsibilities:
- Responsible for loading the customer's data and event logs from Oracle database, Teradata into HDFS using Sqoop
- End-to-end performance tuning of Hadoop clusters and Hadoop MapReduce routines against very large data sets.
- Responsible for building scalable distributed data solutions using Hadoop.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce,
- Loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
- Implementing MapReduce programs to analyze large datasets in warehouse for business intelligence
- Written the Spouts and Bolts after collecting the real stream customer data from Kafka broker to process and store into HBASE.
- Analyze the log files and process through Flume
- Experience in optimization of MapReduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization.
- Developed HQL queries to implement the select, insert, update and operations to the database by creating
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Developed simple to complex Map/Reduce jobs using Java, and scripts using Hive and Pig.
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) for data ingestion
- Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
- Experienced on loading and transforming of large sets of structured and semi structured data.
- Managing and Reviewing Hadoop Log Files, deploy and Maintaining Hadoop Cluster.
- Export filtered data into HBase for fast query.
- Involved in creating Hive tables, loading with data and writing Hive queries.
- Created data-models for customer data using the Cassandra Query Language.
- Ran many performance tests using the Cassandra-stress tool in order to measure and improve the read and
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce)
- And move the data files within and outside of HDFS.
- Queried and analyzed data from DataStax Cassandra for quick searching, sorting and grouping.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop
Environment: Apache Hadoop (Cloudera), Hbase, Hive, Pig, Map Reduce, Sqoop, Oozie, Eclipse, Java