We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

2.00/5 (Submit Your Rating)

SUMMARY

  • Proficient Data Engineer with 7+ years of experience in designing and implementing solutions for complex business problems involving all aspects of Database Management Systems, large scale data warehousing, reporting solutions data streaming, and real - time analytics.
  • Worked on Cloud Environments (AWS, Azure) and has experience in automating, configuring, and deploying instances on Cloud.
  • Developed an end-to-end scalable architecture to solve business problems with Azure Components such as Data Lake, Key Vault, HDInsight, Azure Monitoring, Azure Synapse, Function app, Data Factory, and Event Hubs.
  • Implemented Azure Data Factory to meet business functional requirements by Ingesting data from various source systems such as relational and unstructured data.
  • Built pipelines in ADF using Datasets/Linked Services/Pipeline to extract, load and transform data from various sources such as Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool, and backwards.
  • Migrated on-premises data to Amazon Web Services cloud. I have used AWS services like EC2 and S3 to process and store small data sets, and I have worked with Hadoop clusters on AWS EMR (Elastic Map Reduce).
  • Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, CloudFront, Auto Scaling, CloudWatch, SNS, SQS, SES, and other services of the AWS family.
  • Designed and built a Security Framework to provide fine-grained access to AWS S3 objects using AWS Lambda and Glue.
  • Used AWS EMR to transform and move enormous amounts of data into other AWS data stores and databases such as Amazon S3 and Amazon DynamoDB.
  • Worked on AWS Services such as AWS SNS to send out automated emails and messages using BOTO3.
  • Designed and built ETL processes in AWS Glue to migrate Campaign data from external sources such as S3 files into AWS Redshift.
  • ETL orchestration with Apache Airflow DAGs using a variety of hooks, operators, and custom modules
  • Experience with Py Spark and Azure Data Factory in creating, developing, and deploying high-performance ETL pipelines.
  • Used Kafka to load real-time data from multiple data sources into HDFS.
  • Experience in building Real-time Data Pipelines with Kafka Connect and Spark Streaming.
  • Contributed to the migration of objects from Teradata to Snowflake and the development of Snow pipe for continuous data load.
  • Experience on Apache Hadoop ecosystem components like Hadoop 2.X, Map Reduce, Sqoop, Spark, Hive, Storm, YARN, HBase.
  • Written multiple MapReduce programs to extract data for extraction, transformation, and aggregation from more than 20 sources having multiple file formats including XML and other compressed file formats.
  • Worked on Data analysis by using NumPy and Panda libraries in python to filter unstructured data to structured format.
  • Working experience with NoSQL databases such as HBase, as well as developing real-time read/write access to large datasets using HBase.
  • Ensured that reporting requirements are considered early in projects and designed data solutions to support Business Intelligence reporting.
  • Experience in using Spark Streaming, Spark SQL, and other components of spark like accumulators, Broadcast variables, various levels of caching and optimization techniques for spark jobs.
  • Skilled in query optimization, data workflows using Alteryx and data visualization presentations using Power BI, and Quick Sight.
  • Data Modeling and Dimensional Modeling for OLAP (Online Analytical Processing) and ODS Systems Using 3NF, Star, and Snowflake Schemas
  • Expertise in aspects of Agile framework from Sprint planning, retrospective analysis, and work estimates.

TECHNICAL SKILLS

AWS: EC2, Amazon S3, ECS, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, CloudFront, Auto Scaling, CloudWatch, Redshift

AZURE: Azure Data Lake, Data Bricks, Azure Data Factory, Azure Monitoring, Active Directory, Synapse, Key Vault, SQL Azure.

Hadoop/Big Data Technologies: Hadoop, Map Reduce, Oozie, Hive, Scoop, Spark, and Cloudera Manager.

Hadoop Distribution: Horton Works, Cloudera.

Programming & Scripting: Python, Scala, SQL, Shell Scripting.

IDE Tools: Eclipse, PyCharm, JupyterMonitoring and Reporting PowerBI, Tableau

Databases: Oracle, MY SQL, Teradata

NO SQL Database: HBase, DynamoDB

Development methods: Agile, Waterfall

PROFESSIONAL EXPERIENCE:

Confidential

Azure Data Engineer

Responsibilities:

  • As an Azure data engineer, I oversaw implementing the ETL process for loading data from various sources into Databricks tables and Azure Synapse Tables.
  • Knowledge of Azure Cloud Services (PaaS & IaaS), Storage, Data-Factory, Data Lake (ADLA & ADLS), Logic Apps, Azure Monitoring, Active Directory, Synapse, Key Vault, and SQL Azure.
  • Designed & customized data models for the Data warehouse supporting data from multiple sources in real-time.
  • Involved in building the ETL architecture & Source to Target mapping to load data into the Data warehouse.
  • Design and implement ETL and data movement solutions using Azure Data Factory, and SSIS
  • On-premises data migration (Oracle/ SQL Server/ DB2) to Azure Data Lake Store via Azure Data Factory
  • Extensive experience with data pre-processing and cleaning to perform feature engineering and data imputation techniques for missing values in a dataset using Python.
  • Worked on Data analysis by using NumPy and Pandas libraries in python to filter unstructured data to structured format.
  • Created Notebooks that extract raw data from a Data Bricks database, transform it, and then insert it into a cleansed Data Bricks database.
  • Designed and developed the data warehouse models by using Snowflake schema.
  • Created PowerShell scripts to automate ADF pipeline triggers using the Control-M scheduler.
  • Developed stored procedure, lookup, execute pipeline, data flow, copy data, and azure function features in ADF.
  • Designed and implemented multiple dashboards for internal metrics using Azure Synapse - PowerPivot & Power Query tools.
  • Configured Spark Streaming to receive real time data from Kafka. Used Backpressure to control message queuing in the topic.
  • Used Data Factory to develop pipelines and performed batch processing using Azure Batch processing.
  • Designed and developed pipelines to move the data from Azure blob storage/file share to SQL Data Warehouse.
  • Developed Spark applications in Databricks using PySpark and Spark SQL to perform transformations and aggregations on source data before loading it into Azure Synapse Analytics for reporting.
  • Experience with SQL Server Integration Services (SSIS), SSRS, stored procedures, triggers, and T-SQL.

Environment: Azure Data Storage,, Azure Services, Azure SQL server, Azure Data Factory Azure data warehouse, MySQL, ETL, PowerBI, SQL Database, U-SQL, Azure Data Lake, Kafka, Azure Databricks,, T-SQL, SQL Server Integration Services (SSIS).

Confidential

AWS (Amazon Web Services) Data Engineer

Responsibilities:

  • Worked on AWS pipelines for data migration to create and monitor multiple instances such as EC2, S3, AWS lambda, step function, SageMaker, EMR, spark, and Hadoop Development.
  • Design, development, and implementation of ETL solutions on AWS and in the Big Data environment, as well as migration of existing objects from on-premises, Redshift, and HDFS to S3 Data Lake and Snowflake.
  • Developed reusable framework to be leveraged for future migrations that automates ETL from RDBMS systems to the Data Lake utilizing Spark Data Sources and Hive data objects.
  • Used SQL, NumPy, Pandas, Boto3, and Hive for data analysis and model building
  • Created Python scripts in Spark for data aggregation, queries, and writing data back into OLTP (Online Transactional processing) system using Data frames/SQL/Data sets and RDD/MapReduce.
  • Involved in design and analysis of the issues and providing solutions and workarounds to the users and end-clients.
  • Used Python to create multiple Spark Streaming and Spark SQL jobs on AWS.
  • Created a job monitoring tool to monitor jobs scheduled using Data Pipeline, Step Functions, and EMR.
  • Created a Data Pipeline utilizing Processor Groups and numerous processors in Apache Nifi for Flat File, RDBMS as part of a Proof of Concept (POC) on Amazon EC2.
  • Extensively used AWS Athena to import structured data from S3 into other systems such as Red Shift.
  • Provisioning of EC2 instances, as well as transient and long-running EMRs (Elastic Map Reduce) to handle petabytes of data
  • Configure access to RDS DB services, DynamoDB tables, and EBS volumes for inbound and outbound traffic to set alarms for notifications or automated actions on AWS.
  • Developed a data pipeline for various events such as ingestion, aggregation, and loading consumer response data from an AWS S3 bucket into Hive external tables in HDFS to feed tableau dashboards.
  • Wrote scripts to collect high-frequency log data from various sources and integrate it into AWS using Kinesis, staging data in the Data Lake for later analysis.
  • Used Apache Airflow in conjunction with AWS to monitor multi-stage machine learning processes using Amazon Sage Maker tasks.
  • Experienced in working with Spark SQL on different file formats like XML, JSON, and Parquet.
  • In development - Used RedShift SQL or Hive/Spark SQL in Elastic Mapreduce to implement the data processing layer to obtain normalized data from these raw data.
  • Used Gzip, LZO, and Snappy compression techniques to optimize Spark jobs for efficient use of HDFS.

Environment: AWS services, AWS EC2, HDFS, Dynamo Database, AWS Athena, Cloud Instances, AWS S3, Boto3, AMI, ETL, Hive, Spark API, Spark SQL, Hive, Python, Spark,, Sqoop, MySQL, Linux,, SNS, Snowflake.

Confidential

Data Engineer

Responsibilities:

  • Involved in the entire project life cycle, from design discussions to production deployment.
  • Good Knowledge of monitoring, managing, and reviewing Hadoop clusters using Cloudera Manager.
  • Expertise in Cluster Analysis using various big data analytic tools such as MapReduce and Hive
  • Supported the Hadoop Architect team in developing a Database Design in HDFS using HBase Architecture Design.
  • Used Scala to create Spark applications to ease the transition to Hadoop.
  • Developed scripts and batch jobs for scheduling various Hadoop programs and was involved in the maintenance and review of Hadoop log files.
  • Used Sqoop for data import and export from RDBMS to HDFS
  • Extracted the required data from the server into HDFS and Bulk Loaded cleaned data into HBase.
  • Built a data pipeline for a compliance report proof-of-principle consisting of Sqoop, Hadoop (HDFS), SparkSQL, Scala, Elasticsearch, and Kibana.
  • The open-source Python web scraping framework was used to crawl and extract data from web pages, and the conversion was performed using Hadoop, Hive, and MapReduce.
  • Created Airflow Scheduling scripts in Python.
  • Used Apache Spark to ingest Kafka data. Loading and transforming large volumes of structured, semi-structured, and unstructured data is possible.
  • Expert in Oozie, and Data pipeline operational services to coordinate clusters and plan workflows.
  • Used Cloudera Manager to continuously monitor and manage the Hadoop cluster.
  • Mappings done with reusable components such as worklets and mapplets, as well as other transformations.
  • Expert in Developing SSIS packages to ETL data into heterogeneous data warehouses.
  • Used python subprocess module to perform UNIX shell commands and extracted data from Agent.
  • Used Kibana visualizations to highlight compliance metrics in the translated report.
  • Wrote Scala user-defined functions for SQL functionality that lacked a Spark-SQL counterpart.

Environment: Hadoop, Cloudera Manager, HDFS, Map Reduce, Hive, Spark SQL, ScalaPython, Oozie, Sqoop, ETL, SSIS, Kibana, Kafka, RDBMS, UNIX.

Confidential

Jr Hadoop Developer

Responsibilities:

  • Installation and configuration of Hadoop Ecosystem, and Cloudera Manager using CDH3 Distribution.
  • Good knowledge and experience using Hive, and MapReduce
  • Used Sqoop to load data from DB2 to HBase for faster querying and performance optimization
  • Worked on developing ETL Workflows on the data obtained using Python for processing it in HDFS and HBase using Oozie
  • Developed programs to manipulate data & perform CRUD operations on request to database.
  • Developed several REST web services supporting both XML and JSON to perform tasks such as demand response management.
  • Developed SQL scripts using Spark for handling different datasets and verifying the performance over Map Reduce Jobs.
  • Improving data quality, reliability and efficiency of the individual component and the complete system.
  • Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and created Hive queries for analysis.
  • Performed moving of RDBMS data to HDFS in flat files generated by various channels for further processing
  • Created job workflows in Oozie to automate the tasks of loading data into HDFS.
  • Handled data imports from various data sources, transformations using Hive and MapReduce, data loading into HDFS, and data extraction from Teradata into HDFS using Sqoop.

Environment: Scala, Spark Scripts, Spark, Spark-SQL, Hadoop, RDBMS, Apache Hive, Apache Spark, Linux, HDFS, Sqoop, MCS, Hive UDF.

Confidential

Python Developer

Responsibilities:

  • Used Agile application development techniques and participated in the Software Development Life Cycle phases of analysis, definition, design, implementation, and testing.
  • Used Python along with other libraries such as matplotlib for charts and graphs, MySQL dB for database connectivity, Python-twitter, Py Side, Pickle, Panda’s data frame.
  • In Linux/Python, I created an embedded software data-driven test automation framework. Developed test cases and test plans.
  • Experience in the creation of Indexes, Stored Procedures, Constraints, Cursors, Triggers, Views, and User Defined Functions.
  • On the Django Web Framework, I developed complete frontend and backend modules in Python and used Py Unit, the Python unit test framework, for all Python applications.
  • I wrote and executed several MySQL databases queries in Python using the Python-MySQL connector and MySQL DB package.
  • Used Python scripting for database automation and remote operations on databases, particularly DynamoDB.
  • The MySQL WordPress database was cleaned and optimized.
  • Used PANDAS to handle enormous amounts of data as Data Frames Written scripts to handle missing data, analyze data, and reduce dimension
  • Writing unit-test cases for APIs and individual Python scripts to maintain code standards and stability, as well as debugging to provide error-free code.
  • Installing the necessary Python and Django packages and tools to set up the staging and test servers.
  • Used the Git version control tool to organize team development.
  • Worked on Python OOD code for quality assurance, logging, monitoring, and debugging
  • Selenium Library was used to create a fully functional test automation process that simulated submitting different requests from multiple browsers to a web application

Environment: Django, SQL, NoSQL, MongoDB, Python, REST, XML parser framework, DOM, RPM

We'd love your feedback!