We provide IT Staff Augmentation Services!

Sr. Azure Data Engineer Resume

3.00/5 (Submit Your Rating)

Detroit, MI

SUMMARY

  • Around 8+ years of experience in software industry, including 5+ years of experience in, Azure cloud services, and 3+ years of experience in Data warehouse.
  • Experience in Azure Cloud, Azure Data Factory, Azure Data Lake storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos NO SQL DB, Azure Big Data Technologies (Hadoop and Apache Spark) and Data bricks.
  • Experience in developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite.
  • Experience in developing very complex mappings, reusable transformations, sessions, and workflows using Informatica ETL tool to extract data from various sources and load into targets.
  • Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE, and MS SQL Server.
  • Experience in DevelopingSparkapplications usingSpark - SQLinDatabricksfor data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming the data to uncover insights into the customer usage patterns.
  • Used various file formats like Avro, Parquet, Sequence, Json, ORC and text for loading data, parsing, gathering, and performing transformations.
  • Good experience in Hortonworks and Cloudera for Apache Hadoop distributions.
  • Designed and created Hive external tables using shared meta-store with Static & Dynamic partitioning, bucketing, and indexing.
  • Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Extensive hands on experience tuning spark Jobs.
  • Experienced in working with structured data using HiveQL, and optimizing Hive queries.
  • Familiarity with libraries like PySpark, Numbly, Pandas, Star base, Matplotlib in python.
  • Writing complex SQL queries using joins, group by, nested queries.
  • Experience with solid capabilities in exploratory data analysis, statistical analysis, and visualization using Python, SQL, and Tableau.
  • Running and scheduling workflows using Oozie and Zookeeper, identifying failures and integrating, coordinating, and scheduling jobs.
  • In - depth understanding of Snowflake cloud technology.
  • Hands on experience on Kafka and Flume to load the log data from multiple sources directly in to HDFS.
  • Widely used different features of Teradata such as BTEQ, Fast load, Multifood, SQL Assistant, DDL and DML commands and very good understanding of Teradata UPI and NUPI, secondary indexes and join indexes.
  • Having working experience with Building RESTful web services, and RESTful API.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, Map Reduce, HDFS, Sqoop, Hive, HBase, Flume, Kafka, Yarn, Apache Spark.

Databases: Oracle, MySQL, SQL Server, MongoDB, Dynamo DB, Cassandra, Snowflake.

Programming Languages: Python, Pyspark, Shell script, Perl script, SQL, Java.

Tools: PyCharm, Eclipse, Visual Studio, SQL*Plus, SQL Developer, SQL Navigator, SQL Server Management Studio, Eclipse, Postman.

Cloud Tech: Azure and AWS

Version Control: SVN, Git, GitHub, Maven

Operating Systems: Windows 10/7/XP/2000/NT/98/95, UNIX, LINUX, OS

Visualization/ Reporting: Tableau, ggplot2, matplotlib

PROFESSIONAL EXPERIENCE

Confidential, Detroit, MI

Sr. Azure Data Engineer

Responsibilities:

  • Architect and implement ETL and data movement solutions using Azure Data Factory, SSIS
  • Understand Business requirements, analysis and translate into Application and operational requirements.
  • Designed one-time load strategy for moving large databases to Azure SQL DWH.
  • Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using Azure Data Factory and HDInsight.
  • Experience in GCP Dataproc, GCS, Cloud functions, BigQuery.
  • Involved in designing optimizing Spark SQL queries, Data frames, import data from Data sources, perform transformations and stored the results to output directory into AWS S3.
  • Redesigned the Views in snowflake to increase the performance.
  • Created a framework to do data profiling, cleansing, automatic restart ability of batch pipeline and handling rollback strategy.
  • Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL
  • Lead a team of six developers to migrate the application.
  • Designed and implemented data loading and aggregation frameworks and jobs dat will be able to handle hundreds of GBs of json files, using Spark, Airflow and Snowflake.
  • Experience in moving data between GCP and Azure using Azure Data Factory.
  • Implemented masking and encryption techniques to protect sensitive data.
  • Implemented SSIS IR to run SSIS packages from ADF.
  • Written Pyspark job in AWS Glue to merge data from multiple table and in utilizing crawler to populate AWS Glue data catalog with metadata table definitions.
  • Developed mapping document to map columns from source to target.
  • Created azure data factory (ADF pipelines) using Azure blob.
  • Performed ETL using Azure Data Bricks. Migrated on-premises Oracle ETL process to Azure Synapse Analytics.
  • Worked on python scripting to automate generation of scripts. Data curation done using azure data bricks.
  • Worked on azure data bricks, PySpark, HDInsight, Azure ADW and hive used to load and transform data.
  • Implemented and Developing Hive Bucketing and Partitioning.
  • Implemented Kafka, spark structured streaming for real time data ingestion.
  • Used Azure Data Lake as Source and pulled data using Azure blob.
  • Used stored procedure, lookup, execute pipeline, data flow, copy data, azure function features in ADF.
  • Worked on creating star schema for drilling data. Created PySpark procedures, functions, packages to load data.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in InAzure Databricks.
  • Developed AWS cloud formation templates and setting up Auto scaling for EC2 instances and involved in the automated provisioning of AWS cloud environment using Jenkins.
  • Worked on Snowflake environment to remove redundancy and load real time data from various data sources into HDFS using Spark.
  • Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.
  • Creating Databricks notebooks using SQL, Python and automated notebooks using jobs.
  • Creating Spark clusters and configuring high concurrency clusters using Azure Databricks to speed up the preparation of high-quality data.
  • Create and maintain optimal data pipeline architecture in cloudMicrosoft Azure using Data Factory and Azure Databricks

Environment: Hadoop, Hive,Aws, Impala, Oracle, Snowflake, Spark, Pig, Sqoop, Oozie, Map Reduce, Teradata, SQL, Abolition, (S3, RedShift, CFT, EMR, Cloudwatch), Kafka, Zookeeper, Pyspark.

Confidential, San Francisco, CA

Azure Data Engineer/Data Bricks

Responsibilities:

  • Used Agile Methodology of Data Warehouse development using Kanbanize.
  • Developed data pipeline using Spark, Hive and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
  • Experience in building power bi reports on Azure Analysis services for better performance.
  • Working Experience onAzure Databrickscloud to organizing the data into notebooks and making it easy to visualize data using dashboards.
  • Used AWS Atana extensively to ingest structured data from S3 into other systems such as RedShift or to produce reports
  • Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery
  • Performed ETL on data from different source systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in InAzure Databricks.
  • Worked on managing the Spark Databricazure
  • Super-eminent understanding of AWS (Amazon Web Services) includes S3, Amazon RDS, IAM, EC2, Redshift, Apache Spark RDD concepts and developing Logical Data Architecture with adherence to Enterprise Architecture.
  • Creating Reports in Looker based on Snowflake Connections.
  • Implemented data ingestion from various source systems using sqoop and PySpark.
  • Hands on experience implementing Spark and Hive jobs performance tuning.
  • KS by proper troubleshooting, estimation, and monitoring of the clusters.
  • Performed Data Aggregation, Validation and on Azure HDInsight using spark scripts written in Python.
  • Performed monitoring and management of the Hadoop cluster by using Azure HDInsight.
  • Generated PL/SQL scripts for data manipulation, validation, and materialized views for remote instances.
  • Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
  • Worked with Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into snowflake table.
  • Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions and Triggers using SQL and PL/SQL.
  • Created large datasets by combining individual datasets using various inner and outer joins in SAS/SQL and dataset sorting and merging techniques using SAS/Base.
  • Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.
  • Wrote Python scripts to parse XML documents and load the data in database.
  • Created data sharing between two snowflake accounts.
  • Used Hive, Impala and Sqoop utilities and Oozie workflows for data extraction and data loading.
  • Created HBase tables to store various data formats of data coming from different sources.
  • Responsible for importing log files from various sources into HDFS using Flume.
  • Responsible for translating business and data requirements into logical data models in support Enterprise data models, ODS, OLAP, OLTP and Operational data structures.
  • Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Created SSIS packages to migrate data from heterogeneous sources such as MS Excel, Flat files and CVS files.

Environment: ADF, Databricks and ADL Spark, Hive, HBase, Sqoop, Flume, ADF, Blob, cosmos DB, MapReduce, HDFS, Cloudera, SQL, Apache Kafka, Azure, AWS,Python, power BI, Unix,Snowflake, SQL Server.

Confidential, NC

Big data Developer

Responsibilities:

  • Involved in Requirement gathering, Business Analysis and translated business requirements into technical design in Hadoop and Big Data.
  • Involved in SQOOP implementation which helps in loading data from various RDBMS sources to Hadoop systems and vice versa.
  • Support existing GCP Data Management implementations
  • Used AWS Atana extensively to ingest structured data from S3 into other systems such as RedShift or to produce reports.
  • Developed Python scripts to extract the data from the web server output files to load into HDFS.
  • Written a python script which automates to launch the EMR cluster and configures the Hadoop applications.
  • Extensively worked with Avro and Parquet files and converted the data from either format Parsed Semi Structured JSON data and converted to Parquet using Data Frames in PySpark.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions, Documented the systems processes and procedures for future references.
  • Involved in Configuring Hadoop cluster and load balancing across the nodes.
  • Involved in Hadoop installation, Commissioning, Decommissioning, Balancing, Troubleshooting, Monitoring and, debugging Configuration of multiple nodes using Hortonworks platform.
  • Written Pyspark job in AWS Glue to merge data from multiple table and in utilizing crawler to populate AWS Glue data catalog with metadata table definitions.
  • Involved in working with Spark on top of Yarn/MRv2 for interactive and Batch Analysis.
  • Involved in managing and monitoring Hadoop cluster using Cloudera Manager.
  • Used Python and Shell scripting to build pipelines.
  • Design and build GCP data driven solutions for enterprise data warehouse and data lakes
  • Developed data pipeline using Sqoop, HQL, Spark and Kafka to ingest Enterprise message delivery data into HDFS.
  • Developed AWS cloud formation templates and setting up Auto scaling for EC2 instances and involved in the automated provisioning of AWS cloud environment using Jenkins.
  • Developed workflow in Oozie also in Airflow to automate the tasks of loading data into HDFS and pre-processing with Pig and Hive.
  • Assisted in creating and maintaining technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
  • Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive semi structured and unstructured data. Loaded unstructured data into Hadoop distributed File System (HDFS).
  • Created HIVE Tables with dynamic and static partitioning including buckets for efficiency. Also created external tables in HIVE for staging purposes.
  • Loaded HIVE tables with data, wrote hive queries which run on MapReduce and Created customized BI tool for manager teams dat perform query analytics using HiveQL.
  • Aggregated RDDs based on the business requirements and converted RDDs into Data frames saved as temporary hive tables for intermediate processing and stored in HBase/Cassandra and RDBMs.

Environment: Hadoop 3.0, Hive 2.1, J2EE, JDBC, Pig 0.16, HBase 1.1, Sqoop, NoSQL, Impala, Java, Spring, MVC, XML, Spark 1.9, PL/SQL, HDFS, JSON, Hibernate, Bootstrap, jQuery.

Confidential, Columbus, OH

Data Warehouse Developer

Responsibilities:

  • Creation, manipulation and supporting the SQL Server databases.
  • Involved in the Data modeling, Physical and Logical Design of Database
  • Helped in integration of the front end with the SQL Server backend.
  • Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP
  • Created Stored Procedures, Triggers, Indexes, User defined Functions, Constraints etc on various database objects to obtain the required results.
  • Import & Export of data from one server to other servers using tools like Data Transformation Services (DTS)
  • Wrote T-SQL statements for retrieval of data and involved in performance tuning of TSQL queries.
  • Transferred data from various data sources/business systems including MS Excel, MS Access, Flat Files etc to SQL Server using SSIS/DTS using various features like data conversion etc. Also Created derived columns from the present columns for the given requirements.
  • Supported team in resolving SQL Reporting services and T-SQL related issues and Proficiency in creating different types of reports such as Cross-Tab, Conditional, Drill-down, Top N, Summary, Form, OLAP and Sub reports, and formatting them.
  • Provided via the phone, application support. Developed and tested Windows command files and SQL Server queries for Production database monitoring in 24/7 support.
  • Created logging for ETL load at package level and task level to log number of records processed by each package and each task in a package using SSIS.
  • Developed, monitored and deployed SSIS packages.

Environment: IBM WebSphere DataStage EE/7.0/6.0 (Manager, Designer, Director, Administrator), Ascential Profile Stage 6.0, Ascential QualityStage 6.0, Erwin, TOAD, Autosys, Oracle 9i, PL/SQL, SQL, UNIX Shell Scripts, Sun Solaris, Windows 2000.

Confidential

Data Warehouse/ETL Developer

Responsibilities:

  • Assisting the team with performance tuning for ETL and database processes
  • Design, develop, implement and assist in validating processes
  • Self-manage time and task priorities and of other developers on the project
  • Work with data providers to fill data gaps and/or to adjust source-system data structures to facilitate analysis and integration with other company data
  • Develop mapping / sessions / workflows
  • Conduct ETL performance tuning, troubleshooting, support, and capacity estimation
  • Map sources to targets using a variety of tools, including Business Objects Data Services/BODI. Design and develop ETL code to load and transform the source data from various formats into a SQL database.
  • Worked extensively on different types of transformations like source qualifier, expression, filter, aggregator, rank, lookup, stored procedure, sequence generator and joiner.
  • Created, launched & scheduled tasks/sessions. Configured email notification. Setting up tasks to schedule the loads at required frequency using Power Center Server manager. Generated completion messages and status reports using Server manager.
  • Administrated Informatica server ran Sessions & Batches.
  • Developed shell scripts for automation of Informatica session loads.
  • Involved in the performance tuning of Informatica servers.

Environment: Windows, UNIX script, Oracle 8.0, SQL, PLSQL, MS Access, SQL Server, Informatica 5.1, MS Excel.

We'd love your feedback!