We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

5.00/5 (Submit Your Rating)

Sterling, VA

SUMMARY

  • 7+ years of experience in interacting with business users to analyze the business process and requirements and transforming requirements into data warehouse design, documenting and rolling out the deliverables.
  • Experience in creating Spark applications usingpySparkandSpark - SQLfor data extraction, transformation and aggregation from multiple source systems (different file formats, databases) for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Created logic app workflow to read data from SharePoint and store in the blob container
  • Having experience with code analysis, code management in Azure Databricks
  • Having experience in creating secrets and accessing key vaults for database, SharePoint credentials
  • Created pipelines, datasets, linked services in Azure Data Factory
  • Automated deployment using Data Factory's integration with Azure Pipelines
  • Having experience in creating physical and logical modeling
  • Having experience in creating stored procedures and SQL queries
  • Expert level skills in Extract Transform and Load tools (ETL) Data Warehouse.
  • In-depth Experience in Data Warehouse ETL Architecture and development of ETL using Teradata, Oracle, SQL, PL/SQL, Informatica, UNIX Shell scripts, SQL*Plus, SQL*Loader.
  • Experience with all stages of Project development life cycle across numerous platforms in a variety of industries.
  • Knowledge of different Schemas (Star and Snowflake) to fit reporting, query and business analysis requirements.
  • Experience with Production support for ETL processes and databases.
  • Strong data warehousing experience using Informatica with extensive experience in designing the Tasks, Workflows, Mappings, Mapplets and Scheduling the workflow/ sessions using Informatica.
  • Optimize session performance by eliminating performance bottlenecks in source, target, and transformations.
  • Experience in using Informatica command line utilities like pmcmd, pmrepserver, and pmrepagent to control workflows, sessions and tasks.
  • Experience in working with UNIX Shell Scripting, CRON, FTP and file management in various UNIX environments.
  • Experience in creating functional and technical specifications for ETL processes and designing architecture for ETL.
  • Hands-on experience in writing, testing and implementation of the Cursors, Procedures, Functions, Triggers, and Packages at Database level using PL/SQL.
  • Extensively used PL/SQL bulk load options, Collections, PL/SQL tables and V Arrays and RefCursors.
  • Experience in Performance tuning & Optimization of SQL statements using SQL Trace, Explain Plan and TkProf.
  • Extensively worked with Teradata SQL and associated tools like BTEQ, Fast Load, FastExport, MultiLoad and SQL Assistant.
  • Experience in analyzing data using HiveQL, and pyspark programs
  • Proven ability to quickly learn and apply new technologies that translate requirements into client-focused solutions.
  • Excellent analytical, problem solving and communication skills.
  • Ability to multi-task in a fast-paced environment and to work independently or collaboratively.

TECHNICAL SKILLS

Cloud: Azure (Data Factory, Data Lake, Databricks, Logic App, ARM, Azure SQL)

Automation Tools: Azure Logic App,Crontab

Big Data: PySpark, Hive

Code Repository Tools: Azure DevOps

Database: SQL Server Management Studio 18,Oracle 11g/10g/9i/8i/7.X, SQL Server 2000/2005, Teradata 13.10/12.0/ V2R6/ V2R5, MS Access

Database Tools: SQL Navigator, TOAD, Teradata Utilities, SQL*Plus, SQL*Loader, ERwin.

ETL: Azure Data Factory(V2), Azure DatabricksInformatica10.1, 9.5.1/9.1.1

Languages: Python, Pandas, SQL, PL/SQL, UNIX Shell Script, Perl, C, C++

Operating System: Windows 10/7/XP/2000/NT/98/95, UNIX, LINUX, DOS

PROFESSIONAL EXPERIENCE

Confidential

Azure Data Engineer

Responsibilities:

  • ADF jobs are executed using a trigger that can be set to run on a specific schedule
  • The jobs are monitored for MIC for any Interface Failures like Source Access Failures and troubleshoot for the root cause
  • The jobs are restarted manually depending on the cause of the failure

Environment: Azure Logic App, Azure Blob Storage, Azure Data Factory(V2), Azure Data Lake Gen2, Azure SQL.

Confidential, Sterling, VA

Data Engineer

Responsibilities:

  • Experience in using GET and POST requests to get JSON data from Elasticsearch
  • Having experience in parsing nested json using Pandas and get the data using GET and POST methods by hitting Elasticsearch and validated by comparing with Kibana dashboards
  • Having Experience in parsing nested JSON documents using Python 3 and load data to S3 then ThoughtSpot
  • Converted the data frame from wide to long and vice versa using Pandas
  • Implemented various functions in NumPy and Pandas for mathematical operations and arrays.
  • Communicated with different teams to understand and report UI errors
  • Having experience in creating worksheets and pinboards in ThoughtSpot
  • Responsible for processing and analyzing UI logs using Pandas and Python
  • Experience in doing performance tuning of long running Python script

Environment: Jupyter tools, Elastic Search 7.9.1, Kibana,Python 3, Pandas,AWS S3,ThoughtSpot

Confidential, San Jose, CA

Azure Data Engineer

Responsibilities:

  • Involved in creating specifications for ETL processes, finalized requirements and prepared specification documents
  • Experience in creating and installing packages on cluster using Databricks
  • Experience in using Azure Databricks notebooks to clean, transform the data and load to Azure SQL staging tables using JDBC connector
  • Design, develop Data Factory pipelines to transform Excel, SharePoint Lists, JSON, Avro formats load into Azure SQL server.
  • Build pySpark scripts for transforming data and load to Azure SQL server staging table.
  • Created Scope and Access Tokens on Databricks to make pySpark scripts connect to Key Vaults.
  • Created to ODBC/JDBC connections to connect Pyspark scripts to Azure SQL Database.
  • Experience in creating and loading data into Hive tables with appropriate static and dynamic partitions, intended for efficiency.
  • Implemented various frameworks like Data Quality Analysis, Data Validation and Data Profiling with the help of technologies like Big data, PySpark, Pandas with database like Azure SQL server
  • Experience managing Azure Blob Storage, Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services.
  • Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight/Databricks
  • DevelopedETLjobs usingPySparkto migrate data from MYSQL server to HDFS.
  • Create Reusable ADF pipelines to call REST APIs.
  • Responsible for unit testing and in creating detailed Unit Test Document with all possible Test cases/Scripts
  • Used Azure Devops for scheduling ADF jobs and used Logic Apps for scheduling ADF pipelines
  • Worked with structured and unstructured data in Pyspark
  • Developed Pyspark scripts to integrate the data flow between SharePoint, SharePoint lists and Azure SQL Server.

Environment: Azure Logic App, Azure Blob Storage, Azure Databricks, Azure Data Factory(V2), Azure Data Lake Gen2, Azure SQL.

Confidential

Azure Data Engineer

Responsibilities:

  • Created Pipelines inADFusingLinked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources likeAzure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Involved in creating the pyspark DataFrames in Azure Databricks to read the data from Data Lake or from Blob storage and use Spark Sql context for transformation.
  • Worked on cloud POC to select the optimal cloud vendor based on a set of rigid success criteria.
  • Design, development and implementation of performant ETL pipelines using PySpark and Azure Data Factory
  • Integration of data storage solutions in spark - especially with Azure Data Lake storage and Blob storage.
  • Performance tuning of Hive and Spark jobs.
  • Developed Hive scripts from TeradataSQLscripts to process data inHadoop.
  • CreatedHivetables to store the processed results and written Hive scripts to transform and aggregate the disparate data.
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Analyzed large data sets usingHivequeries for Structured data, unstructured and semi-structured data.
  • Worked with structured data in Hiveto improve performance by various advanced techniques like Bucketing, Partitioning, and Optimizing self joins.
  • Written and used complex data type in storing and retrieving data using HQL in Hive.

Environment: Azure Data Factory(V2), Azure Databricks, Python 2.0, SSIS, Azure SQL, Azure Data Lake, Azure Blob Storage, Spark 2.0, Hive.

Confidential

ETL Developer

Responsibilities:

  • Involved in understanding Business and data needs and analyze multiple data sources and document data mapping to meet those needs.
  • Worked extensively on SQL, Informatica, Mload, Fload, FastExportas needed to handle different scenarios.
  • Worked on a Linux/UNIX platform consisting of a Production UNIX node and a Teradata UNIX node, both of which are connected to Teradata servers.
  • Troubleshoot Teradata ETL scripts, fix bugs and address production issues and performance tune
  • Extracted data from Teradata and writing to flat files using BTEQ and loading data into Teradata from flat files using MLoads.
  • Implemented extraction, transformation and load (ETL) scripts in BTEQ and load utilities as per the Business Logic.
  • Used various Teradata analytic functions.
  • Performance tuned and optimized various complex SQL queries.
  • Designed and created mappings/mapplets/sessions/workflows to move data from source to reporting EDW.
  • Developed and maintained complex Informatica mappings using Transformations like Router, Aggregator, Normalizer, Joiner, Expression and Lookup, Update strategy, Sequence generator and Stored Procedure.
  • Extensively worked on Mapplets, Mapping Variables, Mapping Parameters, and Session Parametersand increased the re-usability.
  • Used Informatica Power Center Workflow manager to create sessions, workflows and batches to run with the logic embedded in the mappings.
  • Use debugger in identifying bugs in existing mappings by analyzing data flow, evaluating transformations.
  • Broke up complicated slow mappings into multiple mappings that ran much faster and could be run in parallel.
  • Optimize session performance by eliminating performance bottlenecks.
  • Involved in extracting and loading volumes of up to 200 million or more records a day.
  • Constructed Shell driver scripts to pre-process data, run and schedule jobs in UNIX.
  • Perform data validation tests using complex SQL statements.

Environment: Teradata 13.10, Teradata SQL Assistant 13.10, Informatica Power Center 9.5, SUSE Linux Server 10 (2.6.16.60-0 ), SunOS 5.10, Erwin, MS Visio 2007

Confidential, Bentonville, AR.

ETL Developer

Responsibilities:

  • Proficient in ETL programming using Informatica and SQL scripting using Teradata and Oracle.
  • Coordinating with multiple teams forgathering requirements, Data analysis and system impactanalysis for new enhancements.
  • Designed and developed end-to-end ETL process for Deposits and LoansAffiliate process.
  • Creating mappings using a variety of transformations like Source Qualifier, Lookup (static/dynamic cache, connected/unconnected, persistent/nonpersistentcache), Stored Procedures (connected/unconnected, pre/post/normal), Routers, Filters, Aggregator, Normalizer, Unions, Update Strategies, Sorters and JAVA transformation.
  • Responsible for identifying the performance bottlenecks and fixing them.
  • Worked on a Linux/UNIX platform consisting of a Production UNIX node and a Teradata UNIX node, both of which are connected to Teradata servers.
  • Troubleshoot Teradata ETL scripts, fix bugs and address production issues and performance tune
  • Used SVNVersion control system for UNIX scripts and ETL Parameter files.
  • Unit Testing, Integration Testing and assisted super users in UAT.
  • Created UNIX shell scripting and automation of ETL processes.
  • Extensively used the Informatica Debugger for debugging the Mappings associated with failed Sessions.
  • Involved in performance tuning of targets, sources, mappings, and sessions.
  • Used pmcmd, pmrepagent and pmrepserver in non-windows environment.
  • Extensively used Joins, Triggers, Stored Procedures and Functions in interaction with backend database using PL/SQL as part of the process to handle different scenarios.
  • Worked closely with DBA in performance tuning of queries using Analyze Tables and SQL Trace.
  • Created post-session and pre-session shell scripts and mail-notifications.
  • Documented Informatica mappings, design and validation rules.
  • Participating in Performance testing activities.

Environment: Informatica 9.1, Oracle 10g database, Teradata v12, MS SQL server2012, Tortoise SVN, Linux

Confidential

ETL Developer

Responsibilities:

  • Extensively worked in Data Extraction, Transformation and Loading from flat files, Oracle, XML, Teradata Sources into Teradata using BTEQ, FastLoad, MultiLoad, Korn shell scripts and Informatica.
  • Developed project that identifies subscribers who are excessive OFFNET offenders Voice and Data by applying certain rules on Oracle PL/SQL.
  • Created UNIX shell scripting and automation of ETL processes.
  • Extensively used the Informatica Debugger for debugging the Mappings associated with failed Sessions.
  • Involved in performance tuning of targets, sources, mappings, and sessions.
  • Used pmcmd, pmrepagent and pmrepserver in non-windows environment.
  • Extensively used Joins, Triggers, Stored Procedures and Functions in interaction with backend database using PL/SQL as part of the process to handle different scenarios.
  • Worked closely with DBA in performance tuning of queries using Analyze Tables and SQL Trace.
  • Created post-session and pre-session shell scripts and mail-notifications.
  • Documented Informatica mappings, design and validation rules.
  • Extensively used Informatica Power Center to design multiple mappings with embedded business logic.
  • Tuned Informatica Mappings and Sessions for optimum performance and also used new features like Pushed down optimization.
  • Analyzed the functional specs provided by the data architect and created technical specs documents for all the mappings.
  • Utilized variousTransformationsin mappings like Joiner, Lookup, Sorter, Aggregator, Union, SQL, Stored Procedure, Update Strategy, Normalizer and Routerfor populating target table in efficient manner.
  • Worked on Mapping Variables, Mapping Parameters, Workflow Variables and Session Parameters
  • Used various tasks like session, email, command, Event wait, Event raise, control etc.
  • Developed a solution that allows subscriber account records on the SQL Server to be updated with the current status and address values in eCDW.
  • Developed Source to Target Audits to ensure that the data supporting critical business metrics are reconciled with the external data improving eCDW Data Quality and Integrity.
  • Worked closely with the source team and users to validate the accuracy of the mapped attributes.
  • Unit tested the developed ETL scripts, created test SQLs, and handled System testing and UAT issues.
  • Created Maestro Scripts to Scheduled ETL jobs.
  • Documented for all phases like Analysis, design, development, testing and maintenance.

Environment: Teradata 12.0, Teradata SQL Assistant 12.0, Oracle 10g, Informatica Power Center 9.1.1, TOAD, Maestro (TWS) V8.2, Erwin, MS Visio 2007

We'd love your feedback!