Azure Data Engineer Resume Sterling, VA - Hire IT People

SUMMARY

7+ years of experience in interacting with business users to analyze the business process and requirements and transforming requirements into data warehouse design, documenting and rolling out the deliverables.
Experience in creating Spark applications usingpySparkandSpark - SQLfor data extraction, transformation and aggregation from multiple source systems (different file formats, databases) for analyzing & transforming the data to uncover insights into the customer usage patterns.
Created logic app workflow to read data from SharePoint and store in the blob container
Having experience with code analysis, code management in Azure Databricks
Having experience in creating secrets and accessing key vaults for database, SharePoint credentials
Created pipelines, datasets, linked services in Azure Data Factory
Automated deployment using Data Factory's integration with Azure Pipelines
Having experience in creating physical and logical modeling
Having experience in creating stored procedures and SQL queries
Expert level skills in Extract Transform and Load tools (ETL) Data Warehouse.
In-depth Experience in Data Warehouse ETL Architecture and development of ETL using Teradata, Oracle, SQL, PL/SQL, Informatica, UNIX Shell scripts, SQL*Plus, SQL*Loader.
Experience with all stages of Project development life cycle across numerous platforms in a variety of industries.
Knowledge of different Schemas (Star and Snowflake) to fit reporting, query and business analysis requirements.
Experience with Production support for ETL processes and databases.
Strong data warehousing experience using Informatica with extensive experience in designing the Tasks, Workflows, Mappings, Mapplets and Scheduling the workflow/ sessions using Informatica.
Optimize session performance by eliminating performance bottlenecks in source, target, and transformations.
Experience in using Informatica command line utilities like pmcmd, pmrepserver, and pmrepagent to control workflows, sessions and tasks.
Experience in working with UNIX Shell Scripting, CRON, FTP and file management in various UNIX environments.
Experience in creating functional and technical specifications for ETL processes and designing architecture for ETL.
Hands-on experience in writing, testing and implementation of the Cursors, Procedures, Functions, Triggers, and Packages at Database level using PL/SQL.
Extensively used PL/SQL bulk load options, Collections, PL/SQL tables and V Arrays and RefCursors.
Experience in Performance tuning & Optimization of SQL statements using SQL Trace, Explain Plan and TkProf.
Extensively worked with Teradata SQL and associated tools like BTEQ, Fast Load, FastExport, MultiLoad and SQL Assistant.
Experience in analyzing data using HiveQL, and pyspark programs
Proven ability to quickly learn and apply new technologies that translate requirements into client-focused solutions.
Excellent analytical, problem solving and communication skills.
Ability to multi-task in a fast-paced environment and to work independently or collaboratively.

TECHNICAL SKILLS

Cloud: Azure (Data Factory, Data Lake, Databricks, Logic App, ARM, Azure SQL)

Automation Tools: Azure Logic App,Crontab

Big Data: PySpark, Hive

Code Repository Tools: Azure DevOps

Database: SQL Server Management Studio 18,Oracle 11g/10g/9i/8i/7.X, SQL Server 2000/2005, Teradata 13.10/12.0/ V2R6/ V2R5, MS Access

Database Tools: SQL Navigator, TOAD, Teradata Utilities, SQL*Plus, SQL*Loader, ERwin.

ETL: Azure Data Factory(V2), Azure DatabricksInformatica10.1, 9.5.1/9.1.1

Languages: Python, Pandas, SQL, PL/SQL, UNIX Shell Script, Perl, C, C++

Operating System: Windows 10/7/XP/2000/NT/98/95, UNIX, LINUX, DOS

PROFESSIONAL EXPERIENCE

Confidential

Azure Data Engineer

Responsibilities:

ADF jobs are executed using a trigger that can be set to run on a specific schedule
The jobs are monitored for MIC for any Interface Failures like Source Access Failures and troubleshoot for the root cause
The jobs are restarted manually depending on the cause of the failure

Environment: Azure Logic App, Azure Blob Storage, Azure Data Factory(V2), Azure Data Lake Gen2, Azure SQL.

Confidential, Sterling, VA

Data Engineer

Responsibilities:

Experience in using GET and POST requests to get JSON data from Elasticsearch
Having experience in parsing nested json using Pandas and get the data using GET and POST methods by hitting Elasticsearch and validated by comparing with Kibana dashboards
Having Experience in parsing nested JSON documents using Python 3 and load data to S3 then ThoughtSpot
Converted the data frame from wide to long and vice versa using Pandas
Implemented various functions in NumPy and Pandas for mathematical operations and arrays.
Communicated with different teams to understand and report UI errors
Having experience in creating worksheets and pinboards in ThoughtSpot
Responsible for processing and analyzing UI logs using Pandas and Python
Experience in doing performance tuning of long running Python script

Environment: Jupyter tools, Elastic Search 7.9.1, Kibana,Python 3, Pandas,AWS S3,ThoughtSpot

Confidential, San Jose, CA

Azure Data Engineer

Responsibilities:

Involved in creating specifications for ETL processes, finalized requirements and prepared specification documents
Experience in creating and installing packages on cluster using Databricks
Experience in using Azure Databricks notebooks to clean, transform the data and load to Azure SQL staging tables using JDBC connector
Design, develop Data Factory pipelines to transform Excel, SharePoint Lists, JSON, Avro formats load into Azure SQL server.
Build pySpark scripts for transforming data and load to Azure SQL server staging table.
Created Scope and Access Tokens on Databricks to make pySpark scripts connect to Key Vaults.
Created to ODBC/JDBC connections to connect Pyspark scripts to Azure SQL Database.
Experience in creating and loading data into Hive tables with appropriate static and dynamic partitions, intended for efficiency.
Implemented various frameworks like Data Quality Analysis, Data Validation and Data Profiling with the help of technologies like Big data, PySpark, Pandas with database like Azure SQL server
Experience managing Azure Blob Storage, Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services.
Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight/Databricks
DevelopedETLjobs usingPySparkto migrate data from MYSQL server to HDFS.
Create Reusable ADF pipelines to call REST APIs.
Responsible for unit testing and in creating detailed Unit Test Document with all possible Test cases/Scripts
Used Azure Devops for scheduling ADF jobs and used Logic Apps for scheduling ADF pipelines
Worked with structured and unstructured data in Pyspark
Developed Pyspark scripts to integrate the data flow between SharePoint, SharePoint lists and Azure SQL Server.

Environment: Azure Logic App, Azure Blob Storage, Azure Databricks, Azure Data Factory(V2), Azure Data Lake Gen2, Azure SQL.

Confidential

Azure Data Engineer

Responsibilities:

Created Pipelines inADFusingLinked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources likeAzure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
Involved in creating the pyspark DataFrames in Azure Databricks to read the data from Data Lake or from Blob storage and use Spark Sql context for transformation.
Worked on cloud POC to select the optimal cloud vendor based on a set of rigid success criteria.
Design, development and implementation of performant ETL pipelines using PySpark and Azure Data Factory
Integration of data storage solutions in spark - especially with Azure Data Lake storage and Blob storage.
Performance tuning of Hive and Spark jobs.
Developed Hive scripts from TeradataSQLscripts to process data inHadoop.
CreatedHivetables to store the processed results and written Hive scripts to transform and aggregate the disparate data.
Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Analyzed large data sets usingHivequeries for Structured data, unstructured and semi-structured data.
Worked with structured data in Hiveto improve performance by various advanced techniques like Bucketing, Partitioning, and Optimizing self joins.
Written and used complex data type in storing and retrieving data using HQL in Hive.

Environment: Azure Data Factory(V2), Azure Databricks, Python 2.0, SSIS, Azure SQL, Azure Data Lake, Azure Blob Storage, Spark 2.0, Hive.

Confidential

ETL Developer

Responsibilities:

Involved in understanding Business and data needs and analyze multiple data sources and document data mapping to meet those needs.
Worked extensively on SQL, Informatica, Mload, Fload, FastExportas needed to handle different scenarios.
Worked on a Linux/UNIX platform consisting of a Production UNIX node and a Teradata UNIX node, both of which are connected to Teradata servers.
Troubleshoot Teradata ETL scripts, fix bugs and address production issues and performance tune
Extracted data from Teradata and writing to flat files using BTEQ and loading data into Teradata from flat files using MLoads.
Implemented extraction, transformation and load (ETL) scripts in BTEQ and load utilities as per the Business Logic.
Used various Teradata analytic functions.
Performance tuned and optimized various complex SQL queries.
Designed and created mappings/mapplets/sessions/workflows to move data from source to reporting EDW.
Developed and maintained complex Informatica mappings using Transformations like Router, Aggregator, Normalizer, Joiner, Expression and Lookup, Update strategy, Sequence generator and Stored Procedure.
Extensively worked on Mapplets, Mapping Variables, Mapping Parameters, and Session Parametersand increased the re-usability.
Used Informatica Power Center Workflow manager to create sessions, workflows and batches to run with the logic embedded in the mappings.
Use debugger in identifying bugs in existing mappings by analyzing data flow, evaluating transformations.
Broke up complicated slow mappings into multiple mappings that ran much faster and could be run in parallel.
Optimize session performance by eliminating performance bottlenecks.
Involved in extracting and loading volumes of up to 200 million or more records a day.
Constructed Shell driver scripts to pre-process data, run and schedule jobs in UNIX.
Perform data validation tests using complex SQL statements.

Environment: Teradata 13.10, Teradata SQL Assistant 13.10, Informatica Power Center 9.5, SUSE Linux Server 10 (2.6.16.60-0 ), SunOS 5.10, Erwin, MS Visio 2007

Confidential, Bentonville, AR.

ETL Developer

Responsibilities:

Proficient in ETL programming using Informatica and SQL scripting using Teradata and Oracle.
Coordinating with multiple teams forgathering requirements, Data analysis and system impactanalysis for new enhancements.
Designed and developed end-to-end ETL process for Deposits and LoansAffiliate process.
Creating mappings using a variety of transformations like Source Qualifier, Lookup (static/dynamic cache, connected/unconnected, persistent/nonpersistentcache), Stored Procedures (connected/unconnected, pre/post/normal), Routers, Filters, Aggregator, Normalizer, Unions, Update Strategies, Sorters and JAVA transformation.
Responsible for identifying the performance bottlenecks and fixing them.
Worked on a Linux/UNIX platform consisting of a Production UNIX node and a Teradata UNIX node, both of which are connected to Teradata servers.
Troubleshoot Teradata ETL scripts, fix bugs and address production issues and performance tune
Used SVNVersion control system for UNIX scripts and ETL Parameter files.
Unit Testing, Integration Testing and assisted super users in UAT.
Created UNIX shell scripting and automation of ETL processes.
Extensively used the Informatica Debugger for debugging the Mappings associated with failed Sessions.
Involved in performance tuning of targets, sources, mappings, and sessions.
Used pmcmd, pmrepagent and pmrepserver in non-windows environment.
Extensively used Joins, Triggers, Stored Procedures and Functions in interaction with backend database using PL/SQL as part of the process to handle different scenarios.
Worked closely with DBA in performance tuning of queries using Analyze Tables and SQL Trace.
Created post-session and pre-session shell scripts and mail-notifications.
Documented Informatica mappings, design and validation rules.
Participating in Performance testing activities.

Environment: Informatica 9.1, Oracle 10g database, Teradata v12, MS SQL server2012, Tortoise SVN, Linux

Confidential

ETL Developer

Responsibilities:

Extensively worked in Data Extraction, Transformation and Loading from flat files, Oracle, XML, Teradata Sources into Teradata using BTEQ, FastLoad, MultiLoad, Korn shell scripts and Informatica.
Developed project that identifies subscribers who are excessive OFFNET offenders Voice and Data by applying certain rules on Oracle PL/SQL.
Created UNIX shell scripting and automation of ETL processes.
Extensively used the Informatica Debugger for debugging the Mappings associated with failed Sessions.
Involved in performance tuning of targets, sources, mappings, and sessions.
Used pmcmd, pmrepagent and pmrepserver in non-windows environment.
Extensively used Joins, Triggers, Stored Procedures and Functions in interaction with backend database using PL/SQL as part of the process to handle different scenarios.
Worked closely with DBA in performance tuning of queries using Analyze Tables and SQL Trace.
Created post-session and pre-session shell scripts and mail-notifications.
Documented Informatica mappings, design and validation rules.
Extensively used Informatica Power Center to design multiple mappings with embedded business logic.
Tuned Informatica Mappings and Sessions for optimum performance and also used new features like Pushed down optimization.
Analyzed the functional specs provided by the data architect and created technical specs documents for all the mappings.
Utilized variousTransformationsin mappings like Joiner, Lookup, Sorter, Aggregator, Union, SQL, Stored Procedure, Update Strategy, Normalizer and Routerfor populating target table in efficient manner.
Worked on Mapping Variables, Mapping Parameters, Workflow Variables and Session Parameters
Used various tasks like session, email, command, Event wait, Event raise, control etc.
Developed a solution that allows subscriber account records on the SQL Server to be updated with the current status and address values in eCDW.
Developed Source to Target Audits to ensure that the data supporting critical business metrics are reconciled with the external data improving eCDW Data Quality and Integrity.
Worked closely with the source team and users to validate the accuracy of the mapped attributes.
Unit tested the developed ETL scripts, created test SQLs, and handled System testing and UAT issues.
Created Maestro Scripts to Scheduled ETL jobs.
Documented for all phases like Analysis, design, development, testing and maintenance.

Environment: Teradata 12.0, Teradata SQL Assistant 12.0, Oracle 10g, Informatica Power Center 9.1.1, TOAD, Maestro (TWS) V8.2, Erwin, MS Visio 2007

We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

Sterling, VA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship