Azure Data Engineer Resume
Sterling, VA
SUMMARY
- 7+ years of experience in interacting with business users to analyze the business process and requirements and transforming requirements into data warehouse design, documenting and rolling out the deliverables.
- Experience in creating Spark applications usingpySparkandSpark - SQLfor data extraction, transformation and aggregation from multiple source systems (different file formats, databases) for analyzing & transforming the data to uncover insights into the customer usage patterns.
- Created logic app workflow to read data from SharePoint and store in the blob container
- Having experience with code analysis, code management in Azure Databricks
- Having experience in creating secrets and accessing key vaults for database, SharePoint credentials
- Created pipelines, datasets, linked services in Azure Data Factory
- Automated deployment using Data Factory's integration with Azure Pipelines
- Having experience in creating physical and logical modeling
- Having experience in creating stored procedures and SQL queries
- Expert level skills in Extract Transform and Load tools (ETL) Data Warehouse.
- In-depth Experience in Data Warehouse ETL Architecture and development of ETL using Teradata, Oracle, SQL, PL/SQL, Informatica, UNIX Shell scripts, SQL*Plus, SQL*Loader.
- Experience with all stages of Project development life cycle across numerous platforms in a variety of industries.
- Knowledge of different Schemas (Star and Snowflake) to fit reporting, query and business analysis requirements.
- Experience with Production support for ETL processes and databases.
- Strong data warehousing experience using Informatica with extensive experience in designing the Tasks, Workflows, Mappings, Mapplets and Scheduling the workflow/ sessions using Informatica.
- Optimize session performance by eliminating performance bottlenecks in source, target, and transformations.
- Experience in using Informatica command line utilities like pmcmd, pmrepserver, and pmrepagent to control workflows, sessions and tasks.
- Experience in working with UNIX Shell Scripting, CRON, FTP and file management in various UNIX environments.
- Experience in creating functional and technical specifications for ETL processes and designing architecture for ETL.
- Hands-on experience in writing, testing and implementation of the Cursors, Procedures, Functions, Triggers, and Packages at Database level using PL/SQL.
- Extensively used PL/SQL bulk load options, Collections, PL/SQL tables and V Arrays and RefCursors.
- Experience in Performance tuning & Optimization of SQL statements using SQL Trace, Explain Plan and TkProf.
- Extensively worked with Teradata SQL and associated tools like BTEQ, Fast Load, FastExport, MultiLoad and SQL Assistant.
- Experience in analyzing data using HiveQL, and pyspark programs
- Proven ability to quickly learn and apply new technologies that translate requirements into client-focused solutions.
- Excellent analytical, problem solving and communication skills.
- Ability to multi-task in a fast-paced environment and to work independently or collaboratively.
TECHNICAL SKILLS
Cloud: Azure (Data Factory, Data Lake, Databricks, Logic App, ARM, Azure SQL)
Automation Tools: Azure Logic App,Crontab
Big Data: PySpark, Hive
Code Repository Tools: Azure DevOps
Database: SQL Server Management Studio 18,Oracle 11g/10g/9i/8i/7.X, SQL Server 2000/2005, Teradata 13.10/12.0/ V2R6/ V2R5, MS Access
Database Tools: SQL Navigator, TOAD, Teradata Utilities, SQL*Plus, SQL*Loader, ERwin.
ETL: Azure Data Factory(V2), Azure DatabricksInformatica10.1, 9.5.1/9.1.1
Languages: Python, Pandas, SQL, PL/SQL, UNIX Shell Script, Perl, C, C++
Operating System: Windows 10/7/XP/2000/NT/98/95, UNIX, LINUX, DOS
PROFESSIONAL EXPERIENCE
Confidential
Azure Data Engineer
Responsibilities:
- ADF jobs are executed using a trigger that can be set to run on a specific schedule
- The jobs are monitored for MIC for any Interface Failures like Source Access Failures and troubleshoot for the root cause
- The jobs are restarted manually depending on the cause of the failure
Environment: Azure Logic App, Azure Blob Storage, Azure Data Factory(V2), Azure Data Lake Gen2, Azure SQL.
Confidential, Sterling, VA
Data Engineer
Responsibilities:
- Experience in using GET and POST requests to get JSON data from Elasticsearch
- Having experience in parsing nested json using Pandas and get the data using GET and POST methods by hitting Elasticsearch and validated by comparing with Kibana dashboards
- Having Experience in parsing nested JSON documents using Python 3 and load data to S3 then ThoughtSpot
- Converted the data frame from wide to long and vice versa using Pandas
- Implemented various functions in NumPy and Pandas for mathematical operations and arrays.
- Communicated with different teams to understand and report UI errors
- Having experience in creating worksheets and pinboards in ThoughtSpot
- Responsible for processing and analyzing UI logs using Pandas and Python
- Experience in doing performance tuning of long running Python script
Environment: Jupyter tools, Elastic Search 7.9.1, Kibana,Python 3, Pandas,AWS S3,ThoughtSpot
Confidential, San Jose, CA
Azure Data Engineer
Responsibilities:
- Involved in creating specifications for ETL processes, finalized requirements and prepared specification documents
- Experience in creating and installing packages on cluster using Databricks
- Experience in using Azure Databricks notebooks to clean, transform the data and load to Azure SQL staging tables using JDBC connector
- Design, develop Data Factory pipelines to transform Excel, SharePoint Lists, JSON, Avro formats load into Azure SQL server.
- Build pySpark scripts for transforming data and load to Azure SQL server staging table.
- Created Scope and Access Tokens on Databricks to make pySpark scripts connect to Key Vaults.
- Created to ODBC/JDBC connections to connect Pyspark scripts to Azure SQL Database.
- Experience in creating and loading data into Hive tables with appropriate static and dynamic partitions, intended for efficiency.
- Implemented various frameworks like Data Quality Analysis, Data Validation and Data Profiling with the help of technologies like Big data, PySpark, Pandas with database like Azure SQL server
- Experience managing Azure Blob Storage, Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services.
- Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight/Databricks
- DevelopedETLjobs usingPySparkto migrate data from MYSQL server to HDFS.
- Create Reusable ADF pipelines to call REST APIs.
- Responsible for unit testing and in creating detailed Unit Test Document with all possible Test cases/Scripts
- Used Azure Devops for scheduling ADF jobs and used Logic Apps for scheduling ADF pipelines
- Worked with structured and unstructured data in Pyspark
- Developed Pyspark scripts to integrate the data flow between SharePoint, SharePoint lists and Azure SQL Server.
Environment: Azure Logic App, Azure Blob Storage, Azure Databricks, Azure Data Factory(V2), Azure Data Lake Gen2, Azure SQL.
Confidential
Azure Data Engineer
Responsibilities:
- Created Pipelines inADFusingLinked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources likeAzure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
- Involved in creating the pyspark DataFrames in Azure Databricks to read the data from Data Lake or from Blob storage and use Spark Sql context for transformation.
- Worked on cloud POC to select the optimal cloud vendor based on a set of rigid success criteria.
- Design, development and implementation of performant ETL pipelines using PySpark and Azure Data Factory
- Integration of data storage solutions in spark - especially with Azure Data Lake storage and Blob storage.
- Performance tuning of Hive and Spark jobs.
- Developed Hive scripts from TeradataSQLscripts to process data inHadoop.
- CreatedHivetables to store the processed results and written Hive scripts to transform and aggregate the disparate data.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Analyzed large data sets usingHivequeries for Structured data, unstructured and semi-structured data.
- Worked with structured data in Hiveto improve performance by various advanced techniques like Bucketing, Partitioning, and Optimizing self joins.
- Written and used complex data type in storing and retrieving data using HQL in Hive.
Environment: Azure Data Factory(V2), Azure Databricks, Python 2.0, SSIS, Azure SQL, Azure Data Lake, Azure Blob Storage, Spark 2.0, Hive.
Confidential
ETL Developer
Responsibilities:
- Involved in understanding Business and data needs and analyze multiple data sources and document data mapping to meet those needs.
- Worked extensively on SQL, Informatica, Mload, Fload, FastExportas needed to handle different scenarios.
- Worked on a Linux/UNIX platform consisting of a Production UNIX node and a Teradata UNIX node, both of which are connected to Teradata servers.
- Troubleshoot Teradata ETL scripts, fix bugs and address production issues and performance tune
- Extracted data from Teradata and writing to flat files using BTEQ and loading data into Teradata from flat files using MLoads.
- Implemented extraction, transformation and load (ETL) scripts in BTEQ and load utilities as per the Business Logic.
- Used various Teradata analytic functions.
- Performance tuned and optimized various complex SQL queries.
- Designed and created mappings/mapplets/sessions/workflows to move data from source to reporting EDW.
- Developed and maintained complex Informatica mappings using Transformations like Router, Aggregator, Normalizer, Joiner, Expression and Lookup, Update strategy, Sequence generator and Stored Procedure.
- Extensively worked on Mapplets, Mapping Variables, Mapping Parameters, and Session Parametersand increased the re-usability.
- Used Informatica Power Center Workflow manager to create sessions, workflows and batches to run with the logic embedded in the mappings.
- Use debugger in identifying bugs in existing mappings by analyzing data flow, evaluating transformations.
- Broke up complicated slow mappings into multiple mappings that ran much faster and could be run in parallel.
- Optimize session performance by eliminating performance bottlenecks.
- Involved in extracting and loading volumes of up to 200 million or more records a day.
- Constructed Shell driver scripts to pre-process data, run and schedule jobs in UNIX.
- Perform data validation tests using complex SQL statements.
Environment: Teradata 13.10, Teradata SQL Assistant 13.10, Informatica Power Center 9.5, SUSE Linux Server 10 (2.6.16.60-0 ), SunOS 5.10, Erwin, MS Visio 2007
Confidential, Bentonville, AR.
ETL Developer
Responsibilities:
- Proficient in ETL programming using Informatica and SQL scripting using Teradata and Oracle.
- Coordinating with multiple teams forgathering requirements, Data analysis and system impactanalysis for new enhancements.
- Designed and developed end-to-end ETL process for Deposits and LoansAffiliate process.
- Creating mappings using a variety of transformations like Source Qualifier, Lookup (static/dynamic cache, connected/unconnected, persistent/nonpersistentcache), Stored Procedures (connected/unconnected, pre/post/normal), Routers, Filters, Aggregator, Normalizer, Unions, Update Strategies, Sorters and JAVA transformation.
- Responsible for identifying the performance bottlenecks and fixing them.
- Worked on a Linux/UNIX platform consisting of a Production UNIX node and a Teradata UNIX node, both of which are connected to Teradata servers.
- Troubleshoot Teradata ETL scripts, fix bugs and address production issues and performance tune
- Used SVNVersion control system for UNIX scripts and ETL Parameter files.
- Unit Testing, Integration Testing and assisted super users in UAT.
- Created UNIX shell scripting and automation of ETL processes.
- Extensively used the Informatica Debugger for debugging the Mappings associated with failed Sessions.
- Involved in performance tuning of targets, sources, mappings, and sessions.
- Used pmcmd, pmrepagent and pmrepserver in non-windows environment.
- Extensively used Joins, Triggers, Stored Procedures and Functions in interaction with backend database using PL/SQL as part of the process to handle different scenarios.
- Worked closely with DBA in performance tuning of queries using Analyze Tables and SQL Trace.
- Created post-session and pre-session shell scripts and mail-notifications.
- Documented Informatica mappings, design and validation rules.
- Participating in Performance testing activities.
Environment: Informatica 9.1, Oracle 10g database, Teradata v12, MS SQL server2012, Tortoise SVN, Linux
Confidential
ETL Developer
Responsibilities:
- Extensively worked in Data Extraction, Transformation and Loading from flat files, Oracle, XML, Teradata Sources into Teradata using BTEQ, FastLoad, MultiLoad, Korn shell scripts and Informatica.
- Developed project that identifies subscribers who are excessive OFFNET offenders Voice and Data by applying certain rules on Oracle PL/SQL.
- Created UNIX shell scripting and automation of ETL processes.
- Extensively used the Informatica Debugger for debugging the Mappings associated with failed Sessions.
- Involved in performance tuning of targets, sources, mappings, and sessions.
- Used pmcmd, pmrepagent and pmrepserver in non-windows environment.
- Extensively used Joins, Triggers, Stored Procedures and Functions in interaction with backend database using PL/SQL as part of the process to handle different scenarios.
- Worked closely with DBA in performance tuning of queries using Analyze Tables and SQL Trace.
- Created post-session and pre-session shell scripts and mail-notifications.
- Documented Informatica mappings, design and validation rules.
- Extensively used Informatica Power Center to design multiple mappings with embedded business logic.
- Tuned Informatica Mappings and Sessions for optimum performance and also used new features like Pushed down optimization.
- Analyzed the functional specs provided by the data architect and created technical specs documents for all the mappings.
- Utilized variousTransformationsin mappings like Joiner, Lookup, Sorter, Aggregator, Union, SQL, Stored Procedure, Update Strategy, Normalizer and Routerfor populating target table in efficient manner.
- Worked on Mapping Variables, Mapping Parameters, Workflow Variables and Session Parameters
- Used various tasks like session, email, command, Event wait, Event raise, control etc.
- Developed a solution that allows subscriber account records on the SQL Server to be updated with the current status and address values in eCDW.
- Developed Source to Target Audits to ensure that the data supporting critical business metrics are reconciled with the external data improving eCDW Data Quality and Integrity.
- Worked closely with the source team and users to validate the accuracy of the mapped attributes.
- Unit tested the developed ETL scripts, created test SQLs, and handled System testing and UAT issues.
- Created Maestro Scripts to Scheduled ETL jobs.
- Documented for all phases like Analysis, design, development, testing and maintenance.
Environment: Teradata 12.0, Teradata SQL Assistant 12.0, Oracle 10g, Informatica Power Center 9.1.1, TOAD, Maestro (TWS) V8.2, Erwin, MS Visio 2007