We provide IT Staff Augmentation Services!

Azure Cloud Data Engineer Resume

4.00/5 (Submit Your Rating)

Las, VegaS

SUMMARY

  • 7+ years of experience in Data warehousing with exposure to Cloud Architecture Design, Modelling, Development, Testing, Maintenance and customer support environments on multiple domains like Telecom, Networking, Banking and Gaming.
  • 2+ years of experience in Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos NO SQL DB, Azure HDInsight Big Data Technologies (Hadoop and Apache Spark) and Data bricks.
  • Experience in designing Azure Cloud Architecture and Implementation plans for hosting complex application workloads on MS Azure.
  • Experience working in reading Continuous Json data from different source system using Kafka into Databricks Delta and processing the files using Apache Structured streaming, Pyspark and creating the files in parquet format.
  • Created data pipelines for both batch process, Micro - batch streaming and continuous streaming process in Databricks for high latency, low latency and ultra-low latency of data accordingly by using inbuilt Apache spark modules.
  • Well versed experienced in creating pipelines in Azure Cloud ADFv2 using different activities like Move &Transform, Copy, filter, for each, Data bricks etc.
  • Providing Azure technical expertise including strategic design and architectural mentorship, assessments, POCs, etc., in support of the overall sales lifecycle or consulting engagement process.
  • Hands on experience in Hadoop Ecosystem components such as Hadoop, Spark, HDFS, YARN, TEZ, Hive, Sqoop, MapReduce, Pig, OOZIE, Kafka, Storm, HBASE.
  • Have Experience in designing and developing Azure stream analytics jobs to process real time data using Azure Event Hubs, Azure IoT Hub and Service Bus Queue.
  • In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames and Spark Streaming.
  • Expertise in using Spark SQL, U-SQL with various data sources like JSON, Parquet and Hive.
  • Experience in writing HQL queries in Hive Data warehouse and performance tuning of HIVE scripts, resolving automation job failure issues and reloading the data into HIVE Data Warehouse if needed.
  • Experience in using Accumulator and Broadcast variables, RDD caching for Spark Streaming.
  • Experience in data processing like collecting, aggregating, moving from various source using Apache Kafka.
  • Strong experience in writing applications using Python using different libraries like Pandas , NumPy , SciPy, Matpotlib etc.
  • Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like CosmosDB .
  • Much experience in performing Data Modelling by designing Conceptual, Logical data models and translating them to Physical data models for high volume datasets from various sources like Oracle, Teradata, Vertica and SQL Server by using Erwin tool.
  • 5+ years of Proficient experience in Teradata database and Teradata Load and Unload Utilities (FASTLOAD, FASTEXPORT, MULTILOAD, TPUMP, BTEQ, TPT and TPTAPI).
  • Expert knowledge and experience in Business Intelligence Data Architecture, Data Management and Modeling to integrate multiple, complex data sources that are transactional and non-transactional, structured and unstructured.
  • 4+ years of Expert knowledge in working with Informatica Power Center 9.6.x/8.x/7.x (Designer, Repository manager, Repository Server Administrator console, Server Manager, Workflow manager, workflow monitor).
  • Also, design and develop relational databases for collecting and storing data and build and design data input and data collection mechanisms.
  • Well versed with Relational and Dimensional Modeling techniques like Star, Snowflake Schema, OLTP, OLAP, Normalization, Fact and Dimensional Tables .
  • Over 4+ years of working experience in Vertica Data Architecture, designing and writing Vsql scripts.
  • Good knowledge in creating SQL queries, collecting statistics and Teradata SQL query performance tuning techniques and Optimizer/explain plan.
  • Well versed in writing UNIX shell scripting.
  • Self-motivated, hardworking, possess strong analytical and problem-solving skills and result oriented with the spirit of teamwork and effective communication and interpersonal skills. Eager to learn, able to adapt quickly, well organized and very reliable.

TECHNICAL SKILLS

Azure Cloud Platform: ADFv2, BLOB Storage, ADLS, Azure SQL DB, SQL server, Azure Synapse, Azure Analytic Services, Data bricks, Mapping Dataflow (MDF), Azure Cosmos DB, Azure Stream Analytics, Azure Event Hub, Azure Machine Learning, App Services, Logic Apps, Event Grid, Service Bus, Azure DevOps, GIT Repository Management, ARM Templates

Teradata Tools and Utilities: FatsLoad, FastExport, MultiLoad, Tpump, TPT, Teradata SQL Assistant, BTEQ

Modelling & DA Specs Tools: CA Erwin Data Modeler, MS Visio, Gromit for DA Specs

ETL Tools: Informatica Power Center 9.x/ 8.6/8.5/8.1/7 ), DataStage 11.x/9.x, SSIS

Programming Languages: PySpark, Python, U-SQL, T-SQL, LINUX Shell Scripting, AZURE PowerShell, Java

Big data Technologies: Hadoop, HDFS, Hive, Apache Spark, Apache Kafka, Pig, Zookeeper, Sqoop, Oozie, HBASE, YARN

Databases: Azure SQL Warehouse, Azure SQL DB, Azure Cosmos No SQL DB, Teradata, Vertica, RDBMS, MySQL, Oracle, Microsoft SQL Server

IDE and Tools: Eclipse, Tableau, IntelliJ, R Studio, SSMS, Maven, SBT, MS-Project, GitHub, Microsoft Visual Studio

Scheduler Tools: Tivoli workload scheduler, Autosys Scheduler, Control-M

Methodologies: Waterfall, Agile/Scrum, SDLC

PROFESSIONAL EXPERIENCE

Confidential, Las Vegas

Azure cloud Data Engineer

Environment: Azure Cloud, Azure Data Factory (ADF v2), Azure functions Apps, Azure DataLake, BLOB Storage, SQL server, Teradata Utilities, Windows remote desktop, UNIX Shell Scripting, AZURE PowerShell, Data bricks, Python, Erwin Data Modelling Tool, Azure Cosmos DB, Azure Stream Analytics, Azure Event Hub, Azure Machine Learning.

Responsibilities:

  • Attended requirement calls and worked with Business Analyst and Solution Architects to understand the requirements from clients.
  • Analyzed the data flow from different sources to target to provide the corresponding design Architecture in Azure environment.
  • Take initiative and ownership to provide business solutions on time.
  • Created High level technical design documents and Application design document as per the requirements and delivered clear, well-communicated and complete design documents.
  • Created DA specs and Mapping Data flow and provided the details to developer along with HLDs.
  • Created Application Interface Document for the downstream to create new interface to transfer and receive the files through Azure Data Share. creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks
  • Ingested data in mini-batches and performs RDD transformations on those mini-batches of data by using Spark Streaming to perform streaming analytics in Data bricks.
  • Created, provisioned different Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.
  • Improved performance by optimizing computing time to process the streaming data and saved cost to company by optimizing the cluster run time.
  • Perform ongoing monitoring, automation and refinement of data engineering solutions prepare complex SQL views, stored procs in azure SQL DW and Hyperscale
  • Loaded different files from ADLS by using U-SQL scripts into target Azure Data warehouse.
  • Worked on complex U-SQL for the data transformation and loading table and report generation.
  • Designed and developed a new solution to process the NRT data by using Azure stream analytics, Azure Event Hub and Service Bus Queue.
  • Created Linked service to land the data from Caesars SFTP location to Azure Datalake.
  • Created numerous pipelines in Azure using Azure Data Factory v2 to get the data from different source databases Informix, Sybase etc by using different Azure Activities like Move &Transform, Copy, filter, for each, Databricks etc.
  • Created several Databricks Spark jobs with Pyspark to perform several tables to table operations.
  • Extensively used SQL Server Import and Export Data tool.
  • Created database users, logins and permissions to setup.
  • Working with complex SQL, Stored Procedures, Triggers, and packages in large databases from various servers.
  • Experience in creating Data Lake Analytics account and creating Date Lake Analytics Job in Azure Portal using U-SQL Script.
  • Helping team member to resolve any technical issue, Troubleshooting, Project Risk & Issue identification and management
  • Addressing resource issue, Monthly one on one, Weekly meeting.

Confidential, Chicago, IL

Azure cloud Data Engineer

Environment: Microsoft Azure HDINSIGHT, Hadoop Stack, Sqoop, Hive, Oozie, Microsoft SQL server, HBASE, YARN, Hortonworks, UNIX Shell Scripting, AZURE PowerShell, Databricks, Python, Erwin Data Modelling Tool, Azure Cosmos DB, Azure Data Factory (ADF v2) Azure functions Apps, Web service, Azure DataLake, BLOB Storage, Azure SQL DB, Azure SQL Warehouse.

Responsibilities:

  • Working with the Hortonworks Distribution of Hadoop.
  • Played a lead role in architecting and development of Confidential Data Lake and in building Confidential Data Cube on Microsoft Azure HDINSIGHT cluster.
  • Responsible for managing data coming from disparate data sources.
  • Experience in ingesting incremental updates from Inform web services on to Hadoop data platform using Sqoop.
  • Implemented OLAP multi-dimensional cube functionality using Azure SQL Data Warehouse.
  • Experience in working with Restful APIs.
  • Created HBase tables to store various data formats coming from different applications.
  • Developed scripts for extracting and processing EDI POS sales data sourced from SFTP server in Hive data warehouse using Linux shell scripting.
  • Implemented proof of concept to analyze the streaming data using Apache Spark with Python; Used Maven/SBT for build and deploy the Spark programs.
  • Responsible for building Confidential data cube using SPARK framework by writing Spark SQL queries in Python so as to improve efficiency of data processing and reporting query response time.
  • Developed spark programming code in Python Data bricks workbooks.
  • Performance tuning of SQOOP, Hive and Spark jobs.
  • Responsible for modification of ETL data load scripts, scheduling automated jobs and resolving production issues (if any) on time.
  • Wrote AZURE POWERSHELL scripts to copy or move data from local file system to HDFS Blob storage.
  • Developed OOZIE workflows to automate ETL process by scheduling multiple SQOOP and HIVE jobs.
  • Daily Monitoring of Cluster status and health using AMBARI UI.
  • Maintained technical documentation for launching and executing jobs on Hadoop clusters.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Responsible for programming code independently for intermediate to complex modules following development standards.
  • Planned and conducted code reviews for changes and enhancements that ensure standards compliance and systems interoperability.
  • Responsible for modifying the code, debugging, and testing the code before deploying on the production cluster.

Confidential, Chicago, IL

Sr. ETL/Teradata Developer.

Environment: : Teradata R12/R13, Teradata SQL Assistant, Informatica Power center, SQL, Outlook, Putty, WinSCP, MLOAD, TPUMP, FAST LOAD, FAST EXPORT, SSIS, MYSQL, Oracle, Unix, TPT, FTP, Python.

Responsibilities:

  • Involved in understanding the Requirements of the End Users/Business Analysts and Developed Strategies for ETL processes.
  • Performed analysis of complex business issues and provided recommendations for possible solutions. Writing SQL queries.
  • Extracted data from flat files (provided by disparate ERP systems) and loaded the data into Teradata staging using Informatica Power Center.
  • Identify key data or components that fit within the business system/process and document the gaps that need solutions.
  • Program using T-SQL in SQL Server, PL/SQL in Oracle, Microsoft SSRS and Crystal Reports, configure permits and licenses for individuals, properties and businesses in CSDC Application suite.
  • Extracted data from different sources like Flat files ("pipe" delimited or fixed length), excel spreadsheets and Databases.
  • Used Teradata utilities BTEQ, FAST LOAD, MULTI LOAD, TPUMP to load data.
  • Wrote, tested and implemented Teradata Fast load, Multiload and BTEQ scripts, DML and DDL.
  • Managed all development and support efforts for the Data Integration/Data Warehouse team.
  • Set and follow Informatica best practices, such as creating shared objects in shared for reusability and standard naming convention of ETL objects, design complex Informatica transformations, mapplets, mappings, reusable sessions, worklets and workflows.
  • Used Informatica Data Quality Tool (Developer) to scrub, standardize, and match customer Address against the reference table and Performed Unit testing for all the interfaces.
  • Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata.
  • Worked on creating Technical Design Documents (TDD) by performing the impact analysis on the application for the new functionality changes.
  • Performance tuned and optimized various complex SQL queries.
  • Used BTEQ and SQL Assistant (Query man) front-end tools to issue SQL commands matching the business requirements to Teradata RDBMS.
  • Coordinated with the business analysts and developers to discuss issues in interpreting the requirements
  • Provided on call support during the release of the product to low level to high level Production environment.
  • Used Agile methodology for repeated testing.
  • Involved in Unit testing, User Acceptance testing to check whether the data is loading into target, which was extracted from different source systems according to the user requirements.
  • Prepared BTEQ import, export scripts for tables.
  • Written BTEQ, FAST LOAD, MLOAD scripts.
  • Interacting with the Source Team and Business to get the Validation of the data, End to End Testing.

Confidential, Plano, TX

ETL/Teradata Developer

Environment: Oracle, Teradata, Teradata SQL Assistant, Informatica Power center, SQL, MLOAD, TPUMP, FAST LOAD, FAST EXPORT.

Responsibilities:

  • Involved in Complete Software Development Lifecycle Experience (SDLC) from Business Analysis to Development, Testing, Deployment and Documentation.
  • Involved in cleansing files, transform and load data into the Teradata using Teradata Utilities.
  • Involved in Requirement analysis in support of Data Warehousing efforts along with Business Analyst and working with the QA Team in Waterfall methodology.
  • Loaded data in to the Teradata tables using Teradata Utilities BTEQ, Fast Load, Multiload, and Fast Export and TPT.
  • Extensively worked in the performance tuning of Teradata SQL, ETL and other processes to optimize session performance.
  • Involved in Unit testing, User Acceptance testing to check whether the data is loading into target, which was extracted from different source systems according to the user requirements.
  • Develop complex ETL mappings on SSIS platform as part of the Risk Data integration efforts.
  • Created packages with different control flow options and data flow transformations such as Conditional Split, Multicast, Union all and Derived Column etc.
  • Coordinated with the business analysts and developers to discuss issues in interpreting the requirements.
  • Worked on creating Technical Design Documents (TDD) by performing the impact analysis on the application for the new functionality changes.
  • Extracted data from source tables and transformed the data based on user requirements and loaded data to target database.
  • Responsible for migrating the workflows from development to production environment.
  • Managed all development and support efforts for the Data Integration/Data Warehouse team.
  • Used Teradata utilities fastload, multiload, tpump to load data.
  • Wrote BTEQ scripts to transform data and Wrote Fastexport scripts to export data.
  • Wrote, tested and implemented Teradata Fastload, Multiload and Bteq scripts, DML and DDL.
  • Wrote views based on user and/or reporting requirements.
  • Wrote Teradata Macros and used various Teradata analytic functions.

Confidential, Atlanta, GA

ETL/SQL Server Developer

Environment: SQL Server 2016, SSIS, SSAS, SSRS, Microsoft Visual Studio 2012, Teradata, Informatica, T- SQL, MS Access, MS Excel, Putty, WinSCP, Outlook

Responsibilities:

  • Created complex Freeform’ s at Folder and Property level for data entry and maintenance.
  • Undertake other software development project related tasks which may be reasonably expected.
  • Analyzing the given documents, understanding and implementing it.
  • Implemented the changes according to the change request document.
  • Written many queries by using different sql objects, functions and keywords.
  • Worked with various scripts (BTEQ, Fastload and Mload) by creating and manipulating them.
  • Implemented performance tuning wherever required by creating required indexes, Eliminating spool space issues etc.
  • Participate in peer-reviews, including, but not limited to design and development (coding).
  • Participate and present status updates in each of the Stand-Up meetings on a regular basis.
  • Solve complex problems in a time bound manner, document issues and fixes for easy system maintenance
  • Mainly did error handling in ET, UV and Worktables.
  • Analyzed the errors in error tables and reported to the client for restart ability.
  • Did data reconciliation across source systems.
  • Creating Unit test cases and implemented unit test results.
  • Performance tuning, SQL query enhancements, code enhancements to achieve performance targets using Explain Plans.
  • Involved in writing Fast load, Multi load and scripts to make loading, Updating.
  • Answered Ad-hoc request of the client by using Functions, Analytical functions etc.

We'd love your feedback!