We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

4.00/5 (Submit Your Rating)

Tampa, FL

SUMMARY

  • Around 8 years of Software Development Experience in Azure/Big data Hadoop stack and automation testing.
  • Around 5 years of experience in implementing complete Hadoop solutions, including data acquisition, data validation, data profiling, storage, transformation, analysis and integration with other frameworks to meet business needs using Azure technology stack that includes Azure Data Factory (ADF),Azure Data Lake Storage (ADLS gen2), Azure logic apps, Azure Blobs, Azure Synapse Analytics, Spark Streaming, Databricks, Spark SQL, Kafka, Snowflake Data Cloud.
  • Also, has working experience in Big Data/Hadoop technology stack that includes HDFS, MapReduce, HBase, Hive, Pig, Impala, Oozie, Sqoop, Spark SQL, YARN/MRv1/MRv2, Flume (Web log processing), Zookeeper.
  • Experienced in extract Transform and Load data from different Sources Systems to Azure Data Lake Storage (ADLS) using a combination of Azure Data Factory (ADF), Spark SQL and U - SQL Azure Data Lake Analytics and processing the data in InAzure Databricks.
  • Expertise in various Hadoop distributions like Cloudera, Hortonworks distributions and Azure.
  • Experience in importing/exporting of data into/from Traditional Database like Teradata/Oracle RDBMS using Sqoop.
  • Experienced in getting streaming data into HDFs using Flume, memory channels, custom interceptors.
  • Extensively worked in writing, fine tuning and profiling Map Reduce jobs for optimized performance.
  • Extensive experience in implementing data analytical algorithms using Map reduce design patterns.
  • Experience in implementing complex map reduce algorithms to perform joins on the Map side using distributed cache.
  • Experience in creating Hive Internal/External tables, loading with data and troubleshoot with Hive jobs.
  • Experience in writing complex Hive Queries and PIG scripts for the Business use cases. Extended Hive and Pig core functionality by writing Custom UDFs.
  • Experienced in creating Hive internal/external tables, partitions, dynamic partitions, buckets to interact with data scientist to perform ad-hoc queries in structured data.
  • Good understanding of various Hadoop file formats and compressions i.e. RCFile, Parquet, ORC File, Avro, GZip and Snappy.
  • Experienced in handling ETL transformations using Pig Latin scripts, expressions, join operations and Custom UDF's for evaluation, filtering and storing data.
  • Expert in analyzing real time queries using different NoSQL data bases including HBase.
  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Python.
  • Experience in converting business process into RDD transformations using Apache Spark and Python.
  • Experience in Writing Producers/consumers and creating messaging centric applications using Apache Kafka.
  • Experience in supporting data driven analytic projects using Azure and Databricks Spark.
  • Expertise and Knowledge in using job scheduling and monitoring tools like Azkaban, Oozie and Zookeeper.
  • Expertise in writing Shell-Scripts, Cron Automation and Regular Expressions.
  • Experience with various SDLC methodologies like Waterfall and Agile and Object-Oriented Analysis and Design (OOAD).
  • Strong hands on experience in development of Client/Server Applications using Java/J2EE, XML.
  • Experienced in Object Oriented Analysis and Object-Oriented Design using Unified Modeling Language (UML).
  • Working knowledge of Web/Application Servers like JBoss, Apache Tomcat, IBM Web Sphere and Oracle Web Logic.
  • Expertise in tools and utilities like Eclipse, TOAD for Oracle, Rational Rose (UML tool), WSAD, RAD, Ant, Maven.
  • Experience in developing and executing exceptional modularized re-usable automated scripts using Selenium, Java, Quick Test Professional (QTP)/UFT and VBScript for testing client/server, web-based n-tier applications. Designed and Developed automation frameworks like Keyword Driven Framework and Hybrid Framework for based on the project requirement.
  • Strong knowledge of agile development methodologies, waterfall methodologies to minimize customer impact.
  • Well-versed with all phases of testing such as unit testing with Junit, integration testing, Quality Assurance testing, System testing, UAT testing.
  • Have good experience with both Windows, LINUX and UNIX platforms.

TECHNICAL SKILLS

Programming Languages: Python, Java, Scala

Distributed File Systems: Apache Hadoop HDFS

Hadoop Distributions: Cloudera, Hortonworks

Hadoop Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Azkaban, Oozie, Zookeeper, Flume, Spark, Spark Streaming and Apache Kafka

NoSQL data bases: HBase

Relational Data Stores: Oracle, MySQL, Sql Server, Teradata

Search Platforms: Apache Solr

In-memory/MPP/Search: Apache Spark, Apache Spark Streaming, Apache Storm

Operating Systems: Windows, UNIX, LINUX

Cloud Platforms: Azure, AWS.

PROFESSIONAL EXPERIENCE

Confidential, Tampa, FL

Azure Data Engineer

Responsibilities:

  • Responsible for design, development and delivery of data from operational systems and files into Data Lake.
  • Used Python and Spark SQL to convert Hive/SQL native queries into Spark DF transformations in Apache Spark.
  • Extract Transform and Load data from different Sources Systems to Azure Data Lake Storage (ADLS) using a combination of Azure Data Factory (ADF), Spark SQL and processing the data in InAzure Databricks.
  • Created Batch & Streaming Pipelines in Azure Data Factory (ADF) using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Created Azure Data Factory (ADF) Batch pipelines to Ingest data from relational sources into Azure Data Lake Storage (ADLS gen2) Raw Path in full & incremental fashion and then load into Delta tables after cleansing.
  • Crated Azure Data Factory (ADF) Streaming pipelines to consume data from Azure Event Hub using Spark and load into Azure Data Lake Storage (ADLS).
  • Consuming data from Liveperson API source via spark using oauth authentication using Pyspark, parse response and load into raw and core tables.
  • Created Azure Data Factory pipelines for applying transformations using Databricks Spark as per the UC and then finally move/load the transformed data into Curated data Model.
  • Import the data from different sources like HDFS/Hive intoSparkData Frames and Data Sets with spark 2.0.
  • Load the data from Azure Data Lake Storage to Azure Synapse using Databricks with Pyspark/python and created reusable ADF pipelines for the data loads.
  • Create Source Secrets in Azure Key Vaults and then access them in Azure Data Factory (ADF) & Databricks to connect to Sources.
  • Implemented invoking a pipeline of one Azure Data Factory (ADF) from another Azure Data Factory using Azure logic apps and ADF web activity post method.
  • Created Azure logic apps to trigger when a new email received with an attachment and load the file to blog storage.
  • Created reusable Azure Data Factory pipeline to pull the data from Teradata and load it to ADLS gen2 in incremental fashion/truncate and reload.
  • Implemented CI/CD pipelines using Azure DevOps in cloud with GIT, Maven, along with Jenkins plugins.
  • Implemented automation scripts for voltage Encryption in Hadoop ETL jobs using Python to improve the productivity of development by eliminating manual changes.
  • Built the Efficient Near Real-time data processing pipeline using Kafka, Spark Structured Streaming and HBase for processing the incoming trades instantly.
  • Developed Apachesparkbatch jobs using Python for faster data processing and usedsparkSQL for querying.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Worked in designing and developing applications in Spark using python to compare the performance of Spark with Hive.
  • Developed a NIFI Workflow to pick up the data from HDFS and drop it in SFTP server and send the email notifications.
  • Ingested data from MySQL/SQL Server into Data Lake using Sqoop and importing various formats of flat files in to HDFS using custom batch Jobs.
  • Working with BA’s, End users and architects to define and process requirements, build code efficiently and work in collaboration with the rest of the team for effective solutions.
  • Deliver projects on-time and to specification with quality.
  • Used the version control system GIT to access the repositories and used in coordinating with CI tools.

Environment: Azure Data Factory (ADF),Azure Data Lake Storage (ADLS gen2), Azure logic apps, Azure Blobs, Azure Synapse Analytics, Azure Key-Vault, Spark Streaming, Databricks, Spark SQL, Kafka, Teradata, SQL Server, Snowflake Data Cloud Hortonworks, HDFS, Sqoop, GIT.

Confidential, Tampa, FL

Azure Data Engineer

Responsibilities:

  • Responsible for analyzing large data sets to develop custom data pipelines to drive businesssolutions.
  • Responsible for creating reusable Azure Data Factory (ADF) pipelines for Data Ingestion & Transformations from different sources using PySpark Databricks.
  • Creating Batch & structured streaming Azure Data Factory (ADF) pipelines using Databricks Spark, Delta tables, ADLS Gen2, Azure Key-vault, Azure Blob & Azure Event Hub.
  • Used Python and Spark SQL to convert Hive/SQL native queries into Spark DF transformations in Apache Spark.
  • Created Batch & Streaming Pipelines in Azure Data Factory (ADF) using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data.
  • Created re-usable ADF pipelines to extract the data from HDFS and land it to Azure Data Lake Storage and created delta tables.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Created Azure Data Factory (ADF) Batch pipelines to Ingest data from relational sources into Azure Data Lake Storage (ADLS gen2) Raw Path in full & incremental fashion and then load into Delta tables after cleansing.
  • Crated Azure Data Factory (ADF) Streaming pipelines to consume data from Azure Event Hub using Spark and load into Azure Data Lake Storage (ADLS).
  • Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark databricks cluster.
  • To meet specific business requirements wrote UDF’s in Scala and Pyspark.
  • Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights.
  • Created Azure Data Factory pipelines for applying transformations using Databricks Spark as per the UC and then finally move/load the transformed data into Curated data Model.
  • Import the data from different sources like HDFS/Hive intoSparkData Frames and Data Sets with spark 2.0.
  • Implement ETL and data movement solutions using Azure Data Factory (ADF), SSIS create and run SSIS Package ADF V2 Azure-SSIS IR.
  • Created Hive target tables with input-output format as ORC, AVRO to hold the data after all the PIG ETL operations using HQL.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Experience in creating in house frameworks in Python for automation & fixing PRD issues.
  • Worked in designing and developing applications in Spark using python to compare the performance of Spark with Hive.
  • Deliver projects on-time and to specification with quality.
  • Used the version control system GIT to access the repositories and used in coordinating with CI tools.

Environment: Azure Data Factory, Azure data Lake storage, Azure Blob, Azure Key-Vault, Azure logic apps, HDFS, Hive, Sqoop, Apache Kafka, HBase, Oracle, Spark SQL, Spark Streaming,, data bricks, Delta tables, GIT.

Confidential -Plano, TX

Hadoop Developer

Responsibilities:

  • Structured/Relational data was ingested onto the data lake using Sqoop jobs and scheduled using Oozie workflow from the RDBMS data sources for the incremental data.
  • Streaming Data (Time Series Data) was ingested into the data lake using Flume.
  • Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
  • Developed Map Reduce programs using Java programming language that are implemented on the Hadoop cluster.
  • Used Avro data serialization system with Avro tools to handle Avro data files using Map reduce programs.
  • Implemented UDFS in java for hive to process the data that can’t be performed using Hive inbuilt functions
  • Involved in creating Hive Internal/External tables, loading with data and troubleshoot with Hive jobs.
  • Used the RegEx, JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data and implemented Hive custom UDF’s.
  • Experience in writing Map Reduce jobs on Hadoop Ecosystem using Pig Latin and creating Pig scripts to carry out essential data operations and tasks
  • Experienced in Using Hive ORC formats for better columnar format, compression and processing.
  • Wrote pig scripts for advanced analytics on the data for recommendations.
  • Processed the source data to structured data and store in NoSQL database HBase.
  • Involved in converting business transformations into Spark RDDs using Python.
  • Involved in integrating hive queries into spark environment using Spark SQL.
  • Computing the complex logics and controlling the Data flow through In-memory process tool Apache Spark.
  • Experience in creating near real-time data pipeline to consume messages from Kafka using DStreams Spark streaming and then loading into Hive (Due to limitations with Structured Streaming API).
  • Implemented messaging system for different data sources using apache Kafka and configuring High level consumers for online and off-line processing.
  • Developing Oozie workflows to automate the entire data pipeline and schedule them using scheduler.
  • Implemented automation scripts in Hadoop ETL jobs using Python to improve the productivity of development by eliminating manual changes.
  • Experienced in working in agile environment and on-site/offshore co-ordination.
  • Used source control management system GIT to manage repositories and check in the latest code changes.

Environment: Hadoop Framework, MapReduce, Hive, Sqoop, Pig, HBase, Cassandra, Apache Kafka, Storm, Flume, Oozie, Maven, Jenkins, Java (JDK1.6), UNIX Shell Scripting, Oracle 11g/12g,GIT.

Confidential

QA Analyst

Responsibilities:

  • Created Test Cases using Element Locators and Selenium Wed driver methods.
  • Involved in Automation Infrastructure Development using Java and Selenium.
  • Complicated in User Interface design using HTML, C#, Java script and Flash technologies.
  • Performed API/Web Services testing for applications based on SOAP and RESTful framework.
  • Involved in testing the XML files and checked whether data is passed and loaded to staging tables.
  • Designed and implemented different automation frameworks like Keyword Driven Framework and Hybrid framework.
  • Performance test on various applications using variety of protocols like C# and JAVA.
  • Creation, enhancing and execution of test scripts using selenium IDE and Selenium RC (Java).
  • Developed Selenium Web Driver automation scripts in JAVA, Ruby using Rational
  • Extensively used Selenium (data-driven, XPath locator) and Web Driver to test the Web Application.
  • Used the scheduler for composing and executing the jobs to simulate the production environment in QA.
  • Executed automation scripts on different browsers/environments & reported defects/results to the team.
  • Conducted JMeter load testing with the help senior automation tester in the team.
  • Implemented automation using Selenium Web Driver, JAVA, Cucumber, Watir, Jenkins and Maven.
  • Reported defects with Severity and Priority against each defect.
  • Develop/Maintain code in Junit and Nunit using Selenium Web driver.
  • Experienced in using Test Management tools such as JIRA and Bugzilla bug reporting tools to track test progress, execution and deliverables.
  • Performed Back End testing of the database by using SQL queries to make sure that the SQL Server database reflects the updates/change, to verify the database Integrity.
  • Performed Load testing to check the system behavior and resolved the performance issues using HP Load Runner.
  • Worked on distributed test automation execution on different environment as part of Continuous Integration Process using Jenkins.
  • Executed the Test Scripts and worked with development team for investigating the bugs.
  • Performed compatibility testing to make sure that application is working fine on other external resources or interfaces such as OS, Mobile devices, Network, browser etc.
  • Excellentmulti-tasking skills and prioritize effectively and Report timely and accurate status to Management.
  • Raised many Clear Quest request as part of code fix, issues in UAT.

Environment: Java, Selenium IDE, Maven, TestNG, Advanced Query Tool (AQT) for DB2 HP ALM/QC, Cucumber, HTML, Python, REST API, SOAP Oracle 11g/12g, Source Tree, Git.

Confidential

QA Engineer

Responsibilities:

  • Engaged in preparation of Test Plan for the project.
  • Worked effectively with Developers, Agile Team and Project Management to achieve QTP
  • Performed Boundary Value Testing and Load Testing to meet specific business requirements.
  • Created and updated Hybrid Framework and function libraries in use. Customized Test Cases and set properties of Test Sets in QC.
  • Created automated test scripts using HTML, C, C++, C#. VB Script, UNIX and Java.
  • Performed all dimensions of testing including Functionality Testing, Performance Testing, System Integration Testing (SIT), Validation testing, Load Testing and Regression Testing.
  • Performance test on various applications using variety of protocols like .Net and JAVA.
  • Discussed with the team members on the functional aspects of the project.
  • Wrote functional test cases according to test plan.
  • Involved in Test Execution and Defect Reporting using JIRA.
  • Used data from different Data Sources like MS Excel, MS Word and notepad for testing in REST and SOAP UI.
  • Updated test cases and test scripts according to changed requirements as well as using Quick Test Pro (QTP) for regression testing
  • Parameterized property value of objects developed Reusable Functions and used various String Functions to verify that the Web Application is functioning as expected in QTP.
  • Created VB scripts to call different actions, applied synchronization points in QTP to ensure the application meets the requirements.
  • Used HP Load Runner to predict the behavior of the system infrastructure under simulated loads of emulated virtual client users.
  • Maintaining the build environment, the source code control system and managing build packages using TFS.
  • Created and maintained SQL Scripts to perform back-end testing on the Oracle 10g database.
  • Executed SQL queries to fetch data from databases to verify and compare expected results with those obtained.
  • Experienced in analyzing and filing bugs in HP QC.

Environment: HP QTP/UFT, VB Script, HP ALM/QC, Python, REST API, SOAP Oracle 11g/12g, Source Tree, Git.

We'd love your feedback!