We provide IT Staff Augmentation Services!

Senior Aws Data Engineer Resume

2.00/5 (Submit Your Rating)

New, JerseY

SUMMARY

  • Accomplished IT professional with 8+ years of experience as a Data Engineer and Analyst with expertise in Big Data, ETL developing, data warehousing, Hadoop Ecosystem, Cloud Engineering.
  • Experience in Data Analysis for Online Transaction Processing (OLTP) and Data Warehousing (OLAP)/Business Intelligence (BI) applications (Tableau, Power BI, QlikView).
  • Extensive knowledge of SaaS, PaaS, and IaaS cloud computing architecture and implementation using Azure, and AWS.
  • Expertise in migrating SQL databases to Azure Data Lake, Azure Data Lake Analytics, Azure SQL Database, Data Bricks, and Azure SQL Datawarehouse and controlling and granting database access to databases and migrating on - premises databases to Azure Data Lake store using Azure Data Factory.
  • Data Science expert with proven problem-solving, coding, debugging and analytical skills to effectively convey business requirements. Facilitated meetings with business and technical stakeholders and resolved conflicts to determine data requirements.
  • Good experience in Amazon Web Services like S3, IAM, EC2, EMR, VPC, Dynamo DB, RedShift,Amazon RDS, Lambda, Amazon Elastic Load Balancing, Auto Scaling,CloudWatch, SNS, SQS.
  • Extensive experience with Spark/Data Bricks and Python to design and implement large-scale data pipelines for data curation.
  • Utilized AWS Elastic MapReduce to convert and transfer large amounts of data between AWS data storage and databases, Amazon S3 and Amazon DynamoDB.
  • Working knowledge on Python Scikit- Learn for Machine Learning and Statistical Analysis.
  • Proficient in SQL, SSIS/Power BI, MS Excel's macro architecture, and data mining operations.
  • Excellent command of statistical programming languages like Python, R, Data mining and linguistic processing (NLP) .
  • Developed security frameworks to provide fine grained access to objects in S3 using Lambda and DynamoDB
  • Developed a data pipeline using Kafka and Spark Streaming to store data in HDFS and performed real-time analytics on the incoming data.
  • Extensive experience using AWS S3 to stage and move data, as well as to support data transfers and archiving. Knowledge of AWS RedShift for various data migration projects, as well as CDC (change data capture) using AWS DMS.
  • Experience in importing real-timedatato Hadoop using Kafka and implemented Oozie workflow for scheduled imports.
  • Worked with various streaming ingest services with Batch and Real-time processing using Spark streaming, Kafka Confluent, Storm, Flume and Sqoop.
  • Excellent working experience with SQL and NoSQL databases Cosmos DB, HBase, Cassandra, and MongoDB. Involved in data modeling, tuning, backup, development and automation of ETL pipelines using Python and SQL.
  • Strong SQL query writing and optimization skills, practical experience of RDBMS like SQL Server 2012, and familiarity with NoSQL databases like MongoDB.
  • Expertise in migrating SQL databases to Azure Data Lake, Azure Data Lake Analytics, Azure SQL Database, Data Bricks, and Azure SQL Datawarehouse and controlling and granting database access to databases and migrating on-premises databases to Azure Data Lake store using Azure Data Factory.
  • Working knowledge of Spark - SQL in Databricks for extracting, transforming, and aggregating data from multiple formats for analyzing and transforming the data to discover customer usage patterns.
  • Experience in developing and implementing CI/CD pipelines and automation using Jenkins, Git, Docker, and Kubernetes for ML models deployment
  • Good Understanding of dimensional modeling with snowflake and star schema for building fact tables and dimensional tables for analytical services.
  • Experience in development of enterprise solutions using Hadoop components like Hadoop( HDFS, MapReduce, Yarn), Spark, SparkSQL, PySpark, Kafka, Sqoop, HBase, Impala, Pig, Flume, Oozie, Nifi, Kafka, Zookeeper, Storm and Hive.
  • In-depth knowledge of Spark ecosystem and architecture for developing production-ready Spark applications utilizing Spark Streaming, MLlib, GraphX, Spark SQL, Data Frames, Datasets, and Spark ML.
  • In-depth knowledge in Big Data architecture like Hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL .
  • Knowledge of Tableau server commands for migrating workbooks and performing backups, debugging, and monitoring.
  • Developed Json and YAML scripts to deploy the pipeline in Azure Data Factory (ADF) to process the data with Cosmos activity.
  • Experience with Microsoft SSIS, and Informatica to extract, transform, and load (ETL) data from spreadsheets, databases, and other sources.
  • Maintained corporate metadata definitions for enterprise data stores within a metadata repository by creating, documenting, and maintaining logical and physical database models in compliance with enterprise standards.
  • Ability to work both independently and collaboratively in a fast-paced, multitasking environment. A self-motivated, enthusiastic learner with excellent communication skills.

TECHNICAL SKILLS

Big Data Ecosystem: HDFS, MapReduce, Hive, Pig, HBase, Impala, Hue, Sqoop, Kafka, Oozie, Flume. AWS & AZURE services

Amazon AWS: EMR, EC2, EBS, RDS, S3, Athena, Glue, Elasticsearch, Lambda, SQS, DynamoDB, Redshift, Quicksight, Kinesis

Microsoft Azure: Databricks, Data Lake, Blob Storage, Azure Data Factory, SQL

Database: SQL Data Warehouse, Cosmos DB, Azure Active Directory Apache Hadoop 2.x/1.x

Programming Languages: Python, Scala, SQL, PySpark, R, Shell Scripting, HiveQL.

Spark components: RDD, Spark SQL (Data Frames and Dataset), and Spark Streaming

NoSQL Databases: Cassandra, HBase, DynamoDB.

Databases: MySQL, Teradata, Oracle, MS SQL SERVER, PostgreSQL.

ETL/BI: Informatica, Talend, Snowflake, Tableau, Power BI.

PROFESSIONAL EXPERIENCE

Senior AWS Data Engineer

Confidential, New jersey

Responsibilities:

  • Working experience with distributed computing architectures like AWS cloud Platform( S3,EC2, Redshift, EMR, lambda, Glue, Elastic Search )Hadoop, Python, Spark.
  • Developed a data pipeline to ingest, process and deliver by batch processing and streaming data using spark, AWS EMR clusters, Lambda, and data bricks.
  • Implemented batch data processing, ETL, and ingestion into data warehouses using Lambda Python functions, Elastic Kubernetes Service (EKS), and S3 services through Airflow automation.
  • Ingested data into data lake (S3) and used AWS Glue to expose the data to redshift.
  • Provided ET solutions to migrate Teradata data from the on-premises system to AWS Red Shift and reduce run time by over 60%.
  • Used dbt (data build tool) to transform the data in Redshift after configuring the EMR cluster for data ingestion.
  • Used batch processing, calculate the risk associated with the price strategy and feed it into other systems such as Discounted cash flow (DCF), PNL, and Europe credit platform.
  • Developed and tested SQL code for transformations using the data build tool.
  • Utilized Data Frames in Spark to extract, transform, and load (ETL) data from multiple federated sources (JSON, relational databases, etc.).
  • Designed data pipelines with Airflow to schedule PySpark jobs for incremental loads and used Flume for weblog server data.
  • Developed an end-to-end ET pipeline that extract data from surge and load into an RDBMS using Spark.
  • Automated airflow jobs to automate the ingestion process into the data lake using Apache in a cluster.
  • Automated the extraction of weblogs using Python scripts and Airflow DAGs.
  • Developed and implemented Hive Bucketing and Partitioning.
  • Utilized AWS Kinesis to develop scalable applications on financial spreading for real-time data Ingestion into various databases and then built a common learner by performing aggregations and necessary transformations on data model and stored the data in HBase .
  • Utilized AWS Lambda and Step Functions to orchestrate multiple ETL jobs, as well as AWS Glue to load and prepare customer data analytics.
  • Implemented AWS Lambda for the purpose of running servers without managing them, as well as triggering run code by S3 and SNS.
  • Implemented data migration programs from DynamoDB to Redshift (ETL Process) using AWS Lambda by creating functions in Python for certain events.
  • Developed and implemented the AWS cloud computing platform by integrating RDS, Python, DynamoDB, S3, and Redshift.
  • Developed Spark applications in Databricks that combine Spark - SQL with a variety of file formats for extracting, transforming, and aggregating data to analyze and transform it for customer insights.
  • Worked with various formats of files like delimited text files, clickstream log files, Apache log files, Avro files, JSON files, and XML Files.
  • Developed and maintained various file formats, including delimited text files, clickstream logs, Apache logs, Avro files, JSON files, and XML files.
  • Optimized performance of the database by performing indexing and performance tuning.
  • Built a common learner data model by collecting data using Spark Streaming from AWS S3 bucket in near-real time and performing necessary transformations.
  • Responsible for Data loading and transformation for structured, semi-structured, and unstructured sets .
  • Created Hadoop and Spark clusters using AWS EMR, and these clusters are used in production to submit and execute Python applications .
  • Designed and developed end-to-end ET processing from Oracle to AWS using AWS S3, Elastic MapReduce, and python scripts for business intelligence and advanced analytics, from ingestion of data to consumption.
  • Developed data pipeline using Map Reduce, Flume, Sqoop and Pig to ingest customer behavioral data into HDFSforanalysis.
  • Implemented CI/CD solution using Git, Jenkins, Docker, and Kubernetes to configure big data systems on Amazon Web Services cloud.
  • Developed SQL Scripts and PL/SQL Scripts to obtain data from the database for testing purposes and for meeting business requirements.
  • Provided data analysis and data validation while troubleshooting and resolving complex production issues.

Environment: Spark - 3.0, Spark SQL, EMR,AWSSQS,AWSSNS, S3 CFT,AWSLambda, SQL, Java, Python, Cassandra,Oracle, MySQL, Hive, HDFS, Sqoop, Apache Flume, Git, Jenkins, Docker, and Kubernetes, Tableau, Zookeeper, HBase.

Azure Data Engineer

Confidential, California

Responsibilities:

  • Created tabular models on Azure analysis services and collaborated with business users for meeting business reporting requirements for data warehousing & reporting.
  • Have good working experience loading data into the Azure SQL Synapse analytics (DW) environment from Azure Blob and Data Lake storage.
  • Have ETL data loading experience from different source systems to Azure data Storage services using a combination of AzureDataFactory, T-SQL, Spark SQL, and U-SQL to perform AzureDataLakeAnalytics.
  • Data ingestion to Azure services (AzureDataLake, Azure Storage, Azure SQL DB, Azure SQL DW), and data processing in Azure Databricks.
  • Hands-on Experience of working on Snowflake data warehouse.
  • Moved data from Azure storage to snowflake database.
  • Developed and Designed spark code and Spark REPL applications using Scala, Spark -SQL streaming for faster processing of data and to handle similar datasets.
  • Using On-demand Server manager created batches and sessions to move data at specific intervals.
  • Developed Hadoop scripts to load and manipulate HDFS (Hadoop File System) data.
  • Performed Hive queries on local sample files and HDFS data.
  • For batch processing, we used spark streaming to divide data into batches as an input to spark engine.
  • Experience using Azure Cloud Services (PaaS & laaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure Monitoring, Key Vault and Azure Data Lake.
  • Analyzed Hadoop clusters and different Big Data analytical tools, such as Hive, HBase, Spark and Sqoop.
  • Generated user reports, visualization, and Business Intelligence by exporting data from HDFS to a MySQL database using Sqoop.
  • Extracted daily data from multiple vendors using Python API’s.
  • Used Scala, Python to develop Spark applications and implemented Apache Spark to process data from various RDBMS and streaming sources.
  • Integrated multiple data connections and created multiple joins across various sources of data for data preparation.
  • Utilized ADF to automate the process of file validations in Databricks using Python scripts.
  • Worked on ETL process using Spark, Scala, HIVE, and HBase and converting Hive/SQL queries into Spark transformations using Scala, RDD, and Python.
  • Utilized SQL server integration services (SSIS) to develop, design and deploy ETL solutions.
  • Optimized HiveQL performance by analyzing user request patterns and using implementing partitions and buckets.
  • Utilized SQL for data processing and developed JSON Scripts for deploying the pipeline in Azure data Factory.
  • Developed and Created tables in Azure SQL DW for data reporting and visualization for business requirements, and controlled frameworks using SQL Db audit tables for controlling the ingestion, transformation, and load of data in Azure.
  • Comprehensive knowledge of Risk Based Monitoring and Centralized data monitoring processes and data driven trial execution.
  • Proficient in programming reports in J review (SAS and SQL programming), study set-up and Data review in J review.
  • An understanding of Snowflake's utilities, SnowSQL, SnowPipe, and the use of Python for Big Data models.
  • Dynamic Snowflake queries automated with dbt, git, and Azure DevOps, allowing for continuous integration and continuous delivery.
  • Generated various graphical planning reports using Python packages like NumPy, matplotlib.
  • Wrote SQL Queries against Snowflake and Created ETL pipelines in and out of data warehouse using combination of Python and snowflakes SnowSQL.
  • Hands-on experience using both informatica developer and Informatica analyst tools to create reference tables.
  • Created databases and schema objects including indexes and tables by writing various functions, stored procedures, and triggers to connect various applications to the existing database.
  • Implemented Normalization and De-Normalization of existing tables using Joins and Indexes effectively to optimize queries and provide fast query retrieval.
  • Created Visualization dashboards, generated reports and KPI score cards using tools like Power Bi, Tableau and Excel (Power view).
  • Collaborated with product managers, scrum masters, and engineers to develop retrospectives, backlogs, and documentation initiatives to help improve Agile projects.

Environment: Azure, ADF, Azure Databricks, Azure SQL, Snowflake, Spark, Hadoop, Hive, Oozie, Java, Linux, Oracle 11g,MySQL, ETL, SSIS, Power Bi, Tableau, HDFC, HBase, Sqoop, Python, Scala, PySpark, IDQ Informatica Tool 10.0, IDQ Informatica Developer Tool 9.6.1 HF3.

Data & Analytics Engineer

Confidential

Responsibilities:

  • Gathered business requirements and communicated with users, project managers, and subject matter experts (SMEs) to gain a deeper knowledge of the business processes.
  • Extensive knowledge in text analytics, data visualization using R and Python, and dashboard creation using tools such as Tableau.
  • Worked on data profiling and mapping of functional and non-functional categorized data elements from source to target data environments, as well as developing working documents to support findings.
  • Worked in the extraction of data from various sources such as flat files, Oracle, and mainframes, and worked on claims data.
  • Implemented a variety of approaches to collect the business requirements and participated in JAD sessions involving the discussion of various reporting needs.
  • Utilized Python SciPy to build Factor Analysis and Cluster Analysis models to classify customers into different target groups.
  • Documented Source to Target Mapping documents with data transformation logic.
  • Utilized SQL to analyze and profile data from various sources, including Oracle and Teradata.
  • A good understanding in creating Data Governance Policies, Business Glossary, Data Dictionary, Reference Data, Metadata, Data Lineage, and Data Quality Rules.
  • Analysis, designing, and building of modern data solutions using Azure PaaS services. Understand current Production state of applications and determine impact of new implementation on existing business processes.
  • Worked with Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics in Extracting, Transforming and Loading data from Sources Systems to Azure Data Storage services.
  • Implemented Data Governance using Excel and Collibra.
  • Integrated data lineage with business glossary work within the data governance framework.
  • Provided an overview of areas of system control gaps and issues related to Data Quality.
  • Ensured Collibra was configured correctly for both reference and master data domains.
  • Working knowledge on Automation of data management processes using collibra .
  • Created metadata, lineage, and data quality guidelines and rules with data steward and data owners.
  • Created and managed Extract and Monitor Queries using SQL Developer, including Advanced connections, and joining new tables.
  • Used Tableau Developed parameter and dimension-based reports, drill-down reports, matrix reports, charts, and Tabular reports. Effectively used data blending feature in Tableau.
  • Used the Tableau show me functionality to create Tableau scorecards, dashboards, and heat maps.
  • Created complex data views manually using multiple measures and sorted, filtered, grouped, and created sets.
  • Experience in building complex dashboards using URLs, Hyperlinks, and filters. Published reports, workbooks, and data sources to servers, and exported the reports in different formats.
  • Used Key Performance indicators( KPI ) and Measures to create Aggregations, calculated Fields, Table calculations, Totals, percentages.
  • Utilized crystal reports, tables, views, business views to develop a wide range of complex reports, including Standard reports, Group reports, Detail and Summary reports, Crosstab reports, Graphical reports, Drill-down reports, and linked sub-reports.
  • Validated Performance data by creating views, stored procedures, and stored procedures.
  • Created Tableau visualizations on various Data Sources including Flat Files (Excel, Access) and Relational Databases (Oracle, Teradata).
  • Designed and deployed rich visualizations with Drill Down and Drop-down menu option and Parameterized using Tableau.
  • Worked on performance tuning of the database, including index creation, optimization of SOL statements, and monitoring.

Environment: Tableau, Teradata, SAP BW, Mongo DB, Azure ADF, Netezza, Oracle 9i/10g/11g,Sql Server, DB2 LUW, DB2 ZOS, Flat Files, Linux, ERWIN 7.3/8.28

Data/BI Analyst

Confidential

Responsibilities:

  • Implemented database triggers, views, stored procedures, and functions with SQL to enable information entered by WMs to impact respective tables.
  • Performed ad hoc reporting analysis as well as complicated data manipulation on the MS SQL server.
  • Wrote complicated SQL queries to obtain data from multiple tables using Joins, Sub-queries, and SQL server capabilities such as Explain, Stats, Cast, and volatile tables.
  • Wrote complex SQL queries using advanced concepts like aggregate functions.
  • Worked on developing SQL queries to manipulate, transform, or calculate information to fulfill data reporting requirements including identifying the tables and columns to extract data.
  • Wrote several complex SQL queries for extensive data validation, participated in back-end testing, and worked with data quality issues.
  • Implemented data management projects and fulfilling ad-hoc requests using data management software programs and tools including Perl, Toad, MS Access, Excel, and SQL.
  • Worked on claims data, involved in extraction, transforming and loading of data directly from different systems like flat files, mainframes, excel Oracle and SQL Server.
  • Developed complex SQL queries using Joins, Subqueries, and concepts like Explain, Stats, Cast, and volatile tables on SQL Server to retrieve data from disparate tables.
  • Analyzed requirements for reporting and developed various dashboards, created various views in Tableau (Tree maps, Heat Maps, Scatter plot).
  • Extensive knowledge in creating filters, rapid filters, table computations, computed measurements, and parameters.
  • Utilized data investigation, discovery, and mapping tools to analyze every single data record from various sources.
  • Performed data analysis and data profiling using complex SQL using various sources including SQL Server, Oracle, and Teradata.
  • Analyzed every single record of data from multiple sources using data investigation and discovery tools.
  • Responsible for creating, evaluating, and reporting key risk indicator and key performance indicator metrics that enable management to make timely and effective threats, risks, and control required decisions.
  • Analyzing metrics, mining data, and identifying trends within a help desk environment.
  • Analyzed duplicate data and data inconsistencies to offer proper inter-departmental communication and monthly reports.
  • Written SQL scripts to test mapping and Developed a Traceability Matrix of Business Requirements mapped to Test Scripts to ensure any change in requirements leads to a change in the test case.
  • Extensive experience in data mapping and data transformation from source to target data models.
  • Worked on complex SQL queries for DATA validation and backend testing and helped to resolve data quality issues.
  • Worked with end users to gain an understanding of information and core data concepts behind their business.

Environment: Tableau, IBM Cognos, Oracle 9i/10g/11g,Sql Server 2005, Toad 8.6 for Oracle, DB2, Cognos 10.2, Linux, ERWIN 7.3/, Flat Files, Hyperion 8.3.

SQL/BI Developer

Confidential

Responsibilities:

  • Analysis, Design, and data modeling of Logical and Physical database using E-R diagrams.
  • Wrote SQL queries and setting up relationships within different tables.
  • Extensively used joints and sub-queries for complex queries which were involving multiple tables from different databases.
  • Experience working in Relational Databases.
  • Developed different business reports according to the client's requirement.
  • Designed tables, constraints, necessary stored procedures, functions, triggers, and packages using T-SQL.
  • Created efficient Queries, Indexes, Views, and Functions.
  • Develop, manage, and maintain data dictionary and or metadata.
  • Design and develop the payroll module using MSSQL Server.
  • Migrated data from MS Access to SQL Server.
  • Design, develop and implement complete life cycle of data warehouses.
  • Enforced Security requirements using triggers to prevent unauthorized access to the database.
  • Performance Tuning of the client databases for better performance on a regular basis.
  • Responsibilities include writing of scripts for Database tasks, releasing Database objects into production.
  • Perform and execute data extraction, transformation and loading using ETL tools.
  • Maintain and manage all versions of data models for production, testing and developing databases.

Environment: MSSQLServer, MSSQLServer Reporting Services 2015, SSIS, SSAS, Windows Server.

We'd love your feedback!