We provide IT Staff Augmentation Services!

Sr. Data Engineer(aws) Resume

2.00/5 (Submit Your Rating)

Weehawken, NJ

SUMMARY

  • 8+ years of IT industry knowledge with hands - on working experience in Data Engineering & Data Analysis.
  • Good knowledge of Data Quality & Data Governance practices & processes.
  • Well-versed with Agile with Scrum, Waterfall Model, and Test-driven Development (TDD) methodologies.
  • Proficient in SQLite, MySQL, and SQL databases with Python.
  • Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star - Schema modeling, Snowflake Schema Modelling, and Fact and Dimension tables.
  • Experience in handling python and spark context when writing PySpark programs for ETL.
  • Strong knowledge in data visualization using Power BI and Tableau.
  • Hands-on experience with NoSQL databases like Snowflake, HBase, Cassandra, and MongoDB.
  • Hands on expertise with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis).
  • Hands in Terraform in managing resource scheduling, disposable environments and multitier application
  • Experience in building and architecting multiple Data pipelines, end-to-end ETL, and ELT processes for Data ingestion and transformation in GCP and coordinating tasks among the team.
  • Experience in GCP Dataproc, GCS, Cloud functions, Big Table, and BigQuery.
  • Experience with Apache Spark ecosystem using Spark-Core, SQL, Data Frames, and RDDs.
  • Experienced in data manipulation using python.
  • Hands-on experience working with Amazon Web Services (AWS) using Elastic Map Reduce (EMR), Redshift, and EC2 for data processing.
  • Integrate Collibra DGC using Collibra Connect (MuleESB) with third-party tools such as Ataccama. IBM IGC and Tableau to apply DO rules import technical lineage and to create reports using the MetaData in Collibra DGC.
  • Designed SSIS Packages to extract transfer load (ETL) existing data into SQL Server from different environments for the SSAS cubes (OLAP).
  • Proficient in installing, configuring, and using Apache Hadoop ecosystems such as MapReduce, Hive, Pig, Flume, Yarn, HBase, Sqoop, Spark, Storm, Kafka, Oozie, and Zookeeper.
  • Strong experience in designing Big data pipelines such as Data Ingestion, Data Processing (Transformations, enrichment, and aggregations), and Reporting.
  • Experience in integrating Kafka with Spark streaming for high-speed data processing.
  • Experience in implementing Azure data solutions, provisioning storage accounts, Azure Data Factory, Azure Databricks, Azure Blob Storage, Azure Synapse, and Azure Cosmos DB.
  • An excellent team member with an ability to perform individually, good interpersonal relations, strong communication skills, hardworking and high level of motivation.

PROFESSIONAL EXPERIENCE

Sr. Data Engineer(Aws)

Confidential, Weehawken, NJ

Responsibilities:

  • As a Data Engineer responsible for building scalable distributed data solutions using Hadoop.
  • Involved in Agile Development process (Scrum and Sprint planning).
  • Experience with Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Big Data Technologies (Apache Spark), and Databricks is preferred.
  • Handled Hadoop cluster installations in Windows environment.
  • Data Extraction aggregations and consolidation of Adobe data within AWS Glue using PySpark.
  • Migrated on-premise environment in GCP (Google Cloud Platform)
  • Supporting Continuous storage in AWS using Elastic Block Storage, S3, Glacier. Created Volumes and configured Snapshots for EC2 instances
  • Worked on migrating data from on-premises Bigdata platform to AWS cloud architecture.
  • Worked on advanced SQL coding writing complex stored procedures functions. CTEs. Triggers Views and dynamic SQL for interactive reports.
  • Integrate Ataccama with Collibra using MuleESB connector and publish DQ rule results on Collibra using REST API calls.
  • Migrated data warehouses to Snowflake Data warehouse.
  • Developed machine learning models using recurrent neural networks LSTM for time series predictive analysis.
  • Developing Business models Job Design Palette, XML to XL endpoint to endpoint transformation. (SALESFORCE Confidential VISUALFORCE JAVA ORACLE PL SOL TALEND
  • ESB INFORMATICA TDD AGILE JIRA.).
  • Defined virtual warehouse sizing for Snowflake for different types of workloads.
  • UNITY, CLOAS, AUS, STP, App SetUp Lite, Stars, Berkw, ODS, OTTO (INFORMATICA JAVA Shell scripting DB2-UDB Linux, MS SQL Server, IBM. Windows Oracle AIX 6 MySOL.
  • Involved in porting the existing on-premise Hive code migration to GCP (Google Cloud Platform) BigQuery.
  • Involved in migration of an Oracle SQL ETL to run on Google cloud platform using cloud Dataproc & BigQuery, cloud pub/sub for triggering the Apache Airflow jobs.
  • Extracted data from data lakes, and EDW to relational databases for analyzing and getting more meaningful insights using SQL Queries and PySpark.
  • Performed optimization on Spark.Scala.
  • Performed Raw data ingestion into S3 from kinesis firehouse which would trigger a lambda function and pit refined data into another S3 bucket and write to SOS queue as aurora topics. with Oracle to resolve Gateway configuration issues, shell scripting automation testing.
  • Proficient with complex workflow orchestration tools namely Oozie. Airflow Data pipelines and Azure Data Factory, CloudFormation & Terraforms.
  • Implemented Token based authentication to secure the ASP.NET Core Web API and provide authorization to different users.
  • Building, publishing customized interactive reports and dashboards, report scheduling using Tableau server. O Created action filters: parameters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Use Spark to process the data before ingesting the data into the HBase Both Batch and real-time spark jobs were created using Scala.
  • Extensively used Terraform in AWS Virtual Private Cloud to automatically setup and modify settings by interfacing with control layer.
  • Written AWS Lambda code in Python for nested json files, converting comparing sorting etc.
  • Designed, developed, and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems.
  • Developed solutions using Alteryx tool to provide the data for the dashboard in formats that include (SON. csv excel etc. Experience in working with business intelligence and data warehouse software including SSAS/SSRS/SSIS. Business Objects Amazon Redshift Azure Data Warehouse and Teradata.
  • Developed machine learning models using Google Tensor Flow keras API Convolution neural networks for Classification problems fine tuned the model performance by adjusting the epochs, bath size, Adam optimizer.
  • Developed MapReduce programs to parse the raw data, populate staging tables, and store the refined data in partitioned tables in the EDW.
  • Defined best practices for Tableau report development
  • Programmed in Python, PySpark and SQL to streamline the incoming data and build the data pipelines to get the useful insights, and orchestrated pipelines.
  • Logging, building SQLs for logging of the purged records: Setup DDLs for logging Creating Staging tables. Loading Prepurge tables and staging tables and Primary key values, conduct Unit Testing; Build shell scripts Build run books for Control-M Scheduling, Create run Jobs (INFORMATICA JAVA Shell scripting DB2-UDB Linux. worked on various machine learning algorithms like Linear regression, logistic regression: Decision trees random forests K means clustering Support vector machines XGBoosting on client requirements.
  • Day to-day responsibility includes developing ETL Pipelines in and out of data warehouse, develop major regulatory and financial reports using advanced SQL queries in snowflake.
  • Wrote Sqoop Scripts for importing and exporting data from RDBMS to HDFS.
  • Set up Data Lake in Google cloud using Google cloud storage, BigQuery, and Big Table.
  • Developed scripts in BigQuery and connected them to reporting tools.
  • Developed custom Machine Learning(ML) alogrithms in Scala and then made available for MLIB in Python via wrappers.
  • Developing Business models Job Design Palette, XML to XL endpoint to endpoint transformation. (SALESFORCE Confidential VISUALFORCE JAVA ORACLE PL SOL TALEND
  • Developed Order Management and Messaging system using core ASP.NET MVC.
  • Experience with Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Big Data Technologies (Apache Spark), and Databricks is preferred.
  • Experience in Data Lineage using Excel and Collibra.
  • Developed spark code using Scala and Spark-SQL/Streaming for fster testing and processing of data.
  • Used DataStage as an ETL tool to extract data from sources systems loaded the data into the ORACLE.
  • Used SSRS, Tableau, QlikView for Ad-Hoc reporting.
  • Use HBase as the database to store application data, as HBase offers features like high scalability distributed NoSQL column oriented and real-time data querying to name a few.
  • Designed workflows using Airflow to automate the services developed for Change data capture.
  • Designed, developed, and implemented complex SSIS packages, asynchronous ETL processing. Ad hoc reporting and SSRS report server, and data mining in SSAS.
  • Integrated both framework and CloudFormation to automate Azure environment creation along wif ability to deploy on Azure using build scripts (Azure CLI and automate solutions using terraform.
  • Carried out data transformation and cleansing using SQL queries and PySpark.
  • Used Kafka and Spark Streaming to ingest real-time or near real-time data in HDFS.
  • Worked related to downloading BigQuery data into Spark data frames for advanced ETL capabilities.
  • Built reports for monitoring data loads into GCP and driving reliability at the site level.
  • Participated in daily stand-ups, bi-weekly scrums, and PI panning.

Environment: Hadoop 3.3, GCP, AWS, BigQuery, Linux, Machine Learning, Big Table, Spark 3.0, Sqoop 1.4.7, ETL, HDFS, Snowflake DW, Oracle SQL, tableau, Spark, Scala, CollibraHbase Pyspark, SQL, MapReduce, SSAS, .net, Azure Databricks, Terraform, Kafka 2.8, GIThub, and Agile process.

Sr. Data Engineer

Confidential, Boston, MA

Responsibilities:

  • Built reporting data warehouse from ERP system using Order Management, Invoice & Service contracts modules.
  • Experience in GCP Dataproc, GCS, Cloud functions, BigQuery Azure Data Factory Databricks.
  • Experience implementing Collibra to automate data management processes.
  • Extensive work in Informatica Powercenter.
  • Acted as SME for Data Warehouse related processes.
  • Performed Data analysis for building Reporting Data Mart.
  • Worked with Reporting developers to oversee the implementation of report/universe designs.
  • Tuned performance of Informatica mappings and sessions for improving the process and making it efficient after eliminating bottlenecks.
  • Defined best practices for Tableau report development.
  • Administered user used groups and scheduled instances for reports in Tableau.
  • Automated to build the Azure infrastructure using Terraform and Azure cloud formation.
  • Responsible for building scalable distributed data solution using Hadoop.
  • Use HBase as the database to store application data, as HBase offers features like high scalability distributed NoSQL column oriented and real-time data querying to name a few.
  • Developed Spark Python modules for machine learning & predictive analytics in Hadoop.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of spark with Hive and SQL/Teradata.
  • Expertise in Terraform for multi cloud deployment using single configuration.
  • Worked on complex SQL Queries, and PL/SQL procedures and convert them to ETL tasks
  • Worked with PowerShell and UNIX scripts for file transfer, emailing, and other file-related tasks.
  • Worked with deployments from Dev to UAT, and then to Prod.
  • Worked on Data modeling Advanced SQL with Columnar Databases using AWS.
  • Worked with Informatica Cloud for data integration between Salesforce, RightNow, Eloqua, and WebServices applications
  • Converted metric insight reports to tableau reports.
  • Transform the data using AWD Glue dynamic frames with Pysparks; cataloged the transformed the data using Crawlers and scheduled the job and crawler using workflow features.
  • Utilizing Entity Framework, ADO.Net and LINQ is connecting to Data Accels Management with SQL Server.
  • UNITY, CLOAS, AUS, STP, App SetUp Lite, Stars, Berkw, ODS, OTTO (INFORMATICA JAVA Shell scripting DB2-UDB Linux, MS SQL Server, IBM. Windows Oracle AIX 6 MySOL.
  • Developed various machine learning models such as Logistic regression, KNN, and Gradient Boosting with Pandas. NumPy, Seaborn Matplotlib. Sakit learn in Python Worked on Amazon Web Services (AWS) cloud services to do machine learning on big data.
  • Expertise in Informatica cloud apps Data Synchronization (ds), Data Replication (dr), Task Flows & Mapping configurations.
  • Extract Transform and Load data from sources Systems to Azure Data Storage services using a combination of Azure Data factory, T-SQL Spark SQL, and U-SQL Azure.
  • Worked on end to end machine learning workflow, written python code for gathering the data from AWS snowflake, data preprocessing feature extraction feature engineering modeling evaluating the model deployment. Written python code for exploratory data analysis using Scikit-learn machine learning python packages NumPy
  • Deployed and processed SSAS cubes daily/Weekly to update information using SQL Server Agent.
  • Developed and maintained ETL (Extract Transformation and Loading) mappings to extract the data from multiple source systems like Orade SQL server. Netezza. Ab Initio and Flat files. java and loaded into Teradata.
  • Experience with building data pipelines in Python/Pyspark/HivesQU/Presto/BigQuery and building python DAG in Apache Airflow.
  • Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators both old and newer operators.
  • Used Terraform in managing resource scheduling, disposable environments and multitier application.
  • Developed Shell scripts for job automation and daily backup.
  • Based on the requirements, addition of extra nodes to the cluster to make it scalable.
  • Pandas Matplotlib. Seaborn, statsmodels pandas profiling
  • Worked on migration project which included migrating web methods code to Informatica cloud.
  • Advanced SQL Server integration Service 20121SSIS) and Performance tuning Advanced SSIS Transformations.
  • Implemented Proof of concepts for SOAP & REST APIs
  • Built web services mappings and expose them as SOAP WSDL
  • Written Programs in Spark using Scala for Data quality Check.
  • C Effectively used data blending feature in tableau.
  • Worked with Reporting developers to oversee the implementation of reports/dashboard designs in Tableau.
  • Assisted users in creating/modifying worksheets and data visualization dashboards in Tableau.
  • Tuned and performed optimization techniques for improving report/dashboard performance.
  • Assisted report developers with writing required logic and achieving desired goals.
  • Met End Users for gathering and analyzing the requirements.
  • Worked with Business users to identify root causes for any data gaps and develop corrective actions accordingly.
  • Implemented Data Governance using Excel and Collibra.
  • Created Terraform Scripts to Automate Azure services which include Firewall. Blob Storage database and application configuration, this Script creates stacks, single servers or joins web servers to stacks.
  • Created Ad hoc Oracle data reports for presenting and discussing the data issues with Business.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Creating Star schema cubes using SSAS.
  • Performed gap analysis after reviewing requirements.
  • Identified data issues within the DWH dimension and fact tables like missing keys, joins, etc.
  • Wrote SQL queries to identify and validate data inconsistencies in the data warehouse against the source system.
  • Validated reporting numbers between source and target systems.
  • Finding a technical solution and business logic for fixing any missing or incorrect data issues identified
  • Coordinating and providing technical details to reporting developers

Environment: Informatica Power Center 9.5/9.1, .net, Informatica Cloud, BIT, Machine Learning, Oracle 10g/11g, SQL Server 2005, SSAS, Tableau 9.1, Salesforce, Collibra, RightNow, HBase, Advance SQL, snowflake, Tableau, Databricks, Terraform, Eloqua, Linux, Spark-Scala, Pyspark, Web Methods, PowerShell, Unix

Confidential, Scottsdale, AZ

Data Engineer

Responsibilities:

  • As a Data Engineer was responsible to build a data lake as a cloud-based solution in AWS using Apache Spark and Hadoop.
  • Involved in Agile methodologies, daily Scrum meetings, and Sprint planning.
  • Involved in designing and developing enterprise reports (SSRS/Business Objects) using the data from ETL Loads SSAS Cubes and various heterogeneous data sources.
  • Installed and configured Hadoop and was responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Integrated both framework and CloudFormation to automate Azure environment creation along wif ability to deploy on Azure using build scripts (Azure CLI and automate solutions using terraform.
  • Handled Tableau admin activities granting access, managing extracts and installations.
  • Used AWS Cloud and On-Premise environments with Infrastructure Provisioning/ Configuration.
  • Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Contributed to the development of key data integration and advanced analytics solutions leveraging Apache Hadoop.
  • Extensive use of shell SDK in GCP to configure/deploy the services data proc, Storage, and BigQuery.
  • Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
  • Overseen the configuration of Collibra for reference and master data domains.
  • GitHub for Source control (branching and merging). Advance T-SQL with use of 2012 functionality (Triggers. Stored procedure).
  • Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
  • Developed Big Data solutions focused on pattern matching and predictive modeling.
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Developed a data pipeline using Kafka, HBase, Spark, and Hive to ingest, transform, and analyze customer behavioral data.
  • Expertise in Informatica cloud apps Data Synchronization (ds), Data Replication (dr), Task Flows & Mapping configurations.
  • Develophig Java RDS Wrapper scripts, from TDS, Datacube: pivoting the reports vertical for QlikView. for Net monthly. Time period, Excess return, Portfolo retum.
  • Experience in building power bi reports on Azure Analysis services for better performance when comparing that to direct query using GCP Bigquery.
  • Database design writing complex T-SQL queries, stored procedures UDFS (Functions) views to implement Business rules Advance SQL development and data manipulation/Transformation skills.
  • Developed ETL pipelines in and out of the data warehouse using a mix of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake.
  • Data Extraction aggregations and consolidation of Adobe data within AWS Glue using PySpark.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Developed reconciliation process to make sure elastic search index document count match to source records.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Implemented Sqoop to transform the data from Oracle to Hadoop and load back in parquet format
  • Developed incremental and complete load Python processes to ingest data into Elastic Search from the oracle database
  • Developed Multi-dimensional Objects (Cubes Dimensions) using MS Analysis Services (SSAS).
  • Generated tableau dashboards with combination charts for clear understanding.
  • Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.
  • Developed various Spark applications using Scala to Perform Various enrichment of these Click stream data merged with user profile data.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
  • Used Terraform in AWS Virtual Private Cloud to automatically setup and modify settings by interfacing with control layer.
  • Advanced SQL Server integration Service 20121SSIS) and Performance tuning Advanced SSIS Transformations.
  • Wporkes on user Controls, Masters pages for data caching and used ADO.NET to implement the data layer to help communicate with database.
  • One-off data loads to Collibra databases.
  • Generated tableau dashboards for sales with forecast and reference lines.
  • Worked on migration project which included migrating web methods code to Informatica cloud.
  • Experience in GCP Dataproc, GCS, Cloud functions, BigQuery Azure Data Factory Databricks.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Created Pipelines which were built in Azure Data Factory using Linked Services/Datasets/Pipeline/ to extract, transform, and load data from a variety of sources including Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and reverse.
  • Created adhoc reports to users in Tableau by connecting various data sources.
  • Data Extraction aggregations and consolidation of Adobe data within AWS Glue using PySpark.
  • Pulled the data from the data lake (HDFS) and massaged the data with various RDD transformations.
  • Load the data through HBase into Spark RDD and implement in-memory data computation to generate the output response.
  • Continuously tuned Hive UDFs for faster queries by employing partitioning and bucketing.

Environment: Hadoop 2.7, Spark 2.7, Hive, Sqoop 1.4.6, AWS, HBase, snowflake, HBase, Tableau, Data Factory, Kafka 2.6.2, Python 3.6, HDFS, Scala, Advance SQL, Collibra, pyspark, SSAS,Terraform, Elastic Search & Agile Methodology

Hadoop Developer

Confidential

Responsibilities:

  • Importing and sending out information into HDFS and Hive utilizing Sqoop.
  • Implemented spark using Scala and Utilizing Data frames and spark SQL API for faster processing of data.
  • Experienced in running Hadoop stream jobs to process terabytes of xml group information with the help of Map Reduce programs.
  • Used parquet file format for published tables and created views on the tables.
  • In-charge of managing data coming from different sources.
  • Support in running MapReduce Programs in the cluster.
  • Cluster coordination services through Zoo Keeper.
  • Involved in loading information from UNIX document framework to Hadoop Distributed File System.
  • Installed, configured Hive and furthermore composed Hive UDFs.
  • Automated every one of the jobs for pulling information from FTP server to stack information into Hive tables, utilizing Oozie work processes.
  • Developed Spark Python modules for machine learning & predictive analytics in Hadoop.
  • Developed Scala scripts, UDFs using data both Data frames/SQL/Data sets and RDD/Map reduce in Spark 1.6 for data aggregation, queries and Writing data back into OLTP system through Sqoop.
  • Writing data to parquet tables both non-partitioned and partitioned tables by adding dynamic data to partitioned tables using Spark.
  • Processing the schema oriented and non-Schema-Oriented data using Scala and Spark.
  • Wrote User Defined functions (UDFs) for special functionality for Spark.
  • Used SQOOP Export functionalities and scheduled the jobs on daily basis with Shell scripting in Oozie.
  • Worked with SQOOP jobs to import the data from RDBMS and used various optimization techniques to optimize Hive and SQOOP.
  • Used SQOOP import functionality for loading Historical data present in a Relational Database system into Hadoop File System (HDFS).

Environment: - Hadoop, Map Reduce, HDFS, Hive, Java, R, Sqoop, Spark, Scala, Sqoop

Data Analyst

Confidential

Responsibilities:

  • Involved in understanding the legacy applications & data relationships.
  • Identified problematic areas and conduct research to determine the best course of action to correct the data.
  • Analyzed problems and solved issues with current and planned systems as they relate to the integration and management of order data.
  • Involved in Data Mapping activities for the data warehouse.
  • Analyzed reports of data duplicates or other errors to provide ongoing appropriate inter-departmental communication and monthly or daily data reports.
  • Monitor for timely and accurate completion of select data elements.
  • Collected, analyze, and interpret complex data for reporting and/or performance trend analysis.
  • Monitor data dictionary statistics.
  • Involved in analyzing and adding new features of Oracle 10g like DBMS SHEDULER, Create Directory, Data pump, and CONNECT BY ROOT in the existing Oracle 10g application.
  • Archived the old data by converting them into SAS data sets and flat files.
  • Extensively used Erwin tool in forwarding and reverse engineering, following the Corporate Standards in Naming Conventions, using Conformed dimensions whenever possible.
  • Enhance smooth transition from legacy to a newer system, through the change management process.
  • Planned project activities for the team based on project timelines using Work Breakdown Structure.
  • Compare data with original source documents and validate Data accuracy.
  • Used reverse engineering to create Graphical Representation (E-R diagram) and to connect to an existing database.
  • Generate weekly and monthly asset inventory reports.
  • Created Technical Design Documents, Unit Test Cases.
  • Written SQL Scripts and PL/SQL Scripts to extract data from the Database to meet business requirements and for Testing Purposes.
  • Written complex SQL queries for validating the data against different kinds of reports generated by Business Objects XIR2
  • Involved in Test case/ data preparation, execution, and verification of the test results
  • Created user guidance documentations.
  • Created reconciliation report for validating migrated data.

Environment: UNIX, Shell Scripting, XML Files, XSD, XML, SAS, PL/SQL, Oracle 10g, Erwin 9.5, Autosys.

We'd love your feedback!