We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

4.00/5 (Submit Your Rating)

SUMMARY

  • Diligent Data Engineer with 8+ years of Experience working with largedatasets of structured and unstructured data.
  • Performed risk analysis,datavisualization, reporting on various projects. Technical expertise inDataPipeline, Integration,Dataprofiling,DataCleaning. Worked on Tableau, Power BI, and Shiny in R to create dashboards and visualizations.
  • Experience in Data warehousing, Data engineering, Feature engineering, big data, ETL/ELT, and Business Intelligence. As a big data architect and engineer, specialize in AWS and Azure frameworks, Cloudera, Hadoop Ecosystem, Spark/Py Spark/Scala, Data bricks, Hive, Redshift, Snowflake, relational databases, tools like Tableau, Airflow, DBT, Presto/Athena, and Data DevOps Frameworks/Pipelines with Scripting skills in Python in building data intensive applications, tackling challenging architectural and scalability problems.
  • Experience and training in Confidential Web Services S3, EC2, IAM, Route53, Databases (RDS, DynamoDB, Redshift), VPC, Lambda, EBS, EFS, Glue, Athena, SQS, SNS, API Gateway, Kinesis.
  • Business Intelligence as an application developer with expertise in implementing and developing Data Warehousing/Business Intelligence solutions applying advance techniques using IBM OBIEE, in - depth and comprehensive experience in design, development, testing, Security and Support for Data warehousing and Client/Server projects
  • Experience working with AWS like S3, EMR, EC2, Step functions and CloudWatch.
  • Experience in Work on AWS Databases like Elastic Cache (Memcached & Redis) and NoSQL databases HBase, Cassandra & MongoDB for database performance tuning & data modeling
  • Experience in building data pipelines using Azure Data factory, Azure data bricks and loading data to Azure data Lake, Azure SQL Database, Azure SQL Data warehouse and controlling and granting database access.
  • Experience in Developing Spark applications using Spark - SQL, Pyspark and Delta Lake in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Good understanding of Spark Architecture, MPP Architecture, including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.
  • Productionize models in Cloud environment, which would include, automated processes, CI/CD pipelines, monitoring/alerting, and troubleshooting issues. Present the model and results to technical and non-technical audience.
  • Experience in Database Design and development with Business Intelligence using SQL Server 2014/2016 Integration Services (SSIS), DTS Packages, SQL Server Analysis Services (SSAS), DAX, OLAP Cubes, Star Schema and Snowflake Schema.
  • Strong skills in visualization tools Power BI, Microsoft Excel - formulas, Pivot Tables, Charts and DAX Commands.
  • Highly skilled in various automation tools, continuous integration workflows, managing binary repositories and containerizing application deployments and test environments.
  • Experience in Data Analysis, Data Profiling, Data Integration, Migration, Data governance and Metadata Management, Master Data Management and Configuration Management.
  • Experience in creating Docker containers leveraging existing Linux Containers and AMI's in addition to creating Docker containers from scratch.
  • Experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses, as well as data processing like collecting, aggregating, and moving data from various sources using Apache Flume, Kafka, Power BI and Microsoft SSIS.
  • Used Kafka for activity tracking and log aggregation.
  • Experience in efficiently doing ETL’s using spark - in memory processing, Spark SQL and Spark Streaming using Kafka Distributed Messaging System.
  • Hands-on experience withAmazon EC2, Confidential S3, Confidential RDS, VPC, IAM, Confidential Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SES, SQS, Lambda, EMR, Dynamo DBand other services of the AWS family.
  • Experience in handling python and spark context when writing Pyspark programs for ETL.
  • Work in wearing multiple hats: Azure Architect/System Engineering, network operations and data engineering.

TECHNICAL SKILLS

Programming Language: Java 1.7/1.8, SQL, Python, Scala, UNIX Shell Script, Power Shell, SQL, YAML

Cloud Platform: AZURE, AWS

Application/Web Servers: WebLogic, Apache Tomcat 5.x/6.x/7.x/8.x

Hadoop Distributions: Horton Works, Cloudera Hadoop

Hadoop/Big Data Technologies: HDFS, Hive, Sqoop, Yarn, Spark, Spark SQL

Hadoop/Spark Ecosystem: Hadoop, MapReduce, Pig, Hive/impala, YARN, Kafka, Flume, Oozie, Zookeeper, Spark, Airflow

Hadoop Distribution: Cloudera distribution and Hortonworks

Cloud Platforms: AWS: Confidential EC2, S3, RDS, IAM, Auto Scaling, CloudWatch, SNS, Athena, Glue, Kinesis, Lambda, EMR, Redshift, DynamoDB Azure: Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake, Azure HDInsight, GCP, OpenStack.

ETL Tools: Informatica, Data Studio

Reporting Tools: Power BI, Tableau, SSRS

Virtualization: Citrix, VDI, VMware

PROFESSIONAL EXPERIENCE

Confidential

Sr. Data Engineer

Responsibilities:

  • Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
  • Developed Python scripts to manage AWS resources from API calls using BOTO3 SDK and worked with AWS CLI.
  • Set up the CI/CD pipelines using Maven, GitHub, and AWS.
  • Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
  • Experienced in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipeline system.
  • Performed data cleaning and feature selection using MLlib package in PySpark. Deep learning using CNN, RNN, ANN, reinforcement learning.
  • Processed structured & semi-structured datasets using PySparkby loading into RDDs.
  • Extracted data and load into HDFS using Sqoop commands and scheduling Map/Reduce jobs on Hadoop.
  • Used AWS EMR to transform and move large amounts of data into and out of other AWS Data stores and databases, such a S3 and Confidential Dynamo DB.
  • Implemented reporting data marts for Sales and Marketing teams on AWS redshift. Handled data schema design and development, ETL pipelines in Python/ MySQL Stored Procedures and Automation using Jenkins
  • Extensive experience with query performance tuning in Relational and distributed systems.
  • Designed ETL architecture for data transfer from the OLTP to OLAP.
  • Imported attribution data from proprietary database into Hadoop ecosystem
  • Extensively worked with moving data across cloud architectures including Redshift, hive, S3 buckets
  • Scripted in Unix and Python to simplify repetitive support tasks
  • Provided data engineering support on Python, Spark, Hive, Airflow for modelling projects
  • Designed, executed a recommendation campaign to boost 2nd purchase, generating 6M additional revenue using item based collaborative filtering on Python and MSSQL.
  • Strong Experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
  • Configured Spark Streaming to get ongoing information from the Kafka and store information to HDFS.
  • I have integrated product data feeds from Kafka to Spark processing system and store the order details in PostgreSQL Database
  • Designed social listening platform to scrape data from Twitter, Instagram. Application calls Google sentiment analysis API, Image detection API to produce brand index, competitor analysis and product analysis.
  • Collaborated with Marketing team to implement search bidding algorithm.
  • Automated build and deployment using Jenkins to reduce human error and speed up production processes.
  • Managed GitHub repositories and permissions, including branching and tagging.
  • Rewrote shell deploy scripts, reducing deployment times from 5+ hours to less than 2 minutes
  • Drove strategy for migrating from perforce to GitHub, including branching, merging, and tagging
  • Created tables, triggers, stored procedures, SQL loader in relational database SQL.
  • Designed services to store and retrieve user data usingMongoDBdatabase and communicated with remote servers usingRESTenabled Web Services on Jersey framework.
  • Performance tuning in Hive using multiple methods but not limited to dynamic partitioning, bucketing, indexing, file compressions and cost-based optimization

Environment: HDFS, Hive, AWS, MapReduce, Kafka, Confidential DynamoDB, PostgreSQL Spark SQL, Python, Scala, PySpark, SSIS, ELK.

Confidential

Sr. Data Engineer

Responsibilities:

  • Primarily involved in Data Migration using SQL, SQL Azure, Azure Storage, and Azure Data Factory, SSIS, PowerShell.
  • Built Azure Web Job for Product Management teams to connect to different APIs and sources to extract the data and load into Azure Data Warehouse using Azure Web Job and Functions.
  • Build various pipeline to integrate the Azure Cloud from AWS S3 to get the data into Azure Database.
  • Set up the spark Cluster to process the more than 2 Tb of data and dumped into SQL Server. In addition, built various Spark jobs to run Data Transformations and Actions.
  • Writing a different APIs to connect with the different Media Data feeds like, Prisma, Double Click Management, Twitter, Facebook, Instagram and Amnet to get the Data using Azure Web Job and Functions integration with Cosmos DB.
  • Built the trigger-based Mechanism to reduce the cost of different resources like Web Job and Data Factories using Azure Logic Apps and Functions.
  • Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.
  • Worked on developing Pyspark script to encrypting the raw data by using hashing algorithms concepts on client specified columns.
  • Responsible for Design, Development, and testing of the database and Developed Stored Procedures, Views, and Triggers
  • Create ETL scripts for the ad-hoc requests, requests to retrieve data from analytic sites.
  • Experience in Custom Process design of Transformation via Azure Data Factory & Automation Pipelines. Extensively used the Azure Service like Azure Data Factory and Logic App for ETL, to push in/out the data from DB to Blob storage, HDInsight - HDFS, Hive Tables.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in InAzure Databricks.
  • Implemented Copy activity, Custom Azure Data Factory Pipeline Activities.
  • Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).
  • Migration of on-premises data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake and Storage (ADLS) using Azure Data Factory (ADF V1/V2).

Environment: Azure, Scala, Hive, HDFS, Apache Spark, Oozie, Sqoop, Cassandra, Shell Scripting, Power BI, Mongo DB, Jenkins, UNIX, JIRA, Git

Confidential

AWS Developer

Responsibilities:

  • Developed strategy for cloud migration and implementation of best practices using AWS services like database migration service, AWS server migration service from On-Premises to cloud.
  • Responsible for Setup and build AWS infrastructure using resources VPC, EC2, S3, Dynamo DB, IAM, EBS, Route53, SNS, SES, SQS, Cloud Watch, Cloud Trail, Security Group, Auto scaling and RDS using Cloud Formation templates.
  • Backing up AWSPost GREtoS3on daily job run onEMRusing Data Frames.
  • Implementation of new tools such as Kubernetes with Docker to assist with auto-scaling and continuous integration (CI) and upload a Docker image to the registry so the service is deployable through Kubernetes. Use the Kubernetes dashboard to monitor and manage the services.
  • Worked on implementing Data warehouse solutions inAWS Redshift, worked on various projects to migrate data from one database toAWS Redshift, RDS, ELB, EMR, Dynamo DB and S3.
  • Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda, DynamoDB.
  • Set up and worked on Kerberos authentication TEMPprincipals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users.
  • Performed end- to-end Architecture & implementation assessment of various AWS services like Confidential EMR, Redshift, S3.
  • Implemented the machine learning algorithms using python to predict the quantity a user might want to order for a specific item so we can automatically suggest using kinesis firehose and S3 data lake.
  • Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Confidential Simple Storage Service ( Confidential S3) and Confidential DynamoDB.
  • Used AWS Cloud Watch to monitor and store logging information.
  • Worked on Docker CE and curl, Jenkins by configuring and maintaining for continuous integration and for end-to-end automation for all build and deployments.
  • Deployed project into Confidential web services (AWS) using Confidential elastic bean stalk.
  • Created Realtime data pipelines and frameworks with kafka, spark streaming and loading data to Hbase.
  • Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Confidential Simple Storage Service ( Confidential S3) and Confidential DynamoDB.

Environment: Spark, Hive, Pig, Spark SQL, Spark Streaming, HBase, Sqoop, Kafka, AWS EC2, S3, Cloudera, Scala IDE (Eclipse), Scala, Linux Shell Scripting, HDFS, Python, Snowflake, QlikView, Json, OpenShift, AWS Glacier

Confidential

Azure Data Engineer

Responsibilities:

  • Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools).
  • Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure, etc.
  • Involved in designs Logical and Physical Data Model for Staging, DWH and Data Mart layer.
  • Created POWER BI Visualizations and Dashboards as per the requirements
  • Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure etc.
  • Develop dashboards and visualizations to help business users analyze data as well as providing data insight to upper management with a focus on Microsoft products like SQL Server Reporting Services (SSRS) and Power BI.
  • Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight.
  • Collaborate with application architects on infrastructure as a service (IaaS) application to Platform as a Service (PaaS).
  • Build Complex distributed systems involving huge amounts of data handling, collecting metrics building data pipeline, and Analytics.
  • Architect and implement ETL and data movement solutions using Azure Data Factory, SSIS create and run SSIS Package ADF V2 Azure-SSIS IR.
  • Migrate data from traditional database systems to Azure databases.
  • Built the data pipeline using Azure Service like Data Factory to load the data from Legacy SQL server to Azure Data Base using Data Factories, API Gateway Services, SSIS Packages, Talend Jobs, custom .Net and Python codes.
  • Deploying Azure Resource Manager JSON Templates from PowerShell worked on Azure suite: Azure SQL Database, Azure Data Lake, Azure Data Factory, Azure SQL Data Warehouse, Azure Analysis Service.
  • Engage with business users to gather requirements, design visualizations, and provide training to use self-service Bl tools.
  • Design and implement end-to-end data solutions (storage, integration, processing, visualization) in Azure.
  • Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.

Confidential

Data Engineer / Data Analyst

Responsibilities:

  • Implemented Apache Airflow for authoring, scheduling, and monitoring Data Pipelines
  • Designed several DAGs (Directed Acyclic Graph) for automating ETL pipelines
  • Performed data extraction, transformation, loading, and integration in data warehouse, operational data stores and master data management
  • Developing ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake.
  • Strong understanding of AWS components such as EC2 and S3
  • Implement AWS Lambdas to drive real-time monitoring dashboards from system logs.
  • Conducted statistical analysis on healthcare data using python and various tools.
  • Developed simple and complex Map Reduce programs in Java for Data Analysis on different data formats.
  • Involved in using AWS for the Tableau server scaling and secured Tableau server on AWS to protect the Tableau environment using Confidential VPC, security group, AWS IAM and AWS Direct Connect.
  • Responsible for data services and data movement infrastructures
  • Experienced in ETL concepts, building ETL solutions and Data modeling
  • Worked on architecting the ETL transformation layers and writing spark jobs to do the processing.
  • Aggregated daily sales team updates to send report to executives and to organize jobs running on Spark clusters
  • Loaded application analytics data into data warehouse in regular intervals of time
  • Experienced in fact dimensional modeling (Star schema, Snowflake SQL), transactional modeling and SCD (Slowly changing dimension)
  • Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda, DynamoDB.
  • Used AWS Athena to Query directly from AWS S3.
  • Worked on Confluence and Jira
  • Designed and implemented configurable data delivery pipeline for scheduled updates to customer facing data stores built with Python
  • Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors) and Statistical Modeling
  • Configured AWS Lambda with multiple functions.
  • Compiled data from various sources to perform complex analysis for actionable results
  • Measured Efficiency of Hadoop/Hive environment ensuring SLA is met
  • Optimized the TensorFlow Model for efficiency
  • Implementations of generalized solution model using AWS Sage Maker
  • Analyzed the system for new enhancements/functionalities and perform Impact analysis of the application for implementing ETL changes
  • Implemented a Continuous Delivery pipeline with Docker, GitHub, and AWS

Environment: AWS, Python, HDFS, MapReduce, Flume, Kafka, Zookeeper, Pig, Hive, HQL, HBase, Spark, Kafka, ETL, Web Services, Linux RedHat, Unix.

We'd love your feedback!