Aws Data Engineer Resume
St Brooklyn, NY
SUMMARY
- A dynamic professional with 6+ years of experience as a Big Data Engineer, ETL Developer, and Java Developer incorporates planning, creating, and carrying out information models for big business level applications.
- Excellent comprehension of advances on frameworks that contains colossal measures of data and run for an exceptionally and highly distributed fashion in Cloudera, Hortonworks Hadoop distributions, and Amazon AWS.
- Experience in large scale application development usingBig Data ecosystem - Hadoop (HDFS, MapReduce, Yarn), Spark, Hive, Impala, HBase, Sqoop, Pig, Airflow, Oozie, Zookeeper, Ambari, Flume, Apache Nifi, AWS,Azure, Google Cloud Platform.
- Sound Experience with AWS services like Amazon EC2, S3, EMR, Amazon RDS, VPC, Amazon Elastic Load Balancing, IAM, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS, and Lambda to trigger resources.
- Analytics and cloud migration from on-premises to AWS Cloud with AWS EMR, S3, and DynamoDB.
- Worked on ETL Migration services by creating and deploying AWS Lambda functions to provide a serverless data pipeline that can be written to Glue Catalog and queried from Athena.
- Detailed knowledge of AWS databases Elasticache (Memcached and Redis) and NoSQL databases such as HBase, DynamoDB, as well as database performance tuning and data modeling, tuning, disaster recovery, backup and creating data pipelines.
- Design and Developed data pipeline to ingest transactional data to S3 as data-lakes using BDA,Kinesis Streams, lambda and glue.
- Developed test programs manipulating data directly from/intoHBase tables for testing/analysis purpose.
- Implemented microservices architecture with API Gateway, Lambda andDynamoDB and deployed Applications/Infrastructure using core AWS - EC2, S3, RDS, EBS,DynamoDB, SNS, SQS.
- Experience in creating and managing, reporting and analytics infrastructure for internal business clients using AWS services including Athena, Redshift, Spectrum, EMR, and Quick Sight.
- Extensive experience developing and implementing cloud architecture on Microsoft Azure.
- Created an Azure SQL database, monitored it, and restored it. Migrated Microsoft SQL server to Azure SQL database
- Experience with Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical administrations, Big Data Technologies (Apache Spark) & Data Bricks.
- Experience in building data pipelines using Azure Data Factory, Azure Databricks, and loading data to Azure Data Lake, Azure SQL Database, Azure SQL Data Warehouse to control and grant database access.
- Created a connection from Azure to an on-premises data center using the Azure Express Route for Single and Multi-Subscription.
- Experience in OLTP/OLAP system study, analysis, and E-R modeling, as well as developing database schemas such as the Star schema and Snowflake schema, which are utilized in relational, dimensional, and multidimensional modeling.
- Extensive knowledge in all phases ofData Acquisition and Data Warehousing, Data Modeling analysis using Star Schema and Snowflake for FACT and Dimensions Tables.
- Used Oozie workflow engine to manage independentHadoop jobs and to automate several types ofHadoop such as java Map Reduce, Hive and Sqoop as well as system specific jobs.
- Experience creating Web-Services with the Python programming language.
- Involved in migration of the legacy applications to cloud platform using DevOps tools like GitHub, Jenkins, JIRA, Docker, and Slack.
- Experience in designing interactive dashboards, reports, performing ad-hoc analysis, and visualizations usingTableau, Power BI, Arcadia, andMatplotlib.
- Worked with Spark to improve the speed and optimization of current Hadoop algorithms utilizing Spark Context, Spark-SQL, Data Frame, Pair RDD, and Spark YARN.
- Worked with the Map Reduce programming paradigm and the Hadoop Distributed File System.
TECHNICAL SKILLS
Big Data Ecosystems: HDFS, YARN, MapReduce, Spark, Kafka, Hive, Airflow, StreamSets, Sqoop, HBase, Oozie, ZooKeeper, Nifi, Ranger, Ambari
Scripting Language: Python, Scala, PowerShell Scripting, Pig Latin, HiveQL.
Cloud Environment: Amazon Web Services (AWS), Microsoft Azure
NoSQL Database: Cassandra, Redis, MongoDB, Neo4j
Database: MySQL, Oracle, Teradata, MS SQL SERVER, PostgreSQL, DB2
Version Control: Git, SVN, Bitbucket
ETL Tools: Tableau, Microsoft Excel, Informatica, Power BI, R, Google Data Studio
Application Server: Apache Tomcat 5.x 6.0, JBoss 4.0
Others: Machine learning, NLP, Spring Boot, Jupyter Notebook, Terraform, Docker, Kubernetes, Jenkins, Ansible, Splunk, Jira
PROFESSIONAL EXPERIENCE
Confidential, St. Brooklyn, NY
AWS Data Engineer
Responsibilities:
- Extensively used AWS Athena to import structured data from S3 into other systems such as RedShift or to generate reports.
- Worked with Spark to improve the speed and optimization of Hadoop's current algorithms.
- RDBMS as part of a Proof of Concept (POC) on Amazon EC2.
- Migrated an existing on-premises application to AWS. AWS services, for example, EC2 and S3 were utilized for data set processing and storage.
- The Spark-Streaming APIs were used to perform on-the-fly changes and activities for the common learner data model, which gets information from Kinesis in real-time.
- Performed end-to-end architecture and implementation evaluations of different AWS services such as Amazon EMR, Redshift, S3, Athena, Glue, and Kinesis. Hive We created external table schemas for the data being processed as the primary query engine of EMR.
- Created Apache presto and Apache drill configurations on an AWS EMR (Elastic Map Reduce) cluster to integrate different databases such as MySQL and Hive. This allows for the comparison of outcomes such as joins and inserts on many data sources controlled by a single platform.
- Developed views and templates withPython and Django's view controller.
- AWS RDS (Relational database services) was created to act as a Hive meta store, and metadata from 20 EMR clusters could be integrated into a single RDS, preventing data loss even if the EMR was terminated.
- Developed and implemented ETL pipelines on S3 parquet files in a data lake using AWS Glue.
- Good experience in working with MuleSoft 3.9+ runtime, Mule Expression Language (MEL) to access payload data, properties and variable of Mule Message Flow.
- Worked on developing batch integrations to transfer data in bulk between enterprise applications usingMuleSoft ESB.
- Application integration usingMuleSoft ESB and IBM Message Broker for integrating and orchestrating the services.
- Developed a cloud formation template in JSON format to utilize content delivery with cross-region replication using Amazon Virtual Private Cloud.
- Implemented SQLAlchemy which is apython library for complete access over SQL.
- Dealt withPython Open stack API's, usedPython scripts to update content in the database and manipulate files.
- Developed Pyspark script to perform ETL using glue job, where the data is extracted from S3 using crawler and creating a data catalog to store the metadata.
- Designed and developed an entire module in python and deployed in AWS GLUE usingPyspark library and Python.
- Vast experience on the Teradata database, most work was ELT with transformations and optimizations done in Teradata. I createdpipelines to load data into the EDW.
- Worked on the code transfer of a quality monitoring application from AWS EC2 to AWS Lambda, as well as the construction of logical datasets to administer quality monitoring on snowflake warehouses.
- Worked on creating workloads HDFS on Kubernetes clusters to mimic the production workload for development purposes.
- Code Commit, Code Build,Code Deploy, Code Pipeline, Jenkins, Bit bucket Pipelines, Elastic Beanstalk.
- Worked on ETL Migration services by creating and deploying AWS Lambda functions to provide a serverless data pipeline that can be written to Glue Catalog and queried from Athena.
Environment: Python, Databricks, PySpark, Kafka, GitLab, PyCharm, AWS S3, Delta Lake, Snowflake. Cloudera CDH 5.9.16, Hive, Impala, Kubernetes, Flume, Apache Nifi, Java, Shell-scripting, SQL, Sqoop, Oozie, Java, Python, Oracle, SQL Server, HBase, PowerBI, Agile Methodology.
Confidential, Dallas, TX
Azure Data Engineer
Responsibilities:
- Performed migration of several Databases, Application and Web servers from on-premises environments to MicrosoftAzure Cloud environment.
- UsedAzure Data Factory extensively for ingestingdata from disparate source systems.
- UsedAzure Data Factory as an orchestration tool for integratingdata from upstream to downstream systems.
- Designed and Implemented pipelines inAzure Synapse/ADF to Extract, Transform and loadData from several sources including Azure SQL,Azure SQL Datawarehouse etc.
- Hands on experience designing and buildingdata models anddata pipelines onData Warehouse focus andData Lakes.
- Experienced in managingAzure Data Lake Storage (ADLS), Databricks Delta Lake and an understanding of how to integrate with otherAzure Services.
- Experienced in creating aData Lake Storage Gen2 storage account and a file system.
- Experience in implementation of Microservices / REST APIbased on MuleSoft ESB/APIGateway.
- Integrateddata storage options with Spark, notably withAzure Data Lake Storage and Blob storage.
- Used Copy Activity inAzure Data Factory to copydata amongdatastores located on-premises and in the cloud.
- Created Python notebooks onAzure Databricks for processing the datasets and load them intoAzure SQL databases.
- Worked on advanced analytics using Python with Modules like Pandas, NumPy and Diplomata fordata extraction, manipulation.
- Deployed containerized Airflow on K8s for job orchestration (Docker, Jenkins, Airflow,Python).
- Built the trigger-based Mechanism to reduce the cost of different resources like Web Job andData Factories usingAzure Logic Apps and Functions.
- Worked on creating star schema for drillingdata. Created pyspark procedures, functions, packages to loaddata.
- Development level experience in MicrosoftAzure providingdata movement and scheduling functionality to cloud-based technologies such asAzure Blob Storage andAzure SQL Database.
- Hands on experience in creating pipelines inAzure Data Factory V2 using activities like Move &Transform, Copy, filter, for each, Get Metadata, LookupData Bricks etc.
- Developed code to parsedata formats like Parquet, Json, etc. Fetcheddata from theData Lake and other sources to aggregatedata.
- Successfully executed a proof of concept forAzure implementation, with the wider objective of transferring on-premises servers anddata to the cloud.
- Created Dynamic Pipelines, data Sets, Linked servers inAzure Data Factory (ADF) fordata movement and data transformations.
- Developed mapping spreadsheets that will provide theDataWarehouse Development (ETL) team with source to targetDataMapping.
- Native integration with Azure Active Directory (Azure AD) and otherAzure services enables to build moderndata warehouse and machine learning and real-time analytics solutions.
- Used Hive queries to analyze huge data sets of structured, unstructured, and semi-structured data.
- Used structured data in Hive to enhance performance using sophisticated techniques including bucketing, partitioning and optimizing self joins.
Environment: Azure Cloud, Azure Data Factory (ADF v2), Azure functions Apps, Azure Data Lake, BLOB Storage, SQL Server, Windows remote desktop, AZURE PowerShell, Databricks, Python, Kubernetes, Azure SQL Server, Azure Data Warehouse.
Confidential, Houston, TX
Data Engineer
Responsibilities:
- Created Scala apps for loading/streaming data into NoSQL databases (MongoDB) and HDFS is preferred.
- Performed T-SQL tuning and query optimization for and SSIS packages.
- Developed distributed algorithms for detecting and successfully processing data trends.
- Extensive experience in working with various distributions ofHadoop enterprise versions of Cloudera (CDH4/CDH5), Hortonworks and good knowledge on MAPR distribution.
- Created dataflow between SQL Server andHadoop clusters using Apache Nifi.
- Experienced in running query using Impala and used BI tools to run ad-hoc queries directly onHadoop.
- Exploring with Spark for improving the performance and optimization of the existing algorithms inHadoop.
- Used Oozie workflow engine to manage independentHadoop jobs and to automate several types ofHadoop such as java Map Reduce, Hive and Sqoop as well as system specific jobs
- Hive tables were created on HDFS to store thedata processed by Apache Spark on theHadoop Cluster in Parquet format.
- Created an SSIS package to import data from SQL tables into various Excel sheets
- Used Spark SQL to pre-process, clean, and combine big data sets.
- Performed data validation using Redshift and built pipelines capable of processing more than 100TB per day.
- Developed the SQL server database system to optimize performance.
- Performed migration of databases from conventional data warehouses to spark clusters.
- Performed frequent cleaning and integrity tests, you can ensure that the data warehouse was only loaded with high-quality entries.
- Developed SQL queries to extract data from existing sources and validate format correctness.
- Design patterns for graph databases.Mentored engineers, system analysts, management, business analysts, interns, and others in data modellingand data architecture principles, patterns, and techniques.
- Multiplemodelling options for introduction of new gene-based data with pros, cons, and recommendations.
- Created automated tools and dashboards for collecting and displaying dynamic data.
Environment: T-SQL, MongoDB, HDFS, Scala, Spark SQL, Relational Databases, Redshift, SSIS, SQL, Linux, Data Validation, MS Excel.
Confidential
Software Engineer
Responsibilities:
- Working closely with Project Management and Data Analyst to convert the High-level requirement to Design Document and do the Development in Informatica Big Data Management Tool.
- Develop the mapping to load data from Legacy files to Teradata and SQL server using informatica
- Develop the mapping, parameter sets and application using informatica BDM for loading the data Databases Hive.
- Designed and developed the application using Waterfall methodology and followed Test Driven Development (TDD), Scrum.
- Used JSON for data exchange between application modules along with XML.
- Worked with PostgreSQL and search-based data storage.
- Designing the flow of the project using Waterfall Model.
- Implemented Spring ORM wiring with Hibernate provided access to Oracle RDBMS.
- Create new interfaces using java, maven/ant, Spring MVC or hibernate based on provided requirements XSD/DDF’s.
- Designed a database with 16 tables for an online shopping website on MySQL, utilizing MyBatis to generate classes based on tables. Implemented the online payment function by applying Alipay interface.
- Followed the MVC design pattern, implementing a website for customers to purchase and manage their orders and for merchants to manage orders from customers based on Spring framework.
- Automate test cases using the built-in framework in Selenium Web Driver usingNetBeans IDE.
- Used JBoss application server to deploy application into Production environment.
- Developed framework usingJava, MySQL and web server technologies.
- Managed code with unit tests and SVN, Environment: JAVA/J2EE, JSON, Spring ORM, Hibernate, Maven/Ant, Spring MVC, JSP,Servlets, Spring, Webservices, JBoss, MySQL, SVN.
- Environment: Cloudera Hadoop, Pyspark, Python, Hive, JIRA, Agile Scrum, Kanban & GIT.