We provide IT Staff Augmentation Services!

Aws Data Engineer Resume

5.00/5 (Submit Your Rating)

Rockville, MD

SUMMARY

  • Around 8+ years of experience in systems analysis, design, and development in the fields of Data Warehousing, AWS Cloud Data Engineering, Data Visualization, Reporting and Data Quality Solutions.
  • Good experience in Amazon Web Services like S3, IAM, EC2, EMR, Kinesis, VPC, Dynamo DB, RedShift, Amazon RDS, Lambda, Athena, Glue, DMS, Quick Sight, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SQS and other services of the AWS family.
  • Hands on experience in Data Analytics Services such as Athena, Glue, Data Catalog & Quick Sight.
  • Hands on expertise with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis).
  • Experience in developing the Hadoop based applications using HDFS, MapReduce, Spark, Hive, Sqoop, HBase and Oozie.
  • Hands on experience in Architecting Legacy Data Migration projects on - premises to AWS Cloud.
  • Wrote AWS Lambda functions in python for AWS's Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters.
  • Experience in building and optimizing AWS data pipelines, architectures, and data sets.
  • Hands on experience on tools like Hive for data analysis and Sqoop for data ingestion and Oozie for scheduling.
  • Experience in scheduling and configuring the oozie and also having good experience in writing Oozie workflow and coordinators.
  • Worked on different file formats like JSON, XML, CSV, ORC, Paraquet. Experience in processing both structured and semi structured Data with the given file formats.
  • Worked on Apache Spark performing the Actions, Transformations on RDDs, Data Frames & Datasets using spark SQL and Spark streaming contexts.
  • Having good experience in spark core, spark SQL and spark streaming.
  • Having good experience in different SDLC models including Waterfall, V-Model and Agile.
  • Involved in Daily standups and sprint planning and review meetings in Agile model.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, MapReduce, HDFS, Sqoop, PIG, Hive, HBase, Oozie, Flume, NiFi, Kafka, Zookeeper, Yarn, Apache Spark, Mahout, SparkMLIib

Databases: Oracle, MySQL, SQL Server, MongoDB, Cassandra, DynamoDB, PostgreSQL, Teradata, Cosmos.

Programming: Python, PySpark, Scala, Java, C, C++, Shell script, Perl script, SQL

Cloud Technologies: AWS, Microsoft Azure

Frameworks: Django REST framework, MVC, Hortonworks

Tools: PyCharm, Eclipse, Visual Studio, SQL*Plus, SQL Developer, TOAD, SQL Navigator, Query Analyzer, SQL Server Management Studio, SQL Assistance, Eclipse, Postman

Versioning tools: SVN, Git, GitHub

Operating Systems: Windows 7/8/XP/2008/2012, Ubuntu Linux, MacOS

Network Security: Kerbero

Database Modelling: Dimension Modeling, ER Modeling, Star Schema Modeling, Snowflake Modeling

Monitoring Tool: Apache Airflow

Visualization/ Reporting: Tableau, ggplot2, MatPlotLib, SSRS and Power BI

Machine Learning Techniques: Linear & Logistic Regression, Classification and Regression Trees, Random Forest, Associative rules, NLP and Clustering.

PROFESSIONAL EXPERIENCE

Confidential, Rockville, MD

AWS Data Engineer

Responsibilities:

  • Designed and setup Enterprise Data Lake to provide support for various uses cases including Storing, processing, Analytics and Reporting of voluminous, rapidly changing data by using various AWS Services.
  • Used various AWS services including S3, EC2, AWS Glue, Athena, RedShift, EMR, SNS, SQS, DMS, Kinesis.
  • Extracted data from multiple source systems S3, Redshift, RDS and Created multiple tables/databases in Glue Catalog by creating Glue Crawlers.
  • Created AWS Glue crawlers for crawling the source data in S3 and RDS.
  • Created multiple Glue ETL jobs in Glue Studio and then processed the data by using different transformations and then loaded into S3, Redshift and RDS.
  • Created multiple Recipes in Glue Data Brew and then used in various Glue ETL Jobs.
  • Design and Develop ETL Processes in AWS Glue to migrate data from external sources like S3, Parquet/Text Files into AWS Redshift.
  • Used AWS glue catalog with crawler to get the data from S3 and perform SQL query operations using AWS Athena.
  • Written PySpark job in AWS Glue to merge data from multiple tables and in Utilizing Crawler to populate AWS Glue Data Catalog with metadata table definitions.
  • Used AWS Glue for transformations and AWS Lambda to automate the process.
  • Used AWS EMR to transform and move large amounts of data into and out of AWS S3.
  • Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs using CloudWatch.
  • Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift and S3.
  • Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.
  • To analyze the data Vastly used Athena to run multiple queries on processed data from Glue ETL Jobs and then used Quick Sight to generate Reports for Business Intelligence.
  • Used AWS EMR to transform and move large amounts of data into and out of AWS S3.
  • Used DMS to migrate tables from homogeneous and heterogeneous DBs from On-premise to AWS Cloud.
  • Created Kinesis Data streams, Kinesis Data Firehose and Kinesis Data Analytics to capture and process the streaming data and then output into S3, Dynamo DB and Redshift for storage and analyzation.
  • Created Lambda functions to run the AWS Glue job based on the AWS S3 events.

Environment: AWS Glue, S3, IAM, EC2, RDS, Redshift, EC2, Lambda, Boto3, DynamoDB, Apache Spark, Kinesis, Athena, Hive, Sqoop, Python.

Confidential, Owings Mills, MD

AWS Data Engineer

Responsibilities:

  • Responsible for provisioning key AWS Cloud services and configure them for scalability, flexibility, and cost optimization
  • Create VPCs, subnets including private and public, NAT gateways in a multi- region, multi-zone infrastructure landscape to manage its worldwide operation
  • Manage Amazon Web Services (AWS) infrastructure with orchestration tools such as CFT, Terraform and Jenkins Pipeline
  • Create Terraform scripts to automate deployment of EC2 Instance, S3, EFS, EBS, IAM Roles, Snapshots and Jenkins Server
  • Build Cloud data stores in S3 storage with logical layers built for Raw, Curated and transformed data management
  • Create data ingestion modules using AWS Glue for loading data in various layers in S3 and reporting using Athena and Quicksight.
  • Create manage bucket policies and lifecycle for S3 storage as per organizations and compliance guidelines
  • Create parameters and SSM documents using AWS Systems Manager
  • Established CICD tools such as Jenkins and Git Bucket for code repository, build and deployment of the python code base
  • Build Glue Jobs for technical data cleansing such as deduplication, NULL value imputation and other redundant column removal.
  • Also build Glue jobs to build standard data transformations (date/string and Math operations) and Business transformations required by business users.
  • Used Kinesis Family (Kinesis Data streams, Kinesis Firehose, Kinesis Data Analytics) for collection, processing and analyze the streaming data.
  • Create Athena data sources on S3 buckets for adhoc querying and business dashboarding using Quicksight and Tableau reporting tools
  • Copy Fact/Dimension and aggregate output from S3 to Redshift for Historical data analysis using Tableau and Quicksight
  • Use Lambda functions and Step Functions to trigger Glue Jobs and orchestrate the data pipeline
  • Use PyCharm IDE for Python/PySpark development and Git for version control and repository management

Environment: AWS - EC2, VPC, S3, EBS, ELB, CloudWatch, CloudFormation, ASG, Lambda, AWS CLI, GIT, Glue, Athena and Quicksight, Python and PySpark, Shell scripting, Jenkins.

Confidential, NYC, NY

AWS DATA ENGINEER

Responsibilities:

  • Implemented a 'serverless' architecture using API Gateway, Lambda, and Dynamo DB and deployed AWS Lambda code from Amazon S3 buckets.
  • Created a Lambda function and configured it to receive events from your S3 bucket
  • Designed the data models to be used in data intensive AWS Lambda applications which are aimed to do complex analysis, creating analytical reports for end-to-end traceability, lineage, definition of Key Business elements from Aurora.
  • Writing code that optimizes performance of AWS services used by application teams and provide Code-level application security for clients (IAM roles, credentials, encryption, etc.)
  • Creating AWS Lambda functions using python for deployment management in AWS and designed and implemented public facing websites on Amazon Web Services and integrated it with other applications infrastructure.
  • Creating different AWS Lambda functions and API Gateways, to submit data via API Gateway that is accessible via Lambda function.
  • Responsible for Building Cloud Formation templates for SNS, SQS, Elastic search, Dynamo DB, Lambda, EC2, VPC, RDS, S3, IAM, Cloud Watch services implementation and integrated with Service Catalog.
  • Regular monitoring activities in Unix/Linux servers like Log verification, Server CPU usage, Memory check, Load check, Disk space verification, to ensure the application availability and performance by using cloud watch and AWS X-ray. implemented AWS X-Ray service inside Confidential, it allows development teams to visually detect node and edge latency distribution directly from the service map Tools.
  • Design and Develop ETL Processes in AWS Glue to migrate data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Utilized Python Libraries like Boto3, NumPy for AWS.
  • Used Amazon EMR for MapReduce jobs and test locally using Jenkins.
  • Created external tables with partitions using Hive, AWS Athena and Redshift.
  • Developed the PySpark code for AWS Glue jobs and for EMR.
  • Good Understanding of other AWS services like S3, EC2 IAM, RDS Experience with Orchestration and Data Pipeline like AWS Step functions/Data Pipeline/Glue.
  • Experience in writing SAM template to deploy serverless applications on AWS cloud.
  • Hands-on experience on working with AWS services like Lambda function, Athena, DynamoDB, Step functions, SNS, SQS, S3, IAM etc.
  • Designed and Developed ETL jobs in AWS GLUE to extract data from S3 objects and load it in data mart in Redshift.
  • Responsible for Designing Logical and Physical data modelling for various data sources on Redshift.
  • Experienced with event-driven and scheduled AWS Lambda functions to trigger various AWS resources.
  • Integrated lambda with SQS and DynamoDB with step functions to iterate through list of messages and updated the status into DynamoDB table.

Environment: AWS EC2, S3, EBS, ELB, EMR, Lambda, RDS, SNS, SQS, VPC, IAM, Cloud formation, CloudWatch, ELK Stack, Bitbucket, Python, Shell Scripting, GIT, Jira, Unix/Linux, AWS X-Ray, Dynamo DB, Kinesis.

Confidential, Jersey City, NJ

AWS Data Engineer

Responsibilities:

  • Designed and Developed ETL Processes with pyspark in AWS Glue to migrate data from S3 to generate Reports.
  • Involved in writing and Scheduling the Databricks jobs Using Airflow.
  • Used Sagemaker as dev endpoint for the glue development.
  • Authored Spark Jobs for data filtering and data transforming through Pyspark data frames both in aws glue and DataBricks.
  • Used AWS glue catalog with Athena to get the data from S3 and perform sql query operations
  • Wrote various data normalization jobs for new data ingested to s3
  • Created Airflow Dags to schedule the jobs on daily, weekly, monthly schedules.
  • Designed and Developed ETL Processes with pyspark in AWS Glue to migrate data from external sources and S3 Files into AWS Redshift.
  • Involved in writing and Scheduling the Glue jobs, Building data catalog and mapping from S3 to Redshift.
  • Created AWS Lambda functions and assigned IAM roles to schedule python scripts using CloudWatch Triggers to support the infrastructure needs that needed extraction of xml tags.
  • Involved in connecting Redshift to Tableau for creating dynamic dashboard for analytics team.
  • Authored Spark Jobs for data filtering and data transforming through Pyspark data frames.

Environment: AWS EMR 5.0.0, EC2, S3, Oozie 4.2, Kafka, Spark, Spark SQL PostgreSQL, Shell Script, SQOOP1.4, Scala.

Confidential, Charlotte, NC

Hadoop Developer

Responsibilities:

  • Developed Hive, and Bash scripts for source data validation and transformation.
  • Automated data loading into HDFS and Hive for pre-processing the data using One Automation.
  • Gather data from Data warehouses in Teradata and Snowflake.
  • Developed Spark/Scala, and Python for regular expression projects in the Hadoop/Hive environment.
  • Designed and implemented an ETL framework to load data from multiple sources into Hive and from Hive into Teradata.
  • Generate reports using Tableau.
  • Experience in building Big Data applications using Cassandra and Hadoop.
  • Utilized SQOOP, ETL, and Hadoop Filesystems APIs for implementing data ingestion pipelines
  • Worked on Batch data of different granularity ranging from hourly, daily to weekly, and monthly.
  • Hands-on experience in Hadoop administration and support activities for installations and configuring Apache Big Data Tools and Hadoop clusters using Cloudera Manager.
  • Handled Hadoop cluster installations in various environments such as Unix, Linux, and Windows.
  • Assisted in upgrading, configuration, and maintenance of various Hadoop infrastructures like Ambari, PIG, and Hive.
  • Developing and writing SQLs and stored procedures in Teradata. Loading data into a snowflake and writing Snow SQLs scripts
  • TDCH scripts for a full and incremental refresh of Hadoop tables.

We'd love your feedback!