We provide IT Staff Augmentation Services!

Aws Architect & Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Vienna, Virginia

PROFESSIONAL SUMMARY:

  • Over 10+ years of working experience in system requirements, analysis, design, implementation, development, testing, estimation, gathering of various business applications using Technologies and Databases like Spark, Scala, Python, AWS, Lambda, EMR, EC2, Hadoop, Teradata, DB2, Redshift, DynamoDB, HBase, Hive, SQL, Shell scripting, JCL.
  • Cloud and Big Data Architect, hands - on technical decision maker with extensive experience in designing and leading complex enterprise integrated systems. Complete end-to-end large-scale Data Warehouse, and Big Data and implemting best Security practies centered around cloud.
  • Excellent understanding / knowledge on below Cloud Migration Projects.
  • Teradata and DB2 to HDFS/Hive
  • HDFS Lake to Redshift, Snowflake and s3 Data Lake
  • Teradata and Redshift to S3 Data Lake
  • Teradata/Redshift to Snowflake and s3 OneLake
  • Strong experience on building tools/frameworks for migration projects, created more than 50 Tools/Framework for Cloud Migration.
  • Good experience in building reporting Dashboard for migration projects using nodejs, html, css, AWS Quicksight and Elastic Beanstalk
  • Strong experience in Building Serverless Architecuture using Lambda.
  • Good Experience with creating EMR using Python script, CFT and AWS Console.
  • Experience in using various Amazon Web Services (AWS) Components like Transient and long-running EMR for Big Data processing and Analysis using Spark on Scala using EMRFS and HDFS, EC2 for virtual servers, S3 and Glacier for storing objects, VPC, Auto Scaling, Cloud Formation, EBS, Step Functions for ETL Pipeline, CloudWatch for log analysis and even trigger, Lambda for serverless framework, Cloud Trail for audit, CDH Clusters, Redshift,Snowflake, SNS, Route53 to route traffic to disaster recovery region, Athena and Glue to analyze data in s3, Presto, Spark and Hive on EMR.
  • Excelled on creating AMI (AWS Machine Images).
  • Good Working experience with EC2 and EMR Rehydration to update the AMI of the EC2 instances running the application.
  • Good Experience in creating Transient EMR using Lambda(Python) for batch and ETL, long-running clusters for daily batch jobs.
  • Creation and Maintation of database Objects like Tables, Views and schemas in Databases like DB2, Hive, Redshift, DynamoDB and Snowflake.
  • Strong experience in designing External, Paritition and Non Parition tables on HDFS and s3 Data Lake.
  • Excellent understanding/knowledge ofHadooparchitecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node.
  • Good experience on Setting up S3 Cross region replication.
  • Good experience on AWS Disaster Recovery exercise to fail over to west region from east region.
  • Good exposure in interacting with clients and can communicate effectively with people at different levels including stakeholders, internal teams and the senior management.

TECHNICAL SKILLS:

Big Data Ecosystems: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, Zookeeper, Databricks, Sqoop.

AWS: EMR, EC2, Lambda, Step Functions, Cloudwatch, CloudTrail, S3, Redshift,DynamoDB, AutoScaling, Athena, EBS, EFS, IAM, VPC, CloudFormation, SNS, Glue, Route53, Presto, QuickSight, Athena, Hue, Auto Scaling, Kinesis, s3 Glacier, Elastic Beanstalk.

Scripting Languages: Spark, Shell Scripting, Python, DB2 SQL, JCL and Bash

Databases: DB2, Hive, Teradata, Redshift, DynamoDB, HBASE and Snowflake.

Tools: Eclipse, Cloudera, Databricks.

Platforms: Linux, MVS OS/390

Methodologies: Agile

Domain: Banking, Retail.

PROFESSIONAL EXPERIENCE:

Confidential, Vienna, Virginia.

AWS ARCHITECT & Hadoop Developer

Responsibilities:

  • Interacting with business team and product owner for requirement gathering and analysis.
  • Data profiling and data analysis to identify gaps or redundancy.
  • Design, development and Implementation of ETL solutions on AWS and Big Data environment and ensure the migration of existing objects from on-premises, Redshift and HDFS to s3 DataLake and Snowflake.
  • Developed end-to-end data pipeline using CLoudwatch, Step Functions, Lambda, EMR, Spark, EMRFS. HDFS and Hive.
  • Contribute to innovation by exploring, investigation, recommending BIG DATA and related technologies for various business applications.
  • Developed Data Validation Tool to compare data between all the sources on Cloud and On-prem( Databases like Teradata, DB2, Hive, Redshift and files like HDFS, s3, local files). Supports multiple file format like Parquet, csv and Json.
  • Developed Job monitoring tool to Monitor the jobs scheduled using Data Pipeline, Step Functions, CA-7, AROW, Control-M and Jobs scheduled on EMR.
  • Developed a Interactive UI for AWS Architecture.This UI is very usefull in all the Client demo, meetings and also to train new joiners in the team.
  • Design, implement and maintain all AWS infrastructure and services within a managed service environment and maintain enterprise class security.
  • Creation of EC2 Instances, Transient and long-running EMRs to support peta bytes of data.
  • Used Databricks to process big data using Python and Spark.
  • Termination of EC2 and EMR instances using Scheduled Lambda functions.
  • Provided security and managed user access using AWS Identity and Access Management (IAM), including creating new Policies, IAM Roles, setting cross account access.
  • Involved in converting Ab-inito, DB2 and Hive/SQL queries into Spark transformations using Spark RDDs, Dataframe and PySpark.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Pig Scripts for joining, grouping, sorting and filtering the data.
  • Developed Scripts to convert file format(CSV to Parquet, Parquet to CSV, Json to and CSV)
  • CloudTrail to audit AWS Resources and Utilize Cloud Watch to monitor resources such as EC2, CPU memory, Amazon RDS DB services, Dynamo DB tables, and EBS volumes.
  • Creation of Redshift cluster and Redshift Objects and setting up Redshift Cross regision snapsot.
  • Disaster Recover of Redshift cluster using Cross Region snapshot.
  • Creating S3 buckets also managing policies for S3 buckets and Utilized S3 Glacier.
  • Created EMR using Python script, CFT and AWS Console.

Confidential, Vienna, Virginia.

AWS ARCHITECT & Hadoop Developer

Responsibilities:

  • Interacting with business team and product owner for requirement gathering and analysis.
  • Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
  • Design, implement and maintain all AWS infrastructure and services.
  • Design, development and Implementation of ETL solutions on AWS and Big Data environment and ensure the migration of existing objects from HDFS to Redshift and s3 DataLake .
  • Built state machines to execute the data migration steps in a reliable and scalable fashion.
  • Developed tool to validate data/count between HDFS, Redshift, s3 Data Lake. Generate reports and trigger automated email to users/customers.
  • Worked on AWS Disaster Recovery exercise which involved creation of CFT using existing AWS infrastructure in East and deploy in west to fail over to west region from east region.
  • Queue Spark jobs to long-running EMR using Lambda to run adoch jobs.
  • Monitor Cloud Trail logs to Audit EC2, EMR, S3 and other AWS resources.
  • Created External Hive table which maps to Dynamo DB to store log data from HDFS to DynamoDB.
  • Good experience in installing Hadoop and open source softwares.
  • Managed and reviewedHadooplog files and good Experience in debugging EMR Logs

Confidential

Lead Engineer

Responsibilities:

  • Implemented solutions for ingesting data from Teradata and DB2 to Redshift and s3 datalake and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive.
  • Lead development team in designing/migrating AWS cloud based solutions.
  • Ability to present technical concepts and Cloud Managed Services in clear manner to customers, internal and external clients through demos, proposals & presentations.
  • Unified data lake architecture integrating various data sources on Hadoop architecture and AWS s3.
  • Integrated Serverless architecture with Lambda and DynamoDB to store the status reports.
  • Used Amazon QuickSight to build visualization and Ad-hoc analysis and built visualization using Bar Chart, Line Chart and Pi-chart.
  • Redesigned the existing Ab-initio ETL mappings & workflows using Spark SQL and HiveQL.
  • Ingest data into Redshift and s3 Data Lake from Teradata and DB2.
  • Developed Data Migartion tools using Spark Scala and Python to migrate data from Teradata to S3 Datalake and Redshift.
  • Developed Data Validation tool to validate the data migrated from Teradata to S3 and Redshift.
  • Developed end-to-end data pipeline using CLoudwatch, Step Functions, Lambda, EMR, Spark, Teradata TBUIL, Teradata Bteq, PostreSQL, EMRFS to migrate data from Teradata to Redshift and s3 as a serverless architecuture, Terminate the EMR after data migaration.
  • Perform Rehydration to update the AMI of the EC2 instances running the Application.
  • Assumed IAM Role from different AWS account using Spark Scala and Python to access data from s3 bucket.
  • Used presto to query the reports stored in postgreSQL and built reports to send it to end users.
  • Used Glue as a Data Catalog and access structured and unstructured s3 data using Athena.
  • Performed Disaster recovery exercies of AWS resources like Redshift, s3, EC2 and EMR to fail over to west region from east region.
  • Used Hue s3 browser to access s3, HDFS. Used Hue Pig and Hive editors. Used Hue job browser to check the status of Jobs running on EMR
  • Integrated NoSQL database like Hbase with Map Reduce to move bulk amount of data into HBase.
  • Developed Tools to Download data from Teradata and DB2.
  • Built HiveQL to transform Ab-inito Graphs to HiveQL.
  • Creating Hive tables and promoting it into various subsystems including production.

Confidential

DB2 DBA

Responsibilities:

  • Providing 24/7 support for the critical systems.
  • Performance Monitoring and Tuning.
  • Taking Regular Backup and Restoration whenever needed.
  • Perform maintenance activities for Production.
  • Creation of Objects like Table and promoting it in to various subsystems including production.
  • Database Replication experience.
  • Automation of house keeping jobs and Handling Production,staging and test sub-system failures.
  • Measured package performance suggested SQL tuning and/or created index/execute DB2 utilities forbetter data organization and SQLperformance.

We'd love your feedback!