We provide IT Staff Augmentation Services!

Aws Data Engineer Resume

0/5 (Submit Your Rating)

Washington D, C

SUMMARY:

  • 6+ years of experience as a technical software developer in Big Data, Hadoop Ecosystem, Data Warehousing, Cloud Engineering.
  • Experienced in designing and developing Hadoop ecosystem including HDFS, Spark, Hive, Kafka, Map - Reduce, Oozie, Spark SQL and Sqoop.
  • Hands-on experience on Teradata RDBMS with knowledge on BTEQ, MULTILOAD, FASTLOAD, TRUMP and FAST EXPORT utilities.
  • Good exposure on Relational Data Modeling and Dimensional Data modeling, Star Schema and Snowflake schema modeling, Physical & logical data modeling.
  • Experienced in RDD architecture and implementing Spark operations, optimizing transformations and actions using PySpark.
  • Have done IT data analytics projects where I migrated on-premises ETLs to Google Cloud Platform (GCP) using cloud native tools like BIG query, Google Cloud Storage, Cloud Data Proc and Composer.
  • Expertized in Data Migration, Data Profiling, Data Ingestion, Data Cleansing, Data Importing, Transformation, and Data Export by using ETL tools.
  • Skilled in creating real time data streaming solutions using Apache Spark / Spark Streaming, Kafka and Flume.
  • Experienced in creating Hive tables, complex queries using HiveQL and implementing Hive simple, generic custom UDF’s and developed multiple hive views for presentation layer.
  • Handled Oozie and Airflow in designing time driven and data driven automated workflow.
  • Have an experience on AWS cloud segments (HDInsight, Databricks, Blob Storage, DateLake, Storage Explorer, Data Factory, SQL DB, SQL DWH, and Cosmos DB).
  • Extensive involvement in Continuous Integration (CI) integration, also Continuous Deployment on numerous JAVA based applications using Jenkins, TeamCity, Azure, DevOps, Maven, Git, Nexus, Docker and Kubernetes.
  • Experience on cloud base services like AWS EC2, EMR and GCP Data procs, GCP BigQuery, Compute Engine and GCS to work in distributed data model
  • Experience in NoSQL databases such as HBase and Cassandra and ensure faster access to data on HDFS.
  • Worked on RDBMS including MYSQL, Oracle, MS SQL, PostgreSQL and Netezza.
  • Worked in building, configuring, monitoring, and supporting Cloudera and Horton Works Hadoop Platform.
  • In depth understanding of Hadoop Architecture, workload management, schedulers, scalability, and various components such as HDFS, MapReduce and Yarn.
  • Undertake data analysis and collaborated with down-stream, analytics team to shape the data according to their requirement.
  • Developed Spark applications using Scala and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Designed strategies for optimizing all aspect of the continuous integration, release and deployment processes using container and virtualization techniques like Docker and Kubernetes. Built Docker containers using microservices project and deploy to Dev.
  • Profound involvement in building ETL pipelines between a few source frameworks and Enterprise Data Warehouse by utilizing Informatica PowerCenter, SSIS, SSAS and SSRS.
  • Developed story telling dashboards in Tableau Desktop and published them on to Tableau Server, which allowed end users to understand the data on the fly with the usage of quick filters for on demand needed information.

TECHNICAL SKILLS:

BigData and Hadoop Ecosystem: HDFS, MapReduce, Hive, YARN, Pig, Sqoop, Flume, Oozie, Zookeeper, Kafka, Cassandra, Apache Spark, Spark Streaming, HBase, Flume, ImpalaHadoop Distribution: Cloudera CDH, Horton Works HDP, Apache, AWS

Machine Learning Classification Algorithms: Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbour (KNN), Principal Component Analysis

Database: Oracle, SQL Server, MySQL, Cassandra, Teradata, PostgreSQL, Netezza, MS Access, Snowflake, Hive, NoSQL Database (HBase, MongoDB).

Languages: Shell scripting, SQL, PL/SQL, Python, R, PySpark, Pig, Hive QL, Scala, Regular Expressions

IDE & Tools, Design: Eclipse, Visual Studio, Net Beans, Junit, CI/CD, SQL, SQL Developer, Workbench, Tableau.

Operating Systems: Linux (Redhat, CentOS), Mac OS, Windows, Unix

Cloud Environment: AWS (S3, IAM, Dynamodb, Lambda, Spark, HDInsight, Databricks, Blob Storage, DateLake, Storage Explorer, SQL DWH, and Cosmos DB, daRedShift, Kinesis, Cloud formation), Microsoft Azure, EMR, Google Cloud Platform (GCP), Compute Engine and GCSpost

Others: Jenkins, TeamCity, JupyterNoteBook, DevOps, Maven, Git, nexus, Docker, Kubernetes

PROFESSIONAL EXPERIENCE:

Confidential, Washington D.C.

AWS Data Engineer

Responsibilities:

  • Experience in Big data ecosystem along with Cloud Data Engineering using all the AWS services.
  • Worked with Amazon Web Services to deploy files into Buckets and to collect clean files in Amazon S3 using Amazon EC2 Clusters.
  • Contributed to the creation of production Data pipelines to enhance decision-making, make it quicker, and make it more solution focused.
  • Developed enterprise solutions using the Apache Spark, HDFS, Hive, Zookeeper and YARN.
  • Ingested the data from various sources like terraform ad SQL server, into the AWS S3 buckets using Sqoop and then performed transformations with the help of Hive.
  • Migrated the data from on-premises MS SQL Databases to AWS Redshift using the AWS Database migration Service.
  • Extensively coded in SQL and Python using the ETL scripts too, to read the data from the cloud servers.
  • Responsible for designing the security framework for protecting the access to S3 using AWS Lambda and DynamoDB.
  • Designed and developed the end-to-end pipelines and implemented various AWS EMR.
  • Used Amazon EMR for map reduction jobs and test locally using Jenkins.
  • Built ETL pipelines and dealt with data integration using SQL Server Integrations Services (SSIS) and Data Transformation Services (DTS) Packages.
  • Used EMR to transfer large amounts of datasets from and to different databases like Amazon S3 and DynamoDB.
  • Using Streamsets, Kinesis streams, and API gateways, many streaming systems supplied a daily data volume of over 100 million to S3. Several data-driven pipelines were created, and the data was stored on S3 utilizing Step functions, Lambdas, and Spark.
  • Developed a variety of Cloud Formation templates for the deployment of EC2 instances across environments.
  • Even used Spark SQL to process the date in Spark. Performed all the computations in Spark and EMR after which the data was written back into Hive.
  • Designed and implemented Splunk clustered search head and index, deployment servers, deployers.
  • Used Kubernetes to deploy scale, load balance, scale and manage docker containers with multiple names spaced versions.
  • Using step functions, Lambda was integrated with SQS and DynamoDB to loop through the list of messages and update the status in the DynamoDB table.
  • Copied data from S3 to Snowflake and connect with SQL workbench seamless importing and movement of data via S3
  • With the use of API gateways, Kinesis streams, and streamsets, various streaming systems provided S3 with more than 100 million data packets each day. Step functions, lambdas, and spark were used to develop several data-driven pipelines, and the data was stored on S3.

Environment: AWS (EC2, S3, EBS, ELB, RDS, SNS, SQS, VPC, LAM Cloud formation, CloudWatch, ELK Stack), Bitbucket, Ansible, Python, Shell Scripting, Powershell, GIT, Jira, JBOSS, Bamboo, Docker, PostgreeSQL, Web Logic, Maven, Web sphere, Unix/Linux, AWS -ray, DynamoDB, SNOWFLAKE, Kinesis, CodeDeploy, CodePipeline, CodeBuild, CodeCommit, Splunk, SonarQube.

Confidential

Azure Data Engineer

Responsibilities:

  • Migrated data from database systems to Azure databases.
  • Implemented end to end data solutions in Azure.
  • Worked on data migration by transferring the data from on-premises database to Azure cloud server. The Azure Database Migration Service helped in better lowering the downtimes for transferring the data to Azure Data platform.
  • Performed ETL to deploy the data from sources to Azure Data Storage services along with Azure Data Factory.
  • Designed and developed database objects like tables, views, indexes for normalization.
  • Performed DBA tasks at the application level, such as establishing tables and indexes.
  • Created pipelines in Azure Data Factory and processes the resultant data in Azure Data Bricks.
  • Designed and implemented database solutions in Azure SQL Data Warehouse, Azure SQL.
  • Develop recommendations for the proper scale of data infrastructure while proposing topologies that take cost and Azure spending into account.
  • Involved in Data Migration using SQL, SQL Azure, Azure storage and Azure Data Factory, SSIS, PowerShell.
  • Developed C# programs that load data from online APIs and Azure storage blobs to Azure SQL.
  • Experience in Work on AWS Databases like Elastic Cache and NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Implemented a complete automated build-release solution using a combination of technologies like Maven, TFS, Jenkins.
  • Recreating current application logic and functionality in the azure data lake, data factory, SQL database and SQL Datawarehouse environment. Implemented DHW/BI project implementation using Azure DF and databricks.
  • Created workflows and new sessions using the Power Center Workflow Manager.
  • Designed and implemented ETL pipelines and data movement solutions using Azure Data Factory, SSIS. Created and run SSIS Package ADF V2 Azure-SSIS IR.
  • Deploying azure resource manager JSON templates from PowerShell works on azure suit: azure SQL database, Azure data lake, Azure data factory, Azure SQL data warehouse, azure analysis service.
  • Using Azure Data Factory to migrate on-premises data to an Azure Data Lake repository.
  • Create dashboards and visualizations with an emphasis on Microsoft technologies like SQL Server reporting services and Power BI to assist business users in data analysis and to give higher management access to data insight.
  • Used Azure Synapse for the data integration and data warehousing. Combined with this, we built the Business Intelligence team to forecast the results by using Power BI.
  • Created Tableau Dashboards as well, for giving a summarized report with the heat maps and pie charts.
  • Responsible for understanding the client requirements and interacting with the business owners and other stakeholders to uprise the project by providing them with accurate analytics.

Environment: Azure SQL, Azure Storage Explorer, Azure Storage, Azure Blob Storage, Azure Backup, Azure Files, Azure Data Lake Storage, Azure Data Factory, DBA, SQL Server Management Studio 2016, Visual Studio 2015, VSTS, Azure Blob, Power BI, PowerShell, C# .Net, Cassandra, MongoDB, SSIS, DataGrid, Maven, TFS, Tableau, Jenkins, ETL Extract Transformation and Load, Business Intelligence (BI).

Confidential

Big Data Developer

Responsibilities:

  • Worked on Hortonworks-HDP 2.5 distribution.
  • Built a scalable distribution data solution by using Hadoop.
  • Used Sqoop to import data from MS SQL Server, MySQL, and Teradata into HDFS
  • Written HiveQL queries to integrate various tables for create views to generate result set.
  • Used Flume to collect log data from Web Servers and integrate it into HDFS.
  • Loaded and transformed wide sets of structured and unstructured data.
  • Used different Hive options to write Hive ETL like overwrite tables, changing data type add columns, changing serde properties. Substring, used Impala in running queries for optimization.
  • Used MapReduce programs for data cleaning and transformations and load the output into the Hive tables in different file formats
  • Cleaned and transformed data using MapReduce processes, then loaded the output into Hive tables in different file formats.
  • Queried and analyzed data from Cassandra for quick searching, sorting, and grouping through CQL.
  • Joined various tables in Cassandra using Spark and Scala and ran analytics on top of them.
  • Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location
  • Designed data pipelines to load data from DynamoDB into AWS S3 buckets and then into HDFS locations for different events.
  • Mastered major Hadoop distributes like Horton Works and Cloudera numerous Open-Source projects and prototype various applications that utilize modern Big Data tools.
  • Used Cloudera Manager continuous monitoring and managing of the Hadoop cluster for working application teams to install operating systems, Hadoop updates, patches, version upgrades as required.
  • Loaded data into HBase NoSQL database.
  • Designed and developed controller to invoke splunk and Nexus adapters for log extraction kit download to extract schema respectively.
  • Responsible for the gathering of data, cleaning it and dealing with the missing values in various reports and dashboards, which helped in debugging the Tableau dashboards.
  • Built, managed, and scheduled Oozie workflows for end-to-end job processing.
  • Written UDFs for extending Hive and Pig core functionality using Java.
  • Used SparkSQL to analyze large volumes of structured data.

Environment: & Tools: Hortonworks, Hadoop, HDFS, Pig, Sqoop, Hive, Impala, Cassandra, Cloudera, Oozie, Zookeeper, NoSQL, HBase, Shell Scripting, Splunk, Nexus, Scala, Spark, Tableau, SparksSQL

Confidential

SQL Developer

Responsibilities:

  • Worked on tables, packages, functions, procedures, collections, triggers, cursors, exceptions, synonyms, views, ref cursors, sequence, performance tuning, API, interfaces, Lookups, processing constrain.
  • Contributed to delivering the POC for the new webservices DTO model flow.
  • Debugged Order management, Purchase order and pricing issue in IAT, UAT and production and fix the issue.
  • Created DE fix scripts for hold orders and updated the method for handling previously placed orders
  • Prepared test plan and test cases for various types of testing like unit, functional, performance and regression
  • Worked on the specification of functional and technical requirements documentation.
  • Involved in deploying and executing the code in oracle
  • Participated in the integration of a third-party tool with Oracle
  • Worked on creating an estimation plan to implement the change request based on the code freeze dates in various situations.
  • Comprehensive teamwork with the clients to gather requirements for solutions
  • Complemented requirement analysis and compiled a list of clarifications and issues
  • In charge of using SVN to verify code quality
  • Responsible for day-to-day Production Support operations, Job monitoring, Incident ticket resolution, on time delivers and code deployment

Environment: & Tools: Oracle 11g/10g, SQL * Plus, TOAD, SQL*Loader, SQL Developer, Shell Scripts, UNIX, Windows XP

We'd love your feedback!