Hadoop Data Engineer Resume
4.00/5 (Submit Your Rating)
SUMMARY
- 10 years total experience I.T. in Data Systems with the last 7 Years’ Experience in Big Data Architecture and Engineering.
- Experienced team lead providing mentoring to engineers, and liaison for team with stakeholders, business units, data scientists/analysts and making sure all teams collaborate smoothly
- Used to working in a production environment, managing migrations, installations, and development Experience in large scale distributed systems with extensive experience as Hadoop Developer and Big Data Analyst
- Primary technical skills in HDFS, YARN, Pig, Hive, Sqoop, HBase, Flume, Oozie, Zookeeper
- Have good experience in extracting and generating statistical analysis using Business Intelligence tool
- Facilitation of meetings following Scrum processes such as Sprint Planning, Backlog, Sprint Retrospective, Requirements Gathering and providing planning and documentation for project; ensuring project is on track with stakeholder wishes
- Experience in writing SQL queries, Stored Procedures, Triggers, Cursors and Packages
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job
- Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts and experience in working with
- Worked with Data Lakes and Big Data ecosystems (Hadoop, Spark, Hortonworks, Cloudera)
- Used Apache Hadoop for working with Big Data to analyze large data sets efficiently
- Hands on experience in working with Ecosystems like Hive, Pig, Sqoop, MapReduce, Flume, Oozie
- Strong knowledge of Pig and Hive's analytical functions, extending Hive and Pig core functionality by writing custom UDFs
- Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions
- Hands - on experience developing Teradata PL/SQL Procedures and Functions and SQL tuning of large databases
- Track record of results as a project manager in an Agile methodology using data-driven analytics
- Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop
- Experience in handling XML files and related technologies
- Performed the performance and tuning at source, Target and Data Stage job levels using Indexes, Hints and Partitioning in DB2, ORACLE and DataStage
- Expert with BI tools like Tableau and PowerBI, data interpretation, modeling, data analysis, and reporting with the ability to assist in directing planning based on insights.
TECHNICAL SKILLS
- Apache Ant, Apache Flume, Apache Hadoop, Apache YARN, Apache Hive, Apache Kafka, Apache MAVEN, Apache Oozie, Apache Pig, Apache Spark, Apache Tez, Apache Zookeeper, Cloudera Impala, HDFS
- PIG/Pig Latin, HiveQL, MapReduce, XML, FTPPython, UNIX, Shell scripting, LINUX
- Unix/Linux, Windows 10, Ubuntu, Apple OS
- Parquet, Avro & JSON, ORC, text, csv
- Cloudera, Hortonworks, MapR, AWS, Elastic, Elastic Cloud, Elasticsearch, Cloudera CDH 4/5, Hortonworks HDP 2.3/2.4, Amazon Web Services (AWS)
- Apache Spark, Spark Streaming, Storm
- Pentaho, QlikView, Tableau
- Apache Spark, Spark Streaming, Storm
- Microsoft SQL Server Database Administration (2005, 2008R2, 2012)
- Database & Data Structures, Apache Cassandra, Amazon Redshift, DynamoDB, Apache Hbase, Apache Hive, MongoDBSOFTWARE
- Microsoft Project, Primavera P6, VMWare, Microsoft Word, Excel, Outlook, Power Point; Technical Documentation Skills
PROFESSIONAL EXPERIENCE
HADOOP DATA ENGINEER
Confidential
Responsibilities:
- Created a pipeline to gather data using PySpark, Kafka and HBase
- Sent requests to source REST Based API from a Scala script via Kafka producer
- Utilized a cluster of multiple Kafka brokers to handle replication needs and allow for fault tolerance
- Received the JSON response in Kafka consumer python file, formatted the response into a data frame using a schema containing, country code, artist name, number of plays and genre to parse the JSON
- Established a connection between the Hive and Spark for the transfer of the newly populated data frame
- Stored the data pulled from the API into HBase on Hortonworks Sandbox
- Utilized SQL to query the data to discover music release trends from week to week
- Involved in analyzing system failures, identifying root causes and recommended course of actions
- Configured Zookeeper to coordinate the servers in clusters to maintain the data consistency and to monitor services
- Designed Hive queries to perform data analysis, data transfer and table design
- Creating Hive tables, loading with data and writing hive queries to process the data.
HADOOP DATA ENGINEER
Confidential
Responsibilities:
- Created a pipeline to gather new music releases of a country for a given week using PySpark, Kafka and HBase
- Utilized a cluster of three Kafka brokers to handle replication needs and allow for fault tolerance
- Sent requests to Confidential REST Based API from a python script via Kafka producer
- Received the JSON response in Kafka consumer python file, formatted the response into a data frame using a schema containing, country code, artist name, number of plays and genre to parse the JSON
- Established a connection between the HBase and Spark for the transfer of the newly populated data frame
- Stored the data pulled from the API into HBase on Hortonworks Sandbox
- Utilized SQL to query the data to discover music release trends from week to week
- Assist in Install and configuration of Hive, Pig, Sqoop, Flume, Oozie and HBase on the Hadoop cluster with latest patches
HADOOP CLOUD DATA ENGINEER
Confidential
Responsibilities:
- Spark clusters exclusively from the AWS Management Console.
- Created custom test, design and production Spark clusters
- Used Spark DataFrame API over Cloudera platform to perform analytics on Hive data.
- Made and oversaw cloud VMs with AWS EC2 command line clients and AWS administration reassure.
- Used Ansible Python Script to generate inventory and push the deployment to AWS Instances.
- Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
- Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
- Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3) AWS Redshift
- Implemented AWS Lambda functions to run scripts in response to events in Amazon Dynamo DB table or S3.
- Populating database tables via AWS Kinesis Firehose and AWS Redshift.
- Automated the installation of ELK agent (file beat) with Ansible playbook. Developed KAFKA Queue System to Collect Log data without Data Loss and Publish to various Sources.
- AWS Cloud Formation templates used for Terraform with existing plugins.
- AWS IAM was used for creating new users and groups.