HADOOP DATA ENGINEER Resume

SUMMARY

10 years total experience I.T. in Data Systems with the last 7 Years’ Experience in Big Data Architecture and Engineering.
Experienced team lead providing mentoring to engineers, and liaison for team with stakeholders, business units, data scientists/analysts and making sure all teams collaborate smoothly
Used to working in a production environment, managing migrations, installations, and development Experience in large scale distributed systems with extensive experience as Hadoop Developer and Big Data Analyst
Primary technical skills in HDFS, YARN, Pig, Hive, Sqoop, HBase, Flume, Oozie, Zookeeper
Have good experience in extracting and generating statistical analysis using Business Intelligence tool
Facilitation of meetings following Scrum processes such as Sprint Planning, Backlog, Sprint Retrospective, Requirements Gathering and providing planning and documentation for project; ensuring project is on track with stakeholder wishes
Experience in writing SQL queries, Stored Procedures, Triggers, Cursors and Packages
In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job
Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts and experience in working with
Worked with Data Lakes and Big Data ecosystems (Hadoop, Spark, Hortonworks, Cloudera)
Used Apache Hadoop for working with Big Data to analyze large data sets efficiently
Hands on experience in working with Ecosystems like Hive, Pig, Sqoop, MapReduce, Flume, Oozie
Strong knowledge of Pig and Hive's analytical functions, extending Hive and Pig core functionality by writing custom UDFs
Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions
Hands - on experience developing Teradata PL/SQL Procedures and Functions and SQL tuning of large databases
Track record of results as a project manager in an Agile methodology using data-driven analytics
Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop
Experience in handling XML files and related technologies
Performed the performance and tuning at source, Target and Data Stage job levels using Indexes, Hints and Partitioning in DB2, ORACLE and DataStage
Expert with BI tools like Tableau and PowerBI, data interpretation, modeling, data analysis, and reporting with the ability to assist in directing planning based on insights.

TECHNICAL SKILLS

Apache Ant, Apache Flume, Apache Hadoop, Apache YARN, Apache Hive, Apache Kafka, Apache MAVEN, Apache Oozie, Apache Pig, Apache Spark, Apache Tez, Apache Zookeeper, Cloudera Impala, HDFS
PIG/Pig Latin, HiveQL, MapReduce, XML, FTPPython, UNIX, Shell scripting, LINUX
Unix/Linux, Windows 10, Ubuntu, Apple OS
Parquet, Avro & JSON, ORC, text, csv
Cloudera, Hortonworks, MapR, AWS, Elastic, Elastic Cloud, Elasticsearch, Cloudera CDH 4/5, Hortonworks HDP 2.3/2.4, Amazon Web Services (AWS)
Apache Spark, Spark Streaming, Storm
Pentaho, QlikView, Tableau
Apache Spark, Spark Streaming, Storm
Microsoft SQL Server Database Administration (2005, 2008R2, 2012)
Database & Data Structures, Apache Cassandra, Amazon Redshift, DynamoDB, Apache Hbase, Apache Hive, MongoDBSOFTWARE
Microsoft Project, Primavera P6, VMWare, Microsoft Word, Excel, Outlook, Power Point; Technical Documentation Skills

PROFESSIONAL EXPERIENCE

HADOOP DATA ENGINEER

Confidential

Responsibilities:

Created a pipeline to gather data using PySpark, Kafka and HBase
Sent requests to source REST Based API from a Scala script via Kafka producer
Utilized a cluster of multiple Kafka brokers to handle replication needs and allow for fault tolerance
Received the JSON response in Kafka consumer python file, formatted the response into a data frame using a schema containing, country code, artist name, number of plays and genre to parse the JSON
Established a connection between the Hive and Spark for the transfer of the newly populated data frame
Stored the data pulled from the API into HBase on Hortonworks Sandbox
Utilized SQL to query the data to discover music release trends from week to week
Involved in analyzing system failures, identifying root causes and recommended course of actions
Configured Zookeeper to coordinate the servers in clusters to maintain the data consistency and to monitor services
Designed Hive queries to perform data analysis, data transfer and table design
Creating Hive tables, loading with data and writing hive queries to process the data.

HADOOP DATA ENGINEER

Confidential

Responsibilities:

Created a pipeline to gather new music releases of a country for a given week using PySpark, Kafka and HBase
Utilized a cluster of three Kafka brokers to handle replication needs and allow for fault tolerance
Sent requests to Confidential REST Based API from a python script via Kafka producer
Received the JSON response in Kafka consumer python file, formatted the response into a data frame using a schema containing, country code, artist name, number of plays and genre to parse the JSON
Established a connection between the HBase and Spark for the transfer of the newly populated data frame
Stored the data pulled from the API into HBase on Hortonworks Sandbox
Utilized SQL to query the data to discover music release trends from week to week
Assist in Install and configuration of Hive, Pig, Sqoop, Flume, Oozie and HBase on the Hadoop cluster with latest patches

HADOOP CLOUD DATA ENGINEER

Confidential

Responsibilities:

Spark clusters exclusively from the AWS Management Console.
Created custom test, design and production Spark clusters
Used Spark DataFrame API over Cloudera platform to perform analytics on Hive data.
Made and oversaw cloud VMs with AWS EC2 command line clients and AWS administration reassure.
Used Ansible Python Script to generate inventory and push the deployment to AWS Instances.
Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3) AWS Redshift
Implemented AWS Lambda functions to run scripts in response to events in Amazon Dynamo DB table or S3.
Populating database tables via AWS Kinesis Firehose and AWS Redshift.
Automated the installation of ELK agent (file beat) with Ansible playbook. Developed KAFKA Queue System to Collect Log data without Data Loss and Publish to various Sources.
AWS Cloud Formation templates used for Terraform with existing plugins.
AWS IAM was used for creating new users and groups.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship