Data Engineer/Hadoop Developer Resume Atlanta, GA - Hire IT People

SUMMARY

Around 7 years of experience in Hadoop eco - system like HDFS, Map Reduce, Apache Pig, Hive, HBase, Sqoop, Flume, Nifi, YARN and Zookeeper.
Highly Proficient with in depth understanding of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
Worked with MapReduce programs using Apache Hadoop for working with Big Data to analyze large data sets efficiently.
Experience in importing and exporting Tera bytes of data between HDFS and Relational Database Systems using Sqoop.
Proficiency in Spark using Scala for loading data from the local file systems like HDFS, Amazon S3, Relational and NoSQL databases using Spark SQL and importing data into RDD and ingesting data from a range of sources using Spark Streaming.
Expertise in databases such as Oracle, MySQL, SQL Server to manage data, and CRUD operations on data.
Worked with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
Developed analytical components using Scala, Spark and Spark SQL.
Experience in Hadoop Distributions: Cloudera and Hortonworks.
Experience in Ranger, Knox configuration to provide the security for Hadoop services
Experience with Sequence files, AVRO and ORC file formats and compression.
Good knowledge on Python Collections, Python Scripting and Multi-Threading.
Well versed in using software development methodologies like Water Fall, Agile (SCRUM), and Test-Driven Development.
Experience in using code repository tools - Tortoise SVN, GitHub, and Visual Source Safe.
Strong communication and analytical skills and a demonstrated ability to handle multiple tasks as well as work independently or in a team

TECHNICAL SKILLS

Hadoop/Big data Technologies: Hive, Hbase, Sqoop, Pig, MapReduce, YARN, flume, Oozie, Zoo Keeper, Nifi, Kafka

IDE Tools: Eclipse, Intellij, Pycharm

Cloud Technologies: AWS EC2, S3, CloudFormation, Lambda

Databases: SQL-Server, MySQL server

Programming/Scripting Languages: Unix Shell/Bash Scripting, Python.

Platforms: Windows, Linux and Unix

Version Control: Tortoise SVN, GIT

Methodologies: Agile/ Scrum

PROFESSIONAL EXPERIENCE

Confidential, Atlanta, GA

Data Engineer/Hadoop Developer

Responsibilities:

Involved in Analyzing data from different sources like Teradata, MySQL and Sqooping data into Hive using Sqoop.
Involved in identifying customers using unstructured data with Pyspark and Fuzzy wuzzy logics.
Written UDF, UDTF and UDAF in spark to implement business logic on data.
Involved in moving huge data between servers in Hadoop by using compression techniques.
Created a Realtime data pipelines and frameworks with Kafka, Spark streaming and loading data to Hbase.
Worked on Nifi processors for creating data pipelines to copy data from JMS MQ to Kafka topics and processing data in between like JSON to XML conversion.
Created a customized Nifi processors for processing data with business logics.
Involved in development of data frameworks with Python, Java and Scala languages.
Written Shell scripts to initiate jobs with required features and environment.
Monitored Spark Web UI, DAG scheduler and Yarn resource manager UI to optimize queries and performance in spark.
Worked on different file formats like Parquet, Avro files and ORC file formats.
Created a framework using spark streaming and Kafka to process data in Realtime which feeds data to APIs.
Experience in AWS architecture (ELB, EC2, S3, RDS, CloudFormation, Security Groups), DevOps practices, Unix (RedHat), networking, webservices architecture are essential for this hands-on role.

Environment: Hive, Spark, Spark SQL, Spark Streaming, Scala, Druid, Unix Shell Scripting, Oozie, UNIX Shell/Bash Scripting, HBase, Cassandra, Python, YARN, VersionOne, JDBC, AWS EC2, EMR, S3, CloudFormation, Security Groups.

Confidential, Hartford, CT

Data Engineer

Responsibilities:

Involved in creating Hive tables, loading and analyzing data using hive queries.
Optimized existing Hive scripts using many hive optimization techniques.
Built reusable Hive UDF libraries for business which enables users to reuse.
Built and Implemented Apache PIG scripts to load data from and to store data into Hive.
Involved in loading data from LINUX file system to HDFS.
Worked with the Apache Nifi flow to perform the conversion of Raw XML data into JSON, AVRO and for managing the flow of data from source to HDFS
Worked on implementing Flume to import streaming data logs and aggregating the data to HDFS through Flume.
Imported data from various data sources performed transformations using spark and loaded data into hive.
Worked with spark core, Spark Streaming and spark SQL modules of Spark.
Used Scala to write the code for all the use cases in spark
Monitor Hadoop cluster using tools like Ambari and Cloudera Manager.
Explored various modules of Spark and worked with Data Frames, RDD and Spark Context.
Data analysis using Spark with Scala.
Created RDD’s, Data Frames and Datasets.
Create numerous Spark streaming jobs that pull JSON messages from Kafka topics, and parse them using Java code in flight, and land them on our Hadoop platform
Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Data bases.
Worked closely with admin team on Configuring Zookeeper and used Zookeeper to co-ordinate cluster services.
Created views in Tableau Desktop that were published to internal team for review and further data analysis and customization using filters and actions.

Environment: Hive, MapReduce, Ambari, Spark, Knox, Spark SQL, Spark Streaming, Scala, Kafka, Zookeeper, Nifi, Unix Shell Scripting, Oozie, UNIX Shell/Bash Scripting, Python, Tableau, YARN, JIRA, JDBC

Confidential, TX

Hadoop Developer

Responsibilities:

Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
Develop Pig Latin scripts to extract data from the output files to load into HDFS.
Developed workflow in Oozie workflow scheduler to automate and schedule the tasks of loading the data into HDFS and pre-processing with Pig.
Implemented UDFs in java for hive to process the data that can be performed using Hive inbuilt functions.
Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
Developed simple to complex Unix shell/Bash scripting scripts in framework developing process.
Developed complex Talend jobs mappings to load the data from various sources using different components.
Design, develop and implement solutions using Talend Integration Suite.
Worked on implementing Flume to import streaming data logs and aggregating the data to HDFS through Flume.
Used Zeppelin Notebook to execute SQL queries on SQL databases.
Involved in installing AWS EMR framework.
Setup Amazon EC2 multinode Hadoop cluster with PIG, Hive, Sqoop ecosystem tools.
Experience in moving data to Amazon S3, also, performed EMR programs on data stored in S3
Created Parquet Hive tables with Complex Data Types corresponding to the Avro Schema.

Environment: Hive, Spark, Spark SQL, Zeppelin web notebook, Spark Streaming, Scala, Nifi, AWS EC2, AWS EMR, AWS S3, Unix Shell Scripting, No-Sql database Hbase, Control-m, Kafka, YARN, Jenkins, JIRA, JDBC.

Confidential

Java Developer

Responsibilities:

Involved in deployment of full Software Development Life Cycle (SDLC) of the tracking system like Requirement gathering, Conceptual Design, Analysis, Detail design, Development, System Testing and User Acceptance
Worked in Agile Scrum methodology
Involved in writing exception and validation classes using core java
Designed and implemented the user interface using JSP, XSL, DHTML, Servlets, JavaScript, HTML, CSS and AJAX
Developed framework using Java, MySQL and web server technologies
Validated the XML documents with XSD validation and transformed to XHTML using XSLT
Implemented cross cutting concerns as aspects at Service layer using Spring AOP and of DAO objects using Spring-ORM
Spring beans were used for controlling the flow between UI and Hibernate
Services using SOAP, WSDL, UDDI and XML using CXF framework tool/Apache Commons
Worked on database interaction layer for insertions, updating and retrieval operations of data from data base by using queries and writing stored procedures
Wrote Stored Procedures and complicated queries for IBM DB2. Implemented SOA architecture with Web
Used Eclipse IDE for development and JBoss Application Server for deploying the web application
Used Apache Camel for creating routes using Web Service
Used JReport for the generation of reports of the application
Used Web Logic as application server and Log4j for application logging and debugging
Used CVS version controlling tool and project build tool using ANT

Environment: Java, HTML, CSS, JSTL, JavaScript, Servlets, JSP, Hibernate, Struts, Web Services,, Eclipse, JBoss, JSP, JMS, JReport, Scrum, MySQL, IBM DB2, SOAP, WSDL, UDDI, AJAX, XML, XSD, XSLT, Oracle, Linux, JBoss, Log4J, JUnit, ANT, CVS

We provide IT Staff Augmentation Services!

Data Engineer/hadoop Developer Resume

Atlanta, GA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship