Data Engineer/hadoop Developer Resume
Atlanta, GA
SUMMARY
- Around 7 years of experience in Hadoop eco - system like HDFS, Map Reduce, Apache Pig, Hive, HBase, Sqoop, Flume, Nifi, YARN and Zookeeper.
- Highly Proficient with in depth understanding of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Worked with MapReduce programs using Apache Hadoop for working with Big Data to analyze large data sets efficiently.
- Experience in importing and exporting Tera bytes of data between HDFS and Relational Database Systems using Sqoop.
- Proficiency in Spark using Scala for loading data from the local file systems like HDFS, Amazon S3, Relational and NoSQL databases using Spark SQL and importing data into RDD and ingesting data from a range of sources using Spark Streaming.
- Expertise in databases such as Oracle, MySQL, SQL Server to manage data, and CRUD operations on data.
- Worked with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Developed analytical components using Scala, Spark and Spark SQL.
- Experience in Hadoop Distributions: Cloudera and Hortonworks.
- Experience in Ranger, Knox configuration to provide the security for Hadoop services
- Experience with Sequence files, AVRO and ORC file formats and compression.
- Good knowledge on Python Collections, Python Scripting and Multi-Threading.
- Well versed in using software development methodologies like Water Fall, Agile (SCRUM), and Test-Driven Development.
- Experience in using code repository tools - Tortoise SVN, GitHub, and Visual Source Safe.
- Strong communication and analytical skills and a demonstrated ability to handle multiple tasks as well as work independently or in a team
TECHNICAL SKILLS
Hadoop/Big data Technologies: Hive, Hbase, Sqoop, Pig, MapReduce, YARN, flume, Oozie, Zoo Keeper, Nifi, Kafka
IDE Tools: Eclipse, Intellij, Pycharm
Cloud Technologies: AWS EC2, S3, CloudFormation, Lambda
Databases: SQL-Server, MySQL server
Programming/Scripting Languages: Unix Shell/Bash Scripting, Python.
Platforms: Windows, Linux and Unix
Version Control: Tortoise SVN, GIT
Methodologies: Agile/ Scrum
PROFESSIONAL EXPERIENCE
Confidential, Atlanta, GA
Data Engineer/Hadoop Developer
Responsibilities:
- Involved in Analyzing data from different sources like Teradata, MySQL and Sqooping data into Hive using Sqoop.
- Involved in identifying customers using unstructured data with Pyspark and Fuzzy wuzzy logics.
- Written UDF, UDTF and UDAF in spark to implement business logic on data.
- Involved in moving huge data between servers in Hadoop by using compression techniques.
- Created a Realtime data pipelines and frameworks with Kafka, Spark streaming and loading data to Hbase.
- Worked on Nifi processors for creating data pipelines to copy data from JMS MQ to Kafka topics and processing data in between like JSON to XML conversion.
- Created a customized Nifi processors for processing data with business logics.
- Involved in development of data frameworks with Python, Java and Scala languages.
- Written Shell scripts to initiate jobs with required features and environment.
- Monitored Spark Web UI, DAG scheduler and Yarn resource manager UI to optimize queries and performance in spark.
- Worked on different file formats like Parquet, Avro files and ORC file formats.
- Created a framework using spark streaming and Kafka to process data in Realtime which feeds data to APIs.
- Experience in AWS architecture (ELB, EC2, S3, RDS, CloudFormation, Security Groups), DevOps practices, Unix (RedHat), networking, webservices architecture are essential for this hands-on role.
Environment: Hive, Spark, Spark SQL, Spark Streaming, Scala, Druid, Unix Shell Scripting, Oozie, UNIX Shell/Bash Scripting, HBase, Cassandra, Python, YARN, VersionOne, JDBC, AWS EC2, EMR, S3, CloudFormation, Security Groups.
Confidential, Hartford, CT
Data Engineer
Responsibilities:
- Involved in creating Hive tables, loading and analyzing data using hive queries.
- Optimized existing Hive scripts using many hive optimization techniques.
- Built reusable Hive UDF libraries for business which enables users to reuse.
- Built and Implemented Apache PIG scripts to load data from and to store data into Hive.
- Involved in loading data from LINUX file system to HDFS.
- Worked with the Apache Nifi flow to perform the conversion of Raw XML data into JSON, AVRO and for managing the flow of data from source to HDFS
- Worked on implementing Flume to import streaming data logs and aggregating the data to HDFS through Flume.
- Imported data from various data sources performed transformations using spark and loaded data into hive.
- Worked with spark core, Spark Streaming and spark SQL modules of Spark.
- Used Scala to write the code for all the use cases in spark
- Monitor Hadoop cluster using tools like Ambari and Cloudera Manager.
- Explored various modules of Spark and worked with Data Frames, RDD and Spark Context.
- Data analysis using Spark with Scala.
- Created RDD’s, Data Frames and Datasets.
- Create numerous Spark streaming jobs that pull JSON messages from Kafka topics, and parse them using Java code in flight, and land them on our Hadoop platform
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Data bases.
- Worked closely with admin team on Configuring Zookeeper and used Zookeeper to co-ordinate cluster services.
- Created views in Tableau Desktop that were published to internal team for review and further data analysis and customization using filters and actions.
Environment: Hive, MapReduce, Ambari, Spark, Knox, Spark SQL, Spark Streaming, Scala, Kafka, Zookeeper, Nifi, Unix Shell Scripting, Oozie, UNIX Shell/Bash Scripting, Python, Tableau, YARN, JIRA, JDBC
Confidential, TX
Hadoop Developer
Responsibilities:
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Develop Pig Latin scripts to extract data from the output files to load into HDFS.
- Developed workflow in Oozie workflow scheduler to automate and schedule the tasks of loading the data into HDFS and pre-processing with Pig.
- Implemented UDFs in java for hive to process the data that can be performed using Hive inbuilt functions.
- Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
- Developed simple to complex Unix shell/Bash scripting scripts in framework developing process.
- Developed complex Talend jobs mappings to load the data from various sources using different components.
- Design, develop and implement solutions using Talend Integration Suite.
- Worked on implementing Flume to import streaming data logs and aggregating the data to HDFS through Flume.
- Used Zeppelin Notebook to execute SQL queries on SQL databases.
- Involved in installing AWS EMR framework.
- Setup Amazon EC2 multinode Hadoop cluster with PIG, Hive, Sqoop ecosystem tools.
- Experience in moving data to Amazon S3, also, performed EMR programs on data stored in S3
- Created Parquet Hive tables with Complex Data Types corresponding to the Avro Schema.
Environment: Hive, Spark, Spark SQL, Zeppelin web notebook, Spark Streaming, Scala, Nifi, AWS EC2, AWS EMR, AWS S3, Unix Shell Scripting, No-Sql database Hbase, Control-m, Kafka, YARN, Jenkins, JIRA, JDBC.
Confidential
Java Developer
Responsibilities:
- Involved in deployment of full Software Development Life Cycle (SDLC) of the tracking system like Requirement gathering, Conceptual Design, Analysis, Detail design, Development, System Testing and User Acceptance
- Worked in Agile Scrum methodology
- Involved in writing exception and validation classes using core java
- Designed and implemented the user interface using JSP, XSL, DHTML, Servlets, JavaScript, HTML, CSS and AJAX
- Developed framework using Java, MySQL and web server technologies
- Validated the XML documents with XSD validation and transformed to XHTML using XSLT
- Implemented cross cutting concerns as aspects at Service layer using Spring AOP and of DAO objects using Spring-ORM
- Spring beans were used for controlling the flow between UI and Hibernate
- Services using SOAP, WSDL, UDDI and XML using CXF framework tool/Apache Commons
- Worked on database interaction layer for insertions, updating and retrieval operations of data from data base by using queries and writing stored procedures
- Wrote Stored Procedures and complicated queries for IBM DB2. Implemented SOA architecture with Web
- Used Eclipse IDE for development and JBoss Application Server for deploying the web application
- Used Apache Camel for creating routes using Web Service
- Used JReport for the generation of reports of the application
- Used Web Logic as application server and Log4j for application logging and debugging
- Used CVS version controlling tool and project build tool using ANT
Environment: Java, HTML, CSS, JSTL, JavaScript, Servlets, JSP, Hibernate, Struts, Web Services,, Eclipse, JBoss, JSP, JMS, JReport, Scrum, MySQL, IBM DB2, SOAP, WSDL, UDDI, AJAX, XML, XSD, XSLT, Oracle, Linux, JBoss, Log4J, JUnit, ANT, CVS