Hadoop Developer Resume
Baltimore, MD
PROFESSIONAL SUMMARY:
- Over 7 years of experience in the field of IT including four years of experience in Hadoop ecosystem.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Data Node, Name Node and Map - Reduce concepts.
- Implemented in setting up standards and processes for Hadoopbased application design and implementation.
- Experience in installation, configuration and deployment of Big Data solutions.
- Experience in Hadoop Ecosystem including HDFS, Hive, Pig, Hbase, Oozie, Sqoop and knowledge of Map-Reduce framework.
- Experience working with NoSQL database including MongoDB and Hbase.
- Experience in developing NoSQL database by using CRUD, Sharding, Indexing and Replication.
- Experience in working with Cassandra NoSQL database.
- Experience with ETL working with Hive and Map-Reduce.
- Worked on data warehouse product Amazon Redshift which is a part of the AWS (Amazon Web Services)
- Worked on graph database Neo4j by creating nodes and by creating relationships between the nodes.
- Experience in developing Pig scripts and Hive Query Language.
- Written Hive queries for data analysis and to process the data for visualization.
- Responsible for developing for Pig Latin scripts.
- Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
- Experience in managing and reviewing Hadoop Log files.
- Used Zookeeper to provide coordination services to the cluster.
- Experienced using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Sound knowledge of Business Intelligence and Reporting. Preparation of Dashboards using Tableau.
- Experience in requirement analysis, system design, development and testing of various software applications.
- Hands on experience in application development using Java, RDBMS and Linux Shell Scripting.
- Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile.
- Experiences in all phases of the software development lifecycle: Concept, Design, Development, QA, Rollout and Enhancements
- Ability to work independently to help drive solutions in fast paced/dynamic work environments
- Strong team building, conflict management, time management and meeting management skills.
- Excellent communication skills and leadership skills
TECHNICAL SKILLS:
Big Data Technologies: Hadoop, HDFS, Hive, Pig, Oozie, Sqoop, Map-Reduce, Hbase, MongoDB, Zookeeper
Database Technologies: PL/SQL, NoSQL, MongoDB, Neo4j
Programming Languages: C, C++, JAVA
Web Technologies: HTML, JavaScript, AngularJS
Operating Systems: Windows, Linux
Office Tools: MS Word, MS Excel, MS PowerPoint, MS Project
WORK EXPERIENCE:
Confidential, Baltimore, MD
Hadoop Developer
Responsibilities:
- Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
- Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
- Tested Apache™ Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
- Used Impala to read, write and query the Hadoop data in HDFS or HBase or Cassandra
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Installed Oozie workflow engine to run multiple Hive and Pig jobs
- Used Mahout to understand the machine learning algorithms for an efficient data processing
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
Environment: Hadoop 0.20.2 - PIG, Hive, Apache Sqoop, Oozie, HBase, Zoo keeper, Cloudera manager, 30 Node cluster with Linux-Ubuntu.
Confidential, Durham, NC
Big Data Developer
Responsibilities:
- Migrating the data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
- Proposed an automated system using Shell script to sqoop the job.
- Worked in Agile development approach.
- Created the estimates and defined the sprint stages.
- Developed a strategy for Full load and incremental load using Sqoop.
- Mainly worked on Hive queries to categorize data of different claims.
- Integrated the hive warehouse with HBase
- Written customized Hive UDFs in Java where the functionality is too complex.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
- Presented data and dataflow using Talend for reusability.
Environment: Apache Hadoop, HDFS, Hive, Java, Sqoop, Cloudera CDH4, Oracle, MySQL, Tableau, Talend, Elastic search
Confidential, Memphis, TN
Hadoop Developer
Responsibilities:
- Involved in Installing, Configuring Hadoopecosystem, and Cloudera Manager using CDH3 Distribution.
- Experienced in managing and reviewing Hadoop log files
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data
- Load and transform large sets of structured, semi structured and unstructured data.
- Supported Map Reduce Programs those are running on the cluster
- Importing and Exporting of data from RDBMS to HDFS using Sqoop.
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading the data and writing hive queries which will run internally in map reduce.
- Written Hive queries for data to meet the business requirements.
- Analyzed the data using Pig and written Pig scripts by grouping, joining and sorting the data.
- Hands on experience with NoSQL Database.
- Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharding features.
- Participate in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
- Designed and Developed Dashboards using Tableau.
- Actively participated in weekly meetings with the technical teams to review the code.
Environment: Apache Hadoop, HDFS, Hive, Java, MongoDB, Oracle, MySQL.
Confidential, New Jersey
Lead Consultant
Responsibilities:
- Gathered requirements from product marketing & business analysts.
- In Moved semi-structured and structured data sets using Flume & Sqoop into HDFS
- Installed, setup and maintain Hadoop Cluster using CDH3 version with 3 nodes.
- Used Hive queries run reports to identify dropped calls based on location.
- Worked with BI team for further analytics on Pentaho platform.
Environment: Ubuntu, Hive, CM 4.8.2, HBase, Flume, OLAP, Sqoop, MongoDB
Confidential, Charlotte, NC
Software Engineer
Responsibilities:
- Developed various Java classes and SQL queries to retrieve and manipulate the data.
- Created Use case, Sequence diagrams, functional specifications and User Interface diagrams using Star UML.
- Involved in complete requirement analysis, design, coding and testing phases of the project.
- Analysis of business requirements and gathering the requirements.
- Developed the code which will create XML files and Flat files with the data retrieved from Databases and XML files.
- Implemented Queries using SQL.
- Development of complex SQL queries and stored procedures to process and store the data.
- Involved in unit testing and bug fixing.
Environment: PL/SQL, NoSQL, Mongo DB, UML, XML, J2EE
Confidential
Software Engineer
Responsibilities:
- Prepared user requirements document and functional requirements document for different modules.
- Analyzing the Business Requirements.
- Architecture with JSP as View, Action Class as Controller and combination of EJBs and Java classes as Model.
- Involved in coding Session-beans and Entity-beans to implement the business logic.
- Prepared SQL script for database creation and migrating existing data to the higher version of application.
- Developed different Java Beans and helper classes to support Server Side programs.
- Involved in development of backend code for email notifications to admin users with multi excel sheet using the xml.
- Modified the existing Backend code for different level of enhancements.
- Designing error handling flow and error logging flow.
Environment: PL/SQL, NoSQL, Mongo DB, UML, XML, J2EE