Sr Hadoop Developer Resume
Cary, NC
SUMMARY:
- Over 7+ years of experience in Analysis, Architectural Design, Prototyping, Development, Integration and Testing of application using Java/J2EE Technologies
- Over 3+ years of experience with Hadoop Ecosystem including HDFS, MapReduce, Hive, Pig, Storm, Kafka,YARN, HBase, Oozie, ZooKeeper, Flume and Sqoop based Big Data Platforms
- Expertise in design and implementation of Big Data solutionsin Banking, Retail and E - commerce domains
- Expereienced with NoSQL databases like HBase, Cassandra and MongoDB
- Comprehensive experience in building Web-based applications using J2EE Frame works like Spring, Hibernate, EJB, Struts and JMS
- Excellent ability to use analytical tools to mine data and evaluate the underlying patterns
- Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Managing and Reviewing data backups and log files MapReduce
- Hands on experience in developing MapReduce programs using Apache Hadoop for analyzing the Big Data
- Expertise in optimizing traffic across network using Combiners, joining multiple schema datasets using Joins and organizing data using Partitioners
- Experience in writing Custom Counters for analysing the data and testing using MRUnit framework
- Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml and Avro
- Expertise in composing MapReduce Pipelines with many user-defined functions using Apache CrunchPIG
- Expertise in writing ad-hoc MapReduce programs using Pig Scripts
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
- Implemented business logic by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sourcesHIVE
- Expertise in Hive Query Language(HiveQL), Hive Security and debugging Hive issues
- Responsible for performing extensive data validation using HIVE Dynamic Partitioning and Bucketing
- Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Java into Pig Latin and HQL (Hive QL)
- Worked on different set of tables like External Tables and Managed Tables
- Experiences with working different Hive SerDe's that handle file formats like avro, xml
- Analyzed the data by performing Hive queries and used HIVE UDFs for complex queryingNoSQL
- Expert database engineer; NoSQL and relational data modeling
- Responsible for building scalable distributed data solutions using DatastaxCassandra
- Expertise in HBase Cluster Setup, Configurations, HBase Implementation and HBase Client API
- Worked on importing data into HBase using HBase Shell and HBase Client API
- Expertise in performing large-scale web crawling with Apache Nutch using a Hadoop/HBase clusterJava/J2EE
- Expertise in several J2EE technologies like JDBC, Servlets, JSP,Struts, Spring, Hibernate, JPA, JSF, EJB, JMS, JAX-WS, SOAP, JQuery, AJAX, XML, JSON, HTML5/HTML, XHTML, Maven, and Ant
- Expert knowledge over J2EE Design Patterns like MVC Architecture, Front Controller, Session Facade, Business Delegate and Data Access Object for building J2EE Applications
- Thorough knowledge on JAX-WS to access the external Web Services, get the xml response and convert it back to java objects
- Experience in using Jenkins for Continuous Integration and Sonar jobs for java code quality
- Extensive experience in developing Internet and Intranet related applications using J2EE, Servlets, JSP, Jboss, WebLogic, Tomcat, and Struts Frame WorkSQL, Script & Oracle Database
- Extensive experience with database DB2, Oracle9i/10g/11g (Database Design, and SQL Queries)
- Good experience in SQL, PL/SQL, Perl Scripting, Shell Scripting, Partitioning, Data modeling, OLAP, Logical and Physical Database Design, Backup and Recovery procedures
- Experienced with build tool Maven, Ant and continuous integrations like Jenkins
- Developed Unit test cases using JUnit, Easy Mock and MRUnit testing frameworks
- Experienced in Agile SCRUM, RUP (Rational Unified Process) and TDD (Test Driven Development) software development methodologies
TECHNICAL SKILLS:
Hadoop/Big Data/NoSql Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Avro, Hadoop Streaming, Storm, Kafka, YARN, Crunch, Zookeeper, HBase, Cassandra
Programming Languages: Java (JDK 5/JDK 6), Python, C, SQL, PL/SQL, Shell Script
IDE Tools: Eclipse, Rational Team Concert, NetBeans
Framework: Hibernate, Spring, Struts, JMS, EJB, JUnit, MRUnit, JAXB
Web Technologies: HTML5, CSS3, JavaScript, JQuery, AJAX, Servlets, JSP,JSON, XML, XHTML, Rest Web Services
Application Servers: Jboss, Tomcat, Web Logic, Web Sphere
Databases: Oracle 11g/10g/9i, MySQL, DB2, Derby, MS-SQL Server
Operating Systems: UNIX, Windows, LINUX
Build Tools: Jenkins, Maven, ANT
Reporting Tools: Jasper Reports, iReport
PROFESSIONAL EXPERIENCE:
Confidential, Cary, NC
Sr Hadoop Developer
Responsibilities:
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats
- Developed MapReduce programs that filter bad and un-necessary claim records and find out unique records based on account type
- Processed semi, unstructured data using Map Reduce programs
- Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-ordinator jobs
- Implemented custom DataTypes, InputFormat, RecordReader, OutputFormat, RecordWriter for MapReduce computations
- Successfully migrated Legacy application to Big Data application using Hive/Pig/HBase in Production level
- Transformed date related data into application compatible format by developing apache Pig UDFs
- DevelopedMapReducepipeline for feature extractionand tested the modules using MRUnit
- Optimized MapReduce jobs to use HDFS efficiently by using various compression mechanisms
- Creating Hive tables, loading with data and writing Hive queries which will run internally in MapReduceway
- Responsible for performing extensive data validation using Hive
- Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access
- Worked on different set of tables like External Tables and Managed Tables
- Used Oozie workflow engine to run multiple Hive and Pig jobs
- Involved in installing and configuring Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Involved in designing and developing non-trivial ETL processes within Hadoop using tools likePig, Sqoop, Flume, and Oozie
- Used DML statements to perform different operations on Hive Tables
- Developed Hive queries for creating foundation tables from stage data
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior
- Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources
- Working with Apache Crunch library to write, test and run HADOOP MapReduce pipeline jobs
- Involved in joining and data aggregation using Apache Crunch
- Worked with Sqoop to export analyzed data from HDFS environment into RDBMS for report generation and visualization purpose
- Involved in writing, testing, and running MapReduce pipelines using Apache Crunch
- Queried and analyzed data from DatastaxCassandrafor quick searching, sorting and grouping
- Developed Mapping document for reporting tools
Environment: Apache Hadoop, HDFS, MapReduce, Apache Crunch, Java (jdk1.6), MySQL, DB Visualizer, Linux, Sqoop, Apache Hive, Apache Pig
Confidential, Northbrook, IL
Hadoop Developer
Responsibilities:
- Installed, configured, and maintained Apache Hadoop clusters for application development and major components of Hadoop Ecosystem: Hive, Pig, HBase, Sqoop, Flume, Oozie and Zookeeper
- Implemented six nodes CDH4 Hadoop Cluster on CentOS
- Importing and exporting data into HDFS and Hive from different RDBMS using Sqoop
- Experienced in defining job flows to run multiple MapReduce and Pig jobs using Oozie
- Importing log files using Flume into HDFS and load into Hive tables to query data
- Used HBase-Hive integration, written multiple Hive UDFs for complex queries
- Involved in writing APIs to ReadHBasetables, cleanse data and write to anotherHBasetable
- Created multiple Hive tables, implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access
- Responsible for architecting Hadoop clusters with CDH4 on CentOS, managing with Cloudera Manager
- Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
- Experience working with Apache SOLR for indexing and querying
- Knowledge on Zookeeper internals
- Experienced in running batch processes using Pig Scripts and developed Pig UDFs for data manipulation according to Business Requirements
- Experienced in writing programs using HBase Client API
- Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop
- Experienced in design, development, tuning and maintenance of NoSQL database
- Written MapReduce program in Python with the Hadoop streaming API
- Developed unit test cases for Hadoop MapReduce jobs with MRUnit
- Excellent experience in ETL analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of database
- Experience in using Pentaho Data Integration tool for data integration, OLAP analysis and ETL process
- Experience integrating R with Hadoop with RHadoop for statistical analysis and predictive modelling
Environment: Apache Hadoop1.0, Hive, Pig, HBase, Sqoop, Flume, Java, Linux, MySQL Server5.155, MS SQL Server 2012, SQL, PL/SQL, SQL Server Data Tools, SQL Server Business Intelligence Development Studio (SSAS, SSIS, SSRS), R
Confidential, Woodcliff Lake, NJ
Hadoop Developer
Responsibilities:
- This is an initiative from Citibank to move its disconnected legacy billing & monitoring systems to a consolidated platform
- Consolidating customer data from Lending, Insurance, Trading and Billing systems into warehouse and mart subsequently for business intelligence reporting
- Providing improved revenue capture through leakage elimination, Assessing risk score more accurately in its customer portfolios, having better exposure and to offer each customer better products and advice
- Experience working with JIRA for project management, GIT for source code management, JENKINS for continuous integration and Crucible for code reviews
- Transferring and exporting data into HDFS and Hive, MySQL and DB2 using Sqoop
- Implementing MapReduce programs to analyze large datasets in warehouse for business intelligence purpose
- Used default MapReduce Input and Output Formats
- Developed HQL queries to implement the select, insert, update and operations to the database by creating HQL named queries
- Based on user usage identifying the customer potentiality using MapReduce Programs.
- Using Oozie in your environment to schedule Hadoop jobs and would like to call Sqoop from within your existing workflows
- Automatically Importing data regular basis using sqoop to into the Hive partition by using apache Oozie
- Clustering customers category based on that providing offers using Apache Hive
- Experienced in managing and reviewing Hadoop log files Load and transform large sets of data
- Grouping, Aggregation and Sorting are done by using Pig and Hive which are higher-level abstractions of MapReduce
- Conducted data extraction that may include analyzing, reviewing, modeling based on requirements using higher Level Tools such as Hive and Pig
- Supported MapReduce Programs those are running on the cluster
- Involved in creating Hive tables, loading with data and writing Hive queries
- Created data-models for customer data using theCassandraQuery Language
- Ran many performance tests using theCassandra-stress tool in order to measure and improve the read and write performance of the cluster
- Queried and analyzed data from DatastaxCassandrafor quick searching, sorting and grouping
Environment: Apache Hadoop (Cloudera), Java (jdk1.6), Teradata, Redhat Linux, Sqoop, Hive, DB visualizer, Oozie
Confidential, San Antonio, TX
J2EE Software Developer
Responsibilities:
- Application was developed using the Struts MVC architecture
- Developed action and form classes based on Struts framework to handle the pages
- Developed a web-based reporting for credit monitoring system with HTML5, XHTML, JSTL, custom tags and Tiles using Struts framework
- Developed Servlets and JSPs based on MVC pattern using Struts framework and Spring Framework
- Developed web-based customer management software using Facelets, Icefaces and JSF
- Implemented Ajax Frame works, jQuery tools examples like Auto Completer, Tab Module, and Calendar and Floating windows
- Configured Struts-Config file for form-beans, global forwards, error forwards and action forwards
- Designed and implemented Report Module (using Jasper Report framework)
- Created several JSP’s and populated them with data from database
- Developed Message-Driven beans in collaboration with Java Messaging Service (JMS)
- Developing Web Services using Apache Axis 2 to retrieve data from legacy systems
- Developed Servlets, Action classes, Action Form classes and configured the struts-config.xml file
- Used XML parser APIs such as JAXP and JAXB in the web service's request response data marshalling as well as unmarshalling process
- Developed UI components for email and link sharing of documents and files for a Content Management System using BackBone.js and JQuery
- Planned and implemented various SQL, Stored Procedure, and triggers
- Used Hibernate to access My SQL database and implemented of connection pooling
- Developed JavaScript based components using Ext JS framework like GRID, Tree Panel with client reports customized according to user requirements
- Performed building and deployment of WAR, JAR files on test, stage, and production systems in Apache Tomcat application server
- Used ANT for the build process
Environment: J2EE, Java 1.4.2, Servlets, JSP, JDBC, EJB 3, JMS, JQuery, backbone.js, HTML5, JSTL, Icefaces, XML, Spring, Struts, Hibernate, Web Services, Apache Tomcat Server, JSF, EXT JS, JAXB, Jasper Report, JUnit, SOAP, SOAPUI, XML, JavaScript, UML, Apache Axis 2, ANT, SVN, MySQL
Confidential
Java Developer
Responsibilities:
- Worked on Requirement analysis, gathered all possible business requirements from end users and business Analysts
- Involved in creation of UML diagrams like Class, Activity, and Sequence Diagrams using modelling tools of IBM Rational Rose
- Worked with coreJavacode extensively using interfaces and multi-threading techniques
- Involved in production support and documenting the application to provide and knowledge transfer to the user
- Used Log4j for logging mechanism and developed wrapper classes to configure the logs
- Used JUnit and Test cases for testing the application modules
- Developed and configured theJavabeans using SpringMVC framework
- Developed the application using Rational Team Concert and worked under Agile Environment
- Developed SQL stored procedures and prepared statements for updating and accessing data from database
- Conducted the SQL performance analysis on Oracle 9i database tables and improved the performance by SQL tuning
- Also used C++ to create some libraries used in the application
Environment: C++,Java, JDBC, Servlets, JSP, Struts, Eclipse, Oracle 9i, Apache Tomcat, CVS, JavaScript, Log4J