Hadoop Developer Resume
Columbus, OH
SUMMARY
- Overall 8+ years of experience in design and deployment of Enterprise Application Development, Web Applications, Client - Server Technologies, Web Programming using Java and Big data technologies
- Possesses 3+ years of rich Hadoop experience in design and development of Big Data applications, which involves Apache Hadoop Map/Reduce, HDFS, Hive, HBase, Pig, Oozie, Scala, Sqoop, kafka, Flume, Tez and Spark.
- Expertise in developing solutions around NOSQL databases like HBase and Cassandra
- Experience with all flavor of Hadoop distributions, including Cloudera, Horton works and MapR
- Excellent understanding of Hadoop architecture Map Reduce MRv1 and Map Reduce MRv2 (YARN)
- Developed multiple Map Reduce programs to process large volumes of semi/unstructured data files using different Map Reduce design patterns
- Good knowledge on Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data and Machine Learning Concepts.
- Strong experience in writing Map Reduce jobs in Java and Pig.
- Experience with various performance optimizations like using distributed cache for small datasets, partition, bucketing in Hive and Map Side joins when writing Map Reduce jobs
- Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
- Experience in writing and testing Map-Reduce pipelines using Apache Crunch.
- Worked extensively over semi-structured data (fixed length & delimited files), for data sanitation, report generation and standardization
- End to end experience in designing and deploying data visualizations using Tableau. Extensive experience in Data Analysis.
- Having hands on experience in writing Map Reduce jobs in Java, Hive, Impala and Pig Latin.
- Excellent hands on experience in analyzing data using Pig Latin, HQL, HBase and Map Reduce programs in Java.
- Developed UDF's in Java as and when necessary to use with PIG and HIVE queries
- Have dealt with Zookeeper an Oozie Operational Services for coordinating the cluster and scheduling workflows.
- Proficient using Big data ingestion tools like Flume and Sqoop
- Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop
- Experience in handling continuous streaming data using Flume and memory channels.
- Good experience in benchmarking Hadoop cluster.
- Good knowledge on data analysis with R.
- Good knowledge on executing Spark SQL queries against data in Hive
- Experienced in monitoring Hadoop cluster using Cloudera Manager and Web UI.
- Designed and deployed big data analytics data services platform (Hadoop, Storm, Kafka, etc.)
- Developed core modules in large cross-platform applications using JAVA, J2EE, Hibernate, JAX-WS Web Services, JMS and EJB.
- Extensive Experience working on web technologies like HTML, CSS, XML, JSON, JQuery
- Experienced with build tools Maven, ANT and continuous integrations like Jenkins.
- Extensive experience in documenting requirements, functional specifications and technical specifications.
- Extensive experience with SQL, PL/SQL and database concepts.
- Experience working on Version control tools like SVN and Git revision control systems such as GitHub and JIRA to track issues and crucible for code reviews
- Strong Database background with Oracle, PL/SQL, Stored Procedures, trigger, SQL Server, MySQL, and DB2.
- Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
- Good Team Player, Strong Interpersonal, Organizational and Communication skills combined with Self-Motivation, Initiative and Project Management Attributes.
- Holds strong ability to handle multiple priorities and work load and also has ability to understand and adapt to new technologies and environments faster.
TECHNICAL SKILLS
Hadoop Core Services: HDFS, Map Reduce, Spark, YARN
Hadoop Distribution: Horton works, Cloudera, Apache
NO SQL Databases: HBase, Cassandra
Hadoop Data Services: Hive, Pig, Impala,Sqoop, Flume, Sqoop, Kafka
Hadoop Operational Services: Zookeeper, Oozie
Monitoring Tools: Ganglia, Cloudera Manager
Cloud Computing Tools: Amazon AWS
Languages: C, Java, Scala, Python, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script,Shell Scripting
Java & J2EE Technologies: Core Java, Servlets,Hibernate, Spring, Struts, JMS, EJB
Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.
Databases: Oracle, MySQL, Postgress, Teradata
Operating Systems: UNIX, Windows, LINUX
Build Tools: Jenkins, Maven, ANT
Development Tools: Microsoft SQL Studio, Toad, Eclipse, NetBeans
Development Methodologies: Agile/Scrum, Waterfall
PROFESSIONAL EXPERIENCE
Confidential, Warsaw, IN
Sr. Hadoop developer
Responsibilities:
- Developed simple and complex Map Reduce programs in Java for Data Analysis on different data formats
- Developed Map Reduce programs that filter bad and un-necessary records and find out unique records based on different criteria.
- Developed Secondary sorting implementation to get sorted values at reduce side to improve map reduce performance.
- Implemented custom Data Types, Input Format, Record Reader, Output Format, Record Writer for Map Reduce computationsto handle custom business requirements.
- Implemented Map Reduce programs to classified data organizations into different classifieds based on different type of records.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
- Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs
- Responsible for performing extensive data validation using Hive
- Worked with SQOOP import and export functionalities to handle large data set transfer between Oracle database and HDFS.
- Worked intuning Hive and Pig scriptsto improve performance
- Involved in submitting and tracking Map Reduce jobs using JobTracker.
- Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time and data availability
- Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources
- Involved in loading the created HFiles into HBase for faster access of large customer base without taking Performance hit.
- Implemented Hive Generic UDF's to implement business logic.
- Coordinated with end users for designing and implementation of analytics solutions for User Based Recommendations using R as per project proposals.
- Worked on research team that developed Scala, a programming language with full Java interoperability and a strong type system.
- Improved stability and performance of the Scala plug-in for Eclipse, using product feedback from customers and internal users.
- Redesigned and implemented Scala REPL (read-evaluate-print-loop) to tightly integrate with other IDE features in Eclipse.
- Assisted monitoring Hadoop cluster using Ganglia.
- Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
- Implemented test scripts to support test driven development and continuous integration.
- Junit framework was used to perform unit and integration testing.
- Configured build scripts for multi module projects with Maven and Jenkins CI.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, CDH4, Map Reduce, HDFS, Pig, Hive, Impala, Oozie, Java, Kafka, Linux, Scala, Maven, Java Scripting, Oracle 11g/10g, SVN, Ganglia
Confidential, Columbus, OH
Hadoop Developer
Responsibilities:
- Installed, configured, and maintained Apache Hadoop clusters for application development and major components of Hadoop Ecosystem: Hive, Pig, HBase, Sqoop, Flume, Oozie and Zookeeper
- Implemented six nodes CDH4 Hadoop Cluster on CentOS
- Importing and exporting data into HDFS and Hive from different RDBMS using Sqoop
- Experienced in defining job flows to run multiple Map Reduce and Pig jobs using Oozie
- Importing log files using Flume into HDFS and load into Hive tables to query data
- Monitoring the runningMap Reduceprograms on the cluster.
- Responsible for loading data from UNIX file systems to HDFS
- Used HBase-Hive integration, written multiple Hive UDFs for complex queries
- Involved in writing APIs to ReadHBasetables, cleanse data and write to anotherHBasetable
- Created multiple Hive tables, implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access
- Written multiple Map Reduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
- Experienced in running batch processes using Pig Scripts and developed Pig UDFs for data manipulation according to Business Requirements
- Experienced in writing programs using HBase Client API
- Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop
- Experienced in design, development, tuning and maintenance of NoSQL database
- Written Map Reduce program in Python with the Hadoop streaming API
- Developed unit test cases for Hadoop Map Reduce jobs with MRUnit
- Excellent experience in ETL analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of database
- Continuously monitored and managed the Hadoop cluster using Cloudera manager and Web UI.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Used Maven as the build tool and SVN for code management.
- Worked on writing RESTful web services for the application.
- Implemented testing scripts to support test driven development and continuous integration.
Environment: Hadoop, Map Reduce, HDFS, HBase, Hive, Impala,Pig, Java, SQL, Ganglia, Scoop, Flume, Oozie, Unix, Java, Java Script, Maven, Eclipse
Confidential, Spring field, IL
Java / Hadoop Developer
Responsibilities:
- Imported Data from Different Relational Data Sources like RDBMS, Teradata to HDFS using Sqoop.
- Worked on writing transformer/mapping Map-Reduce pipelines using Apache Crunch and Java.
- Imported Bulk Data into Cassandra file system Using Thrift API.
- Involved in creatingHive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Perform analytics on Time Series Data exists in Cassandra using Java API
- Designed and implemented Incremental Imports into Hive tables.
- Worked in Loading and transforming large sets of structured, semi structured and unstructured data
- Involvedin collecting, aggregating and moving data from servers to HDFS using Apache Flume
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Experienced in managing and reviewing theHadooplog files.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Implemented the workflows using Apache Oozie framework to automate tasks
- Worked with Avro Data Serialization system to work with JSON data formats.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Pig Scripts.
Environment: Hadoop, HDFS, Horton works (HDP 2.1), Map Reduce, Hive, Oozie, Sqoop, Pig, MySQL, Java, Rest API, Maven, MRUnit, Junit.
Confidential, Jacksonville FL
Sr Java Developer
Responsibilities:
- Designed, developed, maintained, tested, and troubleshoot Java and PL/SQL programs in support of Payroll employees.
- Developed documentation for new and existing programs, designs specific enhancements to application.
- Implemented web layer using JSF and Ice faces.
- Implemented business layer using Spring MVC.
- Implemented Getting Reports based on start date using HQL.
- Implemented Session Management using Session Factory in Hibernate.
- Developed the DO’s and DAO’s using hibernate.
- Implement SOAP web service to validate zip code using Apache Axis.
- Wrote complex queries, PL/SQL Stored Procedures, Functions and Packages to implement Business Rules.
- Wrote PL/SQL program to send EMAIL to a group from backend.
- Developer scripts to be triggered monthly to give current monthly analysis.
- Scheduled Jobs to be triggered on a specific day and time.
- Modified SQL statements to increase the overall performance as a part of basic performance tuning and exception handling.
- Used Cursors, Arrays, Tables, Bulk collect concepts.
- Extensively used log4j for logging the log files
- Performed UNIT testing in all the environments.
- UsedSubversionas the version control system
Confidential
Java/J2EE developer
Responsibilities:
- Involved in all the phases of the life cycle of the project from requirements gathering to quality assurance testing.
- Developed Class diagrams, Sequence diagrams using Rational Rose.
- Responsible in developing Rich Web Interface modules with Struts tags,JSP, JSTL, CSS, JavaScript, Ajax, GWT.
- Developed presentation layer using Struts framework, and performed validations using Struts Validator plugin.
- Created SQL script for the Oracle database
- Implemented the Business logic using Java Spring Transaction Spring AOP.
- Implemented persistence layer using Spring JDBC to store and update data in database.
- Produced web service using WSDL/SOAP standard.
- Implemented J2EE design patterns like Singleton Pattern with Factory Pattern.
- Extensively involved in the creation of the Session Beans and MDB, using EJB 3.0.
- Used Hibernate framework for Persistence layer.
- Extensively involved in writing Stored Procedures for data retrieval and data storage and updates in Oracle database using Hibernate.
- Deployed and built the application using Maven.
- Performed testing using JUnit.
- Used JIRA to track bugs.
- Extensively used Log4j for logging throughout the application.
- Produced a Web service using REST with Jersey implementation for providing customer information.
- Used SVN for source code versioning and code repository.
Environment: Java (JDK1.5), J2EE, Eclipse, JSP, JavaScript, JSTL, Ajax, GWT, Log4j, CSS, XML, Spring, EJB, MDB, Hibernate, Web Logic, REST, Rational Rose, Junit, Maven, JIRA,SVN.