We provide IT Staff Augmentation Services!

Hadoop Developer Resume

0/5 (Submit Your Rating)

New York, NY

SUMMARY

  • Overall 7+ years of experience in design and deployment of Enterprise Application Development, Web Applications, Client - Server Technologies, Web Programming using Java and Big data technologies.
  • Possesses 3+ years of comprehensive experience as a Hadoop, BigData & Analytics Developer.
  • Expertise on Hadoop architecture and ecosystem such as HDFS, MapReduce, Pig, Hive, Sqoop Flume and Oozie.
  • Complete Understanding on Hadoop daemons such as Job Tracker, Task Tracker, Name Node, Data Node and MRV1 and YARN architecture.
  • Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster using various distributions such as Apache, Cloudera and AWS.
  • Experience in Installation and Configuring Hadoop Stack elements MapReduce, HDFS, Hive, PigSqoop, Flume, Oozie and Zookeeper.
  • Experience in data process and analysis using Map Reduce, HiveQL, and Pig Latin.
  • Extensive experience in Writing User Defined Functions (UDFs) in Hive and Pig.
  • Worked on ApacheSqoop to perform importing and exporting data from HDFS to RDBMS/NoSQL DBs and vice-versa.
  • Worked with NoSQL databases such as HBase, and MongoDB.
  • Exposure to search, cache, and analytics data solutions such asSolr, Cassandra and Hive.
  • Experience in job workflow scheduling and Job Designer with the help of Oozie.
  • Good knowledge on Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data and Machine Learning Concepts.
  • Worked extensively over semi-structured data (fixed length & delimited files), for data sanitation, report generation and standardization.
  • Experienced in monitoring Hadoop cluster using Cloudera Manager and Web UI.
  • Developed core modules in large cross-platform applications using JAVA, J2EE, Hibernate, JAX-WS Web Services, JMS and EJB.
  • Extensive Experience working on web technologies like HTML, CSS, XML, JSON, JQuery
  • Experienced with build tools Maven, ANT and continuous integrations like Jenkins.
  • Extensive experience in documenting requirements, functional specifications and technical specifications.
  • Extensive experience with SQL, PL/SQL and database concepts.
  • Experience working on Version control tools like SVN and Git revision control systems such as GitHub and JIRA to track issues and crucible for code reviews.
  • Strong Database background with Oracle, PL/SQL, Stored Procedures, trigger, SQL Server, MySQL, and DB2.
  • Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
  • Good Team Player, Strong Interpersonal, Organizational and Communication skills combined with Self-Motivation, Initiative and Project Management Attributes.
  • Holds strong ability to handle multiple priorities and work load and also has ability to understand and adapt to new technologies and environments faster.

TECHNICAL SKILLS

Hadoop Core Services: HDFS, Map Reduce, Spark, YARN.

Hadoop Distribution: Horton works, Cloudera, Apache.

NO SQL Databases: HBase, Cassandra.

Hadoop Data Services: Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka (beginner).

Hadoop Operational Services: Zookeeper, Oozie.

Monitoring Tools: Ganglia, Cloudera Manager.

Cloud Computing Tools: Amazon AWS.

Languages: C, Java, Scala, Python, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Shell Scripting.

Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB.

Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.

Databases: Oracle, MySQL, Postgress, Teradata.

Operating Systems: UNIX, Windows, LINUX.

Build Tools: Jenkins, Maven, ANT.

Development Tools: Microsoft SQL Studio, Toad, Eclipse, NetBeans.

Development Methodologies: Agile/Scrum, Waterfall.

PROFESSIONAL EXPERIENCE

Confidential, Hartford, CT

Sr. Hadoop developer

Responsibilities:

  • Installed, configured, and maintained Apache Hadoop clusters for application development and major components of Hadoop Ecosystem: Hive, Pig, HBase, Sqoop, Flume, Oozie and Zookeeper.
  • Implemented six nodes CDH4 Hadoop Cluster on Hortonwoks.
  • Developed simple and complex Map Reduce programs in Java for Data Analysis on different data formats.
  • Developed Map Reduce programs that filter bad and un-necessary records and find out unique records based on different criteria.
  • Developed Secondary sorting implementation to get sorted values at reduce side to improve Map Reduce performance.
  • Implemented custom Data Types, Input Format, Record Reader, Output Format, Record Writer for Map Reduce computationsto handle custom business requirements.
  • Implemented Map Reduce programs to classified data organizations into different classifieds based on different type of records.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
  • Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
  • Responsible for performing extensive data validation using Hive.
  • Worked with SQOOP import and export functionalities to handle large data set transfer between Oracle database and HDFS.
  • Worked intuning Hive and Pig scriptsto improve performance.
  • Involved in submitting and tracking Map Reduce jobs using JobTracker.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time and data availability.
  • Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources
  • Involved in loading the created HFiles into HBase for faster access of large customer base without taking Performance hit.
  • Involved in writing APIs to ReadHBasetables, cleanse data and write to anotherHBasetable Implemented Hive Generic UDF's to implement business logic.
  • Coordinated with end users for designing and implementation of analytics solutions for User Based Recommendations using R as per project proposals.
  • Worked on research team that developed Scala, a programming language with full Java interoperability and a strong type system.
  • Used Pentaho Data Integration Designer to create ETL transformations and also created dashboards in Pentaho using Pentaho Dashboard Designer.
  • Modified shell scripts that were built to call custom procedures and report errors.
  • Modified custom PL/SQL packages to clean out redundant code and replace existing code with more efficient joins and procedure calls.
  • Developed ETL transformations that sourced from a variety of Heterogeneous sources including Microsoft Access, Text files, CSV files
  • Improved stability and performance of the Scala plug-in for Eclipse, using product feedback from customers and internal users.
  • Redesigned and implemented Scala REPL (read-evaluate-print-loop) to tightly integrate with other IDE features in Eclipse.
  • Assisted monitoring Hadoop cluster using Ganglia.
  • Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
  • In Spark, individual execution tasks are expressed as a single, parallelized program flow.
  • Spark is Replacing MapReduce as the open started with help from Cloudera’s Apache committers, ecosystem communities are complementing MapReduce with spark as their execution engine/making spark the default: Hive Pig Crunch Solr.
  • Implemented test scripts to support test driven development and continuous integration.
  • Junit framework was used to perform unit and integration testing.
  • Configured build scripts for multi module projects with Maven and Jenkins CI.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, CDH4, Map Reduce, HDFS, Pig, Hive, Impala, Oozie, Java, Kafka, spark, Flume, Storm, Knox, Linux, Pentaho, Scala, Maven, Java Scripting, Oracle 11g/10g, SVN, Ganglia.

Confidential, New York, NY

Hadoop developer

Responsibilities:

  • Installed and Setup Hadoop CDH clusters for development and production environment.
  • Installed and configured Hive, Pig, Sqoop, Flume, Cloudera manager and Oozie on the Hadoop cluster.
  • Planning for production cluster hardware and software installation on production cluster and communicating with multiple teams to get it done.
  • It also means that OpenStack has the benefit of thousands of developers all over the world working in tandem to develop the strongest, most robust, and most secure product that they can.
  • Monitored multiple Hadoop clusters environments using Hortonworks. Monitored workload, job performance and collected metrics for Hadoop cluster when required.
  • Installed Hadoop patches, updates and version upgrades when required
  • Installed and configured Cloudera Manager, Hive, Pig, Sqoop and Oozie on the CDH4 cluster.
  • Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Namenode utilizing zookeeper services.
  • Performed an upgrade in development environment from CDH 4.2 to CDH 4.6.
  • Worked with big data developers, designers and scientists in troubleshooting map reduce, hive jobs and tuned them to give high performance.
  • Automated end to end workflow from Data preparation to presentation layer for Artist Dashboard project using Shell Scripting.
  • Provide input into Product Management to influence feature requirements for compute, and networking in VMware cloud offering.
  • Developed Map reduce program which were used to extract and transform the data sets and result dataset were loaded to Cassandra.
  • Orchestrated Sqoop scripts, pig scripts, hive queries using Oozie workflows and sub-workflows
  • Conducting RCA to find out data issues and resolve production problems.
  • Involved in loading the created files into MongoDB for faster access of large customer base without taking performance hit.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Performed data analytics in Hive and then exported this metrics back to Oracle Database using Sqoop.
  • Involved in Minor and Major Release work activities.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Collaborating with business users/product owners/developers to contribute to the analysis of functional requirements.

Environment: Cloudera Hadoop, MapReduce, HDFS, Hortonworks, Cloudera Manager, Hive, Pig, Sqoop, Oozie, Flume, Linux, Zookeeper, LDAP.

Confidential, Columbus, OH

Hadoop developer

Responsibilities:

  • Installed, configured, and maintained Apache Hadoop clusters for application development and major components of Hadoop Ecosystem: Hive, Pig, HBase, Sqoop, Flume, Oozie and Zookeeper.
  • Implemented six nodes CDH4 Hadoop Cluster on CentOS.
  • Importing and exporting data into HDFS and Hive from different RDBMS using Sqoop.
  • Experienced in defining job flows to run multiple Map Reduce and Pig jobs using Oozie.
  • Importing log files using Flume into HDFS and load into Hive tables to query data.
  • Monitoring the runningMap Reduceprograms on the cluster.
  • Responsible for loading data from UNIX file systems to HDFS.
  • Used HBase-Hive integration, written multiple Hive UDFs for complex queries.
  • Involved in writing APIs to ReadHBasetables, cleanse data and write to anotherHBasetable.
  • Created multiple Hive tables, implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access.
  • Written multiple Map Reduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
  • Experienced in running batch processes using Pig Scripts and developed Pig UDFs for data manipulation according to Business Requirements.
  • Experienced in writing programs using HBase Client API.
  • Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
  • Experienced in design, development, tuning and maintenance of NoSQL database.
  • Written Map Reduce program in Python with the Hadoop streaming API.
  • Developed unit test cases for Hadoop Map Reduce jobs with MRUnit.
  • Excellent experience in ETL analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of database.
  • Continuously monitored and managed the Hadoop cluster using Cloudera manager and Web UI.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Used Maven as the build tool and SVN for code management.
  • Worked on writing RESTful web services for the application.
  • Implemented testing scripts to support test driven development and continuous integration.

Environment: Hadoop, Map Reduce, HDFS, HBase, Hive, Impala,Pig, Java, SQL, Ganglia, Scoop, Flume, Oozie, Unix, Java, Java Script, Maven, Eclipse.

Confidential, Spring field, IL

Java / Hadoop Developer

Responsibilities:

  • Imported Data from Different Relational Data Sources like RDBMS, Teradata to HDFS using Sqoop.
  • Worked on writing transformer/mapping Map-Reduce pipelines using Apache Crunch and Java.
  • Imported Bulk Data into Cassandra file system Using Thrift API.
  • Involved in creatingHive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Perform analytics on Time Series Data exists in Cassandra using Java API
  • Designed and implemented Incremental Imports into Hive tables.
  • Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
  • Involvedin collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Experienced in managing and reviewing theHadooplog files.
  • Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Worked with Avro Data Serialization system to work with JSON data formats.
  • Worked on different file formats like Sequence files, XML files and Map files usingMap ReducePrograms.
  • Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Developed scripts and automated data management from end to end and sync up between all the clusters.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing.

Environment: Hadoop, HDFS, Horton works (HDP 2.1), Map Reduce, Hive, Oozie, Sqoop, Pig, MySQL, Java, Rest API, Maven, MRUnit, Junit.

Confidential, Jacksonville FL

Sr. Java Developer

Responsibilities:

  • Designed, developed, maintained, tested, and troubleshoot Java and PL/SQL programs in support of Payroll employees.
  • Developed documentation for new and existing programs, designs specific enhancements to application.
  • Implemented web layer using JSF and Ice faces.
  • Implemented business layer using Spring MVC.
  • Implemented Getting Reports based on start date using HQL.
  • Implemented Session Management using Session Factory in Hibernate.
  • Developed the DO’s and DAO’s using hibernate.
  • Implement SOAP web service to validate zip code using Apache Axis.
  • Wrote complex queries, PL/SQL Stored Procedures, Functions and Packages to implement Business Rules.
  • Wrote PL/SQL program to send EMAIL to a group from backend.
  • Developer scripts to be triggered monthly to give current monthly analysis.
  • Scheduled Jobs to be triggered on a specific day and time.
  • Modified SQL statements to increase the overall performance as a part of basic performance tuning and exception handling.
  • Used Cursors, Arrays, Tables, Bulk collect concepts.
  • Extensively used log4j for logging the log files.
  • Performed UNIT testing in all the environments.
  • UsedSubversionas the version control system

Environment: Java (JDK1.5), J2EE, Eclipse, JSP, JavaScript, JSTL, Ajax, GWT, Log4j, CSS, XML, Spring, EJB, MDB, Hibernate, Web Logic, REST, Rational Rose, Junit, Maven, JIRA, SVN.

Confidential

Java/J2EE developer

Responsibilities:

  • Involved in all the phases of the life cycle of the project from requirements gathering to quality assurance testing.
  • Developed Class diagrams, Sequence diagrams using Rational Rose.
  • Responsible in developing Rich Web Interface modules with Struts tags,JSP, JSTL, CSS, JavaScript, Ajax, GWT.
  • Developed presentation layer using Struts framework, and performed validations using Struts Validator plugin.
  • Created SQL script for the Oracle database
  • Implemented the Business logic using Java Spring Transaction Spring AOP.
  • Implemented persistence layer using Spring JDBC to store and update data in database.
  • Produced web service using WSDL/SOAP standard.
  • Implemented J2EE design patterns like Singleton Pattern with Factory Pattern.
  • Extensively involved in the creation of the Session Beans and MDB, using EJB 3.0.
  • Used Hibernate framework for Persistence layer.
  • Extensively involved in writing Stored Procedures for data retrieval and data storage and updates in Oracle database using Hibernate.
  • Deployed and built the application using Maven.
  • Performed testing using JUnit.
  • Used JIRA to track bugs.
  • Extensively used Log4j for logging throughout the application.
  • Produced a Web service using REST with Jersey implementation for providing customer information.
  • Used SVN for source code versioning and code repository.

Environment: Java (JDK1.5), J2EE, Eclipse, JSP, JavaScript, JSTL, Ajax, GWT, Log4j, CSS, XML, Spring, EJB, MDB, Hibernate, Web Logic, REST, Rational Rose, Junit, Maven, JIRA, SVN.

We'd love your feedback!