We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Richardson, TX

SUMMARY:

  • 8 years of IT experience in Software Development, Having 4+ years of experience in Big Data Hadoop and NoSQL technologies in various domains like Automobile, Finance, Insurance, Health care and telecom.
  • 4 years of experience on Hadoop working environment includes Map Reduce, HDFS, HBase, Zookeeper, Oozie, Hive, Sqoop, Pig, Yarn, Cassandra, Kafka, Spark and Flume.
  • Solid understanding of Hadoop Distributed File System.
  • Good experience with MapReduce (MR), Hive, Pig, HBase, Sqoop, Oozie, Flume, Spark,Zookeeper for data extraction, processing, storage and analysis.
  • In - depth understanding on how MapReduce works and Hadoop infrastructure.
  • In depth understanding of Hadoop Architecture and its various components such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Map Reduce concepts.
  • Experience in developing custom MapReduce Programs in Java using Apache Hadoop for analyzing Big Data .
  • Extensively worked on Hive for ETL Transformations and optimized Hive Queries.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems tHDFS and vice-versa.
  • Successfully loaded files tHive and HDFS from MongoDB, Cassandra, HBase.
  • Extending HIVE and PIG core functionality by using custom User Defined Function s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
  • Experience in analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java.
  • Developed Pig Latin scripts for data cleansing and Transformation.
  • Used Flume tchannel data from different sources tHDFS.
  • Job workflow scheduling and monitoring using tools like Oozie.
  • Created HBase tables tload large sets of structured, semi-structured and unstructured data coming from Unix and NoSQL.
  • Good experience in Cloudera, Hortonworks & Apache Hadoop distributions.
  • Worked with relational database systems (RDBMS) such as MySQL, MSSQL, Oracle and NoSQL database systems like HBase and Cassandra.
  • Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
  • Good Knowledge on HadoopCluster architecture and monitoring the cluster.
  • Used Shell Scripting tmove log files intHDFS.
  • Good understanding in processing of real-time data using Spark.
  • Import the data from different sources like HDFS/HBase int Spark RDD.
  • Experienced with different file formats like CSV, Text files, Sequence files, XML, JSON and Avrfiles.
  • Good knowledge on Data Modelling and Data Mining tmodel the data as per business requirements.
  • Involved in unit testing of Map Reduce programs using Apache MRunit.
  • Good knowledge on python scripting, bash Scripting languages.
  • Developed Simple tcomplex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.
  • Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data.
  • Real streaming the data using Spark with Kafka and store the stream data t HDFS using Scala
  • Involved in creating database objects like tables, views, procedures, triggers and functions using T-SQL tprovide definition, structure and tmaintain data efficiently.
  • Expert in Data Visualization development using Tableau tcreate complex and innovative dashboards.
  • Generated ETL reports using Tableau and created statistics dashboards for Analytics.
  • Reported the bugs by classifying them and have played major role in carrying out different types of tests viz. Smoke, Functional, Integration, System, Data Comparison and Regression testing.
  • Experience in creating Master Test Plan, Test Cases, and Test Result Reports, Requirements Traceability Matrix and creating Status Reports and submitting tthe Project management.
  • Strong hands on experience in MVC frameworks and Spring MVC.
  • Good in Designing and developing the Data Access Layer modules with the help of Hibernate Framework for the new functionalities.
  • Extensively experience in working on IDEs like Eclipse, Net Beans and Edit Plus.
  • Working knowledge of Agile and waterfall development models.
  • Working experience in all SDLC Phases.
  • Extensively used Java and J2EE technologies like Core Java, Java Beans, Servlet, JSP, spring, Hibernate, JDBC, JSON Object, and Design Patterns.
  • Experienced in Application Development using Java, J2EE, JSP, Servlets, RDBMS, Tag Libraries, JDBC, Hibernate, XML and Linux shell scripting.
  • Worked with different software version control, Jira, bug tracking and code review systems like CVS, Clear Case.

TECHNICAL SKILLS:

Big data/Hadoop Ecosystem: HDFS, Map Reduce, HIVE, PIG, HBase, Sqoop, Flume, Oozie, Storm and Avro

Java / J2EE Technologies: Core Java, Servlets, JSP, JDBC, XML, REST, SOAP, WSDL

Programming Languages: C, C++, Java, Scala, SQL, PL/SQL, Linux shell scripts.

NoSQL Databases: MongoDB, Cassandra, HBase

Database: Oracle 11g/10g, DB2, MS-SQL Server, MySQL, Teradata.

Web Technologies: HTML, XML, JDBC, JSP, JavaScript, AJAX, SOAP

Frameworks: MVC, Struts 2/1, Hibernate 3, Spring 3/2.5/2.

Tools: Used: Eclipse, IntelliJ, GIT, Putty, Winscp

Operating System: Ubuntu (Linux), Win 95/98/2000/XP, Mac OS, RedHat

ETL Tools: Informatica, pentaho.

Testing: Hadoop Testing, Hive Testing, Quality Center (QC)

Monitoring and Reporting tools: Ganglia, Nagios, Custom Shell scripts.

PROFESSIONAL EXPERIENCE:

Confidential, Richardson, TX

Hadoop Developer

Responsibilities:

  • Imported data from different relational data sources like RDBMS, Teradata tHDFS using Sqoop.
  • Imported bulk data intHBase Using Map Reduce programs.
  • Perform analytics on Time Series Data exists in HBase using HBase API.
  • Designed and implemented Incremental Imports intHive tables.
  • Used Rest ApI tAccess HBase data tperform analytics.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala.
  • Experienced with batch processing of data sources using Apache Spark, Elastic search.
  • Worked in Loading and transforming large sets of structured, semi structured and unstructured data
  • Worked with cloud services like Amazon web services (AWS)
  • Involved in collecting, aggregating and moving data from servers tHDFS using Apache Flume.
  • Written Hive jobs tparse the logs and structure them in tabular format tfacilitate effective querying on the log data.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
  • Developed java Restful web services tupload data from local tAmazon S3, listing S3 objects and file manipulation operations.
  • Configured a 20-30 node (Amazon EC2 spot instance) Hadoop cluster ttransfer the data from Amazon S3 tHDFS and HDFS tAmazon S3 and alstdirect input and output tthe Hadoop MapReduce framework.
  • Import the data from different sources like HDFS/Hbase intSpark RDD developed a data pipeline using Kafka and Storm tstore data intHDFS. Performed real time analysis on the incoming data.
  • Prepared scripts tensure proper data access, manipulation and reporting functions with R programming languages.
  • Formulated procedures for integration of R programming plans with data sources and delivery systems.
  • Developed a scalable queuing system taccommodate the ever growing message flows across the systems using Amazon Simple Queuing System and Akka actor models.
  • Wrote internal and external API services using Node.js modules
  • Experience with Primavera integration tools such as Primavera Gateway or custom integration experience of Primavera with ERP applications
  • Developed Simple tcomplex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig
  • Experienced in managing and reviewing the Hadoop log files.
  • Successfully ran all Hadoop MapReduce programs on Amazon Elastic MapReduce framework by using Amazon S3 for input and output.
  • Involve in Data Asset Inventory tgather, analyze, and document business requirements, functional requirements and data specifications for Member Retention from sources SQL / Hadoop.
  • Involved in Automation of clickstream data collection and store intHDFS using Flume
  • Worked on solving performance and limit queries tthe workbooks that when it connects tlive database by using a data extract option in Tableau.
  • Designed and developed Dashboards for Analytical purposes using Tableau.
  • Migrated ETL jobs tPig scripts do Transformations, even joins and some pre-aggregations before storing the data ontHDFS.
  • Worked with AvrData Serialization system to work with JSON data formats.
  • Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
  • Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
  • Exported data from HDFS environment intRDBMS using Sqoop for report generation and visualization purpose.
  • Worked on Oozie workflow engine for job scheduling.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Pig Scripts.

Environment: CDH 5.3, Map Reduce, Hive0.14, Spark 1.4.1, Oozie, Sqoop, Pig0.11, Java, Rest API, Maven, MRUnit, Junit, Tableau, Cloudera,Python.

Confidential, San Francisco, CA

Hadoop Developer

Responsibilities:

  • Involved in Automation of clickstream data collection and store intHDFS using Flume.
  • Involved in creating Data Lake by extracting customer's data from various data sources intHDFS.
  • Used Sqoop tload data from Oracle Database intHDFS.
  • Developed MapReduce programs tcleanse the data in HDFS obtained from multiple data sources.
  • Employed Oracle database tcreate and maintain Data Mart
  • Contributed in creating ETL designs and provided end users an easy accession tData marts.
  • Involved in creating Hive tables as per requirement defined with appropriate static and dynamic partitions.
  • Used Hive tanalyze the data in HDFS tidentify issues and behavioral patterns.
  • Involved in production Hadoop cluster setup, administration, maintenance, monitoring and support.
  • Real streaming the data using Spark with Kafka and store the stream data t HDFS using Scala .
  • Developed Simple tcomplex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig
  • Involved in Automation of clickstream data collection and store intHDFS using Flume
  • Experience with Primavera integration tools such as Primavera Gateway or custom integration experience of Primavera with ERP applications
  • Developed a scalable queuing system to accommodate the ever growing message flows across the systems using Amazon Simple Queuing System and Akka actor models.
  • Worked with cloud services like Amazon web services (AWS)
  • Wrote internal and external API services using Node.js modules
  • Logical implementation and interaction with HBase.
  • Cluster coordination services through Zookeeper.
  • Efficiently put and fetched data to/from HBase by writing MapReduce job.
  • Developed MapReduce jobs tautomate transfer of data from/tHBase.
  • Created data queries and reports using Qlik view and Excel. Created Customs queries/reports designed for qualifying verification and information sharing.
  • Assisted with the addition of Hadoop processing tthe IT infrastructure.
  • Used flume tcollect the entire web log from the online ad-servers and push intHDFS.
  • Implemented MapReduce job and execute the MapReduce job tprocess the log data from the ad servers.
  • Extensively used Core Java, Servlets, JSP and XML
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Back-endJava developer for Data Management Platform (DMP) and building RESTful APIs tbuild and let other groups build dashboards.
  • Responsible for building Scalable distributed data solutions using HortonWorks.
  • Integrated Hive server 2 with Tableau using Horton Works Hive ODBC driver, for autgeneration of Hive queries for non-technical business user.
  • Worked closely with architect and clients tdefine and prioritize use cases and develop APIs.
  • Involve in monitoring job performance, capacity planning and workload using Cloudera Manager.

Environment: Hadoop, Pig 0.10, Sqoop, Oozie, MapReduce, HDFS, HBase. Hive 0.10, Core Java, Eclipse,Qlik view, Flume, Cloudera, Horton Works,Oracle 10g, UNIX Shell Scripting, Cassandra.

Confidential, Penfield, NY

Hadoop Developer.

Responsibilities:

  • Worked on Hadoop cluster (CDH 5) with 30 nodes.
  • Worked with highly semi-structured and structured data of 90TB with replication factor 3.
  • Extracted the data from Oracle, MySQL, and SQL server databases intHDFS using Sqoop .
  • Extracted data from weblogs and social media using flume and loaded intHDFS.
  • Created jobs in Sqoop with incremental load and populated Hive tables.
  • Developed software to process, cleanse, and report on vehicle data utilizing various analytics and REST API languages like Java, Scala and Akka (Asynchronous programming Framework)
  • Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
  • Involved in Automation of clickstream data collection and store intHDFS using Flume
  • Wrote internal and external API services using Node.js modules
  • Experience with Primavera integration tools such as Primavera Gateway or custom integration experience of Primavera with ERP applications
  • Worked with cloud services like Amazon web services (AWS)
  • Involved in Developing Assert Tracking project where we use tcollect real-time vehicle location data using IBM streams from JMS queue and processed that data in Vehicle Tracking using ESRI GIS Mapping Software, Scala and Akka Actor Model.
  • Involved in developing web-services using REST, HBase Native API and BigSQL Client tquery data from HBase.
  • Experienced in Developing Hive queries in BigSQL Client for various use cases.
  • Involved in developing few Shell Scripts and automated them using CRON job scheduler
  • Implemented test scripts tsupport test driven development and continuous integration.
  • Responsible tmanage data coming from different sources.
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Analyzed large amounts of data sets tdetermine optimal way taggregate and report on it.

Environment: Hadoop 1x, Hive 0.10, Pig 0.11, Sqoop, HBase, UNIX Shell Scripting, Scala, Akka, IBM InfoSphere BigInsights, IBM InfoSphere Streams, IBM BigSQL, Java

Confidential, Memphis,TN

Hadoop Developer

Responsibilities:

  • Worked with structured and semi structured data of approximately 100TB with replication factor of 3.
  • Involved in complete Implementation lifecycle, specialized in writing custom MapReduce , Pig and Hive programs.
  • Exported the analyzed data tthe relational databases using Sqoop for visualization and tgenerate reports for the BI team.
  • Extensively used Hive/HQL or Hive queries tquery or search for a particular string in Hive tables in HDFS.
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins .
  • Experience in developing customized UDF's in java textend Hive and Pig Latin functionality.
  • Involved in Automation of clickstream data collection and store intHDFS using Flume
  • Experience with Primavera integration tools such as Primavera Gateway or custom integration experience of Primavera with ERP applications
  • Created HBase tables tstore various data formats of data coming from different portfolios.
  • Managing and scheduling Jobs tremove the duplicate log data files in HDFS using Oozie.
  • Used Flume extensively in gathering and moving log data files from Application Servers ta central location in Hadoop Distributed File System (HDFS).
  • Experienced with SOLR for indexing and search.
  • Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases tfind which one of them better suites the current requirements.
  • Used File System check ( FSCK ) tcheck the health of files in HDFS.
  • Developed the UNIX shell scripts for creating the reports from Hive data.

Environment: Java, UNIX, HDFS, Pig, Hive, Spark, Scala, MapReduce, Flume, Sqoop, Kafka, HBase, Cassandra, Cloudera Distribution, Oozie, Ambari, Ganglia, Yarn, Shell scripting

Confidential -Plano, TX

Java/J2EE/Hadoop Developer

Responsibilities:

  • Participated in requirement gathering and converting the requirements inttechnical specifications.
  • Created UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams.
  • Developed the application using Struts Framework that leverages classical Model View Controller ( MVC ) architecture.
  • Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
  • Created Business Logic using Servlets, POJO s and deployed them on Web logic server.
  • Wrote complex SQL queries and stored procedures.
  • Developed the XML Schema and Web services for the data maintenance and structures.
  • Implemented the Web Service client for the login authentication, credit reports and applicant information using Apache Axis 2 Web Service.
  • Responsible to manage data coming from different sources.
  • Developed map reduce algorithms.
  • Got good experience with NOSQL database.
  • Involved in loading data from UNIX file system tHDFS.
  • Installed and configured Hive and also written Hive UDFs.
  • Integrated Hadoop with Solr and implement search algorithms.
  • Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 10g database.
  • Used Hibernate ORM framework with Spring framework for data persistence and transaction management.
  • Used struts validation framework for form level validation.
  • Wrote test cases in JUnit for unit testing of classes.
  • Involved in creating templates and screens in HTML and JavaScript.
  • Involved in integrating Web Services using SOAP.

Environment: Hive 0.7.1, Apache Solr 3.x, HBase-0.90.x/0.20.x, JDK 1.5,, Struts 1.3, WebSphere 6.1, HTML, XML, JavaScript, JUnit 3.8,Oracle 10g, Amazon Web Services.

Confidential, McLean, VA

Java/J2EE Developer

Responsibilities:

  • Responsible for gathering business and functional requirements for the development and support of in-house and vendor developed applications
  • Gathered and analyzed information for developing, supporting, and modifying existing web applications based on prioritized business needs
  • Played key role in design and development of new application using J2EE, Servlets, and Spring technologies/frameworks using Service Oriented Architecture (SOA)
  • Wrote Action classes, Request Processor, Business Delegate, Business Objects, Service classes and JSP pages
  • Played a key role in designing the presentation tier components by customizing the Spring framework components, which includes configuring web modules, request processors, error handling components, etc.
  • Implemented the Web Services functionality in the application tallow external applications taccess data
  • Used Apache Axis as the Web Service framework for creating and deploying Web Service Clients using SOAP and WSDL
  • Worked on Spring to develop different modules tassist the product in handling different requirements
  • Developed validation using Spring's Validation Interface and used Spring Core and MVC develop the applications and access data
  • Implemented Spring Beans using IOC and Transaction management features thandle the transactions and business logic
  • Design and developed different PL/SQL blocks , Stored Procedures in DB2 database
  • Involved in writing DAO layer using Hibernate taccess the database
  • Involved in deploying and testing the application using Websphere Application Server
  • Developed and implemented several test cases using JUnit framework
  • Involved in troubleshoot technical issues, conduct code reviews, and enforce best practices

Environment: Java SE 6, J2EE 6, JSP 2.1, Servlets 2.5, Java Script, IBM Websphere7, DB2, HTML, XML, Spring 3, Hibernate 3, JUnit, Windows 7, Eclipse 3.5

Confidential, Seattle, WA

Java/J2EE Developer

Responsibilities:

  • Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
  • Developed and deployed UIlayerlogics of sites using JSP, XML, JavaScript, HTML/DHTML, and Ajax.
  • CSS and JavaScript were used to build rich internet pages.
  • Agile Scrum Methodology been followed for the development process.
  • Designed different design specifications for application development that includes front-end, back-end using design patterns.
  • Developed proto-type test screens in HTML and JavaScript.
  • Involved in developing JSP for client data presentation and, data validation on the client side with in the forms.
  • Developed the application by using the Spring MVC framework.
  • Collection framework used to transfer objects between the different layers of the application.
  • Developed data mapping tcreate a communication bridge between various application interfaces using XML, and XSL.
  • Spring IOC being used tinject the parameter values for the Dynamic parameters.
  • Developed JUnit testing framework for Unit level testing.
  • Actively involved in code review and bug fixing for improving the performance.
  • Documented application for its functionality and its enhanced features.
  • Created connection through JDBC and used JDBC statements tcall stored procedures.

Environment: Spring MVC, Oracle 11g J2EE, Java, JDBC, Servlets, JSP, XML, Design Patterns, CSS, HTML, JavaScript 1.2, Junit, Apache Tomcat, My SQL Server 2008.

Confidential

Application Developer

Responsibilities:

  • Developed the application under JEE architecture, developed, Designed dynamic and browser compatible user interfaces using JSP, Custom Tags, HTML, CSS, and JavaScript.
  • Deployed & maintained the JSP, Servlets components on Web logic 8.0
  • Developed Application Servers persistence layer using JDBC and SQL.
  • Used JDBC tconnect the web applications tDatabases.
  • Implemented Test First unit testing framework driven using Junit.
  • Developed and utilized J2EE Services and JMS components for messaging communication in Web Logic.
  • Configured development environment using Web logic application server for developers integration testing.

Environment: Java/J2EE, SQL, Oracle 10g, JSP 2.0, EJB, AJAX, Java Script, Web Logic 8.0, HTML, JDBC 3.0, XML, JMS, log4j, Junit, Servlets, MVC

We'd love your feedback!