Hadoop Developer Resume
Richardson, TX
SUMMARY:
- 8 years of IT experience in Software Development, Having 4+ years of experience in Big Data Hadoop and NoSQL technologies in various domains like Automobile, Finance, Insurance, Health care and telecom.
- 4 years of experience on Hadoop working environment includes Map Reduce, HDFS, HBase, Zookeeper, Oozie, Hive, Sqoop, Pig, Yarn, Cassandra, Kafka, Spark and Flume.
- Solid understanding of Hadoop Distributed File System.
- Good experience with MapReduce (MR), Hive, Pig, HBase, Sqoop, Oozie, Flume, Spark,Zookeeper for data extraction, processing, storage and analysis.
- In - depth understanding on how MapReduce works and Hadoop infrastructure.
- In depth understanding of Hadoop Architecture and its various components such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Map Reduce concepts.
- Experience in developing custom MapReduce Programs in Java using Apache Hadoop for analyzing Big Data .
- Extensively worked on Hive for ETL Transformations and optimized Hive Queries.
- Experience in importing and exporting data using Sqoop from Relational Database Systems tHDFS and vice-versa.
- Successfully loaded files tHive and HDFS from MongoDB, Cassandra, HBase.
- Extending HIVE and PIG core functionality by using custom User Defined Function s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Experience in analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java.
- Developed Pig Latin scripts for data cleansing and Transformation.
- Used Flume tchannel data from different sources tHDFS.
- Job workflow scheduling and monitoring using tools like Oozie.
- Created HBase tables tload large sets of structured, semi-structured and unstructured data coming from Unix and NoSQL.
- Good experience in Cloudera, Hortonworks & Apache Hadoop distributions.
- Worked with relational database systems (RDBMS) such as MySQL, MSSQL, Oracle and NoSQL database systems like HBase and Cassandra.
- Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
- Good Knowledge on HadoopCluster architecture and monitoring the cluster.
- Used Shell Scripting tmove log files intHDFS.
- Good understanding in processing of real-time data using Spark.
- Import the data from different sources like HDFS/HBase int Spark RDD.
- Experienced with different file formats like CSV, Text files, Sequence files, XML, JSON and Avrfiles.
- Good knowledge on Data Modelling and Data Mining tmodel the data as per business requirements.
- Involved in unit testing of Map Reduce programs using Apache MRunit.
- Good knowledge on python scripting, bash Scripting languages.
- Developed Simple tcomplex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.
- Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data.
- Real streaming the data using Spark with Kafka and store the stream data t HDFS using Scala
- Involved in creating database objects like tables, views, procedures, triggers and functions using T-SQL tprovide definition, structure and tmaintain data efficiently.
- Expert in Data Visualization development using Tableau tcreate complex and innovative dashboards.
- Generated ETL reports using Tableau and created statistics dashboards for Analytics.
- Reported the bugs by classifying them and have played major role in carrying out different types of tests viz. Smoke, Functional, Integration, System, Data Comparison and Regression testing.
- Experience in creating Master Test Plan, Test Cases, and Test Result Reports, Requirements Traceability Matrix and creating Status Reports and submitting tthe Project management.
- Strong hands on experience in MVC frameworks and Spring MVC.
- Good in Designing and developing the Data Access Layer modules with the help of Hibernate Framework for the new functionalities.
- Extensively experience in working on IDEs like Eclipse, Net Beans and Edit Plus.
- Working knowledge of Agile and waterfall development models.
- Working experience in all SDLC Phases.
- Extensively used Java and J2EE technologies like Core Java, Java Beans, Servlet, JSP, spring, Hibernate, JDBC, JSON Object, and Design Patterns.
- Experienced in Application Development using Java, J2EE, JSP, Servlets, RDBMS, Tag Libraries, JDBC, Hibernate, XML and Linux shell scripting.
- Worked with different software version control, Jira, bug tracking and code review systems like CVS, Clear Case.
TECHNICAL SKILLS:
Big data/Hadoop Ecosystem: HDFS, Map Reduce, HIVE, PIG, HBase, Sqoop, Flume, Oozie, Storm and Avro
Java / J2EE Technologies: Core Java, Servlets, JSP, JDBC, XML, REST, SOAP, WSDL
Programming Languages: C, C++, Java, Scala, SQL, PL/SQL, Linux shell scripts.
NoSQL Databases: MongoDB, Cassandra, HBase
Database: Oracle 11g/10g, DB2, MS-SQL Server, MySQL, Teradata.
Web Technologies: HTML, XML, JDBC, JSP, JavaScript, AJAX, SOAP
Frameworks: MVC, Struts 2/1, Hibernate 3, Spring 3/2.5/2.
Tools: Used: Eclipse, IntelliJ, GIT, Putty, Winscp
Operating System: Ubuntu (Linux), Win 95/98/2000/XP, Mac OS, RedHat
ETL Tools: Informatica, pentaho.
Testing: Hadoop Testing, Hive Testing, Quality Center (QC)
Monitoring and Reporting tools: Ganglia, Nagios, Custom Shell scripts.
PROFESSIONAL EXPERIENCE:
Confidential, Richardson, TX
Hadoop Developer
Responsibilities:
- Imported data from different relational data sources like RDBMS, Teradata tHDFS using Sqoop.
- Imported bulk data intHBase Using Map Reduce programs.
- Perform analytics on Time Series Data exists in HBase using HBase API.
- Designed and implemented Incremental Imports intHive tables.
- Used Rest ApI tAccess HBase data tperform analytics.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala.
- Experienced with batch processing of data sources using Apache Spark, Elastic search.
- Worked in Loading and transforming large sets of structured, semi structured and unstructured data
- Worked with cloud services like Amazon web services (AWS)
- Involved in collecting, aggregating and moving data from servers tHDFS using Apache Flume.
- Written Hive jobs tparse the logs and structure them in tabular format tfacilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
- Developed java Restful web services tupload data from local tAmazon S3, listing S3 objects and file manipulation operations.
- Configured a 20-30 node (Amazon EC2 spot instance) Hadoop cluster ttransfer the data from Amazon S3 tHDFS and HDFS tAmazon S3 and alstdirect input and output tthe Hadoop MapReduce framework.
- Import the data from different sources like HDFS/Hbase intSpark RDD developed a data pipeline using Kafka and Storm tstore data intHDFS. Performed real time analysis on the incoming data.
- Prepared scripts tensure proper data access, manipulation and reporting functions with R programming languages.
- Formulated procedures for integration of R programming plans with data sources and delivery systems.
- Developed a scalable queuing system taccommodate the ever growing message flows across the systems using Amazon Simple Queuing System and Akka actor models.
- Wrote internal and external API services using Node.js modules
- Experience with Primavera integration tools such as Primavera Gateway or custom integration experience of Primavera with ERP applications
- Developed Simple tcomplex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig
- Experienced in managing and reviewing the Hadoop log files.
- Successfully ran all Hadoop MapReduce programs on Amazon Elastic MapReduce framework by using Amazon S3 for input and output.
- Involve in Data Asset Inventory tgather, analyze, and document business requirements, functional requirements and data specifications for Member Retention from sources SQL / Hadoop.
- Involved in Automation of clickstream data collection and store intHDFS using Flume
- Worked on solving performance and limit queries tthe workbooks that when it connects tlive database by using a data extract option in Tableau.
- Designed and developed Dashboards for Analytical purposes using Tableau.
- Migrated ETL jobs tPig scripts do Transformations, even joins and some pre-aggregations before storing the data ontHDFS.
- Worked with AvrData Serialization system to work with JSON data formats.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Exported data from HDFS environment intRDBMS using Sqoop for report generation and visualization purpose.
- Worked on Oozie workflow engine for job scheduling.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Pig Scripts.
Environment: CDH 5.3, Map Reduce, Hive0.14, Spark 1.4.1, Oozie, Sqoop, Pig0.11, Java, Rest API, Maven, MRUnit, Junit, Tableau, Cloudera,Python.
Confidential, San Francisco, CA
Hadoop Developer
Responsibilities:
- Involved in Automation of clickstream data collection and store intHDFS using Flume.
- Involved in creating Data Lake by extracting customer's data from various data sources intHDFS.
- Used Sqoop tload data from Oracle Database intHDFS.
- Developed MapReduce programs tcleanse the data in HDFS obtained from multiple data sources.
- Employed Oracle database tcreate and maintain Data Mart
- Contributed in creating ETL designs and provided end users an easy accession tData marts.
- Involved in creating Hive tables as per requirement defined with appropriate static and dynamic partitions.
- Used Hive tanalyze the data in HDFS tidentify issues and behavioral patterns.
- Involved in production Hadoop cluster setup, administration, maintenance, monitoring and support.
- Real streaming the data using Spark with Kafka and store the stream data t HDFS using Scala .
- Developed Simple tcomplex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig
- Involved in Automation of clickstream data collection and store intHDFS using Flume
- Experience with Primavera integration tools such as Primavera Gateway or custom integration experience of Primavera with ERP applications
- Developed a scalable queuing system to accommodate the ever growing message flows across the systems using Amazon Simple Queuing System and Akka actor models.
- Worked with cloud services like Amazon web services (AWS)
- Wrote internal and external API services using Node.js modules
- Logical implementation and interaction with HBase.
- Cluster coordination services through Zookeeper.
- Efficiently put and fetched data to/from HBase by writing MapReduce job.
- Developed MapReduce jobs tautomate transfer of data from/tHBase.
- Created data queries and reports using Qlik view and Excel. Created Customs queries/reports designed for qualifying verification and information sharing.
- Assisted with the addition of Hadoop processing tthe IT infrastructure.
- Used flume tcollect the entire web log from the online ad-servers and push intHDFS.
- Implemented MapReduce job and execute the MapReduce job tprocess the log data from the ad servers.
- Extensively used Core Java, Servlets, JSP and XML
- Load and transform large sets of structured, semi structured and unstructured data.
- Back-endJava developer for Data Management Platform (DMP) and building RESTful APIs tbuild and let other groups build dashboards.
- Responsible for building Scalable distributed data solutions using HortonWorks.
- Integrated Hive server 2 with Tableau using Horton Works Hive ODBC driver, for autgeneration of Hive queries for non-technical business user.
- Worked closely with architect and clients tdefine and prioritize use cases and develop APIs.
- Involve in monitoring job performance, capacity planning and workload using Cloudera Manager.
Environment: Hadoop, Pig 0.10, Sqoop, Oozie, MapReduce, HDFS, HBase. Hive 0.10, Core Java, Eclipse,Qlik view, Flume, Cloudera, Horton Works,Oracle 10g, UNIX Shell Scripting, Cassandra.
Confidential, Penfield, NY
Hadoop Developer.
Responsibilities:
- Worked on Hadoop cluster (CDH 5) with 30 nodes.
- Worked with highly semi-structured and structured data of 90TB with replication factor 3.
- Extracted the data from Oracle, MySQL, and SQL server databases intHDFS using Sqoop .
- Extracted data from weblogs and social media using flume and loaded intHDFS.
- Created jobs in Sqoop with incremental load and populated Hive tables.
- Developed software to process, cleanse, and report on vehicle data utilizing various analytics and REST API languages like Java, Scala and Akka (Asynchronous programming Framework)
- Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Involved in Automation of clickstream data collection and store intHDFS using Flume
- Wrote internal and external API services using Node.js modules
- Experience with Primavera integration tools such as Primavera Gateway or custom integration experience of Primavera with ERP applications
- Worked with cloud services like Amazon web services (AWS)
- Involved in Developing Assert Tracking project where we use tcollect real-time vehicle location data using IBM streams from JMS queue and processed that data in Vehicle Tracking using ESRI GIS Mapping Software, Scala and Akka Actor Model.
- Involved in developing web-services using REST, HBase Native API and BigSQL Client tquery data from HBase.
- Experienced in Developing Hive queries in BigSQL Client for various use cases.
- Involved in developing few Shell Scripts and automated them using CRON job scheduler
- Implemented test scripts tsupport test driven development and continuous integration.
- Responsible tmanage data coming from different sources.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Analyzed large amounts of data sets tdetermine optimal way taggregate and report on it.
Environment: Hadoop 1x, Hive 0.10, Pig 0.11, Sqoop, HBase, UNIX Shell Scripting, Scala, Akka, IBM InfoSphere BigInsights, IBM InfoSphere Streams, IBM BigSQL, Java
Confidential, Memphis,TN
Hadoop Developer
Responsibilities:
- Worked with structured and semi structured data of approximately 100TB with replication factor of 3.
- Involved in complete Implementation lifecycle, specialized in writing custom MapReduce , Pig and Hive programs.
- Exported the analyzed data tthe relational databases using Sqoop for visualization and tgenerate reports for the BI team.
- Extensively used Hive/HQL or Hive queries tquery or search for a particular string in Hive tables in HDFS.
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins .
- Experience in developing customized UDF's in java textend Hive and Pig Latin functionality.
- Involved in Automation of clickstream data collection and store intHDFS using Flume
- Experience with Primavera integration tools such as Primavera Gateway or custom integration experience of Primavera with ERP applications
- Created HBase tables tstore various data formats of data coming from different portfolios.
- Managing and scheduling Jobs tremove the duplicate log data files in HDFS using Oozie.
- Used Flume extensively in gathering and moving log data files from Application Servers ta central location in Hadoop Distributed File System (HDFS).
- Experienced with SOLR for indexing and search.
- Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases tfind which one of them better suites the current requirements.
- Used File System check ( FSCK ) tcheck the health of files in HDFS.
- Developed the UNIX shell scripts for creating the reports from Hive data.
Environment: Java, UNIX, HDFS, Pig, Hive, Spark, Scala, MapReduce, Flume, Sqoop, Kafka, HBase, Cassandra, Cloudera Distribution, Oozie, Ambari, Ganglia, Yarn, Shell scripting
Confidential -Plano, TX
Java/J2EE/Hadoop Developer
Responsibilities:
- Participated in requirement gathering and converting the requirements inttechnical specifications.
- Created UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams.
- Developed the application using Struts Framework that leverages classical Model View Controller ( MVC ) architecture.
- Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
- Created Business Logic using Servlets, POJO s and deployed them on Web logic server.
- Wrote complex SQL queries and stored procedures.
- Developed the XML Schema and Web services for the data maintenance and structures.
- Implemented the Web Service client for the login authentication, credit reports and applicant information using Apache Axis 2 Web Service.
- Responsible to manage data coming from different sources.
- Developed map reduce algorithms.
- Got good experience with NOSQL database.
- Involved in loading data from UNIX file system tHDFS.
- Installed and configured Hive and also written Hive UDFs.
- Integrated Hadoop with Solr and implement search algorithms.
- Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 10g database.
- Used Hibernate ORM framework with Spring framework for data persistence and transaction management.
- Used struts validation framework for form level validation.
- Wrote test cases in JUnit for unit testing of classes.
- Involved in creating templates and screens in HTML and JavaScript.
- Involved in integrating Web Services using SOAP.
Environment: Hive 0.7.1, Apache Solr 3.x, HBase-0.90.x/0.20.x, JDK 1.5,, Struts 1.3, WebSphere 6.1, HTML, XML, JavaScript, JUnit 3.8,Oracle 10g, Amazon Web Services.
Confidential, McLean, VA
Java/J2EE Developer
Responsibilities:
- Responsible for gathering business and functional requirements for the development and support of in-house and vendor developed applications
- Gathered and analyzed information for developing, supporting, and modifying existing web applications based on prioritized business needs
- Played key role in design and development of new application using J2EE, Servlets, and Spring technologies/frameworks using Service Oriented Architecture (SOA)
- Wrote Action classes, Request Processor, Business Delegate, Business Objects, Service classes and JSP pages
- Played a key role in designing the presentation tier components by customizing the Spring framework components, which includes configuring web modules, request processors, error handling components, etc.
- Implemented the Web Services functionality in the application tallow external applications taccess data
- Used Apache Axis as the Web Service framework for creating and deploying Web Service Clients using SOAP and WSDL
- Worked on Spring to develop different modules tassist the product in handling different requirements
- Developed validation using Spring's Validation Interface and used Spring Core and MVC develop the applications and access data
- Implemented Spring Beans using IOC and Transaction management features thandle the transactions and business logic
- Design and developed different PL/SQL blocks , Stored Procedures in DB2 database
- Involved in writing DAO layer using Hibernate taccess the database
- Involved in deploying and testing the application using Websphere Application Server
- Developed and implemented several test cases using JUnit framework
- Involved in troubleshoot technical issues, conduct code reviews, and enforce best practices
Environment: Java SE 6, J2EE 6, JSP 2.1, Servlets 2.5, Java Script, IBM Websphere7, DB2, HTML, XML, Spring 3, Hibernate 3, JUnit, Windows 7, Eclipse 3.5
Confidential, Seattle, WA
Java/J2EE Developer
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
- Developed and deployed UIlayerlogics of sites using JSP, XML, JavaScript, HTML/DHTML, and Ajax.
- CSS and JavaScript were used to build rich internet pages.
- Agile Scrum Methodology been followed for the development process.
- Designed different design specifications for application development that includes front-end, back-end using design patterns.
- Developed proto-type test screens in HTML and JavaScript.
- Involved in developing JSP for client data presentation and, data validation on the client side with in the forms.
- Developed the application by using the Spring MVC framework.
- Collection framework used to transfer objects between the different layers of the application.
- Developed data mapping tcreate a communication bridge between various application interfaces using XML, and XSL.
- Spring IOC being used tinject the parameter values for the Dynamic parameters.
- Developed JUnit testing framework for Unit level testing.
- Actively involved in code review and bug fixing for improving the performance.
- Documented application for its functionality and its enhanced features.
- Created connection through JDBC and used JDBC statements tcall stored procedures.
Environment: Spring MVC, Oracle 11g J2EE, Java, JDBC, Servlets, JSP, XML, Design Patterns, CSS, HTML, JavaScript 1.2, Junit, Apache Tomcat, My SQL Server 2008.
Confidential
Application Developer
Responsibilities:
- Developed the application under JEE architecture, developed, Designed dynamic and browser compatible user interfaces using JSP, Custom Tags, HTML, CSS, and JavaScript.
- Deployed & maintained the JSP, Servlets components on Web logic 8.0
- Developed Application Servers persistence layer using JDBC and SQL.
- Used JDBC tconnect the web applications tDatabases.
- Implemented Test First unit testing framework driven using Junit.
- Developed and utilized J2EE Services and JMS components for messaging communication in Web Logic.
- Configured development environment using Web logic application server for developers integration testing.
Environment: Java/J2EE, SQL, Oracle 10g, JSP 2.0, EJB, AJAX, Java Script, Web Logic 8.0, HTML, JDBC 3.0, XML, JMS, log4j, Junit, Servlets, MVC