We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Madison, WA

­

SUMMARY:

  • 8+ years of professional experience involving project development, implementation, deployment and maintenance using Java/J2EE and Big Data related technologies.
  • Hadoop Developer with 4+ years of working experience in designing and implementing complete end - to-end Hadoop based data analytical solutions using HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase etc.
  • Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
  • Experience in importing and exporting different formats of data into HDFS, HBASE from different RDBMS databases and vice versa using Sqoop.
  • Exposure to Cloudera development environment and management using Cloudera Manager.
  • Experience in analyzing data using HiveQL, Pig Latin, Hbase, Mongo and custom MapReduce programs in Java.
  • Experience in extending Hive and Pig core functionality by writing custom UDFs using Java.
  • Developed analytical components using Spark and Spark Stream
  • Background with traditional databases such as Oracle, SQL Server, MySQL.
  • Good knowledge and Hands-on experience in storing, processing unstructured data using NOSQL databases like HBase and MongoDB.
  • Good knowledge in distributed coordination system ZooKeeper and experience with Data Warehousing and ETL.
  • Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs.
  • Experience in creating complex SQL Queries and SQL tuning, writing PL/SQL blocks like stored procedures, Functions, Cursors, Index, triggers and packages.
  • Good knowledge of database connectivity (JDBC) for databases like Oracle, DB2, SQL Server, MySQL, NoSQL, MS Access.
  • Profound experience in creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
  • Developed analytical components using Spark and Spark Stream
  •  Worked on a prototype Apache Spark Streaming project, and converted our existing Java Strom Topology.
  • Proficient in visualizing data using Tableau, QlikView, Microstratergy and MS Excel.
  • Experience in developing ETL scripts for data acquisition and transformation using Informatica and Talend.
  • Used Maven extensively for building MapReduce jar files and deployed it to Amazon Web Services (AWS) using EC2 virtual Servers in the cloud and Experience in build scripts to do continuous integrations systems like Jenkins.
  • Experienced in Java Application Development, Client/Server Applications, Internet/Intranet based applications using Core Java, J2EE patterns, Spring, Hibernate, Struts, JMS, Web Services (SOAP/REST), Oracle, SQL Server and other relational databases.
  • Experience writing Shell scripts in Linux OS and integrating them with other solutions.
  • Experienced in using agile methodologies including extreme programming, SCRUM and Test Driven Development (TDD).
  • Experienced in creating and analyzing Software Requirement Specifications SRS and Functional Specification Document FSD. Strong knowledge of Software Development Life Cycle SDLC.
  • Devoted to professionalism, highly organized, ability to work under strict deadline schedules with attention to details, possess excellent written and communication skills.

TECHNICAL SKILLS:

Technologies : J2EE, EJB, JSF, Servlets, JSP, JSTL, CSS, HTML, XHTML, XML, Angular JS, AJAX, JavaScript, JQuery.

Development Tools : Eclipse, Net Beans, SVN, Git, Ant, Maven, SOAP UI, JMX, explorer, XML Spy, QC, QTP, Jira, SQL Developer, QTOAD.

Methodologies : Agile/Scrum, UML, Rational Unified Process and Waterfall.

NoSQL Technologies : Cassandra, MongoDB, HBase.

Frameworks : Struts, Hibernate, And Spring MVC.

Scripting Languages : Unix Shell Scripting, perl.

Distributed platforms : Hortonworks, Cloudera, MapR

Databases : Oracle 11g/12C, MySQL, MS-SQL Server, Teradata, IBM DB2

Operating Systems : Windows XP/Vista/7/8,10, UNIX, Linux

Software Package : MS Office 2007/2010/2016.

Web/ Application Servers : WebLogic, WebSphere, Apache Tomcat, WebSphere Application Server

Visualization : Tableau, Qulickview, Microstratergy and MS Excel.

Version control : CVS, SVN, GIT, TFS.

Web Technologies : HTML, XML, CSS, JavaScript, jQuery, AJAX, AngularJS, SOAP, REST and WSDL.

PROFESSIONAL EXPERIENCE:

HADOOP DEVELOPER

Confidential, Madison, WA

Responsibilities:

  • Coordinated with business customers to gather business requirements. And also interacted with other technical peers to derive Technical requirements.
  • Developed MapReduce, Pig and Hive scripts to cleanse, validate and transform data.
  •  Implemented Map Reduce programs to handle semi/unstructured data like xml, json, Avro data files and sequence files for log files. 
  • Developed Sqoop jobs to import and store massive volumes of data in HDFS and Hive.
  • Designed and developed PIG data transformation scripts to work against unstructured data from various data points and created a base line.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis.
  • Designed Data Quality Framework to perform schema validation and data profiling on Spark (pySpark). 
  • Leveraged spark (pySpark) to manipulate unstructured data and apply text mining on user's table utilization data.
  • Worked on creating and optimizing Hive scripts for data analysts based on the requirements.
  • Created Hive UDFs to encapsulate complex and reusable logic for the end users.
  • Developing predictive analytic using Apache Spark Scala APIs.
  • Experienced in migrating HiveQL into Impala to minimize query response time. 
  • Designed an agent-based computational framework based on Scala, Breeze to scale computations for many simultaneous users in real-time.
  • Orchestrated Oozie workflow engine to run multiple Hive and Pig jobs.
  • Experienced with using different kind of compression techniques like Lzo, Snappy, Bzip2, Gzip to save data and optimize data transfer over network using Avro, Parquet, and Orcfile.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Implemented data injection systems by creating Kafka brokers, Java producers, Consumers, custom encoders. 
  • Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
  •  Developed Spark code using Scala and Spark-Sql Streaming for faster testing and processing of data. 
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Developed some utility helper classes to get data from HBase tables.
  • Good experience in troubleshooting performance issues and tuning Hadoop cluster.
  • Knowledge in Spark Core, Streaming, Data Frames and SQL, MLib, GraphX. 
  • Implemented Caching for Spark Transformations, action to use as reusable component.
  • Extracted files from Cassandra through Sqoop and placed in HDFS and processed. 
  • Used maven to build and deploy the Jars for MapReduce, Pig and Hive UDFs. 
  • Developed workflows in Oozie.
  • Extensively used the Hue browser for interacting with Hadoop components. 
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Worked on Amazon Web Services.
  • Cluster coordination services through Zookeeper. 
  •  Involved in agile methodologies, daily scrum meetings, spring planning's. 

Environment: Linux (CentOS, RedHat), UNIX Shell, Pig, Hive, MapReduce, YARN, Spark 1.4.1, Eclipse, Core Java, JDK1.7, Oozie Workflows, AWS, S3, EMR, Cloudera, HBASE, SQOOP, Scala, Kafka, Python, Cassandra, maven, Horton works, Cloudera Manager

HADOOP DEVELOPER

Confidential, Jersey City, NJ

Responsibilities:

  • Written Map Reduce code that will take input as customer related flat file and parse the same data to extract the meaningful (domain specific) information for further processing. 
  • Extensively worked on creating Combiners, Partitioning and Distributed cache to improve the performance of Map Reduce jobs. 
  • Experience in using Sequence files, ORC, AVRO file formats. 
  • Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS. 
  • Created Hive External tables with partitioning to store the processed data from Map Reduce. 
  • Implemented Hive optimized joins to gather data from different sources and run ad-hoc queries on top of them. 
  • Used Pig to do data transformations, event joins, filter and some pre-aggregations before storing the data into HDFS. 
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS. 
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig. 
  • Optimized PIG jobs by using different compression techniques and performance enhancers. 
  • Worked with Cassandra and utilized NoSQL for non-relation data storage and retrieval. 
  • Importing and exporting data from relational data stores and MongoDB to HDFS using Sqoop vice versa.
  • Performed various performance optimizations like using distributed cache for small datasets, Partition and Bucketing in hive and Map Side Joins.
  • Wrote Hive Generic UDF's to perform business logic operations at record level.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile devices and pushed to HDFS.
  • Involved in troubleshooting errors in Shell, Hive and Map Reduce.
  • Worked on debugging, performance tuning of Hive & Pig Jobs. 
  • Responsible for importing and exporting data from HDFS to MySQL database and vice-versa using Sqoop. 
  • Involved in admin related issues of HBase and other NoSQL databases. 
  • Monitored and Debugged Hadoop jobs/Applications running in production using HUE as GUI. 
  • Implemented 100 node CDH4 Hadoop cluster on Red hat Linux using Cloudera Manager.
  • Query indexed data for analytics using Apache Solr. 
  • Exported the analyzed data to the relational databases using Sqoop for Tableau visualization and to generate reports for the BI team. 
  • Automated workflow using Shell Scripts. 
  • Responsible for managing and reviewing Hadoop log files. 
  • Used ZooKeeper for enabling synchronization across the cluster.
  • Performed both major and minor upgrades to the existing cluster and also commissioning and decommissioning of nodes to balance node across the cluster.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager. 
  • Having good working experience in Agile/Scrum methodologies, technical discussion with client and communication using scrum calls daily for project analysis specs and development aspects. 

Environment: Hadoop, Java 1.7, UNIX, Shell Scripting, HDFS, HBase, NOSQL, MapReduce, YARN, Hive, PIG, ORACLE, MongoDB, Zookeeper, Sqoop.

HADOOP DEVELOPER

Confidential, Somerset NJ

Responsibilities:

  • Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Extensively involved in Design phase and delivered Design documents.
  • Importing and exporting data into HDFS and Hive using SQOOP.
  • Writing MapReduce jobs to standardize the data and clean it and calculate aggregates.
  • Established custom MapReduce programs in order to analyze data and used Pig Latin to clean unwanted data.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Experienced in managing and reviewing the Hadoop log files.
  • Load and Transform large sets of structured and semi structured data.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability
  • Used Zookeeper for providing coordinating services to the cluster.
  • Involved in Unit testing and delivered Unit test plans and results documents.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.

Environment: Apache Hadoop, Map Reduce, HDFS, Hive, Pig, SQOOP, HBase, Zookeeper, UNIX shell scripting, Eclipse.

JAVA/J2EE DEVELOPER

Confidential, Columbus, OH

Responsibilities:

  • Involved in Business requirements gathering, Design, Development and unit testing of Bill Pay Account Accelerator (BPAA) & Alphanumeric ID projects.
  • Involved in maintenance & development of pnc.com and their related web sites like PNC virtual wallet, Wealth Management & Mutual Funds etc.
  • Responsible for developing use cases, class and sequence diagram for the modules using UML and Rational Rose.
  • Involved in preparation of docs like Functional Specification document and Deployment Instruction documents.
  • Set up the deployment environment on WebSphere 6.1 Developed system preferences UI screens using JSP2.0 and HTML.
  • Used Java Script for Client side validations.
  • Code and Unit Test according to client standards. Provide production support and quickly resolving the issues until Integration Test is passed.
  • Fix defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.
  • Used JMS for Point-to-Point asynchronous messaging for high transactional Banking operation.
  • Involved in preparation of unit and system test cases and testing of the module in 3 phases named unit testing and system testing and regression testing.
  • Involved in writing shell scripts, Ant scripts for Unix OS for application deployments on production region.
  • Developed core banking business components as a Web Service for enterprise-wide SOA Architecture strategy.
  • Used Rational Clear Case as source control management system.
  • Implemented SOA architecture with web services using SOAP, WSDL, UDDI and XML.
  • Involved in deployments in all environments like Dev, Test, UAT and prod respectively.
  • Involved in design Credit Card Service layer on mainframe with MQ series and WBI. Provide XML based messaging service to front-end applications.
  • Extensively used IBM RAD 7.1 IDE for building, testing, and deploying applications.
  • Responsible for developing use cases, class and sequence diagram for the modules using UML and Rational Rose 2000
  • Worked with Single Sign-On (SSO) using SAML for retrieving data from third party applications like Yodlee.

Environment: Java (jdk1.5), J2EE, WebSphere 6.1, IBM RAD 7.5, Rational ClearCase 7.0, XML, JAXP, XSL, XSLT, XML Schema(XSD), WSDL 2.0, SAML 2.0, AJAX 1.0, Web Services, SOA, JSP 2.2, CSS, Servlets, JProfiler , Struts 2.0, Spring, Rational HATS, JavaScript, JCF, HTML, IBM DB2, JMS, AXIS 2, Swing, MQ, Open source technologies (ANT, LOG4j and Junit), Oracle 10g, UNIX.

JAVA DEVELOPER

Confidential

Responsibilities:

  • Assisted in designing and programming for the system, which includes development of Process Flow Diagram, Entity Relationship Diagram, Data Flow Diagram and Database Design.
  • Involved in Transactions, login and Reporting modules, and customized report generation using Controllers, Testing and debugging the whole project for proper functionality and documenting modules developed.
  • Designed front-end components using JSF.
  • Involved in developing Java APIs, which communicates with the Java Beans.
  • Implemented MVC architecture using Java, Custom and JSTL tag libraries.
  • Involved in development of POJO classes and writing Hibernate query language (HQL) queries.
  • Implemented MVC architecture and DAO design pattern for maximum abstraction of the application and code reusability.
  • Created Stored Procedures using SQL/PL-SQL for data modification.
  • Used XML, XSL for Data presentation, Report generation and customer feedback documents.
  • Used Java Beans to automate the generation of Dynamic Reports and for customer transactions.
  • Developed JUnit test cases for regression testing and integrated with ANT build.
  • Implemented Logging framework using Log4J.
  • Involved in code review and documentation review of technical artifacts.

Environment: J2EE/Java, JSP, Servlets, JSF, Hibernate, Spring, JavaBeans, XML, XSL, HTML, DHTML, JavaScript, CVS, JDBC, Log4J, Oracle 9i, IBM WebSphere Application Server

We'd love your feedback!