Hadoop/Spark Developer Resume NJ - Hire IT People

SUMMARY:

Having 8 years of professional IT experience in Analysis, Development, Integration and Maintenance of Web based and Client/Server applications using Java and Big Data technologies.
4 years of relevant experience in Hadoop Ecosystem and architecture (HDFS, MapReduce, YARN, Pig, Hive, HBase, Sqoop, Flume, Oozie).
Experience in real time analytics with Apache Spark (RDD, DataFrames and Streaming API).
Used Spark DataFrames API over Cloudera platform to perform analytics on Hive data.
Experience in integrating Hadoop with Apache Storm and Kafka. Expertise in uploading Click stream data from Kafka to HDFS, HBase and Hive by integrating with Storm.
Developed producers for Kafka which compress and bind many small files into a larger Avro and Sequence files before writing to HDFS to make best use of Hadoop block size.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
Configured Flume to extract the data from the web server output files to load into HDFS.
Extensive hands on experience in writing MapReduce jobs in Java.
Performed data analysis using Hive and Pig. Experience in analyzing large datasets using HiveQL and PigLatin.
Experience in using Partitioning and Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Java into PigLatin and HQL (HiveQL) and Used UDFs from Piggybank UDF Repository.
Good understanding and knowledge of NoSQL databases like MongoDB, Cassandra and HBase.
Having Experience in monitoring and managing the Hadoop cluster using Cloudera Manager.
Experience in job work-flow scheduling and monitoring using Oozie with Python scripting.
Worked extensively on different Hadoop distributions like Cloudera’s CDH and Hortonworks HDP.
Good working knowledge in cloud integration with Amazon Web Services (AWS) components like Redshift, DynamoDB, EMR, S3 and EC2 instances.
Worked with Apache NiFi to develop Custom Processors for the purpose of processing and distributing data among cloud systems.
Having good knowledge of Scala programming concepts.
Expertise in distributed and web environments focused in Core Java technologies like Collections, Multithreading, IO, Exception Handling and Memory Management.
Expertise in development of Web applications using J2EE technologies like Servlets, JSP, Web Services, Spring, Hibernate, HTML5, JavaScript, jQuery, AJAX etc.,
Knowledge of standard build and deployment tools such as Eclipse, Scala IDE, Maven, Subversion, SBT.
Extensive knowledge in Software Development Lifecycle (SDLC) using Waterfall, Agile methodologies.
Facilitate Sprint planning, daily scrums, retrospectives, stakeholder meetings, and software demonstrations.
Excellent communication skills with the ability to communicate complex issues to technical and non-technical audiences that includes peers, partners, and Senior IT and Business management.

TECHNICAL SKILLS:

Languages: Java, XML, SQL, PL/SQL, Pig Latin, Hive QL, Python, Scala

Web Technologies: JEE (JDBC, JSP, SERVLET, JSF, JSTL), AJAX, JavaScript

Big Data Systems: Hadoop, HDFS, MapReduce, YARN, Pig, Hive, Sqoop, Flume, Oozie, Impala, Spark, Kafka, Storm

RDBMS: Oracle, MySQL, SQL Server, PostgreSQL, Teradata

NoSQL Databases: HBase, MongoDB, Cassandra

App/Web Servers: Apache Tomcat, WebLogic

SOA: Web services, SOAP, REST

Frameworks: Struts 2, Hibernate, Spring 3.x

Version Control Systems: GIT, CVS, SVN

IDEs: Eclipse, Scala IDE, NetBeans, IntelliJ IDEA, PyCharm

Operating Systems: UNIX, Linux, Windows

WORK EXPERIENCE:

Confidential, NJ

Hadoop/Spark Developer

Responsibilities:

Worked on a live 16 nodes Hadoop cluster running CDH 4.7.
Worked with highly unstructured and semi-structured data of 30 TB in size (90 TB with replication factor of 3).
Created and worked Sqoop jobs with incremental load to populate Hive External tables.
Configured Flume to transport web server logs into HDFS. Also, used Kite logging module to upload web server logs into HDFS.
Extraction of data from AgentNodes into HDFS using Python scripts.
Development of complex Pig scripts to transform raw data from the staging area.
Designed and developed Hive tables to store staging and historical data.
Created Hive tables as per requirement, internal and external tables are defined with appropriate static and dynamic partitions, intended for efficiency.
Processed large data sets utilizing Hadoop cluster. The data that are stored on HDFS were preprocessed/validated using Pig and then processed data was stored into Hive warehouse which enabled Business analysts to get the required data from Hive.
Experience in using ORC file format with Snappy compression for optimized storage of Hive tables.
Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
Developed Oozie workflow for scheduling and orchestrating the ETL process. Designed & Implemented Java MapReduce programs to support distributed data processing.
Involved in migrating MapReduce jobs into Spark jobs and used Spark SQL and DataFrames API to load structured and semi-structured data into Spark clusters.
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
Used Apache Kafka for importing real time network log data into HDFS.
Involved in setting up and managing training sessions. Currently responsible for mentoring peers and leading technical teams.

Environment: Apache Hadoop, CDH 4.7, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBase, Oozie, Scala, Spark, Spark Streaming, Kafka, Linux

Confidential, Madison, WI

Hadoop Developer

Responsibilities:

Worked on a live 10 node Hadoop cluster running HDP 2.0.
Extracted the data from Teradata into HDFS using Sqoop.
Used Flume to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices and pushed into HDFS.
Implemented MapReduce programs on log data to transform into structured way to find user information.
Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views and visit duration.
Developed UDF functions for Hive and wrote complex queries in Hive for data analysis.
Developed a well-structured and efficient ad-hoc environment for functional users.
Export the analyzed data to relational databases using Sqoop for visualizations and to generate reports for the BI team.
Loaded cache data into HBase using Sqoop.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Extensive work in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading using Informatica.
Extensively used ETL processes to load data from flat files into the target database by applying business logic on transformation mapping for inserting and updating records when loaded.
Created Talend ETL jobs to read the data from Oracle Database and import in HDFS.
Worked on data serialization formats for converting complex objects into sequence bits by using Avro, RC and ORC file formats.

Environment: Apache Hadoop, Hortonworks HDP 2.0, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBase, Oozie, Teradata, Talend, Avro, Java, Python, Linux

Confidential, Omaha, NE

Hadoop Developer

Responsibilities:

Worked on live 8 node Hadoop cluster running CDH 4.
Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS).
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by Business Intelligence tools.
Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Developed several MapReduce programs to analyze and transform the data to uncover insights into the customer usage patterns.
Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data into HDFS.
Responsible for creating Hive External tables and loaded the data into tables and query data using HiveQL.
Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
Created concurrent access for Hive tables with shared and exclusive locking that can be enabled in Hive with the help of Zookeeper implementation in the cluster.
Integrated Oozie with the rest of Hadoop stack supporting several types of Hadoop jobs as well as the system specific jobs (such as Java programs and shell scripts).
Created HBase tables to store various data formats coming from different portfolios, worked on NoSQL databases including HBase, Cassandra and MongoDB.
Used Jenkins for build and continuous integration for software development.
Worked with application teams to install Operating systems, Hadoop updates, patches and version upgrades as required.

Environment: Apache Hadoop, CDH 4, Sqoop, Flume, MapReduce, Pig, Hive, HBase, Cassandra, MongoDB, Oozie, Zookeeper, Jenkins

Confidential

Java Developer

Responsibilities:

Involved in development of business domain concepts into Use cases, Sequence Diagrams, Class Diagrams, Component Diagrams and Implementation Diagrams.
Implemented various J2EE Design patterns such as Model-View-Controller (MVC), Data Access Object, Business Delegate and Transfer Object.
Involved in designing and development of project using Java/J2EE technologies by following MVC architecture of which JSPs are views and Servers as controllers.
Involved in configuring Struts, Tiles and developing the configuration files.
Developed Struts Action classes and Validation classes using Struts controller component and Struts validation framework.
Developed and deployed UI layer logics using JSP, XML, JavaScript, HTML/DHTML.
Using Star UML designed network and use case diagrams to monitor the workflow.
Wrote Server side programs to handle requests coming from different types of devices using RESTful Web Services.
Designed a light weight model for the product using Inversion of Control principal and implemented it successfully using Spring IOC Container.
Used Hibernate ORM tool to store and retrieve the data from PostgreSQL database.
Provided connections using JDBC to the database and developed SQL queries to manipulate the data.

Environment: Java J2EE, Struts MVC, Tiles, JSP, XML, JavaScript, Spring IOC, Websphere Application Server, PostgreSQL

Confidential

Java Developer

Responsibilities:

Work involved providing support to the production environment for various applications and actively work on incidents and issues raised by users. This also involved in on call support during off hours.
Developed service layer logic for core modules using JSPs and Servlets and involved in integration with presentation layer.
Involved in complete lifecycle of the project from gathering business requirements to creating an architecture and build applications on Java/J2EE with Spring MVC framework.
Implemented various design patterns in the project such as Business Delegate, Data Transfer Object, Service Locator, Data Access Object and Singleton.
Developed XML configuration and data description using Hibernate. Hibernate Transaction Manager is used to maintain the transaction persistence.
Developed the user interface using JSP and DHTML to design the dynamic HTML pages.
Involved in fixing bugs and minor enhancements for the front-end module.

Environment: Java, Servlets, JSP, Spring, Hibernate, XML, XPath, jQuery, JavaScript, WebSphere Application Server

We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship