Hadoop Developer Resume Piscataway, NJ - Hire IT People

SUMMARY

Over 8 years of experience with emphasis on Big Data technologies, development and design of Java based enterprise applications
Expertise in the creation of On - prem and Cloud Data Lake
Experience working with Cloudera, Hortonworks and Pivotal Distributions of Hadoop
Expertise in HDFS, Mapreduce, Spark, Hive, Impala, Pig, Sqoop, Hbase, Oozie, Flume, Kafka and various other ecosystem components
Expertise in Spark framework for batch and real time data processing
Experience in working with BI team and transform big data requirements into Hadoop centric technologies.
Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including PIG, HIVE, Sqoop, Oozie, Flume and zookeeper.
Experience in converting MapReduce applications to Spark.
Experience in handline messaging services using Apache Kafka.
Experience in working with flume to load the log data from multiple sources directly into HDFS
Experience in Data migration from existing data stores and mainframe NDM(Network Data mover) to Hadoop
Good Knowledge with NoSql Databases - Cassandra, Mongo DB and HBase.
Experience in handling multiple relational databases: MySQL, SQL Server, PostgeSQL and Oracle.
Experience in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud. Exporting and importing data into S3.
Experience in designing both time driven and data driven automated workflows using Oozie.
Experience in supporting analysts by administering and configuring HIVE.
Experience in running Pig and Hive scripts.
Experience in fine-tuning Mapreduce jobs for better scalability and performance.
Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
Performed Importing and exporting data into HDFS and Hive using Sqoop.
Experience in writing shell scripts to dump the Sharded data from Landing Zones to HDFS.
Worked on predictive modeling techniques like Neural Networks, Decision Trees and Regression Analysis.
Experience in Data mining and Business Intelligence tools such as Tableau, SAS Enterprise Miner, JMP and Enterprise Guide, Confidential SPSS modeler and MicroStratergy.

TECHNICAL SKILLS

Hadoop Ecosystem Development: HDFS, MapReduce, Spark, Hive, Pig, Flume, Oozie, Zookeeper, HBASE, Cassandra, Kafka,Solr, HCatalog, Sqoop.

Operating System: Linux, Windows XP, Server 2003, Server 2008.

Databases: MySQL, Oracle, MS SQL Server, PostgreSQL, MS Access

Languages: C, JAVA, PYTHON, SQL, Pig, UNIX shell scripting

PROFESSIONAL EXPERIENCE

Confidential, Jersey City, NJ

Hadoop Developer

Responsibilities:

Worked on the creation of business rules in Pig
Imported data from legacy systems to Hadoop using Sqoop and Apache Camel
Used Pig for data transformation
Used Apache Spark for real time and batch processing
Used Apache Kafka for handling log messages that are handled by multiple systems
Used shell scripting extensively for data munging
Worked on HCatalog, which allows PIG and Map Reduce to take advantage of the SerDE data format transformation definitions are already written on HIVE
Worked on DevOps tools like Chef, Artifactory and Jenkins to configure and maintain the production environment
Used Pig to transform data into various formats
Stored processed tables in Cassandra from HDFS for applications to access the data in real time
Used Solr on Cassandra for implementation of near real-time search
Worked on writing UDFs in Java for Pig
Created ORCFile tables from the existing non-ORCFile Hive tables

Environment: Hortonworks Data Platform 2.2, Pig, Hive, Spark, Kafka, Cassandra, Sqoop, Apache Camel, Apache Crunch, HCatalog, Chef, Jenkins, Artifactory, Avro, Confidential Data Studio

Confidential, Piscataway, NJ

Hadoop Developer

Responsibilities:

Worked on the creation of on-premise and cloud data lake from start with Pivotal distribution
Imported data from various relational data stores to HDFS using Sqoop
Collected user activity data, log data using Kafka for real time analytics
Implemented batch processing using Spark
Converted Hive tables to HAWQ for higher query performance
Responsible for loading unstructured and semi-structured data into Hadoop cluster from different data sources using Flume
Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries
Used the RegEx, JSON, Parquet and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data
Implemented Hive and Pig custom UDF’s to achieve comprehensive data analysis
Used Pig to develop ad-hoc queries
Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data
Implemented daily workflow for extraction, processing and analysis of data with Oozie
Responsible for troubleshooting Spark/MapReduce jobs by reviewing the log files
Used Tableau for visualizing and to generate reports

Environment: Pivotal HD 2.0, Gemfire XD, MapReduce, Spark, Pig, Hive, Kafka, Sqoop, HBase, Cassandra, Flume, Oozie, Tableau, Aspera, AWS, HCatalog

Confidential, Minneapolis, Minnesota

Hadoop Developer

Responsibilities:

Imported data from our relational data stores to Hadoop using Sqoop.
Created various Mapreduce jobs for performing ETL transformations on the transactional and application specific data sources.
Wrote PIG scripts and executed by using Grunt shell.
Big data analysis using Pig and User defined functions (UDF).
Worked on loading tables to Impala for faster retrieval using different file formats.
Performance tuning of queries in Impala for faster retrieval.
The system was initially developed using Java. The Java filtering program was restructured to have business rule engine in a jar that can be called from both java and Hadoop.
Created Reports and Dashboards using structured and unstructured data.
Upgrade operating system and/or Hadoop distribution as and when new versions released by using Puppet.
Performed joins, group by and other operations in MapReduce by using Java and PIG.
Worked on Amazon Web Services (AWS) to complete set of infrastructure and application services that runs virtually everything in the cloud from enterprise applications and big data project.
Processed the output from PIG, Hive and formatted it before sending to the Hadoop output file.
Used HIVE definition to map the output file to tables.
Setup and benchmarked Hadoop/HBase clusters for internal use
Wrote data ingesters and map reduce programs
Reviewed the HDFS usage and system design for future scalability and fault-tolerance;
Wrote MapReduce/HBase jobs
Worked with HBASE NOSQL database.

Environment: Hadoop, Java 1.5, UNIX, Shell Scripting, XML, HDFS, HBase, NOSQL, MapReduce, Hive, Impala, PIG.

Confidential, Bluebell, PA

Hadoop Consultant

Responsibilities:

Responsible for installing and configuring Hadoop MapReduce, HDFS, also developed various MapReduce jobs for data cleaning
Installed and configured Hive to create tables for the unstructured data in HDFS
Hold good expertise on major components in Hadoop Ecosystem including Hive, PIG, HBase, HBase-Hive Integration, Sqoop and Flume.
Involved in loading data from UNIX file system to HDFS
Responsible for managing and scheduling jobs on Hadoop Cluster
Responsible for importing and exporting data into HDFS and Hive using Sqoop
Experienced in running Hadoop streaming jobs to process terabytes of xml format data
Experienced in managing Hadoop log files
Worked on managing data coming from different sources
Wrote HQL queries to create tables and loaded data from HDFS to make it structured
Load and transform large sets of structured, semi structured and unstructured data
Extensively worked on Hive for generating transforming files from different analytical formats to .txt i.e. text files enabling to view the data for further analysis
Created Hive tables, loaded them with data and wrote hive queries that run internally in MapReduce way
Wrote and modified store procedures enabling to load and modify data according to the project requirements
Responsible for developing PIG Latin scripts enabling the extraction of data from the web server output files to load into HDFS
Extensively used Flume to collect the log files from the web servers and then integrated these files into HDFS
Responsible for implementing schedulers on Job Tracker enabling them to effectively use the resources available in the cluster for any given MapReduce jobs.
Constantly worked on tuning the performance of the queries in Hive and Pig, making the queries work even more powerfully in processing and retrieving the data
Supported Map Reduce Programs running on the cluster
Created external tables in Hive and loaded the data into these tables
Hands on experience in database performance tuning and data modeling
Monitored the cluster coordination using ZooKeeper

Environment: Hadoop, HDFS, MapReduce, HortonWorks, Hive, Java (jdk1.6), DataStax, Flat files, UNIX Shell Scripting, Oracle 11g 10g, PL SQL, SQL*PLUS, Toad 9.6, Windows NT.

Confidential, Pittsburgh, PA

Sr. Java Developer

Responsibilities:

Developed detail design document based on design discussions.
Involved in designing the database tables and java classes used in the application.
Involved in development, Unit testing and system integration testing of the travel network builder side of application.
Involved in design, development and building the travel network file system to be stored in NAS drives.
Setup Linux environment for to interact with route smart library (.so) file and NAS drive file operations using JNI.
Implemented and configure Hudson as Continuous Integration server and Sonar for maintaining code and remove redundant code.
Worked with Route-smart C++ code to interact with Java application using SWIG and Java Native interfaces.
Developed the user interface for requesting a travel network build using JSP and Servlets.
Build business logic to users can specify which version of the travel network files to be used for the solve process.
Used Spring Data Access Object to access the data with data source.
Build an independent property sub-system to ensure that the request always picks the latest set of properties.
Implemented thread Monitor system to monitor threads. Used JUnit to do the Unit testing around the development modules.
Wrote SQL queries and procedures for the application, interacted with third party ESRI functions to retrieve map data.
Building and Deployment of JAR, WAR, EAR files on dev, QA servers.
Bug fixing (Log 4j for logging) and testing support after the development.
Prepared requirements and research to move the map data using Hadoop framework for future usage.

Environment: Java 1.6.21, J2EE, Oracle 10g, Log4J 1.17, Windows 7 and Red Hat Linux, Sub version, Spring 3.1.0, Icefaces 3, ESRI, Weblogic 10.3.5, Eclipse Juno, Junit 4.8.2, Maven 3.0.3, Hudson 3.0.0 and Sonar 3.0.0

Confidential

Java Developer

Responsibilities:

Involved in Requirements gathering, Requirement analysis, Design, Development, Integration and Deployment.
Involved in Order Placement / Order Processing module.
Responsible for the design and development of the customizations framework
Designed and Developed UI’s using JSP by following MVC architecture.
Developed the application using Struts framework. The views are programmed using JSP pages with the struts tag library, Model is the combination of EJB’s and Java classes and web implementation controllers are Servlets.
Used EJB as a middleware in designing and developing a three-tier distributed application.
The Java Message Service (JMS) API is used to allow application components to create, send, receive, and read messages.
Used JUnit for unit testing of the system and Log4J for logging.
Created and maintained data using Oracle database and used JDBC for database connectivity.
Created and implemented Oracle stored procedures and triggers.
Installed Web Logic Server for handling HTTP Request/Response. The request and response from the client are controlled using Session Tracking in JSP.
Worked on the front-end technologies like HTML, JavaScript, CSS and JSP pages using JSTL tags.
Reported daily about the team progress to the Project Manager and Team Lead.

Environment: Core Java, J2EE 1.3, JSP 1.2, Servlets 2.3, EJB 2.0, Struts 1.1, JNDI 1.2, JDBC 2.1, Oracle 8i, UML, DAO, JMS, XML, Web Logic 7.0, MVC Design Pattern, Eclipse 2.1, Log4j and JUnit.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Piscataway, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship