Hadoop Developer Resume
Piscataway, NJ
SUMMARY
- Over 8 years of experience with emphasis on Big Data technologies, development and design of Java based enterprise applications
- Expertise in the creation of On - prem and Cloud Data Lake
- Experience working with Cloudera, Hortonworks and Pivotal Distributions of Hadoop
- Expertise in HDFS, Mapreduce, Spark, Hive, Impala, Pig, Sqoop, Hbase, Oozie, Flume, Kafka and various other ecosystem components
- Expertise in Spark framework for batch and real time data processing
- Experience in working with BI team and transform big data requirements into Hadoop centric technologies.
- Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
- Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including PIG, HIVE, Sqoop, Oozie, Flume and zookeeper.
- Experience in converting MapReduce applications to Spark.
- Experience in handline messaging services using Apache Kafka.
- Experience in working with flume to load the log data from multiple sources directly into HDFS
- Experience in Data migration from existing data stores and mainframe NDM(Network Data mover) to Hadoop
- Good Knowledge with NoSql Databases - Cassandra, Mongo DB and HBase.
- Experience in handling multiple relational databases: MySQL, SQL Server, PostgeSQL and Oracle.
- Experience in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud. Exporting and importing data into S3.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Experience in supporting analysts by administering and configuring HIVE.
- Experience in running Pig and Hive scripts.
- Experience in fine-tuning Mapreduce jobs for better scalability and performance.
- Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
- Performed Importing and exporting data into HDFS and Hive using Sqoop.
- Experience in writing shell scripts to dump the Sharded data from Landing Zones to HDFS.
- Worked on predictive modeling techniques like Neural Networks, Decision Trees and Regression Analysis.
- Experience in Data mining and Business Intelligence tools such as Tableau, SAS Enterprise Miner, JMP and Enterprise Guide, Confidential SPSS modeler and MicroStratergy.
TECHNICAL SKILLS
Hadoop Ecosystem Development: HDFS, MapReduce, Spark, Hive, Pig, Flume, Oozie, Zookeeper, HBASE, Cassandra, Kafka,Solr, HCatalog, Sqoop.
Operating System: Linux, Windows XP, Server 2003, Server 2008.
Databases: MySQL, Oracle, MS SQL Server, PostgreSQL, MS Access
Languages: C, JAVA, PYTHON, SQL, Pig, UNIX shell scripting
PROFESSIONAL EXPERIENCE
Confidential, Jersey City, NJ
Hadoop Developer
Responsibilities:
- Worked on the creation of business rules in Pig
- Imported data from legacy systems to Hadoop using Sqoop and Apache Camel
- Used Pig for data transformation
- Used Apache Spark for real time and batch processing
- Used Apache Kafka for handling log messages that are handled by multiple systems
- Used shell scripting extensively for data munging
- Worked on HCatalog, which allows PIG and Map Reduce to take advantage of the SerDE data format transformation definitions are already written on HIVE
- Worked on DevOps tools like Chef, Artifactory and Jenkins to configure and maintain the production environment
- Used Pig to transform data into various formats
- Stored processed tables in Cassandra from HDFS for applications to access the data in real time
- Used Solr on Cassandra for implementation of near real-time search
- Worked on writing UDFs in Java for Pig
- Created ORCFile tables from the existing non-ORCFile Hive tables
Environment: Hortonworks Data Platform 2.2, Pig, Hive, Spark, Kafka, Cassandra, Sqoop, Apache Camel, Apache Crunch, HCatalog, Chef, Jenkins, Artifactory, Avro, Confidential Data Studio
Confidential, Piscataway, NJ
Hadoop Developer
Responsibilities:
- Worked on the creation of on-premise and cloud data lake from start with Pivotal distribution
- Imported data from various relational data stores to HDFS using Sqoop
- Collected user activity data, log data using Kafka for real time analytics
- Implemented batch processing using Spark
- Converted Hive tables to HAWQ for higher query performance
- Responsible for loading unstructured and semi-structured data into Hadoop cluster from different data sources using Flume
- Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries
- Used the RegEx, JSON, Parquet and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data
- Implemented Hive and Pig custom UDF’s to achieve comprehensive data analysis
- Used Pig to develop ad-hoc queries
- Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data
- Implemented daily workflow for extraction, processing and analysis of data with Oozie
- Responsible for troubleshooting Spark/MapReduce jobs by reviewing the log files
- Used Tableau for visualizing and to generate reports
Environment: Pivotal HD 2.0, Gemfire XD, MapReduce, Spark, Pig, Hive, Kafka, Sqoop, HBase, Cassandra, Flume, Oozie, Tableau, Aspera, AWS, HCatalog
Confidential, Minneapolis, Minnesota
Hadoop Developer
Responsibilities:
- Imported data from our relational data stores to Hadoop using Sqoop.
- Created various Mapreduce jobs for performing ETL transformations on the transactional and application specific data sources.
- Wrote PIG scripts and executed by using Grunt shell.
- Big data analysis using Pig and User defined functions (UDF).
- Worked on loading tables to Impala for faster retrieval using different file formats.
- Performance tuning of queries in Impala for faster retrieval.
- The system was initially developed using Java. The Java filtering program was restructured to have business rule engine in a jar that can be called from both java and Hadoop.
- Created Reports and Dashboards using structured and unstructured data.
- Upgrade operating system and/or Hadoop distribution as and when new versions released by using Puppet.
- Performed joins, group by and other operations in MapReduce by using Java and PIG.
- Worked on Amazon Web Services (AWS) to complete set of infrastructure and application services that runs virtually everything in the cloud from enterprise applications and big data project.
- Processed the output from PIG, Hive and formatted it before sending to the Hadoop output file.
- Used HIVE definition to map the output file to tables.
- Setup and benchmarked Hadoop/HBase clusters for internal use
- Wrote data ingesters and map reduce programs
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance;
- Wrote MapReduce/HBase jobs
- Worked with HBASE NOSQL database.
Environment: Hadoop, Java 1.5, UNIX, Shell Scripting, XML, HDFS, HBase, NOSQL, MapReduce, Hive, Impala, PIG.
Confidential, Bluebell, PA
Hadoop Consultant
Responsibilities:
- Responsible for installing and configuring Hadoop MapReduce, HDFS, also developed various MapReduce jobs for data cleaning
- Installed and configured Hive to create tables for the unstructured data in HDFS
- Hold good expertise on major components in Hadoop Ecosystem including Hive, PIG, HBase, HBase-Hive Integration, Sqoop and Flume.
- Involved in loading data from UNIX file system to HDFS
- Responsible for managing and scheduling jobs on Hadoop Cluster
- Responsible for importing and exporting data into HDFS and Hive using Sqoop
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data
- Experienced in managing Hadoop log files
- Worked on managing data coming from different sources
- Wrote HQL queries to create tables and loaded data from HDFS to make it structured
- Load and transform large sets of structured, semi structured and unstructured data
- Extensively worked on Hive for generating transforming files from different analytical formats to .txt i.e. text files enabling to view the data for further analysis
- Created Hive tables, loaded them with data and wrote hive queries that run internally in MapReduce way
- Wrote and modified store procedures enabling to load and modify data according to the project requirements
- Responsible for developing PIG Latin scripts enabling the extraction of data from the web server output files to load into HDFS
- Extensively used Flume to collect the log files from the web servers and then integrated these files into HDFS
- Responsible for implementing schedulers on Job Tracker enabling them to effectively use the resources available in the cluster for any given MapReduce jobs.
- Constantly worked on tuning the performance of the queries in Hive and Pig, making the queries work even more powerfully in processing and retrieving the data
- Supported Map Reduce Programs running on the cluster
- Created external tables in Hive and loaded the data into these tables
- Hands on experience in database performance tuning and data modeling
- Monitored the cluster coordination using ZooKeeper
Environment: Hadoop, HDFS, MapReduce, HortonWorks, Hive, Java (jdk1.6), DataStax, Flat files, UNIX Shell Scripting, Oracle 11g 10g, PL SQL, SQL*PLUS, Toad 9.6, Windows NT.
Confidential, Pittsburgh, PA
Sr. Java Developer
Responsibilities:
- Developed detail design document based on design discussions.
- Involved in designing the database tables and java classes used in the application.
- Involved in development, Unit testing and system integration testing of the travel network builder side of application.
- Involved in design, development and building the travel network file system to be stored in NAS drives.
- Setup Linux environment for to interact with route smart library (.so) file and NAS drive file operations using JNI.
- Implemented and configure Hudson as Continuous Integration server and Sonar for maintaining code and remove redundant code.
- Worked with Route-smart C++ code to interact with Java application using SWIG and Java Native interfaces.
- Developed the user interface for requesting a travel network build using JSP and Servlets.
- Build business logic to users can specify which version of the travel network files to be used for the solve process.
- Used Spring Data Access Object to access the data with data source.
- Build an independent property sub-system to ensure that the request always picks the latest set of properties.
- Implemented thread Monitor system to monitor threads. Used JUnit to do the Unit testing around the development modules.
- Wrote SQL queries and procedures for the application, interacted with third party ESRI functions to retrieve map data.
- Building and Deployment of JAR, WAR, EAR files on dev, QA servers.
- Bug fixing (Log 4j for logging) and testing support after the development.
- Prepared requirements and research to move the map data using Hadoop framework for future usage.
Environment: Java 1.6.21, J2EE, Oracle 10g, Log4J 1.17, Windows 7 and Red Hat Linux, Sub version, Spring 3.1.0, Icefaces 3, ESRI, Weblogic 10.3.5, Eclipse Juno, Junit 4.8.2, Maven 3.0.3, Hudson 3.0.0 and Sonar 3.0.0
Confidential
Java Developer
Responsibilities:
- Involved in Requirements gathering, Requirement analysis, Design, Development, Integration and Deployment.
- Involved in Order Placement / Order Processing module.
- Responsible for the design and development of the customizations framework
- Designed and Developed UI’s using JSP by following MVC architecture.
- Developed the application using Struts framework. The views are programmed using JSP pages with the struts tag library, Model is the combination of EJB’s and Java classes and web implementation controllers are Servlets.
- Used EJB as a middleware in designing and developing a three-tier distributed application.
- The Java Message Service (JMS) API is used to allow application components to create, send, receive, and read messages.
- Used JUnit for unit testing of the system and Log4J for logging.
- Created and maintained data using Oracle database and used JDBC for database connectivity.
- Created and implemented Oracle stored procedures and triggers.
- Installed Web Logic Server for handling HTTP Request/Response. The request and response from the client are controlled using Session Tracking in JSP.
- Worked on the front-end technologies like HTML, JavaScript, CSS and JSP pages using JSTL tags.
- Reported daily about the team progress to the Project Manager and Team Lead.
Environment: Core Java, J2EE 1.3, JSP 1.2, Servlets 2.3, EJB 2.0, Struts 1.1, JNDI 1.2, JDBC 2.1, Oracle 8i, UML, DAO, JMS, XML, Web Logic 7.0, MVC Design Pattern, Eclipse 2.1, Log4j and JUnit.