We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Columbus, OhiO

SUMMARY

  • Over 5+ years of experience in Information Technology which includes in Bigdata and HADOOP Ecosystem. In - depth knowledge and hands-on experience in dealing with Apache Hadoop components like HDFS, MapReduce, HiveQL, HBase, Pig, Hive, Sqoop, Oozie, Cassandra, Flume, and Spark.
  • Experience in building Pig scripts to extract, transform and load data onto HDFS for processing. Excellent knowledge of data mapping, extract, transform and load from different data source. Experience in writing HiveQL queries to store processed data into Hive tables for analysis.
  • Excellent understanding and knowledge of NOSQL databases like HBase and Cassandra.
  • Expertise in database design, creation and management of schemas, writing Stored Procedures, Functions, DDL, DML, SQL queries & Modeling.
  • Well versed experience in Amazon Web Services (AWS) Cloud services like EC2, S3.
  • Very good understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary Namenode, and MapReduce concepts
  • Extensively worked on MRV1 and MRV2 Hadoop architectures.
  • Hands on experience in writing MapReduce programs, Pig & Hive scripts.
  • Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets
  • Extending Hive and Pig core functionality by writing custom UDFs
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
  • Experience working on Application servers like IBM WebSphere, JBoss, BEA WebLogic and Apache Tomcat.
  • Extensively used Kafka to load the log data from multiple sources directly into HDFS. Knowledge on RabbitMQ. Loaded streaming log data from various webservers into HDFS using Flume.
  • Proficient in using RDMS concepts with Oracle, SQL Server and MySQL.
  • Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Worked extensively with CDH3, CDH4.
  • Skilled in leadership, self-motivated and ability to work in a team effectively. Possess excellent communication and analytical skills along with a can-do attitude.
  • Strong work ethics with desire to succeed and make significant contributions to the organization. Experience in processing different file formats like XML, JSON and sequence file formats.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Strong knowledge on implementation of SPARK core - SPARK SQL, MLlib, GraphX and Spark streaming.
  • Expert in deploying the code trough web application servers like Web Sphere/Web Logic/ Apache Tomcat in AWS CLOUD.
  • Good Experience in creating Business Intelligence solutions and designing ETL workflows using Tableau.
  • Experience with Numpy, Matplotlib, Pandas, Seaborn, Plotly, and Cufflinks python libraries.
  • Worked on large datasets by using Pyspark, numpy and pandas.
  • Good Experience in Agile Engineering practices, Scrum methodologies, and Test Driven Development and Waterfall methodologies.
  • Hands-on Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology.
  • Exposure to Java development projects.
  • Hands on experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries, using Oracle, DB2 and MySQL.
  • Good working experience on different OS like UNIX/Linux, Apple Mac OS-X Windows.
  • Experience working both independently and collaboratively to solve problems and deliver high quality results in a fast-paced, unstructured environment.
  • Exhibited strong written and oral communication skills. Rapidly learn and adapt quickly to emerging new technologies and paradigms.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Oozie, Hadoop distribution, and HBase, Spark

Programming Languages: Java (5, 6, 7), Python, Scala

Databases/RDBMS: MySQL, SQL/PL-SQL, MS-SQL Server 2005, Oracle 9i/10g/11g

Scripting/Web Languages: JavaScript, HTML5, CSS3, XML, SQL, Shell

ETL Tools: Cassandra, HBASE, ELASTIC SEARCH

Operating Systems: Linux, Windows XP/7/8

Software Life Cycles: SDLC, Waterfall and Agile models

Office Tools: MS-Office, MS-Project and Risk Analysis tools, Visio

Utilities/Tools: Eclipse, Tomcat, NetBeans, JUnit, SQL, SVN, Log4j, SOAP UI, ANT, Maven, Automation and MR-Unit

Cloud Platforms: Amazon EC2

Big Data Technologies: Hadoop, HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Oozie, Hadoop distribution, and HBase, Spark

PROFESSIONAL EXPERIENCE

Confidential, Columbus, Ohio

Senior Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and Spark
  • Involved in loading data from LINUX file system, servers, Java web services using Kafka Producers, partitions.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
  • Implemented Storm topologies to pre-process data before moving into HDFS system.
  • Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
  • Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
  • Migrated complex MapReduce programs into Spark RDD transformations, actions.
  • Implemented SparkRDD transformations to map business analysis and apply actions on top of transformations.
  • Developed a process for the Batch ingestion of CSV Files, Sqoop from different sources and also generating views on the data source using Shell Scripting and Python.
  • Integrated a shell script to create Collections/morphine, SolrIndexes on top of table directories using MapReduce Indexer Tool within Batch Ingestion Framework.
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
  • Involved in creating Hive tables, loading with data and writing hive queries which runs internally in MapReduce way.
  • Developed the MapReduce programs to parse the raw data and store the pre-Aggregated data in the partitioned tables.
  • Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
  • Experienced in developing custom input formats and data types to parse and process unstructured and semi structured input data and mapped them into key value pairs to implement business logic in MapReduce.
  • Involved in using HCATALOG to access Hive table metadata for MapReduce code
  • Experience in implementing custom sterilizer, interceptor, source and sink as per the requirement in flume to ingest data from multiple sources.
  • Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Evaluated usage of Oozie for Workflow Orchestration.
  • Converted unstructured data to structured data by writing Spark code.
  • Indexed documents using Apache Solr.
  • Set up Solr Clouds for distributing indexing and search.
  • Automation of all the jobs starting from pulling the Data from different Data Sources like MySQL and pushing the result dataset to Hadoop Distributed File System and running MR, PIG, and Hive jobs using Kettle and Oozie (Work Flow management)
  • Worked on No-SQL databases like Cassandra, MongoDB for POC purpose in storing images and URIs.
  • Integrating bulk data into Cassandra file system using MapReduce programs.
  • Used Talend ETL tool to develop multiple jobs and in setting workflows.
  • Created Talend jobs to copy the files from one server to another and utilized Talend FTP components
  • Worked on MongoDB for distributed storage and processing.
  • Designed and implemented Cassandra and associated RESTful web service.
  • Created partitioned tables in Hive, mentored analyst and SQA team for writing Hive Queries.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Involved in cluster setup, monitoring, test benchmarks for results.
  • Involved in build/deploy applications using Maven and integrated with CI/CD server Jenkins.
  • Involved in agile methodologies, daily scrum meetings, Spring planning's.
  • Handling All Azure Management Tools on Daily basis
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Administering and Maintaining Cloudera Hadoop Clusters Provision physical Linux systems, patch, and maintain them.
  • Utilized data fabrics to Configure data fabric which provides seamless, real-time integration and access across the multiple data silos of a big data system
  • Enable the processing, management, storage and analysis of data using data fabric.
  • Use data mesh to make predictions future sales and predictions of the company Leverage the data and utilized machine learning algorithm.

Environment: Hadoop, Confluent Kafka,, Apache Cassandra, Horton works HDF, HDP, NIFI, Apache Hadoop, Linux, Splunk, Yarn, Cloudera 5.13, Spark, Tableau, Microsoft Azure, Data fabric, Data Mesh.

Confidential, Milwaukee WI

Hadoop Developer

Responsibilities:

  • Developed Pyspark code to read data from Hive, group the fields and generate XML files. Enhanced the Pyspark code to write the generated XML files to a directory to zip them to CD
  • Implemented REST call to submit the generated CDAs to vendor website Implemented Impyla to support JDBC/ODBC connections for Hiveserver2
  • Enhanced the Pyspark code to replace spark with Impyla. Performed installation for Impyla on the Edge node
  • Evaluated performance of Spark application by testing on cluster deployment mode vs local mode
  • Experimented submissions with Test OIDs to the vendor website
  • Explored StreamSet Data collector Implemented StreamSets data collector tool for ingestion into Hadoop.
  • Created a StreamSet pipeline to parse the file in XML format and convert to a format that is fed to Solr
  • Built a data validation dashboard in Solr to be able to display the message record. Wrote shell script to run Sqoop job for bulk data ingestion from Oracle into Hive
  • Created tables for the ingested data in Hive. Scheduled Oozie job for data ingestion for the Sqoop job
  • Worked with JSON file format for StreamSets. Worked with Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs.
  • Shell scripts to dump the data from MySQL to HDFS.
  • Worked on Salt Stack automation tools. Helped teams working with batch-processing and tools in Hadoop technology stack (Map Reduce, Yarm, Pig, Hive, HDFS)
  • Analyzing of large volumes of structured data using SparkSQL.
  • Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
  • Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis
  • Worked on Maven 3.3.9 for building and managing Java based projects. Hands-on experience with using Linux and HDFS shell commands. Worked on Kafka for message queuing solutions.
  • Developing Unit Test Cases for Mapper, Reducer and Driver classes using MRUNIT.
  • Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
  • Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra
  • Written HBASE Client program in Java and web services.

Environment: Hadoop, Azure, AWS, HDFS, Hive, Hue, Oozie, Java, Linux, Cassandra, Python, Open TSDB

Confidential, Boston, MA

Big Data/Hadoop Engineer

Responsibilities:

  • Written HBASE Client program in Java and web services.
  • Developed simple to complex MapReduce streaming jobs using Java language for processing and validating the data.
  • Developed data pipeline using MapReduce, Flume, Sqoop and Pig to ingest customer behavioral data into HDFS for analysis.
  • Developed MapReduce and Spark jobs to discover trends in data usage by users.
  • Implemented Spark using Python and Spark SQL for faster processing of data.
  • Implemented algorithms for real time analysis in Spark
  • Integrated Kafka-Spark streaming for high efficiency throughput and reliability
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Used the Spark -Cassandra Connector to load data to and from Cassandra.
  • Real time streaming the data using Spark with Kafka.
  • Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, MapReduce and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed Pig Latin scripts to perform Map Reduce jobs.
  • Developed product profiles using Pig and commodity UDFs.
  • Developed Hive scripts in HiveQL to De-Normalize and Aggregate the data.
  • Created HBase tables and column families to store the user event data.
  • Written automated HBase test cases for data quality checks using HBase command line tools.
  • Created UDF's to store specialized data structures in HBase and Cassandra.
  • Scheduled and executed work flows in Oozie to run Hive and Pig jobs.
  • Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra.
  • Used Tez framework for building high performance jobs in Pig and Hive.
  • Configured Kafka to read and write messages from external programs.
  • Configured Kafka to handle real time data.
  • Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase
  • Written Storm topology to emit data into Cassandra DB.
  • Written Storm topology to accept data from Kafka producer and process the data
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Used JUnit framework to perform Unit testing of the application
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.
  • Performed data validation on the data ingested using MapReduce by building a custom model to filter all the invalid data and cleanse the data.
  • Experience with data wrangling and creating workable datasets.

Environment: Hadoop, MapReduce, Spark, Pig, Hive, Sqoop, Oozie, HBase, Zookeeper, Kafka, Spark streaming, Flume, Solr, Storm, Tez, Impala, Mahout, Cassandra, Cloudera manager, MySQL, Jaspersoft, Multi-node cluster with Linux-Ubuntu, Windows, Unix.

Confidential

Java Developer

Responsibilities:

  • Worked with the front-end applications using HTML, CSS and Java Script
  • Responsible for developing various modules, front-end and back-end components using several design patterns based on client's business requirements.
  • Designed and Developed application modules using spring and Hibernate frameworks.
  • Designed and developed the front-end with Swings and Spring MVC framework, Tag libraries and Custom Tag Libraries and development of Presentation Tier using JSP pages integrating AJAX, Custom Tag's, JSP Tag Lists, HTML, JavaScript and JQuery.
  • Used Hibernate to develop persistent classes following ORM principles.
  • Deployed spring configuration files such as application context, application resources and application files.
  • Used Java-J2EE patterns like Model View Controller (MVC), Business Delegate, Session façade, Service Locator, Data Transfer Objects, Data Access Objects, Singleton and factory patterns.
  • Used JUnit for Testing Java Classes.
  • Used Waterfall methodology.
  • Worked with Maven for build scripts and Setup the Log4J Logging framework.
  • Involved in the Integration of the Application with other services.
  • Managing the version control for the deliverables by streamlining and re-basing the development streams of the SVN.

Environment: Java/JDK, J2EE, spring 2.5, Spring MVC, Hibernate, Eclipse, Tomcat, XML, JSTL, JavaScript, Maven2, Web Services, JQuery, SVN, JUnit, Log4J, Windows, Oracle.

We'd love your feedback!