We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

La, CA

SUMMARY:

  • Hadoop Developer with over 6 years of IT experience in Big Data Eco - System, AWS, ETL and RDBMS related technologies with domain experience in entertainment, Banking, Automobile, Health Care, Retail and Non-profit Organizations.
  • Experience in Development, analysis and design of ETL methodologies in all the phases of Data Warehousing life cycle. 
  • Experience in major Big Data components like Spark, HDFS, Pig, Hive, Sqoop, Oozie, Zookeeper and Hbase.
  • Good Knowledge on distributed systems, HDFS architecture, internal working details of Map Reduce and Spark processing frameworks.
  • Experience in developing Map Reduce jobs in Java for data cleaning, transformations, pre-processing and analysis. Multiple mappers are implemented to handle data from multiple sources.
  • Knowledge on Hadoop daemon functionalities, resource utilizations and dynamic tuning in order to make cluster available and efficient.
  • Written Several Sqoop scripts to load the data directly into HDFS and Hive Tables from different sources.
  • Experience in Hive Partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive Serde’s like REGEX, JSON , PARQUET and Avro .
  • More than one year of hands on experience using Spark framework with Scala. Good exposure to performance tuning hive queries and map-reduce jobs in spark framework. 
  • Experience in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka. 
  • Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers, and Kafka brokers.
  •   Automated for Azure resource creation, query and deployment application POC using AAD ( Azure active directory ) authentication and ARM (Azure resource manager) API.
  • Designed and Developed various analytical reports from multiple data sources by blending data on a single worksheet in Tableau Desktop 
  • Utilized advance features of Tableau software like to link data from different connections together on one dashboard and to filter data in multiple views at once. 
  • Extensively worked on Java persistence layer in application migration to Cassandra  using Spark to load data to and from Cassandra Cluster.
  • Experience in integration of various data sources like Oracle, SQL server and MS access and non-relational sources like flat files into staging area.
  • AWS EC2 and Cloud watch services. CI/CD pipeline management through Jenkins. Automation of manual tasks using Shell scripting. 
  • Used Stateless Session Beans to encapsulate the business logic and developed Web Services for the modules to integrate client's API.
  • Developed core modules in large cross-platform applications using JAVA , J2EE with experience in Java core concepts like OOPS, Multi-threading, Collections and IO. Expertise in developing Spring MVC frameworks.
  • Involved in performing verification and validation tests to ensure that the developed functionality meets with the specifications and specifications meet business needs. 
  • Imported data into Cassandra using pyspark,scala to process the data. 
  • Experience in designing both time driven and data driven automated workflows using Oozie. 
  • Performed unit testing using  MRUnit and JUnit  Testing Framework and  Log4J  to monitor the error log.

TECHNICAL SKILLS:

Big Data Technologies: HDFS, YARN, Map Reduce, Pig, Hive, HBase, ZooKeeper, Oozie, Sqoop, Talend, Impala, Flume, Kafka, Storm and Spark

Cloud Services: Microsoft Azure

NoSQL: Hbase, Cassandra, MongoDB

Programming Languages: Java, Python and Scala

Frameworks: Hibernate, Struts, and Spring

Web Services: REST, SOAP, Tomcat and Web Sphere

Client Technologies: JQUERY, JAVA Script, AJAX, HTML5

Operating Systems: UNIX, Windows, LINUX(Ubuntu)

Web Technologies : JSP, Servelets, Java Scripts, Java Beans

Databases : Oracle 10g/11g, Postgre SQL 4.x/5.x,PostGres

Development Tools : SQL Developer, ANT, Maven, Jenkins

PROFESSIONAL EXPERIENCE:

Confidential, LA, CA

Hadoop Developer

Responsibilities:

  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Zookeeper, Sqoop, flume, Kafka, Spark, Impala, Cassandra with Cloudera.  
  • Identifying churning of a customer in earlier stage s. Performing explanatory data analyses, generate and test working hypotheses, prepare and analyze historical data and identify patterns.
  • Analyzed the customer behaviour on a setup boxes to derive the real time analytical view of a setup box usage of a customer . 
  • Implemented Snappy Compression and Parquet file format for Staging, Computational Optimization.
  • Developed Sqoop scripts to import and export data from RDBMS into HDFS, HIVE and handled incremental loading on the customer and transaction information data dynamically. 
  • Imported the data from different sources like HDFS/HBASE into Spark RDD.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN .
  • Developed Spark CODE using SCALA and Spark-SQL/Streaming for faster testing and processing of data. Developed KAFKA PRODUCER and consumers, HBase clients,Spark and Hadoop MapReduce jobs along with components on HDFS, Hive. 
  • Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
  • Experienced in working with various kinds of data sources such as Teradata and Oracle . Successfully loaded files to HDFS from Teradata, and load loaded from hdfs to hive and impala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python. 
  • Involved in importing the real time data to Hadoop using Kafka  and implemented the Oozie job for daily.
  • Created pipeline for processing structured and unstructured streaming data using spark streaming and stored the filtered data into s3 as parquet files.
  • Used Scala collection Framework to store and process complex device metadata and other related information. 
  • Written Apache Spark streaming API on Big Data distribution in the active cluster environment.
  • Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive. 
  • Designed and Implemented a Microservices container CICD solution within AWS leveraging Jenkins, GitLabs, Docker, Ansible and Kubernetes.
  • Implemented new projects builds framework using Jenkins & maven as build framework tools. 
  • Connecting with various teams to identify data stitching process.
  • Used Oozie workflow engine for managing interdependent Hadoop jobs and to automate several types of Hadoop jobs. 
  • Managed Application development using Agile life cycle development methodologies.

Environment: Hadoop 2.7, Map Reducer, Clouder 5.4, Hive 1.2, Spark 1.6, Spark SQL,Teradata, Flume, Kafka, Sqoop 1.4, Oozie 3.0.3, Python, Java (JDK 1.6), MongoDB, Tableau, AWS, Eclipse

Confidential,DALLAS,TX

Hadoop Developer

Responsibilities:
  • Extensively involved in Design phase and delivered Design documents. Experience in Hadoop eco system with HDFS, HIVE, PIG, SQOOP and Spark   with SCALA.
  • Hands on extracting data from different databases and to copy into HDFS , Hive using Sqoop, Flume and have an expertise in using compression techniques to optimize the data storage.
  • Used different Serdes for converting JSON data into pipe separated data.
  • Wrote Map Reduce jobs using the access tokens to get the data from the customers. Developed simple to complex Map Reduce jobs using Hive and Pig for analyzing the data.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Hands on experience in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Used Pig Latin Scripts and UDF, UDAFS While Analyzing the Unstructured and Semi-Structured Data.
  • Created Hive UDF for Business Requirements for Reusability of Functioning.
  • Migrated 20+ TBs of data from different databases (i.e. Oracle, PostgreSQL ) to Hadoop.
  • Loaded the data into Spark  RDD and do in memory data Computation to generate the Output response.
  • Utilized in-memory processing capability of Apache Spark to process data using Spark SQL, Spark Streaming using PySpark scripts. 
  • Created PySpark scripts to load data from source files to RDDs, create data frames from RDD and perform transformations and aggregations and collect the output of the process.
  • Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala. 
  • Imported the data from different sources like HDFS/Hbase into Spark RDD.
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
  • Used the Spark - Cassandra Connector to load data to and from Cassandra.
  • Developed scripts, UDFs using both Data frames/SQL and RDD/Map Reduce in Spark 1.3 for Data Aggregation, queries and writing data back into OLTP system directly or through Sqoop.
  • Worked on Oozie workflow engine for job scheduling. Involved in Unit testing and delivered Unit test plans and results documents.
  • Hands on experience in Tableau for Data Visualization and analysis on large data sets, drawing various conclusions.

Environment: Hadoop 2.5, Map Reducer, Cloudera 4.6, HDFS, HBase, Hive 0.1.2, Pig, Spark 1.2, Storm, Flume, Kafka, Sqoop, Oozie, Oracle, Scala, Java (JDK 1.6), Hadoop(Cloudera),Tableau.

Confidential, CHICAGO, ILLINOIS

Java\Hadoop Developer                             

Responsibilities:

  • Worked with the Product manager and Team Leader for Gathering the requirement.
  • Worked on Distributed/Cloud Computing Environment ( Map Reduce/Hadoop, HBase, Hive, Pig, Spark, Sqoop, Oozie etc.).
  • Loaded the Customer Logs and Data info into the HDFS by Using the FLUME and Sqoop .
  • Loaded The RDBMS Data from the Oracle, Postgre Sql, and DB2 into The HDFS By using The Sqoop.
  • Created custom Record Reader, Different Partitioning Techniques, Custom sorting and Shuffling Techniques In complex Map Reduce jobs as per the use cases.
  • Implement The Compression Techniques while Loading The Data From RDBMS Into HDFS, Hive For Optimizing The Data Storage.
  • Used The Partition, Dynamic Partitioning, Bucketing Techniques While Creating The hive Tables for Easy analysis of Dynamic Data which is coming from Different Sources.
  • Worked on reading the different Formats, compressed files from HDFS, Hive .
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Designed dynamic client-side JavaScript, codes to build web forms and simulate process for web application, page navigation and form validation.
  • Implemented the associated business modules using Spring MVC, and Hibernate data mapping.
  • Developed Restful services to allow communication between the applications using JAXRS and Jersey framework. Developed OAuth workflows for device user management.
  • Used various Core Java concepts such as Multi-Threading, Exception Handling, Collection APIs to implement various features.
  • Created jobs in continuous integrated build and testing and deployment using Jenkins, Maven .
  • Used Oozie workflow engine to manage inter-dependent Hadoop jobs and to automate several types of Hadoop .

Environment: Hadoop, Map Reducer, HDFS, HBase, Hive, Pig, Flume, Sqoop, Oozie, PostgreSql, Java (jdk 1.6), Java Script, Maven, Spring,Hibernate,Restful.

Confidential

Java Developer

Responsibilities:

  • Prepared High Level and Low Level Design document implementing applicable Design Patterns with UML diagrams to depict components, class level details.
  • Interacting with the system analysts & business users for design & requirement clarification.
  • Involved in developing UI (User Interface) using HTML 5, CSS 3, JSP, JQuery, AJAX, Java Script.
  • Developed Tabbed pages using AJAX with JQuery and JSON for quick view of related content, providing both functionality and ease of access to the user.
  • Designed dynamic client-side JavaScript, codes to build web forms and simulate process for web application, page navigation and form validation.
  • Developed API’s using Spring, Spring MVC, Hibernate, Web Services technolnogies.
  • Used various Core Java concepts such as Multi-Threading, Exception Handling, Collection APIs to implement various features and enhancements.
  • Implemented the associated business modules using Hibernate data mapping.
  • Used Collection Framework features like Map, Object, List to retrieve the data from Web Service, manipulate the data to incorporate Business Logic and save the data to Oracle database.
  • Created jobs in continuous integrated build and testing using Maven.
  • Implemented Test Cases for my application using Junit libraries.
  • Log4j were used to log the various debug, error and informational messages at various levels.
  • Designed, developed and maintained the data layer using the ORM framework in Hibernate.
  • Involved in Analysis, Design, Development, and Production of the Application and develop UML diagrams.

Environment: Java, JDBC, JSP, JBoss, Servlets, Maven, HTML, AngularJS, Mongo DB, Hibernate, JavaScript, Eclipse, Struts, SQL Server2000.

We'd love your feedback!