Hadoop Developer Resume
La, CA
SUMMARY:
- Hadoop Developer with over 6 years of IT experience in Big Data Eco - System, AWS, ETL and RDBMS related technologies with domain experience in entertainment, Banking, Automobile, Health Care, Retail and Non-profit Organizations.
- Experience in Development, analysis and design of ETL methodologies in all the phases of Data Warehousing life cycle.
- Experience in major Big Data components like Spark, HDFS, Pig, Hive, Sqoop, Oozie, Zookeeper and Hbase.
- Good Knowledge on distributed systems, HDFS architecture, internal working details of Map Reduce and Spark processing frameworks.
- Experience in developing Map Reduce jobs in Java for data cleaning, transformations, pre-processing and analysis. Multiple mappers are implemented to handle data from multiple sources.
- Knowledge on Hadoop daemon functionalities, resource utilizations and dynamic tuning in order to make cluster available and efficient.
- Written Several Sqoop scripts to load the data directly into HDFS and Hive Tables from different sources.
- Experience in Hive Partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive Serde’s like REGEX, JSON , PARQUET and Avro .
- More than one year of hands on experience using Spark framework with Scala. Good exposure to performance tuning hive queries and map-reduce jobs in spark framework.
- Experience in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers, and Kafka brokers.
- Automated for Azure resource creation, query and deployment application POC using AAD ( Azure active directory ) authentication and ARM (Azure resource manager) API.
- Designed and Developed various analytical reports from multiple data sources by blending data on a single worksheet in Tableau Desktop
- Utilized advance features of Tableau software like to link data from different connections together on one dashboard and to filter data in multiple views at once.
- Extensively worked on Java persistence layer in application migration to Cassandra using Spark to load data to and from Cassandra Cluster.
- Experience in integration of various data sources like Oracle, SQL server and MS access and non-relational sources like flat files into staging area.
- AWS EC2 and Cloud watch services. CI/CD pipeline management through Jenkins. Automation of manual tasks using Shell scripting.
- Used Stateless Session Beans to encapsulate the business logic and developed Web Services for the modules to integrate client's API.
- Developed core modules in large cross-platform applications using JAVA , J2EE with experience in Java core concepts like OOPS, Multi-threading, Collections and IO. Expertise in developing Spring MVC frameworks.
- Involved in performing verification and validation tests to ensure that the developed functionality meets with the specifications and specifications meet business needs.
- Imported data into Cassandra using pyspark,scala to process the data.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Performed unit testing using MRUnit and JUnit Testing Framework and Log4J to monitor the error log.
TECHNICAL SKILLS:
Big Data Technologies: HDFS, YARN, Map Reduce, Pig, Hive, HBase, ZooKeeper, Oozie, Sqoop, Talend, Impala, Flume, Kafka, Storm and Spark
Cloud Services: Microsoft Azure
NoSQL: Hbase, Cassandra, MongoDB
Programming Languages: Java, Python and Scala
Frameworks: Hibernate, Struts, and Spring
Web Services: REST, SOAP, Tomcat and Web Sphere
Client Technologies: JQUERY, JAVA Script, AJAX, HTML5
Operating Systems: UNIX, Windows, LINUX(Ubuntu)
Web Technologies : JSP, Servelets, Java Scripts, Java Beans
Databases : Oracle 10g/11g, Postgre SQL 4.x/5.x,PostGres
Development Tools : SQL Developer, ANT, Maven, Jenkins
PROFESSIONAL EXPERIENCE:
Confidential, LA, CA
Hadoop Developer
Responsibilities:
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Zookeeper, Sqoop, flume, Kafka, Spark, Impala, Cassandra with Cloudera.
- Identifying churning of a customer in earlier stage s. Performing explanatory data analyses, generate and test working hypotheses, prepare and analyze historical data and identify patterns.
- Analyzed the customer behaviour on a setup boxes to derive the real time analytical view of a setup box usage of a customer .
- Implemented Snappy Compression and Parquet file format for Staging, Computational Optimization.
- Developed Sqoop scripts to import and export data from RDBMS into HDFS, HIVE and handled incremental loading on the customer and transaction information data dynamically.
- Imported the data from different sources like HDFS/HBASE into Spark RDD.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN .
- Developed Spark CODE using SCALA and Spark-SQL/Streaming for faster testing and processing of data. Developed KAFKA PRODUCER and consumers, HBase clients,Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
- Experienced in working with various kinds of data sources such as Teradata and Oracle . Successfully loaded files to HDFS from Teradata, and load loaded from hdfs to hive and impala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
- Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily.
- Created pipeline for processing structured and unstructured streaming data using spark streaming and stored the filtered data into s3 as parquet files.
- Used Scala collection Framework to store and process complex device metadata and other related information.
- Written Apache Spark streaming API on Big Data distribution in the active cluster environment.
- Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
- Designed and Implemented a Microservices container CICD solution within AWS leveraging Jenkins, GitLabs, Docker, Ansible and Kubernetes.
- Implemented new projects builds framework using Jenkins & maven as build framework tools.
- Connecting with various teams to identify data stitching process.
- Used Oozie workflow engine for managing interdependent Hadoop jobs and to automate several types of Hadoop jobs.
- Managed Application development using Agile life cycle development methodologies.
Environment: Hadoop 2.7, Map Reducer, Clouder 5.4, Hive 1.2, Spark 1.6, Spark SQL,Teradata, Flume, Kafka, Sqoop 1.4, Oozie 3.0.3, Python, Java (JDK 1.6), MongoDB, Tableau, AWS, Eclipse
Confidential,DALLAS,TX
Hadoop Developer
Responsibilities:- Extensively involved in Design phase and delivered Design documents. Experience in Hadoop eco system with HDFS, HIVE, PIG, SQOOP and Spark with SCALA.
- Hands on extracting data from different databases and to copy into HDFS , Hive using Sqoop, Flume and have an expertise in using compression techniques to optimize the data storage.
- Used different Serdes for converting JSON data into pipe separated data.
- Wrote Map Reduce jobs using the access tokens to get the data from the customers. Developed simple to complex Map Reduce jobs using Hive and Pig for analyzing the data.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Hands on experience in creating HBase tables to load large sets of semi structured data coming from various sources.
- Used Pig Latin Scripts and UDF, UDAFS While Analyzing the Unstructured and Semi-Structured Data.
- Created Hive UDF for Business Requirements for Reusability of Functioning.
- Migrated 20+ TBs of data from different databases (i.e. Oracle, PostgreSQL ) to Hadoop.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Utilized in-memory processing capability of Apache Spark to process data using Spark SQL, Spark Streaming using PySpark scripts.
- Created PySpark scripts to load data from source files to RDDs, create data frames from RDD and perform transformations and aggregations and collect the output of the process.
- Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
- Imported the data from different sources like HDFS/Hbase into Spark RDD.
- Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
- Used the Spark - Cassandra Connector to load data to and from Cassandra.
- Developed scripts, UDFs using both Data frames/SQL and RDD/Map Reduce in Spark 1.3 for Data Aggregation, queries and writing data back into OLTP system directly or through Sqoop.
- Worked on Oozie workflow engine for job scheduling. Involved in Unit testing and delivered Unit test plans and results documents.
- Hands on experience in Tableau for Data Visualization and analysis on large data sets, drawing various conclusions.
Environment: Hadoop 2.5, Map Reducer, Cloudera 4.6, HDFS, HBase, Hive 0.1.2, Pig, Spark 1.2, Storm, Flume, Kafka, Sqoop, Oozie, Oracle, Scala, Java (JDK 1.6), Hadoop(Cloudera),Tableau.
Confidential, CHICAGO, ILLINOIS
Java\Hadoop Developer
Responsibilities:
- Worked with the Product manager and Team Leader for Gathering the requirement.
- Worked on Distributed/Cloud Computing Environment ( Map Reduce/Hadoop, HBase, Hive, Pig, Spark, Sqoop, Oozie etc.).
- Loaded the Customer Logs and Data info into the HDFS by Using the FLUME and Sqoop .
- Loaded The RDBMS Data from the Oracle, Postgre Sql, and DB2 into The HDFS By using The Sqoop.
- Created custom Record Reader, Different Partitioning Techniques, Custom sorting and Shuffling Techniques In complex Map Reduce jobs as per the use cases.
- Implement The Compression Techniques while Loading The Data From RDBMS Into HDFS, Hive For Optimizing The Data Storage.
- Used The Partition, Dynamic Partitioning, Bucketing Techniques While Creating The hive Tables for Easy analysis of Dynamic Data which is coming from Different Sources.
- Worked on reading the different Formats, compressed files from HDFS, Hive .
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Designed dynamic client-side JavaScript, codes to build web forms and simulate process for web application, page navigation and form validation.
- Implemented the associated business modules using Spring MVC, and Hibernate data mapping.
- Developed Restful services to allow communication between the applications using JAXRS and Jersey framework. Developed OAuth workflows for device user management.
- Used various Core Java concepts such as Multi-Threading, Exception Handling, Collection APIs to implement various features.
- Created jobs in continuous integrated build and testing and deployment using Jenkins, Maven .
- Used Oozie workflow engine to manage inter-dependent Hadoop jobs and to automate several types of Hadoop .
Environment: Hadoop, Map Reducer, HDFS, HBase, Hive, Pig, Flume, Sqoop, Oozie, PostgreSql, Java (jdk 1.6), Java Script, Maven, Spring,Hibernate,Restful.
Confidential
Java Developer
Responsibilities:
- Prepared High Level and Low Level Design document implementing applicable Design Patterns with UML diagrams to depict components, class level details.
- Interacting with the system analysts & business users for design & requirement clarification.
- Involved in developing UI (User Interface) using HTML 5, CSS 3, JSP, JQuery, AJAX, Java Script.
- Developed Tabbed pages using AJAX with JQuery and JSON for quick view of related content, providing both functionality and ease of access to the user.
- Designed dynamic client-side JavaScript, codes to build web forms and simulate process for web application, page navigation and form validation.
- Developed API’s using Spring, Spring MVC, Hibernate, Web Services technolnogies.
- Used various Core Java concepts such as Multi-Threading, Exception Handling, Collection APIs to implement various features and enhancements.
- Implemented the associated business modules using Hibernate data mapping.
- Used Collection Framework features like Map, Object, List to retrieve the data from Web Service, manipulate the data to incorporate Business Logic and save the data to Oracle database.
- Created jobs in continuous integrated build and testing using Maven.
- Implemented Test Cases for my application using Junit libraries.
- Log4j were used to log the various debug, error and informational messages at various levels.
- Designed, developed and maintained the data layer using the ORM framework in Hibernate.
- Involved in Analysis, Design, Development, and Production of the Application and develop UML diagrams.
Environment: Java, JDBC, JSP, JBoss, Servlets, Maven, HTML, AngularJS, Mongo DB, Hibernate, JavaScript, Eclipse, Struts, SQL Server2000.