Hadoop Developer Resume
College Park, MD
SUMMARY:
- Senior Software Developer with experience in the Technologies including/ Hadoop / HDFS / MapReduce / Impala / Hive / Pig / HBase / Apache Spark /Flume / Zookeeper / Kafka / Sqoop / Eclipse / Agile / Python / Scala / Java / J2EE / spring / Hibernate / PL/SQL / DB2 / MySQL / Linux.
- Around 8+ years of professional IT experience in Big Data ecosystem technologies and Java related technologies.
- Strong technical, administration, and mentoring knowledge in Linux and Big Data/Hadoop technologies.
- Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, Hbase, Zookeeper, Sqoop, Oozie, Cassandra, Flume and Avro.
- Solid understanding of RDD operations in Apache Spark i.e., Transformations &Actions, Persistence (Caching), Accumulators, Broadcast Variables, Optimizing Broadcasts.
- In depth understanding of Apache spark job execution Components like DAG, lineage graph, Dag Scheduler, Task scheduler, Stages and task.
- Experience in exposing Apache Spark as web services.
- Good understanding of Driver, Executor Spark web UI.
- Experience in submitting Apache Spark job and map reduce jobs to YARN.
- Experience in real time processing using Apache Spark and Kafka.
- Migrated Python Machine learning modules to scalable, high performance and fault - tolerant distributed systems like Apache Spark.
- Strong experience in Spark SQL UDFs, Hive UDFs, Spark SQL Performance, Performance Tuning. Hands on experience in working with input file formats like orc, parquet, json, avro.
- Good expertise in coding in Python, Scala and Java.
- Experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Experience in working with the Columnar NoSQL Database like HBase
- Extensive experience in writing Pig and Hive scripts for processing and analyzing large volumes of structured data.
- Experience in writing MapReduce programs and using Apache Hadoop API for analyzing the data.
- Expertise in Spark framework for batch and real-time data processing
- Experience in converting MapReduce applications to Spark
- Worked on open source distributed processing framework Apache Spark to achieve near real time query processing and iterative processing
- Working knowledge in Spark Streaming and Spark SQL.
- Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
- Good experience on RDBMS technologies like Oracle, SQL server and My SQL
- Expertise in Core Java, data structures, algorithms, Object Oriented Design (OOD) and Java concepts such as OOP Concepts, Collections Framework, Exception Handling and I/O System.
- Hands-on experience in J2EE technologies such as Servlets, JSP, EJB, JDBC and developing Web Services providers and consumers using SOAP, REST.
- Hands-on experience in developing web applications using MVC (Model View Controller) architecture including Spring MVC, Struts, and Servlets.
- Used Agile Development Methodology and Scrum for the development process
- Experience working both independently and collaboratively to solve problems and deliver high quality results in a fast-paced, unstructured environment.
- Good analytical, communication, problem solving and interpersonal skills.
TECHNICAL SKILLS:
Big Data Frameworks: Hadoop, Spark, Scala, Hive, Kafka, AWS, Cassandra, HBase, Flume, Pig, Sqoop, Map Reduce, Cloudera, Mongo DB, Apache Zeppelin
Big data distribution: Cloudera, Amazon EMR
Programming languages: Core Java, Scala, Python, SQL, Shell Scripting
Operating Systems: Windows, Linux (Ubuntu)
Databases: Oracle, SQL Server
Designing Tools: Eclipse
Java Technologies: JSP, Servlets, Junit, Spring, Hibernate
Web Technologies: XML, HTML, JavaScript, JVM, JQuery, JSON
Linux Experience: System Administration Tools, Puppet, Apache
Web Services: Web Service (RESTfuland SOAP)
Frame Works: Jakarta Struts 1.x, Spring 2.x
Development methodologies: Agile, Waterfall
Logging Tools: Log4j
Application / Web Servers: Cherrypy, Apache Tomcat, WebSphere
Messaging Services: ActiveMQ, Kafka, JMS
Version Tools: Git, SVN and CVS
Analytics: Tableau, SPSS, SAS EM and SAS JMP
PROFESSIONAL EXPERIENCE:
Confidential, College Park, MD
Hadoop Developer
Responsibilities:
- Migrated the existing data from Oracle to HDFS using Sqoop for processing the data.
- Loaded the recent transactions data from Oracle to HBase
- Used Apache Avro to de-serialize data from compact binary format to Json format
- Designed HBase RowKey and column family structure
- Combined & Processed data coming from HBase and HDFS using Spark Streaming according to the requests of BI Team
- Used Apache Spark and Scala language to find patients with similar symptoms in the past and medications used for them to achieve best results.
- Log data from webservers across the environments is pushed into associated Kafka topic partitions, SparkSQL is used to calculate the most prevalent diseases in each city from this data.
- Created multiple scripts in Pig Latin or Hive to perform MapReduce jobs for data transformation and cleaning
- Analysis of Web logs using Hadoop tools for operational and security related activities.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Involved in loading data from UNIX file system to HDFS.
- Involved in creating Hive tables, loading the data using it and in writing Hive queries to analyze the data.
- Development and ETL Design in Hadoop
- Developed ETL processes for data warehouse and/or Hadoop environment
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores, NOSQL, Hadoop, PIG, MySQL and Oracle databases
- Experience in using Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
- Optimized MapReduce jobs to use HDFS efficiently by using Gzip, LZO, Snappy compression techniques.
- Imported and exported data into HDFS and Hive using Sqoop.
- Experience in loading and transforming huge sets of structured, semi structured and unstructured data.
- Worked on different file formats like XML files, Sequence files, JSON, CSV and Map files using Map Reduce Programs.
- Created Qliksense and Apache Zeppelin Dashboards of Oozie and Falcon data ingestion jobs for efficient monitoring.
- Established relationship between various data and processing elements on a Hadoop environment using Apache Falcon
- Used Apache Falcon framework to simplify data pipeline processing and management on Hadoop clusters
- Continuously monitored and managed Hadoop cluster using Cloudera Manager.
- Performed POC’s using latest technologies like spark, Kafka, Scala.
- Responsible for loading data from Teradata database into a Hadoop Hive data warehousing layer, and performing data transformations using Hive
- Wrote custom MapReduce codes, generated JAR files for user defined functions and integrated with Hive to help the analysis team with the statistical analysis.
- Worked extensively on creating Oozie workflows for scheduling different jobs of hive, map reduce and shell scripts.
- Developed continuous flow of data into HDFS from social feeds using Apache Storm Spouts and Bolts.
- Worked on migrating tables in SQL to Hive using Sqoop.
- Implemented Kafka/RabbitMQ messaging services to stream large data and insert into database.
- Developed Map reduce programs for Third Party files to analyze data i.e Funds sold by a parent company to subsequent chain of companies.
- Used SparkSQL to query data from Db2, Oracle using the respective connectors available.
- Involved in data ingestion into HDFS using Sqoop and Flume from variety of sources.
Environment: Cloudera 5.x Hadoop, Linux, IBM DB2, HDFS, Yarn, Impala, Pig, Hive, Sqoop, Spark, Scala, Hbase, MapReduce, Hadoop Datalake, Informatica BDM 10, Apache Falco, Apache Zeppelin
Confidential, Little rock, AR
Senior Hadoop Developer
Responsibilities:
- Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Gathering the requirements from the business and performing the requirement analysis on various Data points business are interested to view as part of their output
- Support all business areas of ADAC with critical data analysis that helps team members make profitable decisions as a forecast expert and business analyst and utilize tools for business optimization and analytics.
- Experience and talents to be a part of ground breaking thinking and visionary goals. As an Executive Analytics, we take the lead to Delivery analyses/ ad-hoc reports including data extraction and summarization using big data tool set.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Involved in design decision to pick appropriate Maps and Reducers to implement the algorithms.
- Developed data preparation logic to pull daily sales data using Sqoop and transformation through Hive QL and MapReduce Jobs
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Configured Apache Sentry server to provide authentication to various users and provide authorization for accessing services that they were configured to use.
- Used Pig as ETL tool to do transformations, joins and pre-aggregations before loading data onto HDFS.
- Involved in designing tables in Hive and developing Hive Query using HQL for batch analysis to monitor Key Performance Indicator.
- Scheduling batch jobs in Oozie.
- Implemented JDBC call to Hive Warehouse and batch framework to invoke the MapReduce Jobs on Scheduled basis
- Reviewing peer table creation in Hive, data loading and queries.
- Involved in analyzing system failures, identifying root causes and recommended course of actions.
- Developed Scala programs to perform data scrubbing for unstructured data
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Exported the result set from Hive to Oracle Db using Sqoop after processing the data.
- Assisted in designing, building, and maintaining database to analyze life cycle of claim processing and transactions.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Monitored System health and logs and responded accordingly to any warning or failure conditions through the Cloudera Manager.
- Job Scheduling using Oozie and tracking progress.
- Worked extensively in creating map Reduce to power data for search and aggregation.
- Wrote MapReduce Programs for distinct types of input formats like JSON, XML and CSV formats.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior
Environment: Hortornworks 2.2 Hadoop, Linux, HDFS, Yarn, Pig, Hive, Sqoop, Spark, Scala, Flume, MapReduce, Oracle DB, Java, Apache Zeppelin
Confidential, Mayfield Village, OH
Hadoop Developer
Responsibilities:
- Used Kafka as the messaging system to collect data sent by the sensor in the cars.
- Implemented Big Data platforms using Cloudera CDH4 as data storage, retrieval and processing systems.
- Configured Apache Sentry server to provide authentication to various users and provide authorization for accessing services that they were configured to use.
- Worked on Kafka while dealing with raw data, by transforming into new Kafka topics for further consumption.
- Installed and configured Hortonworks Sandbox as part of POC involving Kafka-Storm-HDFS data flow.
- Log data from webservers across the environments is pushed into associated Kafka topic partitions, SparkSQL is used to calculate the most prevalent diseases in each city from this data.
- Wrote Java code to de-serialize data from protocol buffer format to Json
- Loaded unstructured and semi-structured data into Hadoop cluster coming from different sources using Flume.
- Used Sqoop to import data from Oracle to HDFS.
- Designed HBase RowKey and column family structure
- Used the Hbase Java API to migrate data between the HDFS and HBase
- Wrote the MapReduce programs to process the data according to requests by BI Teams
- Wrote Hive custom UDF to analyze data by given schema.
- Worked with BI teams and developed Pig scripts for ad hoc queries.
- Implemented Virtualization of data sources using Spark by connecting to DB2, Oracle using Spark connectors.
- Used SparkSQL to query data from Db2, Oracle using the respective connectors available.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala
- Analyzed various RDDS using Scala, Python with Spark.
- Worked on the conversion of existing MapReduce batch applications to Spark for better performance.
- Worked on different file formats (ORCFILE, RCFILE, SEQUENCEFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO)
- Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data
- Implemented daily workflow for extraction, processing and analysis of data with Oozie
- Responsible for troubleshooting Spark/MapReduce jobs by reviewing the log files
- Wrote PIG scripts and executed by using Grunt shell.
- Big data analysis using Pig and User defined functions (UDF)
- Worked on the conversion of existing MapReduce batch applications to Spark for better performance
Environment: Apache Hadoop 2.x, MapReduce, HDFS, Hive, Pig, Sqoop, Oozie, Kafka, Oracle 11g, Linux, Java 7, Eclipse, Apache Zeppelin
Confidential, Bentonville, AR
Hadoop Developer
Responsibilities:
- Importing Large Data Sets from DB2 to Hive Table using Sqoop.
- Created Hive Managed and External Tables as per the requirements
- Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
- Installed and configured Hive and also written Hive UDFs in java and python
- Integrated the hive warehouse with HBase
- Migrating the needed data from MySQL into HDFS using Sqoop and importing various formats of flat files into HDFS.
- Defined workflows using Oozie.
- Used Hive to create partitions on hive tables and analyzes this data to compute various metrics for reporting.
- Created Data model for Hive tables
- Load the data into HBase tables for UI web application
- Designing and developing tables in HBase and storing aggregating data from Hive
- Developing Hive Scripts for data aggregating and processing as per the Use Case.
- Writing Java Custom UDF's for processing data in Hive.
- Developing and maintaining Workflow Scheduling Jobs in Oozie for importing data from RDBMS to Hive.
- The Hive tables created as per requirement were managed or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Implemented Partitioning, Bucketing in Hive for better organization of the data
- Optimized Hive queries for performance tuning.
- Involved with the team of fetching live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka.
- Deployment of Qliksense and Zeppelin Dashboard for Hadoop data ingestion project at HPE (Ambari, mysql, Qliksense hub, Apache Zeppelin):
- As a part of data migration project from Enterprise Data warehouse to Hortonworks Hadoop data lake, created Dashboard reports for Oozie and Falcon jobs on Qliksense hub and Apache Zeppelin notebook.
- Predictive Analytics on performance metrics for Big data platforms at HPE (Python, R, Apache Spark, Zeppelin, Qlik):
- Written python and R code for predictive analysis on performance metrics of big data platforms using ARIMA modeling on
- Apache Zeppelin and Apache Spark. The project was focused to predict the performance metrics in the future to see the future trend and take corrective actions.
- Supported Map Reduce Programs those are running on the cluster
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
- Experienced in managing and reviewing application log files.
- Ingest the application logs into HDFS and processes the logs using map reduce jobs
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
Environment: Hadoop v2.6.0, HDFS, CDH 5.3.x, Map Reduce, HBase, Sqoop, Core Java, Hive, Oozie DB, Spark Streaming and Apache Kafka
Confidential, Eden Prairie, MN
Java Developer
Responsibilities:
- Involved in development, testing and maintenance process of the application
- Used Spring MVC framework to implement the MVC architecture.
- Developed Stored Procedures, Triggers and Functions in Oracle.
- Developed spring services, DAO's and performed object relation mappings using Hibernate.
- Involved in understanding the business processes and defining the requirements.
- Involved in designing, developing and deploying reports in MS SQL Server environment using SSRS-2008 and SSIS in Business Intelligence Development Studio (BIDS).
- Used ETL (SSIS) to develop jobs for extracting, cleaning, transforming and loading data into data warehouse.
- Worked with Cassandra Query Language (CQL) to execute queries on the data persisting in the Cassandra cluster.
- Developed database objects in SQL Server 2005 and used SQL to interact with the database during to troubleshoot the issues.
- Updated and saved the required data in the DB2 database using JDBC, corresponding to actions performed in the struts class.
- Involved in bug fixing and resolving issues with the QA.
- Developed SQL scripts to store data validation rules in Oracle database.
- Build test cases and performed unit testing.
- Logging done using Log4j.
- Used CVS for version control.
- Responsible for development of Business Services.
- Developed Business Rules for the project using Java.
- Developed portal screens using JSP, Servlets, and Struts framework.
- Developed the test plans and involved in testing the application.
- Implementing the Design Patterns like MVC-2, Front Controller, Composite view and all Struts framework design patterns to improve the performance.
- Re-engineered OMT Wholesale Internet Service Engine (WISE) using an “n” tiered architecture involving latest technologies like EJB, CORBA, XML and JAVA.
- Involved in Java application testing and maintenance in development and production.
- Involved in developing the customer form data tables. Maintaining the customer support and customer data from database tables in MySQL database.
- Involved in mentoring specific projects in application of the new SDLC based on the Agile Unified Process, especially from the project management, requirements and architecture perspectives.
- Designed and developed Views, Model and Controller components implementing MVC Framework
Environment: JAVA 1.6, J2EE1.6, Servlets, JDBC, Spring, Hibernate3.0, JSTL, JSP2, JMS, Oracle10g, Web Services, SOAP, Restful, Maven, Apache AXIS, SOAP UI, XML1.0, JAXB2.1, JAXP, HTML, JavaScript, CSS3, AJAX, JUnit, Eclipse, WebLogic10.3, SVN, Shell Script
Confidential, Herndon, VA
Java/J2EE Programmer
Responsibilities:
- Full life cycle experience including requirements analysis, high level design, detailed design, UMLs, data model design, coding, testing and creation of functional and technical design documentation.
- Used Spring Framework for MVC architecture with Hibernate to implement DAO code and also used Web Services to interact other modules and integration testing.
- Developed and implemented GUI functionality using JSP, JSTL, Tiles and AJAX.
- Designed database and involved in developing SQL Scripts.
- Used SQL navigator as a tool to interact with DB Oracle 10g.
- Developed portal screens using JSP, Servlets, and Struts framework.
- Developed the test plans and involved in testing the application.
- Using RUP and Rational Rose, developed Use Cases, created Class, Sequence and UML diagrams.
- Application Modeling, developing Class diagrams, Sequence Diagrams, Architecture / Deployment diagrams using IBM Rational Software Modeler and publishing them to web perspective with Java Doc.
- Participation did in Design Review sessions for development / implementation discussions.
- Designed & coded Presentation (GUI) JSP’s with Struts tag libraries for Creating Product Servic Components (Health Care Codes) using RAD.
- Developing Test Cases and unit testing using JUnit
- Extensive use of AJAX and JavaScript for front-end validations, and JavaScript based component development using EXT JS Framework with cross browser support.
- Appropriate use of Session handling, data Scope levels within the application.
- Designed and developed DAO layer with Hibernate3.0 standards, to access data from IBM DB2 database through JPA (Java Persistence API) layer creating Object-Relational Mappings and writing PL/SQL procedures and functions
- Integrating Spring injections for DAOs to achieve Inversion of Control, updating Spring Configurations for managing Java objects using callbacks
- Application integration with Spring Web Services to fetch data from external Benefits application using SOA architecture, configuring WSDL based on SOAP specifications and marshalling and un-marshalling using JAXB
- Prepared and executed JUNIT test cases to test the application service layer operations before DAO integration
- Creating test environments with WAS for local testing using test profile. And interacting with Software Quality Assurance (SQA) end to report and fix defects using Rational Clear Quest.
- Implementing the Design Patterns like MVC-2, Front Controller, Composite view and all Struts framework design patterns to improve the performance.
- Used Clear case, and also subversion for maintaining the source version control.
- Wrote Ant scripts to automate the builds and installation of modules.
- Involved in writing Test plans and conducted Unit Tests using JUnit.
- Used Log4j for logging statements during development.
- Design and implementation of log data indexing and search module, and optimization for performance and accuracy.
- To provide a full text search capability for archived log data, utilizing Apache Lucene library.
- Involved in the testing and integrating of the program at the module level.
- Worked with production support team in debugging and fixing various production issues.
Environment: JDK 1.5, JSP, JSP Custom Tag libraries, JavaScript, EXT JS, AJAX, XSLT, XML, DOM4J 1.6, EJB, DHTML, Web Services, SOA, WSDL, SOAP, JAXB, IBM RAD, IBM WebSphere Application server, IBM DB2 8.1, UNIX, UML, IBM Rational Clear case, JMS, Spring Framework, Hibernate 3.0, PL/SQL, JUNIT 3.8, log4j 1.2, Ant 2.7.