Hadoop Developer Resume
Hoffman Estate, IL
SUMMARY
- 8 years of proactive IT experience in Analysis, Design, Development, Implementation and Testing of software applications which includes an accomplished 3+ Years of experience in Big Data using Hadoop, Hive, Spark, PIG, Sqoop and MapReduce Programing.
- Extensively worked upon MapReduce programming model and Hadoop Distributed File Systems (HDFS).
- Work experience in major components of Hadoop Ecosystem like Flume, Hbase, ZooKeeper, Oozie Hive, Sqoop, PIG and YARN.
- Exceptional understanding of Hadoop architecture and different components of Hadoop cluster.
- Leveraged strong Skills in developing applications involving Big Data technologies likeHadoop, Spark, ElasticSearch, MapReduce, Yarn, Flume, Hive, Pig, Kafka, Storm, Sqoop, HBase, Hortonworks, Cloudera, Mahout, Avro and Scala.
- Developed scripts, numerous batch jobs to schedule various Hadoop programs.
- Experience in analyzing data using HiveQL, PIG Latin, and custom MapReduce programs in Java.
- Hands on experience in importing and exporting data from different databases like Oracle, Teradata into HDFS and Hive using Sqoop.
- Extensive experience in collecting and storing stream data like log data in HDFS using Apache Flume.
- Extensively used MapReduce Design Patterns to solve complex MapReduce programs.
- Developed Hive and PIG queries for data analysis to meet the business requirements.
- Experience in extending Hive and Pig core functionality by writing custom UDFs like UDAFs and UDTFs.
- Experienced implementing Security mechanism for Hive Data.
- Experience with Hive Queries Performance Tuning.
- Strong experience in architecting real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Spark SQL, Flume, Map reduce, Hive etc.
- Experienced with improving data cleansing process using Pig Latin operations, transformations and join operations.
- Extensive knowledge in NoSQL databases like HBase, Cassandra, MongoDB, CouchDB.
- Experienced with performing CRUD operations using HBase Java Client API and Rest API.
- Good knowledge on Cassandra, DataStax Enterprise, DataStax OpsCenter and CQL.
- Experience with Oozie Workflow Engine to automate and parallelize Hadoop Map/Reduce, Hive and PIG jobs.
- Experienced with processing different file formats like Avro, XML, JSON and Sequence file formats using MapReduce programs.
- Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
- Excellent Java development skills using J2EE Frameworks like Spring, Hibernate, EJBs and Web Services
- Implemented SOAP and RESTful Web Services.
- Exposed to each of the phases of complete Software Development Life Cycle (SDLC).
- Extensively worked with Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java Design Patterns.
- Good knowledge in creating PL/SQL Stored Procedures, Packages, Functions, Cursors with Oracle (9i, 10g, 11g), and MySQL server.
- Worked with Junit and Easymock andMRUnitto implement test cases
- Good knowledge with versioning tools like Clearcase, Peforce, SubVersion and CVS.
- Exposed into methodologies like Scrum, Agile and Waterfall.
- Multi - cultured Team Playerwith complete flexibilityto work independently as well as in a team and have quick grasping capabilities to work with the newly emerging technologies.
- Motivated high flierwith excellent verbal/written communication skills, admirable presentation capabilities, efficient requirement gathering ability and effectively convey them to other members in the team.
TECHNICAL SKILLS
Big Data Technologies: HDFS, Hive, Map Reduce, Pig, Sqoop, Flume, Oozie, Zookeeper, YARN, Spark, Kafka, Storm.
Scripting Languages: Shell, Python, Scala.
Languages: C, C++, Java, SQL, PL/SQL, PIG Latin, HiveQL, Unix Shell Scripting.
Front End Technologies: HTML, XHTML, CSS, XML, JavaScript, AJAX, Servlets, JSP.
Java Frameworks: MVC, Apache Struts2.0, Spring and Hibernate.
Web Services: SOAP(JAX-WS), WSDL,Apache CXF, Apache Axis SOA, Restful(JAX-RS), JMS.
Application Servers: Apache Tomcat 5.5/6.0/7, WebLogic Server 8x/9x/10x, WebSphere 5.1/6.0, JBoss.
Databases: Oracle 9i/10g/11g, IBM DB2, MySQL, MS SQL Server.
NoSQL Databases: HBase, MongoDB, Cassandra.
IDE: Eclipse, NetBeans.
Operating Systems: Linux, UNIX, Mac, Windows 7/8/10.
Reporting Tools: Tableau, Talend.
PROFESSIONAL EXPERIENCE
Confidential, Wilmington, DE
Sr Hadoop Developer
Responsibilities:
- Extracted the data from Teradata/MySQL into HDFS using Sqoop export/import.
- Expertise in using Data organizational designpatterns in MapReduce to convert business data into custom format.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Used Partitioning pattern in MapReduce to move records into categories
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
- Developed Sparkscripts by using Scalashellcommands.
- Expertise in optimization of MapReduce algorithms using Combiners, Partitioners and Distributed Cache to deliver best results.
- Optimized MapReduce jobs to use HDFS jobs efficiently by using Gzip, LZO compression techniques.
- Creating Solr Cloud collections to load charge code data and serve results to end users using SolrLucene Queries with low latency requirements.
- Implemented Hive generic UDF's to validate business rules that specific to the category.
- Implemented Performance tuning in Hive Queries like making partition fields as filters, optimize join performance etc.
- Experience in writing Pigscripts to transform raw data from several data sources into forming baseline data.
- Exported analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Used UDF’s to implement business logic in Hadoop and responsible to manage data coming from different sources.
- Automated the process for extraction of data from warehouses and weblogs into HIVE tables by developing workflows and coordinator jobs in Oozie.
- Wrote shell scripts to monitor the health check of Hadoop daemon services and responded accordingly to any warning or failure conditions using Ganglia.
Environment: Apache Hadoop 1.1.2, MapR, MapReduce, HDFS, Hive, PIG, Kafka, Oozie, Sqoop, Flume, Apache Solr, Java, SQL, Eclipse, Unix Script, MySQL, and Ganglia.
Confidential, Hoffman Estate, IL
Hadoop Developer
Responsibilities:
- Worked on a live Hadoop production CDH5 cluster with 50 nodes.
- Worked with highly unstructured and semi structured data of 40 TB in size.
- Analyzed Hadoop clusters, other analytical tools used in big data like Hive, Pig and databases like HBase.
- Used Sqoop extensively to ingest data from various source systems into HDFS.
- Written Hive queries for data analysis to meet the business requirements.
- Created Hive tables and worked on them using Hive QL.
- Installed cluster, worked on commissioning & decommissioning of Data node, Name node recovery, capacity planning, and slots configuration.
- Assisted in managing and reviewing Hadoop log files.
- Assisted in loading large sets of data (Structure, Semi Structured, Unstructured).
- Wrote complexPig UDFjobsfor business transformations.
- Worked with the Data Science team, Teradata team, and business to gather requirements for various data sources like webscrapes, APIs.
- Involved in creating Hive/Impala tables, and loading and analysing data using Hive queries.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Involved in running Hadoop jobs for processing millions of records and compression techniques.
- Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Involved in loading data from LINUX file system to HDFS, and wrote shell scripts for productionizing the MAP (Member Analytics Platform) project.
- Load and transform large sets of structured, and semi structured data.
- Loaded Golden collection to Apache Solr using morphline code for Business team.
- Assisted in exporting analysed data to relational databases using Sqoop.
- Data Modelled for Hbase for large transaction sales data.
- Proof of Concept on Strom for streaming the data from one of the sources.
- Proof of Concept in Pentaho for Big Data.
- Implementation of one of the data source transformations in Spark.
- Worked in Agile methodology and used Scrum for Development and tracking the project.
Environment: HDFS, CDH5.3.2, Apache Spark 4.1, Kafka, Storm 0.9.5 Cassandra 2.2.0, Hive, Pig, Scala, Java, Sqoop, SQL, Shell scripting.
Confidential, Melville, NY
Hadoop Developer
Responsibilities:
- Analyzed large data sets by running Hive queries and Pig scripts.
- Worked with the Data Science team to gather requirements for various data mining projects.
- Involved in creating Hive tables, and loading and analyzing data using Hive queries.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Involved in loading data from LINUX file system to HDFS.
- Responsible for managing data from multiple sources.
- Responsible to manage data coming from different sources.
- Assisted in exporting analyzed data to relational databases(MySQL) using Sqoop.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
- Generating tableau reportsand building dashboards.
- Worked closely with business units to define development estimates according to Agile Methodology.
- Worked onCDH 4.6: 48 nodes having each node of 3TB storage and 32GB RAM.
Environment: CDH4.6, HDFS, Pig, Hive, MapReduce, Cassandra, LINUX, Tableau 8.2, shell scripting, and Big Data.
Confidential, Atlanta, GA
Hadoop Developer
Responsibilities:
- Used Sqoop to import customer information data from MySQL database into HDFS for data processing.
- Developed a workflow using Oozie to automate the tasks of loading the data into HDFS from analyzing the data.
- Developed Map Reduce jobs to calculate the total usage of data by commercial routers in different locations.
- Developed Map reduce programs for data sorting in HDFS.
- Optimized Hive queries to extract the customer information from HDFS or Hbase.
- Developed Pig Latin scripts to aggregate the log files of the business clients.
- Loaded and transformed large sets of structured, semi structured data using Pig Scripts.
- Involved in loading data from UNIX file system to HDFS.
- Wrote Map Reduce jobs to generate reports for the number of activities created on a particular day, during a dumped from the multiple sources and the output was written back to HDFS.
- Designed workflows by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume.
- Exposure to Amazon Web Services - AWS cloud computing (EMR, EC2 and S3 services).
- Created the Load Balancer on AWS EC2 for unstable cluster.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Involved in writing shell scripts in scheduling and automation of tasks.
- Worked on Hive for further analysis and for generating transforming files from different analytical formats to text files.
- Involved in ETL, Data Integration and Migration. Imported data using Sqoop to load data from Oracle to HDFS on regular basis.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
- Managed and reviewedHadooplog files to identify issues when job fails.
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Oozie, Java, Linux Shell Scripting and Big Data.
Confidential, Fayetteville, NY
Hadoop Developer
Responsibilities:
- Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed MapReduce programs to parse the raw data and store the refined data in tables.
- Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Developed algorithms for identifying influencers with in specified social network channels.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Analyzing data with Hive, Pig and Hadoop Streaming.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Experienced in working with Apache Storm.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Performed data mining investigations to find new insights related to customers.
- Involved in forecast based on the present results and insights derived from data analysis.
- Involved in collecting the data and identifying data patterns to build trained model using Machine Learning.
- Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages.
- Developed different formulas for calculating engagement on social media posts.
- Involved in review technical documentation and provide feedback.
- Involved in fixing issues arising out of duration testing.
Environment: Java, NLP, HBase, Machine Learning, Hadoop, HDFS, Map Reduce, Hortonworks, Hive, Apache Storm, Sqoop, Flume, Oozie, Apache Kafka, Zookeeper, MySQL, and eclipse.
Confidential
Java/J2EE Developer
Responsibilities:
- Involved in Analysis, Designing, Development and Testing phases of the application.
- Involved in creation and maintenance of the backend services using Multithread, Spring, Hibernate, SQLServer and Oracle.
- Developed Web pages using JSPs with Tag libraries, HTML, and JavaScript.
- Writing J2EE code using Spring, hibernate to upload input CSV files for credit risk data.
- Implemented Dependency Injection (IOC) feature of spring framework to inject dependency into objects and AOP is used for Logging.
- Designed and developed persistence layer build on ORM framework and developed it using Hibernate
- Implemented various Design patterns like Business Delegate, Data Transfer Objects DTO, Service locator, Session Facade and Data Access Objects DAO patterns.
- Involved in writing SQL, Stored procedure and PL/SQL for back end. Used Views and Functions at the Oracle Database end.
- Developed various documents within the application using XML by using Eclipse as IDE tool.
- Developed SOAP requests to interact with billing schedule system.
- Used Web Services (SOAP & WSDL) to exchange data between Server part and client.
- Integrating and deploying the application on WebLogic application server using ANT.
- Developed user interfaces for presenting the expense reports, transaction details using JSP, XML, HTML and Java Script.
- Used Log4J for logging the application exceptions and debugging statements.
- Proficient in doing Object Oriented Design using UML-Rational Rose.
Environment: JDK, JSP, Tiles, HTML, Java Script, WebLogic, Eclipse, Spring JDBC/ORM/DI, JSF, JPA/Hibernate, Spring, PL/SQL, Windows, CVS, Log4J, Ant.
Confidential
Java/J2EE Developer
Responsibilities:
- Participated in the designing of the Web framework using Struts framework as a MVC design paradigm.
- Involved in entire life cycle development of the application and reviewed and analysed data model for developing the Presentation layer and Value Objects.
- Used HTML, CSS, XHTML and DHTML in view pages for front-end extensively involved in developing Web interface using JSP, JSP Standard Tag Libraries (JSTL) using Struts Framework.
- Used Struts&JavaScript for client-side validation and Struts Tag Libraries to develop the JSP pages.
- Used JSTL in the presentation tier and spring for Dependency Injection and configured Struts Validator Forms, Message Resources, Action Errors, Validation.xml, Validator-rules.xml.
- Involved in writing the client side scripts using JavaScript and ddeveloped Controller using Action Servlet and Action mapping provided by Struts framework.
- Wrote Hibernate configuration and mappings xml files for database access and developed various java objects (POJO) as part of persistence classes for OR mapping with databases
- Developed SQL stored procedures and prepared statements for updating and accessing data from database.
- Development carried out under Eclipse Integrated Development Environment (IDE) and used Clearcase Version Control for Project Configuration Management.
Environment: J2EE, Hibernate, Struts 1.2, Spring 2.5, EJB, JSP, JSTL, Servlets, Apache Axis 1.2, JavaScript, HTML, XML, JUnit, Eclipse, TOAD, Apache Tomcat, Clearcase, Oracle9i.
Confidential
Java/J2EE Developer
Responsibilities:
- Developed web components using JSP, Servlets and JDBC.
- Designed tables and indexes.
- Designed, Implemented, Tested and Deployed Enterprise Java Beans both Session and Entity using WebLogic as Application Server.
- Developed stored procedures, packages and database triggers to enforce data integrity. Performed data analysis and created crystal reports for user requirements.
- Provided quick turn around and resolving issues within the SLA.
- Implemented the presentation layer with HTML, XHTML and JavaScript.
- Used EJBs to develop business logic and coded reusable components in Java Beans.
- Development of database interaction code to JDBC API making extensive use of SQL
- Query Statements and advanced Prepared Statements.
- Used connection pooling for best optimization using JDBC interface.
- Used EJB entity and session beans to implement business logic and session handling and transactions. Developed user-interface using JSP, Servlets, and JavaScript.
- Wrote complex SQL queries and stored procedures.
- Actively involved in the system testing.
- Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product.
- Involved in development, and Testing, phases of the project by following Agile methodology.
Environment: Windows NT 2000/2003, XP, and Windows 7/ 8 C, Java,UNIX, and SQL using TOAD,Finacle Core banking, CRM 10209, Microsoft Office Suit, Microsoft project.