Hadoop Developer Resume
Sunnyvale, CA
SUMMARY:
- Mani has almost 8 years of experience spread across Hadoop, Java and ETL. He has extensive experience in Big Data Technologies and in development of standalone and web applications in multi - tiered environments using Java, Hadoop, Hive, HBase, Impala, Pig, Sqoop, J2EE Technologies (Spring, Hibernate), Oracle, HTML, and Java Script. He has 4 years of comprehensive experience as a Hadoop Developer. Mani has very good communication, interpersonal and analytical skills and works well as a team member or independently.
- Passionate towards working in Big Data and Analytics environment.
- Extending Pig and Hive core functionality by writing custom UDF’s for Data Analysis.
- Data transformation, file processing, and identifying user behavior by running Pig Latin Scripts and expertise in creating Hive internal/external Tables/Views using shared Meta Store.
- Experience in writing scripts in HiveQL. Develop Hive queries helps for visualizing business requirement.
- Expertise in writing Hadoop Jobs for analyzing data using Hive and Pig.
- Experience in working with MapReduce programs using Apache Hadoop for working with Big Data.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Extending Hive and Pig core functionality by writing custom UDF’s.
- Familiarity with reporting tools such as Jasper Soft
- Good understanding of Data Mining and Machine Learning techniques.
- Good understanding of Zookeeper and Kafka for monitoring and managing Hadoop jobs.
- Experience in NoSql technologies like Hbase, Cassandra, and Neo4j for data extraction and storing huge volumes of data.
- Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in Java.
- Extensive experience with SQL, PL/SQL and database concepts.
- Worked on Oracle, Teradata and Vertica database systems with Good experience in UNIX Shell scripting
- Knowledge of NoSQL databases such as HBase, and MongoDB.
- Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Experience in developing solutions to analyze large data sets efficiently.
- Knowledge of administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig.
- Experience in installing and configuring different Hadoop distributions like Cloudera (CDH4 & CDH5) and HortonWorks Distributions (HDP).
- Good working knowledge of clustering, compression and continuous performance.
- Experience in Extraction, Transformation, and Loading (ETL) of data from multiple sources like Flat files, XML files, and Databases. Used Informatica for ETL processing based on business.
- Expertise in installing, configuration and administration of Tomcat Web Sphere. Understanding of cloud based deployments into Amazon EC2 with Salt.
- Hands on experience working on Talend Integration Suite and Talend Open Studio. Experience in designing Talend jobs using various Talend components.
- Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
- Handled several techno-functional responsibilities including estimates, identifying functional and technical gaps, requirements gathering, designing solutions, development, developing documentation, and production support.
- Major Strengths are familiarity with multiple software systems, ability to learn quick about new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner.
- An individual with excellent interpersonal and communication skills, strong business acumen, creative problem solving skills, technical competency, team-player spirit, and leadership skills.
- Strong oral and written communication, initiation, interpersonal, learning and organizing skills matched with the ability to manage time and people effectively.
TECHNICAL SKILLS:
Languages: C, C++, Java, J2EE, Phyton, PL SQL
Big Data Ecosystem: Hadooop/Big DATA HDFS, Hbase, Pig, Hive, Sqoop, Zookeeper,Oozie, Spark, ApacheHadoop, Kafka, cloudera, Hortonworks, Talend.
NoSQL Technologies: MongoDB, Cassandra, Neo4J
Databases: Oracle 11g/10g/9.i/8.X, MySQL, MS SQL Server 2000
Web technologies: Core Java, J2EE, JSP, Servlets, EJB, JNDI, JDBC, XML, HTML, JavaScript, Web ServicesWeb Server: Apache Tomcat 7.0/6.0/5.5
IDE: Eclipse 3.7/3.5/3.2/3.1/ Net Beans, Edit Plus 2, Eclipse Kepler
Tools: Teradata, SQL Developer, Soap UI
Testing: JUnit, JMock
Operating System: Linux, UNIX and Windows 2000/NT/XP/Vista/7/8/10
Methodologies: Agile, Unified Modeling Language (UML), Design Patterns (Core Java and J2EE)
System Design & Dev: Requirement gathering and analysis, design, development, testing, delivery
PROFESSIONAL EXPERIENCE:
Hadoop Developer
Confidential, Sunnyvale CA
Responsibilities:
- Worked on bigdata infrastructure build out for batch processing as well as real-time processing.
- Developed, Installed and configured Hive, Hadoop, Bigdata, hive, hue, oozie, pig, sqoop, Storm, Kafka, Elastic Search, Redis, Flume, Scoop, Java, J2EE, HDFS, XML, PHP, ZooKeeper, Flume and Oozie on the Hadoop cluster.
- Design, deploy, manage cluster nodes for our data platform operations (racking/stacking).
- Created Hive Tables, loaded retail transactional data from Teradata using Scoop.
- Managed thousands of Hive databases totaling 250+ TBs.
- Developed enhancements to Hive architecture to improve performance and scalability.
- Collaborated with development teams to define and apply best practices for using Hive.
- Worked on Hadoop, Hive, Oozie, and MySQL customization for batch data platform setup.
- Worked on implementation of a log producer in SCALA that watches for application logs, transforms incremental logs and sends them to a Kafka and Zookeeper based log collection platform.
- Implemented a data export application to fetch processed data from these platforms to consuming application databases in a scalable manner.
- Experience in Storm for handling realtime processing.
- Involved in loading data from Linux file system to HDFS.
- Experience in setting up salt-formulas for centralized configuration management.
- Monitoring Cluster using various tools to see how the nodes are performing.
- Experience on Oozie workflow scheduling.
- Developed Spark scripts by using Scala Shell commands as per the requirement.
- Expertise in cluster task like adding Nodes, Removing Nodes without any effect to running jobs and data.
- Transferred data from Hive tables to HBase via stage tables using Pig and used Impala for interactive querying of HBase tables.
- Implementation of auditing the data for accounting by capturing various logs like HDFS Audit logs, Yarns Audit logs, Audit logs.
- Worked on a proof of concept to implement Kafka-Storm based data pipeline.
- Configured job scheduling in Linuxusing shell scripts
- Created custom Solr Query components to enable optimum search matching.
- Utilized the Solr API to develop custom search jobs and GUI based search applications.
- Also, implemented multiple output formats in the same program to match the use cases.
- Developed Hadoop streaming Map/Reduce works using Python.
- Installation of Apache SPARK on Yarn and managing Master and Worker nodes.
- Performed benchmarking of the No-SQL databases, Cassandra and HBase.
- Created data model for structuring and storing the data efficiently. Implemented partitioning and bucketing of tables in Cassandra.
- Implemented test scripts to support test driven development and continuous integration.
- Clear understanding of Cloudera Manager Enterprise edition.
- Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive serdes.
- Working on POC and implementation & integration of Cloudera for multiple clients.
- Good knowledge on Creating ETL jobs to load Twitter JSON data into MongoDB and jobs to load data from MongoDB into Data warehouse.
- Design and Implement the Various ETL Projects using Informatica, Data stage as data integration tool
- Exported the analyzed data to HBase using Sqoop and to generate reports for the BI team.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Participate in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
- POC work is going on using Spark and Kafka for real time processing.
- Deployed the project in Linux environment
- Working knowledge with Talend ETL tool to filter data based on end requirements.
- Automated the apache installation and its components using salt.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- POC work is going on comparing the Cassandra and HBase NoSQL databases.
- Worked with NoSQL databases like Cassandra and Mongo DB for POC purpose.
- Implement POC with Hadoop. Extract data with Spark into HDFS.
Environment: MapReduce, HDFS, Hive, Pig, Hue, Oozie, Solr, Bigdata, Core Java, Eclipse, Hbase, Flume, Spark, Scala, Kafka, Cloudera Manager, Impala, UNIX RHEL, Cassandra, LINUX, Puppet, IDMS, UNIX Shell Scripting.
Hadoop Developer
Confidential, Hartford, CT
Responsibilities:- Created Hive Tables, loaded retail transactional data from Teradata using Scoop.
- Loaded home mortgage data from the existing DWH tables (SQL Server) to HDFS using Scoop.
- Responsible for Operating system and Hadoop Cluster monitoring using tools like Nagios, Ganglia, Cloudera Manager.
- Talend administrator with hands on Big data ( Hadoop ) with Cloudera framework
- Proactively managed Oracle/SQL Server backups, performance tuning, and general maintenance with capacity planning of the Talend complex.
- Troubleshooting, debugging & fixing Talend specific issues, while maintaining the health and performance of the ETL environment.
- Documented the Installation, Deployment, administration and operational processes of Talend MDM Platform (production, Pre-Prod, test30, test 90 and development) environments for ETL project.
- Worked on POC and implementation & integration of Cloudera&Hortonworks for multiple clients.
- Developed and designed ETL Jobs using Talend Integration Suite (TIS) in Talend 5.2.2.
- Created complex jobs in Talend 5.2.2 using tMap, tJoin, tReplicate, tParallelize, tJava, tJavaFlex, tAggregateRow, tDie, tWarn, tLogCatcher, etc.
- Used tStatsCatcher, tDie, tLogRow to create a generic joblet to store processing stats.
- Created Talend jobs to populate the data into dimensions and fact tables.
- Created Talend ETL job to receive attachment files from pop e-mail using tPop, tFileList, tFileInputMail and then loaded data from attachments into database and archived the files.
- Used Talendjoblet and various commonly used Talend transformations components like tMap, tDie, tConvertType, tFlowMeter, tLogCatcher, tRowGenerator, tSetGlobalVar, tHashInput&tHashOutput and many more.
- Created Talend jobs to load data into various Oracle tables. Utilized Oracle stored procedures and wrote few Java code to capture global map variables and used them in the job.
- Created Talend jobs to copy the files from one server to another and utilized Talend FTP components.
- Involved in Hadoop administration on Cloudera, Hortonworks and Apache Hadoop 1.x & 2.x for multiple projects.
- Built and maintained a bill forecasting product that will help in reducing electricity consumption by leveraging the features and functionality of Cloudera Hadoop.
- Created ETL jobs to load Twitter JSON data into MongoDB and jobs to load data from MongoDB into Data warehouse.
- Worked on analyzing Hadoop cluster using different big data analytic tools including Kafka, Pig, Hive and Map Reduce.
- Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Real time streaming the data using Spark with Kafka.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala
- Imported and exported data into HDFS using Sqoop and Kafka.
- Wrote Hive Queries to have a consolidated view of the mortgage and retail data.
- Data is loaded back to the Teradata for the BASEL reporting and for the business users to analyze and visualize the data using Datameer.
- Orchestrated hundreds of sqoop scripts, pig scripts, hive queries using oozie workflows and sub-workflows.
- Loaded the load ready files from mainframes to Hadoop and files were converted to ASCII format.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Responsible for software installation, configuration, software upgrades, backup and recovery, commissioning and decommissioning data nodes, cluster setup, cluster performance and monitoring on daily basis, maintaining cluster on healthy on different Hadoop distributions (Hortonworks&Cloudera).
- Developed pig scripts for replacing the existing home loans legacy process to the Hadoop and the data is back fed to retail legacy mainframes systems.
- Wrote Hive and Pig scripts as ETL tool to do transformations, event joins, filter both traffic and some preaggregations before storing into the HDFS.
- Developed MapReduce programs to write data with headers and footers and Shell scripts to convert the data to fixed-length format suitable for Mainframes CICS consumption.
- Used Maven for continuous build integration and deployment.
- Agile methodology was used for development using XP Practices (TDD, Continuous Integration).
- Participated in daily scrum meetings and iterative development.
- Supported team using Talend as ETL tool to transform and load the data from different databases.
- Exposure to burn-up, burn-down charts, dashboards, velocity reporting of sprint and release progress.
Environment: Hadoop, Talend, MapReduce, Cloudera, Talend Hive, Pig, Kafka, Sqoop, Avro, ETL, Hortonworks, Datameer, Teradata, SQL Server, IBM Mainframes, Java 7.0, Log4J, Junit, MRUnit, SVN, JIRA.
Hadoop Developer
Confidential
Responsibilities:
- Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Extracted (ETL) data from multiple sources like Flat files, XML files, and Databases.
- Collected data was analyzed using Map Reduce Jobs and presented to employee portal built using Spring MVC.
- Processed Big Data using a Hadoop cluster consisting of 40 nodes.
- Implemented high-availability for Cloudera Manager daemons using Heartbeat
- Configured TLS security for Cloudera Manager and configured hadoop security for CDH 5 using Kerberos through Cloudera Manager
- Experience in setting up and administration of multi-node Mongo DB shared cluster.
- Install and Configure OPS Manager(MMS) for monitoring MongoDB enterprise wide.
- In-depth understanding of MongoDB HA strategies, including replica sets and sharing.
- Developed various reports using PL/SQL, Unix Shell Script, Perlscripting, Jasper reports and Datastage in specified format
- Working knowledge with Talend ETL tool to filter data based on end requirements.
- Involved in loading data from Linux file system to HDFS.
- Designed and configured Flume servers to collect data from the network proxy servers and store to HDFS.
- Involved in designing user screens and validations using HTML, jQuery, Ext JS and JSP as per user requirements.
- Developed data pipeline using Flume and Java map reduce to ingest employee browsing data into Hbase/HDFS for analysis.
- Created and exposed Hive views through Impala for the business Users
- Used agentE2EChain for reliability and failover in flume.
- Involved in Hadoop administration on Cloudera, Hortonworks and Apache Hadoop 1.x & 2.x for multiple projects.
- DevelopedInformatica mappings using Power Center Designer to load data from Flat files to Target database (Teradata).
- Extensively used Change Data Capture for production issues analysis, troubleshooting of the problem tickets, and to keep production data efficient, and to improve the performance of the ETL mappings. Check the data quality using IDQ.
- Created Talend jobs to load data into various Oracle tables. Utilized Oracle stored procedures and wrote few Java code to capture global map variables and use them in the job.
- Used the Regex, JSON and Avro SerDe's for serialization and de-serialization packaged with Hive to parse the contents of streamed log data and implemented Hive custom UDF's.
- Performing Data transfer to DR clusters using Distcp, Falcon and Oozie
- Managed works including indexing data, tuning relevance, developing custom tokenizers and filters, adding functionality includes playlist, custom sorting and regionalization with Solr Search Engine.
- Deployed DevOps using Puppet, Dashboard, and Puppet DB for configuration management to existing infrastructure.
- Design and development of Talend jobs to load meta data in SQL Server data base.
- Designed and implemented Map Reduce jobs for analyzing the data collected by the flume server.
- Actively involved in working with Hadoop Administration team to debugging various slow running MR Jobs and doing the necessary optimizations.
- Designed and implemented RESTFul APIs to retrieve the data from Hadoop Platform to Employee Portal Web Application.
- Used Sparklibraries for designing recommendation Engines.
- Developed modules using Spark MLib to calculate page rank, shortest distance connected
- Optimized the full text search function by connecting MongoDB and Elastic Search.
- Utilized AWS framework for content storage and Elastic Search for document search.
- Creating Hive tables and working on them using Hive QL.
- Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the help of Zookeeper implementation in the cluster.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Created Talend Development Standards. This document describes the general guidelines for Talend developers, the naming conventions to be used in the Transformations and also development and production environment structures.
- Used UNIX Shell scripts to run Informatica and Talend jobs.
- Wrote MRUnit tests for unit testing the Map Reduce jobs.
- Used Kafka as a messaging system to get data from different sources.
- Performed functional requirement review. Worked closely with Risk & Compliance Team and BA.
- Developed ANT Scripts to do compilation, packaging and deployment in the WebSphere server.
- Performance Tuning and optimizing clusters, to get best throughput using tools like HIVE, Impala, Hbase, Spark.
- Created and modified several UNIX Shell Scripts according to the changing needs of the project and client requirements. Developed Unix Shell scripts to call Oracle PL/SQL packages and contributed to standard framework.
- Implemented the logging mechanism using Log4j framework.
- Wrote test cases in JUnit for unit testing of classes.
- Facilitated Knowledge transfer sessions.
- Worked in an agile environment.
Environment: Agile Development Process, Struts 1.3, BigData, MongoDB, Spring 2.0, Web Services (JAX-WS, Axis 2) Hibernate 3.0, Cloudera, IDQ, LINUX, Impala, Oozie,Puppet, Hortonworks, ETL, Talend, HTML, Avro, XML,MLlib, ANT 1.6, Log4J, XML, XSLT, XSD, jQuery, JavaScript,Shell Scripting.
Hadoop Developer
Confidential, GA
Responsibilities:
- Worked on designing Poc's for implementing various ETL Process.
- Responsible for building scalable distributed data solutions using Hadoop.
- Analyzed large data sets by running Hive Queries and Pig scripts.
- Involved in creating Hive tables, loading and analyzing data using Hive Queries.
- Extracted the data from Teradata into HDFS using the Sqoop.
- Developed simple to complex MapReduce jobs.
- Load and transform large sets of structured, semi structured and unstructured data.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Mentored analyst and test team for writing Hive Queries.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Worked with application teams to install Hadoop updates, patches and version upgrades as required.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Implemented best income logic using Pig scripts and UDFs.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance for Hive and Pig queries.
- Developed UNIX Shell scripts to automate repetitive database processes
- Responsible to manage data coming from different sources.
- Experience in managing and reviewing Hadoop log files.
- Exported the analyzed data to the relational databases using Sqoop and to generate reports for the BI team.
- Troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: HDFS, Hive, HBase, MapReduce, Hive, Pig, Sqoop, Unix Shell Scripting, Teradata, Python.
Core Java Developer
Confidential, NY
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) of the application like Requirement gathering, Design, Analysis and Code development.
- Developed a prototype of the application and demonstrated to business users to verify the application functionality.
- Developed and implemented the MVC Architectural Pattern using Struts Framework including JSP, Servlets, Form Beans and Action classes.
- Implemented server side tasks using Servlets and XML.
- Responsible for Performance testing using Apache Jmeter
- Independently develop JMeter test scripts according to test specifications/ requirements.
- Created and running the Jmeter scripts for load testing.
- Used JMeter for load testing of the application and captured the response time of the application
- Helped developed page templates using Struts Tiles framework.
- Implemented Struts Validation Framework for Server side validation.
- Developed JSP's with Custom Tag Libraries for control of the business processes in the middle-tier and was involved in their integration.
- Implemented Struts Action classes using Struts controller component.
- Developed Web services (SOAP) through WSDL in Apache Axis to interact with other components.
- Integrated Spring DAO for data access using Hibernate used HQL and SQL for querying databases.
- Used parsers like SAX and XSD for parsing xml documents and used XML transformations using XSLT.
- Written stored procedures, triggers, and cursors using Oracle PL/SQL.
- Created and deployed web pages using HTML, JSP, JavaScript.
- Written JUnit Test cases for performing unit testing.
Environment: Java1.6, JSP, JDBC, Spring Core 3.0, Swing and Event handling, Multithreading, Struts 1.2, Node JS, Hibernate 3.0, Design Patterns, JMeter, XML, Oracle, PL/SQL Developer, Jboss, WebLogic 10.3, Apache Axis 1.2, MAVEN, HTML4, JSP, Java Script, SVN, JUnit, UML, Web services, SOAP, XSLT, Jira.
JAVA /J2EE Developer
Confidential, CO
Responsibilities:
- Updated the design documents with the new enhancements.
- Developed multiple web services to access company wide product catalogue.
- Designed and Developed UI's using JSP by following MVC architecture.
- Configured the URL mappings and bean classes using Springapp-servlet.xml.
- Developed web services interceptor to provide statistics on web services calls.
- Worked on system analysis to modify and enhance the present system.
- Used JMS message handler to capture request and response messages of web services.
- Developed GUIs using existing and developed CSS to display statistics on web service calls.
- Created and running the Jmeter scripts for load testing.
- Worked on AJAX implementation for retrieving the content and display it without reloading the existing page.
- Tested web services using soapUI tool as a part of unit-testing.
- Developed GUIs for a whole module representing a system to access IRIS.
- Wrote Ant build files for different modules of the project.
- Prepared design documents and for various modules.
- Developed JSP pages for dynamic representation of Customer data on the client side.
- Generated table entities and DAOs using Hibernate tools.
Environment: /Tools: Java, Spring Framework, JSP, Servlets, Spring MVC, JDBC, JMS, JMeter, XML, XSL, HTML, CSS, JavaScript, JQuery, WebSphere, JBoss,JUnit, SOA, SOAP, PL/SQL.
Jr. Java Developer
Confidential
Responsibilities:
- Developing new pages for personals.
- Implementing MVC Design pattern for the Application.
- Using Content Management tool (Dynapub) for publishing data.
- Implementing AJAX to represent data in friendly and efficient manner.
- Developing and Action Classes.
- Used JMeter for load testing of the application and captured the response time of the application
- Created simple user interface for application's configuration system using MVC design patterns and swing framework.
- Implementing Log4j for logging and debugging.
- Implementing Form based approach for ease of programming team.
- Involved in software development life cycle as a team lead.
Environment: Core Java, Java Swing, Struts, J2EE (JSP/Servlets), XML, AJAX, DB2, My SQL, Tomcat, JMeter.