Sr. Big Data Hadoop Developer Resume
Plano, TX
PROFESSIONAL SUMMARY:
- Over 7+ years of programming and software development experience with skills in data analysis, design and development, testing and deployment of software systems from development stage to production stage with giving emphasis on Object oriented paradigm.
- Excellent knowledge on Hadoop and its ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Experience in working with BI team and transform big data requirements into Hadoop centric technologies.
- Experience in performance tuning a Hadoop cluster by gathering and analyzing an existing infrastructure.
- Experience in setting up Hadoop cluster for an Environment.
- Extensively worked with UNIX and Linux file systems and their administration.
- Developed skill sets in setting up monitoring infrastructure for Hadoop cluster using Nagios and Ganglia.
- Working experience on designing and implementing complete end - to-end Hadoop Infrastructure including PIG, HIVE, Sqoop, Oozie, Flume and zookeeper.
- Experience in working with flume to load the log data from multiple sources directly into HDFS
- Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Basic Knowledge of ETL tools.
- Exposure of Spark, Spark SQL, Spark Streaming and Scala.
- Worked on the backend using Scala andSparkto perform several aggregation logics.
- Experience in upgrading the existing Hadoop cluster to latest releases.
- Experienced in using NFS (network file systems) for Name node metadata backup.
- Experience in monitor the cluster around the clock using Ganglia.
- Experience in using Cloudera Manager 4.0 for installation and management of Hadoop cluster.
- Experience in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud. Exporting and importing data into S3.
- Experience in designing both time driven and data driven automated workflows using Oozie
- Experience in supporting analysts by administering and configuring HIVE.
- Have Knowledge on Talend open source data integration studio.
- Familiar with Talend platform functionalities and explored data integration and transformation methods for large enterprise business data integration from a single repository, a data warehouse, and also a big data repository.
- Experience in providing support to data analyst in running Pig and Hive queries.
- Developed Map Reduce programs to perform analysis.
- Performed Importing and exporting data into HDFS and Hive using Sqoop
- Experience in writing shell scripts to dump the Shared data from MySQL servers to HDFS
- Experience in Data Integration between Pentaho and Hadoop.
- Experience in application development using Java.
- Good knowledge in Core java, Collection framework.
- Excellent knowledge in OOPS (Object Oriented Programming Structure).
- Good knowledge in programming JDBC, Servlets and JSP.
- Familiar with Java virtual machine (JVM) and multi-threaded processing.
TECHNICAL SKILLS:
Operating Systems: Windows/98/2000/XP/7/8, MS-DOS, UNIX, Linux, Ubuntu
Languages: Core Java, C, C++, XML, SQL, Shell Script, PIG Latin, Python
Web Technologies: JQuery, Applets, JavaScript, CSS, HTML, XHTML, AJAX, XML, JAX-RS (Restful)
Big Data Eco System: Hadoop, MapReduce, HDFS, Hive, Pig, HBase, SQOOP, Flume, Spark and Scala.
Hadoop Distributions: Apache, CDH3, CDH4
Databases: DB2, Oracle, Impala, MY SQL & Microsoft SQL Server
NoSQL Databases: HBase, MongoDB
Tools: Eclipse, My Eclipse, RAD, VM Ware
Testing Tools: JUnit, MRUnit
PROFESSIONAL EXPERIENCE:
Confidential, Plano, TX
Sr. Big Data Hadoop Developer
Responsibilities:
- Observed the Set up and monitoring of a scalable distributed system based on HDFS for better idea and worked closely with the team to understand the business requirement and add new support features.
- Developed multiple MapReduce jobs in java and used different Hive UDF's for data cleaning and processing.
- Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
- Involved in loading data from LINUX file system to HDFS.
- Extracted the data from various Oracle servers into HDFS using SQOOP. Developed custom MapReduce codes, generated JAR files for user defined functions and integrated it with HIVE to extend the accessibility of statistical procedures within the entire analysis team.
- Worked on SQOOP scripts to pull the data from ORACLE Data Base into HDFS.
- Imported data from mainframe dataset to HDFS using Sqoop. Also handled importing of data from various data sources (i.e. Oracle, DB2, Cassandra, and MongoDB) to Hadoop, performed transformations using Hive, MapReduce.
- Wrote Hive queries for data analysis to meet the business requirements. Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the data using Spark with Kafka for faster processing.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Written python scripts for internal testing which pushes the data reading form a file into Kafka queue which in turn is consumed by the Storm application.
- Worked on Kafka, Kafka-Mirroring to ensure that the data is replicated without any loss.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Participated in building CDH4 test cluster for implementing Kerberos authentication. Upgraded the Hadoop Cluster from CDH4 to CDH5 and setup High availability Cluster to Integrate the HIVE with existing applications
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
- Deploying various Topologies into the Storm cluster based on the business use cases.
- Prototype done with HDP Kafka and Storm for click stream application.
- Integrated the Hive with HBase database.
- Involved Low level design for MR, Hive, Impala, Shell scripts to process data.
- Implemented Partitioning, Dynamic partitioning and Bucketing in HIVE using internal and external table for more efficient data.
- Used HIVE queries for aggregating the data and mining information sorted by volume and grouped by vendor and product.
- Mainly worked on Hive/Impala queries to categorize data of different claims.
- Worked in Agile development approach.
Environment: Hadoop (HDFS/MapReduce), PIG, HIVE, Hbase, SQOOP, Impala, Linux, Java, XML, Eclipse, Kafka, Storm, Spark, Cloudera, CDH4/5 Distribution, DB2, SQL Server, Oracle 11i, MySQL.
Confidential, St. Louis, MO
Sr. Big Data Hadoop Developer
Responsibilities:
- Worked on analyzing, writing Hadoop MapReduce jobs using API, Pig and Hive.
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Involved in installing Hadoop Ecosystem components.
- Responsible to manage data coming from different sources.
- Supported MapReduce Programs those are running on the cluster.
- Wrote MapReduce job using Java API for data Analysis and dim fact generations.
- Installed and configured Pig and also written Pig Latin scripts.
- Worked on the backend using Scala andSpark.
- Wrote MapReduce job using Pig Latin.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Wrote Hive queries for data analysis to meet the business requirements.
- Developed Java Map Reduce programs on mainframe data to transform into structured way.
- Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a MapReduce way.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Developed optimal strategies for distributing the mainframe data over the cluster. Importing and exporting the stored mainframe data into HDFS and Hive.
- Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
- Implemented Hbase API to store the data into Hbase table from hive tables.
- Writing Hive queries for joining multiple tables based on business requirement.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
- Created Hive tables and working on them using Hive QL.
- Conducted POC for Hadoop and Spark as part of NextGen platform implementation.
- Used storm for an automatic mechanism for repeating attempts to download and manipulate the data when there is a hiccup.
- Used Storm to analyze large amounts of non-unique data points with low latency and high throughput.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: CDH4, Java, MapReduce, HDFS, Hive, Spark, Scala, Pig, Linux, XML, MySQL, MySQL Workbench, Cloudera, Maven, Java 6, Eclipse, PL/SQL, SQL connector, Sub Version.
Confidential, Jersey City, NJ
Hadoop Developer/Admin
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Responsible for Cluster maintenance, managing cluster nodes.
- Involved in managing& review of data backups and log files.
- Analyzed data using Hadoop components Hive and Pig.
- Hands on experience with ETL process.
- Involved in running Hadoop streaming jobs to process terabytes of data.
- Experienced in loading and transforming large sets of structured, semi-structured and unstructured data Hadoop concepts.
- Worked on Python scripts.
- Created Hive tables to store data and written Hive queries.
- Involved in importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Extracted the data from Teradata into HDFS using Sqoop.
- Exported the patterns analyzed back to Teradata using Sqoop.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
- Worked with NoSQL databases like Cassandra and MongoDB for Poc purpose.
Environment: Hadoop, HDFS, Hive, Pig, Sqoop, Oozie, Map Reduce, Cassandra, MongoDB, UNIX, Shell Scripting, Python Scripting.
Confidential, Dallas, TX
Java/J2EE/ Hadoop Developer
Responsibilities:
- Involved in review of functional and non-functional requirements.
- Installed and configured Hadoop MapReduce and HDFS.
- Acquired good understanding and experience of NoSQL databases such as HBase and Cassandra.
- Installed and configured Hive and also implemented various business requirements by writing Hive UDFs.
- Extensively worked on user interface for few modules using HTML, JSP’s, JavaScript, Python and Ajax.
- Generated Business Logic using servlets, Session beans and deployed them on Web logic server.
- Created complex SQL queries and stored procedures.
- Used Multithreading and collections to improve performance of application.
- Developed the XML schema and Web services for the data support and structures.
- Implemented the Web service client for login verification, credit reports and applicant information using Apache Axis 2 web service.
- Consumed Web Services using Restful Web services.
- Experience in creating and consuming RESTful Web Services
- Prepared project deliverables: Business Workflow analysis, process analysis, user requirement documents (Use Cases & Use case diagrams) and managed the requirements using Rational Requisite Pro.
- Responsible for managing data coming from different sources.
- Used Hibernate ORM framework with spring framework for data persistence and transaction management.
- Used Struts validation framework for form level validations.
- Wrote test cases in JUnit for unit testing of classes.
- Developed client side validations using Javascript.
- Provided technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects.
- Built and deployed Java application into multiple UNIX based environments and produced both unit and functional test results along with release notes.
Environment: Hadoop, HBase, Hive, Java, Eclipse, J2EE 1.4, Struts 1.3, JSP, Servlets 2.5, WebSphere 6.1, HTML, XML, ANT 1.6, Python, JavaScript, JUnit 3.8.
Confidential
Java/J2EE Developer
Responsibilities:
- Responsible for gathering and understanding the system requirements by interacting with clients.
- Generated the Class diagrams, Sequence diagrams extensity for all the entire process flow using RAD.
- Implemented Spring MVC to integrate business logic and model and DAO classes using Hibernate.
- Worked on Marshalling and Un Marshalling the XML using the JIBX Parser.
- Interpreted and manipulated spring and hibernate configure files.
- Worked on JMS and Messaging Queue (MQ) configurations.
- Designed and developed GUI Screens for user interfaces using JSP, JavaScript, XSLT, AJAX, XML, HTML, CSS, JSON.
- Consumed external web services by creating service contract through WSRR (Websphere Service Registry and Repository) from different Development centers (DCs) and validated the services through SOAP UI.
- Worked on SOAP based Web services, tested Web Services using SOAP UI.
- Used Jenkins tool to build the application on the server.
- Extensively worked for deployment, configurations of application on WEBSPHERE server (DEV and QA-Smoke) and WEBSPHERE PORTAL for integration of all modules.
- Developed documentation for QA Environment.
- Loaded the records from Legacy database (DB2 V10) to existing one (Cassandra 1.2.8)
- Synchronized the create, Update and delete of records between Legacy Database (DB2 v10) and Cassandra 1.2.8
- Created stored procedures, SQL Statements and triggers for the effective retrieval and storage of data into database.
- Application developed on Agile methodologies scrum and iterative method process.
- Used Apache Log4j logging API to log errors and messages.
- Deployed applications on Unix Environment for Dev, QA-Smoke
- Unit tested the application using JUnits and Easy Mock.
- Involved in 24x7 support, Maintenance and enhancement of the application
Environment: JDK, Spring Framework, XML, HTML, JSP, Hibernate, ANT, Java Script, XSLT, CSS, AJAX, JMS, SOAP Web Services, Web Sphere Application Server, Tomcat DB2 Cassandra, PL/SQL, MQ Series, Junit, Log4j, Shell scripting, UNIX
Confidential
Software Engineer
Responsibilities:
- Developed front-end screens using JSP, HTML and CSS.
- Developed server side code using Struts and Servlets.
- Developed core java classes for exceptions, utility classes, business delegate, and test cases.
- Developed SQL queries using MySQL and established connectivity.
- Worked with Eclipse using Maven plugin for Eclipse IDE.
- Designed the user interface of the application using HTML5, CSS3, Java Server Faces 2.0 (JSF 2.0), JSP, JavaScript.
- Tested the application functionality with JUnit Test Cases.
- Developed all the User Interfaces using JSP and Struts framework.
- Writing Client Side validations using JavaScript.
- Extensively used JQuery for developing interactive web pages.
- Developed the DAO layer using hibernate and for real time performance used the caching system for hibernate.
- Experience in developingwebservicesfor production systems using SOAP and WSDL.
- Developed the user interface presentation screens using HTML, XML, and CSS.
- Experience in working with spring using AOP, IOC and JDBC template.
- Developed the Shell scripts to trigger the Java Batch job, Sending summary email for the batch job status and processing summary.
- Co-ordinate with the QA lead for development of test plan, test cases, test code and actual testing responsible for defects allocation and those defects are resolved.
- The application was developed in Eclipse IDE and was deployed on Tomcat server.
- Involved in scrum methodology.
- Supported for bug fixes and functionality change.
Environment: Java, Struts 1.1, Servlets, JSP, HTML, CSS, JavaScript, Eclipse 3.2, Tomcat, Maven 2.x, MySQL, Windows and Linux, JUnit.