Big Data / Hadoop Developer Resume
Atlanta, GA
SUMMARY:
- HadoopDeveloper with 8 years of overall IT experience in a variety of industries, which includes hands on experience in Big Data technologies.
- Nearly 4 years of comprehensive experience in Big Data processing using Hadoop and its ecosystem (MapReduce, Pig, Hive, Sqoop, Flume, Spark, Kafka and HBase).
- Also experienced on Hadoop Administration like software installation, configuration, software upgrades, backup and recovery, cluster setup, cluster performance and monitoring on daily basis, maintaining cluster up and run on healthy.
- Worked on installing, configuring, and administrating Hadoop cluster for distributions like Cloudera Distribution 4, 5 and Hortonworks 2.1, 2.2.
- Experience in building Cloudera distribution of Hadoop with Knox gateway and apache Ranger.
- Efficient in writing MapReduce Programs and using Apache Hadoop API for analyzing the structured and unstructured data.
- Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- Perform maintenance, monitoring, deployments, and upgrades across infrastructure.
- Debugging Pig and Hive scripts and optimizing MapReduce job and debugging Map reduce job.
- Administrator for Pig, Hive and Hbase installing updates patches and upgrades.
- Hands-on experience in managing and reviewing Hadoop logs.
- Good knowledge about YARN configuration.
- Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Knowledge of NoSQL databases such as Hbase, MongoDB and Cassandra.
- Also used Hbase in accordance with PIG/Hive as and when required for real time low latency queries.
- Knowledge of job workflow scheduling and monitoring tools like Oozie (hive, pig) and Zookeeper (Hbase).
- Experience in developing solutions to analyze large data sets efficiently.
- Good knowledge on Spark (spark streaming, spark SQL), Scala andKafka.
- Good in creating and designing data ingest pipelines using technologies such as Apache Storm-Kafka.
- Maintained list of source systems and data copies, tools used in data ingestion, and landing location in Hadoop.
- Developed various shell scripts and python scripts to address various production issues.
- Integrated clusters with Active Directory for Kerberos and User Authentication/Authorization.
- Good Knowledge of data compression formats like Snappy, Avro.
- Dealt with huge transaction volumes while interfacing the front end application written in Java, JSP, Struts, Webworks, Spring, JSF, Hibernate, Web service and EJB with Web sphere Application server and Jboss.
- Experience in Job scheduling using Autosys.
- Delivered zero defect code for three large projects which involved changes to both front end (Core Java, Presentation services) and back-end (DB2).
- Experiance in Test Driven Developement (TDD), Mocking Frameworks, and Continiuos Integration (Hudson & Jenkins)
- Strong experience in designing Message Flows and writing complex ESQL scripts and invoked Web service through message flow.
- Designed and developed a Batch Framework similar to Spring Batch framework.
- Working knowledge of Node.js and Express JavaScript Framework.
TECHNICAL SKILLS:
Big Data: Cloudera Distribution, Horton Works, HDFS, Zookeeper, Yarn, Data Node, Name Node, Resource Manager, Node Manager, Hbase, Hive, Flume, Cassandra, MongoDB, Sqoop, Oozie, PIG, Mapreduce, Kafka, Spark, Storm, Scala, Impala.
Operating Systems: Linux, Windows, Android, UNIX
Programming Languages: Java JDK1.4/1.5/1.6 (JDK 5/JDK 6), C/C++, Mat lab, R, HTML, SQL, PL/SQL
Frameworks: Spring 2.x/3.x,Struts 1.x/2.x, Hibernate 2.x/3.x and JPA
Web Services: WSDL, SOAP, Apache CXF/XFire, Apache Axis, REST, Jersey
Version Control: Visual Sources Safe, SVN
Web Technologies: Direct Web remoting, HTML, XML,JMS, Core Java, J2EE, Soap & REST Web Services, JSP, Servlets, EJB, JavaScript, Struts, Spring, Web works, JSF, Ajax.
Databases technologies: Oracle 8i/9i/10g, Microsoft SQL Server, DB2 & MySQL 4.x/5.x
Middleware Technologies: XML gateway, Web sphere Message Queue, JMS
Others: Junit, ANT, Maven, Android Platform,Microsoft Office, SQL Developer, DB2 control center,Microsoft Visio,Hudson, Subversion, GIT, Nexus, Artifactory and Trac
Developement Strategies: Water-Fall, Agile, Pair Programming and Test Driven Development
PROFESSIONAL EXPERIENCE:
Confidential, Atlanta, GA
Big data / Hadoop Developer
Responsibilities:
- Handled the importing of data from various data sources like DB2, SQL SERVER and Text files using SQOOP performed transformation using Hive and loaded the data into HDFS.
- Participate in requirement gathering and documenting the business requirements by conducting workshops/meetings with various business users.
- Involved in preparing sprint planning (Agile methodology) for each implementation task.
- Installed and configured development cluster for application development and hadoop tools.
- Involved in creating Hive tables and written multiple Hive queries to load the hive tables for analyzing the market data coming from distinct sources.
- Created extensive SQL queries for data extraction to test the data against the various databases.
- Involved in preparing the design flow for the Data stage objects to pull the data from various upstream applications and do the required transformations and load the data into various downstream applications.
- Collaborate with Business Analysts to clarify application requirements.
- Worked on building Cloudera distribution of Hadoop with Knox gateway and apache Ranger.
- Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations.
- Created Partitioning, Bucketing, Map side Join, Parallel execution for optimizing the hive queries.
- Responsible for monitoring Cluster using Cloudera Manager.
- Developed Pig scripts for track data capture between arrived data and current data.
- Developed Simple to complex MapReduce Jobs using Hive and Hbase.
- UsedImpalato read, write and query the data in HDFS.
- Wrote Java Program to loadMongoDBData into Hive.
- Orchestrated hundreds of Sqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows.
- Responsible for handling different data formats like Avro, Parquet and ORC formats.
- Involved in generating analytics data using Map/Reduce programs written inPython.
- UsedKafkato load data in to HDFS and move data into NoSQL databases.
- Involved in creation and designing of data ingest pipelines using technologies such as Apache Storm-Kafka.
- Loading streaming data using kafka and processing UsingStorm.
- Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SqlContext.
- Performed analysis on implementing Spark usingScala.
- Implemented spark sample programs in python using pyspark.
- Used SparkSql andScalafor lightning fast cluster computing.
- Experie Worked in improving performance of theTalendjobs.
- Optimizing the cluster overall performance by caching, persisting the data and partition where ever it is appropriate
- Active member for developing POC on streaming data using Apache Kafka, Flume and Spark Streaming.
- Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.
- Scheduling and ordering of batch jobs inAutosys
Environment: Hadoop, Java, MapReduce, HDFS, Hive, Pig, Sqoop, Flume,Linux, Python, Spark, Impala, Scala, Kafka Storm, Knox Gateway, Shell Scripting, XML, Eclipse, Cloudera(CDH4/5 Distribution), DB2, SQL Server, MySQL, Autosys.
Confidential, Lewisville, TX
Hadoop Developer / Admin
Responsibilities:
- Developed several advanced Map Reduce programs to process data files received from different sensors.
- Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
- Used Sqoop to export data from HDFS to RDBMS.
- Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
- Wrote queries in mongoDB to generate reports to display in the dash board.
- Installed, configured and deployed data node hosts forHadoopCluster deployment.
- Installed various Hadoop eco systems and Hadoop Daemons.
- Maintained the cluster securely using Kerberos and making the cluster upend running all the time also troubleshooting if any problem persists.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
- Administrator for Pig, Hive and Hbase installing updates patches and upgrades.
- Managed commissioning & decommissioning of data nodes.
- Implemented optimization and performance testing and tuning of Hive and Pig.
- Defined workflow using Oozie framework for automation.
- Experience in migrating Hive QL intoImpalato minimize query response time.
- Implemented internal SSO implementation.
- Worked with HiveQL on big data of logs to perform a trend analysis of user behavior on various online modules.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Gained experience in managing and reviewing Hadoop log files.
- Wrote Hadoop Job Client utilities and integrated them into monitoring system.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Responsible for creating, modifying topics (KafkaQueues) as and when required with varying configurations involving replication factors and partitions.
- Written shell scripts and Python scripts for automation of job.
Environment: Horton Works, HDFS, Hive, HQL scripts, Map Reduce, Java, HBase, Pig, Sqoop, Kafka, Impala, Shell Scripts, Python Scripts, Oozie Co-ordinator, MySQL and SFTP.
Confidential, Jacksonville, FL
Hadoop / Java Developer
Responsibilities:
- Developed the application using J2EE Framework that leverages classical Model View Layer (MVC) architecture UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams were used.
- Participated in requirement gathering and converting the requirements into technical specifications.
- Extensively worked on User Interface for few modules using JSPs, JavaScript.
- Used apache-maven tool to build, configure, and package and deploy an application project.
- Used Jenkins for build maven project.
- Designed dynamic and multi-browser compatible pages using HTML, CSS, JQuery,Angular.jsand JavaScript.
- Good experience with AngularJS directives ng-app, ng-init, ng-model for initialization of AngularJS application data.
- Work withNode.jsuse to multiple threads for file and network events.
- Involved in development of user interface using JSP with JavaBeans, JSTL and Custom Tag Libraries,JS, CSS, Jquery,Node.js
- Used Node.JSfor writing code in the server side and creating scalable network applications.
- Used Sub version to maintain different versions of application code.
- Created the search logic using TFIDF algorithm and implemented this algorithm in mapreduce.
- Successfully integrated sqoop export to Oracle tables that exports the top 100 results of mapreduce to the oracle tables.
- Developed workflows using custom MapReduce, Pig, Hive and Sqoop.
- Used Cassandra to handle large amounts of data across many servers.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
- Maintain Hadoop,Hadoop eco systems, third party software, and database(s) with updates/upgrades, performance tuning and monitoring.
- Responsible in modification of API packages
- Managing and scheduling Jobs on aHadoopcluster.
- Responsible to manage data coming from different sources.
- Experience in managing and reviewing Hadoop log files.
- Used Oracle DB for writing SQL scripts, PL/SQLcode for procedures and functions.
- Participated in development/implementation of Cloudera Hadoop environment.
Environment: Core Java, JSP, JavaScript, Jenkins, Angular JS, Node.js, JavaBeans, CSS, HTML, JQuery, Maven Linux, Oracle, PL/SQL, Cloudera Distribution, Hadoop MapReduce, Sqoop, Hbase, Cassandra, Hive, Pig.
Confidential
Java developer
Responsibilities:
- Involved in development of the applications using Spring Web MVC and other components of the Spring Framework, the controller being Spring Core (Dispatcher Servlet)
- Implemented controller (abstract) and mapped it to a URL in .servlet.xml file. Implemented JSP corresponding to the controller where in the data was propagated into it from the model and view object from the controller designed and Implemented MVC architecture using Spring Framework, which involved writing Action Classes/Forms/Custom Tag Libraries & JSP pages.
- Developed unit level test cases using Junit, Maven as build tool and Jenkinsto create and run deployment jobs.
- Used GitHub as a code repository.
- Used Spring MVC, Java script and angularJSfor web page development.
- Redesign the app using technologies of HTML5, CSS3, JS, Angular JS and Node JS.
- Automating job submission Via Jenkins scripts.
- Designed, developed and maintained the data layer using Hibernate and performed configuration of Spring Application Framework.
- Used Oracle DB for writing SQL scripts, PL/SQLcode for procedures and functions.
- Used Hibernate to store the persistence data into the IBM DB2 UDB database and written HQL to access the data from the database.
- Used JMS (Java Messaging Service) for asynchronous communication between different modules.
- Used XML, WSDL, UDDI, SOAP Web Services for communicating data between different applications.
- Worked with QA team to design test plan and test cases for User Acceptance Testing (UAT).
Environment: Core Java, J2EE, Spring MVC, Hibernate, HTML, Junit, GitHub, Jenkins, JavaScript, JSP, Angular JS, Node JS, CSS, JDBC, DB2, PL/SQL, JMS, SVN.
Confidential
Java Developer
Responsibilities:
- Involved in Designing, Coding, Debugging and Deployment of Business Objects.
- Provided Hibernate mapping files for mapping java objects with database tables.
- Used AJAX framework for asynchronous data transfer between the browser and the server.
- Provided JMS support for the application using Weblogic MQ API.
- Extensively used Java Multi-Threading concept for downloading files from a URL.
- Provided the code for Java beans (EJB) and its configuration files for the application.
- Used Rational ClearCase version control tool to manage source repository.
- Involved in configuring and deploying the application on WebLogic Application Server 8.1.
- Provided utility classes for the application using Core Java and extensively used Collection package.
- Implemented log4j by enabling logging at runtime without modifying the application binary.
- Performed various DAL, DML operations on SQL server database.
Environment: Unix, Java 1.5, J2EE, Spring 2.0, Hibernate, WebLogic MQ, JMS, TOAD, AJAX, JSON, JDK, SAX, JSTL, EJB, JSP 2.0, SQL server 2005, Servlets 2.4, HTML, CSS, XML, XSLT, JavaScript, SQL, WebLogic.