Hadoop/spark Developer Resume
Stamford, CT
SUMMARY:
- Over 8 years of IT industry experience with strong emphasis on Big Data/Hadoop, Apache Spark, Java/J2EE, Scala and Python.
- About 4.5 years of work experience in ingestion, storage, querying, processing and analysis of Big Data with hands on experience in Hadoop Ecosystem development including MapReduce, HDFS, Hive, Hive QL, Pig, Spark, Spark SQL, Spark Streaming, YARN, Kafka, HBase, MongoDB, Cassandra, ZooKeeper, Sqoop, Flume, Impala, Oozie and Storm.
- Experienced with Apache Spark by improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frame, Pair RDD's, Spark YARN.
- Experience and strong knowledge on implementation of SPARK core - Spark Streaming, Spark SQL, MLLib.
- Strong experience in writing complex Pig Scripts, Hive data modeling and Mapreduce jobs.
- Assisted in Extending Hive and Pig core functionality by writing custom UDFs.
- Hands on experience with message broker such as Apache Kafka.
- Setting up, configuring, and programming on a Hadoop Framework with Strong knowledge in NoSQL databases like MongoDB, HBase, and Cassandra.
- Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in OOZIE.
- Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and HortonWorks Distributions (HDP) and Elastic Map Reduce (EMR).
- Experience with Testing Map Reduce programs using MRUnit, Junit, ANT, and Maven.
- Experience in handling different file formats like Parquet, Apache Avro, Sequence file, JSON, Spreadsheets, Text files, XML and Flat file format.
- Expertise in writing Shell-Scripts, Cron Automation and Regular Expressions.
- Good knowledge on Amazon AWS concepts like EMR & EC2 web services which provides fast and efficient processing of Big Data.
- Experience in developing Shell scripts and Python Scripts for system management.
- Comprehensive knowledge in Debugging, Optimizing and Performance Tuning of DB2, Oracle and MySQL databases.
- Working knowledge of ETL tools like Informatica/Power centre.
- Data visualization using Tableau.
- Involved in developing distributed Enterprise and Web applications using UML, Java/J2EE, Web technologies that include EJB, JSP, Servlets, Struts 2, JMS, JDBC, HTML, XML, Java Script, Spring and Hibernate.
- Conversant with Web/Application Servers - Apache Tomcat and JBoss Servers.
- Experience in installations of software, writing test cases, debugging, and testing of batch and online systems
- Well versed in using Software development methodologies like Agile Methodology and SDLC.
- Committed to excellence, self-motivator, team-player, and a far-sighted developer with strong problem solving skills and with zeal to learn new technologies.
- Strengths include good team player, excellent communication interpersonal and analytical skills and ability to work effectively in a fast-paced, high volume, deadline-driven environment.
- Willing to relocate: Anywhere
TECHNICAL SKILLS:
Big Data Technologies: Hadoop, MapReduce, Pig, Hive, Hive QL, Storm, Kafka, Spark, Spark SQL, Spark Streaming, Flume, YARN, Ambari, Sqoop, Oozie, Impala, Zookeeper.
NoSQL Databases: HBase, Cassandra, MongoDB.
Languages: Java/J2EE, Scala, Python, C, PL/SQL.
Scripting Lanugages: JavaScript, Shell Script, Python.
Java Technologies: Spring MVC, JDBC, JSP, JSON, Applets, Swing, JNDI, JSTL, RMI, JMS, Servelets, EJB, JSF.
Web Technologies: HTML, XML, CSS, JavaScript, JQuery, AJAX.
Web Application/ Servers: Apache Tomcat, Weblogic
Framework: Hibernate, Structs, Spring and Hadoop.
Build Tools: Maven, SBT
ETL Tools: Informatica, Talend.
Databases: Oracle 10/11g, MS SQL Server, DB2, PostgreSQL and MySql.
Operating Systems: Red hat Linux, Ubuntu, Windows, Unix.
Tools and IDE: Eclipse, Scala IDE, NetBeans, IntelliJ IDEA.
Version Control Systems: SVN, GIT, CVS, VSS.
Methodologies: Agile, SDLC.
PROFESSIONAL EXPERIENCE:
Hadoop/Spark Developer
Confidential, Stamford, CT
Responsibilities:
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, ZooKeeper, Sqoop, flume, Apache Spark, Impala with Hortonworks Distribution.
- Involved in loading and transforming large sets of structured, semi-structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Used Sqoop to dump data from relational database into HDFS and HBase and viceversa.
- Real time streaming of data using Spark with Kafka.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Experienced with batch processing of data sources using Apache Spark and Elastic search.
- Worked on Elastic search and created indexes as well as mappings for many resources.
- Experienced in implementing Spark RDD transformations, actions to implement business analysis..
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive and HBase.
- Created HBase column families to store various data types coming from various sources.
- Implemented partitioning, dynamic partitions and buckets in HIVE.
- Used Talend as a ETL tool to transform and load the data from different databases.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
- Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in Oozie.
- Created Hive tables and involved in data loading and writing Hive UDFs.
- Optimized MapReduce code, pig scripts and performance tuning and analysis.
- Worked on exporting data into relational database using sqoop for making it available for visualization for the BI team.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers using Agile/scrum methodologies.
Environment: Apache Hadoop, HDFS, Hive, Pig, Spark, Spark Streaming, Spark SQL, HBase, Kafka, Sqoop, Talend, Java, Scala, Git, Shell Scripting, Eclipse.
Hadoop Developer
Confidential, Topeka, Kansas
Responsibilities:
- Analyzed large and critical datasets using HDFS, Map Reduce, Hive, Pig.
- Importing the data from the MySql database into the HDFS using Sqoop.
- Extensive work with using log files and to copy them into HDFS using flume.
- Developed simple and complex Map Reduce programs in Java for Data Analysis on different data formats.
- Wrote MapReduce jobs and pig scripts using various input and output formats.
- Wrote extensive Mapreduce jobs in java to train the cluster and developed Java mapreduce programs for the analysis of sample log files stored in cluster.
- Optimized Map/Reduced Jobs to use HDFS efficiently by using various compression mechanisms.
- Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Developed Pig scripts for data analysis and extended its functionality by developing custom UDF's.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Developed Shell and Python scripts to automate and provide Control flow to Pig scripts.
- Worked on designing NoSQL Schemas on HBase.
- Used Informatica as a ETL tool to transform and load the data from different databases.
- Have worked on Apache Kafka for real time processing of the data. Have sound knowledge on producer, consumer, and broker concept of Apache Kafka.
- Successfully converted the avro data into parquet format in IMPALA for faster query processing.
- Hands on experience in exporting the results into relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Hortonworks Hadoop, HDFS, Pig, Hive, Kafka, Python, HBase, Zookeeper, MapReduce, Java, Sqoop, Informatica, Linux, UNIX Shell Scripting, Yarn, Parquet and Avro.
Java/Hadoop Developer
Confidential, Boston, MA
Responsibilities:
- Assisted in Installing and configuring Hadoop, Pig, Sqoop, Flume on the Hadoop cluster.
- Experienced in writing Map reduce jobs using Pig Latin scripts and Pig UDF's in Java to discover trends in data usage by users.
- Developed multiple Map reduce jobs in java for data cleaning and preprocessing.
- Importing log files using Flume into HDFS.
- Coordinated cluster services using Zookeeper.
- Worked with NoSQL databases like Hbase by creating Hbase tables to load large sets of semi structured data coming from various sources.
- Worked on importing and exporting data from Oracle and DB2 into HDFS using Sqoop.
- Have experience in writing shell scripts.
- Have hands on experience working on Sequence files, AVRO file formats.
- Used Oozie job scheduler to automate the job flows.
- Developed Spring MVC classes for handling requests received from front end logic such as JSP pages.
- Involved in design and development of UI component, using frameworks Angular JS, JavaScript, HTML5, CSS and Bootstrap.
- Logged various level of information like error, info, and debug into the log files using the Log4j.
- Involved in JUnit testing of the application using JUnit framework.
Environment: Java, Struts, JSP, JDBC, Spring, XML, Hadoop, HDFS, Pig, Hbase, MapReduce using Java, Oozie, Zookeeper, Linux, UNIX Shell Scripting, Flume, Angular JS, JavaScript, jQuery, HTML5, CSS, JUnit.
Java Developer
Confidential, New York City, NY
Responsibilities:
- Involved in the complete SDLC software development life cycle of the application which includes requirement analysis, design, development and testing.
- Used Spring Transactions for handling rollbacks and Spring Batch Prepared Statements for doing batch load/ updates to improve the performance.
- Developed and implemented the MVC Architectural Pattern using Struts Framework including JSP, Servlets, EJB, Form Bean and Action classes.
- Developed EJB components (Session Beans, Entity beans) using EJB design patterns to business and data process
- Implemented application using Spring, Spring IOC, Spring Annotations, Spring MVC, Spring Transactions, Hibernate 3.0, SQL, IBM WebSphere 8 and JBoss.
- Developed DAOs (Data Access Object) using Hibernate as ORM to interact with DBMS - Oracle.
- Implemented JDBC to connect with the database and read/write the data.
- Used Toad database tool to develop oracle quires.
- Involved in writing SQL queries &PL/SQL - Stored procedures, functions, triggers, cursors, object, types, sequences, indexes etc.
- Developed UI using JavaScript, JSP, HTML, CSS and Angular JS for interactive cross browser functionality and complex user interface.
- Tracing and Logging frameworks implemented using Log4j.
- Used SVN for version control and MAVEN to build scripts for Deployment.
- Developed the XML Schema and Web services for the data maintenance and structures Wrote test cases in JUnit for unit testing of classes.
- Used SOAP Web Services to exchange information.
- Implemented Agile development process on Software Development Life Cycle.
- Actively participated in the daily SCRUM meetings to produce quality deliverables within time.
- Developed web based presentation-using JSP, AJAX using Servlets technologies and implemented using struts framework.
- Involved in coding for JUnit Test cases, ANT for building the application.
Environment: Java, Spring MVC, Struts, Hibernate, HTML, Javascript, JSP, AJAX, IBM Websphere, Apache Tomcat, Oracle 10g, JUnit, SQL, PL/SQL, XML, UML, Log4j, SOAP, Eclipse.