Hadoop Developer Resume
San Jose, CA
SUMMARY:
- 8 + years of professional experience in Hadoop, Big Data and Java technologies such as HDFS, MapReduce, Apache Pig, Impala, Hive, HBase, Sqoop, Spark, Storm, Kafka, Zookeeper, Oracle, JSP, JDBC and Spring.
- Excellent knowledge of Hadoop Architecture and its related components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Experience in managing and reviewing Hadoop log files. Experience in NoSQL database HBase.
- Responsible to manage data coming from different sources and Involved in HDFS maintenance and loading of structured and unstructured data.
- Hands on experience in importing and exporting data from relational databases to HDFS and vice versa using Sqoop.
- Good working knowledge in creating Hive tables and worked using Hive QL for data analysis to meet the business requirements.
- Strong knowledge on implementation of SPARK core - SPARK SQL, RDD, Data Frames and Spark streaming,
- Excellent experience with major relational databases - Oracle 8i/9i/10g, SQL Server 2000/2005/2008, DB2, My SQL.
- Proven experience in writing Queries, Stored Procedures, Triggers, Cursors, Functions and Packages using TOAD.
- Hands on experience with build tools like ANT, Maven.
TECHNICAL SKILLS:
Big Data & Hadoop: Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Oozie, Sqoop, Spark, Impala, Zookeeper, Flume, Kafka, Cloudera Distribution of Hadoop
Programming Languages: Java JDK1.4/1.5/1.6, Scala, SQL, PL/SQL
Java/J2EE Technologies: Servlets, JSP, JSTL, JDBC, JMS, JNDI, RMI, EJB, JFC/Swing, AWT, Applets, Multi-threading, Java Networking
Frameworks: Struts 2.x/1.x, Spring 2.x, Hibernate 3.x
IDEs: Eclipse 3.x, IntelliJ
Web technologies: JSP, JavaScript, jQuery, AJAX, XML, XSLT, HTML, DHTML, CSS
Web Services: SOAP, REST, WSDL
XML Tools: JAXB, Apache Axis, AltovaXMLSpy
Methodologies: Agile, Scrum, RUP, TDD, OOAD, SDLC
Modeling Tools: UML, Visio
Testing Technologies/Tools: JUnit
Database Servers: Oracle 8i/9i/10g, DB2, SQL Server 2000/2005/2008, MySQL
Version Control: CVS, SVN
Build Tools: ANT, Maven
Platforms: Windows 2000/98/95/NT4.0, UNIX
PROFESSIONAL EXPERIENCE:
Confidential, San Jose, CA
Hadoop Developer
Responsibilities:
- Primary working in HDFS, MapReduce, YARN, Pig, Hive, HBase, Spark, Spark Streaming, sqoop.
- Experience in using REST API and BULK API of ELOQUA Marketing Tool for loading data into Custom Data Objects on the backend and starting Email Campaigns in Marketing and Sales Domain. Pushing data from Hive tables to CDO.
- Developed rest-client in JAVA for consuming data from REST API using Jersey. Response from API is XML did marshalling and Unmarshalling using JAXB and store the data into HDFS and created Hive table on top data did ETL to convert XML-> Tabular format.
- Third-party .csv,.xls files are brought into Hadoop. Wrote Pig UDF in JAVA for cleaning of data in files.
- Wrote Hive UDF’s in JAVA for custom functionalities based on business requirement.
- Input validations for input files i.e. checking number of columns, column sequence and checking primary columns are not null if null kill the process and raise an issue before continuation off job is written in JAVA and automated using Shell scripting.
- Migrated few modules having performance impact on hive to spark dataframes. Used pyspark to develop dataframes and created denormalized tables so that data analysts can directly query connecting to tables from BI tools instead of doing joins on their ends.
- Used Jupyter notebooks to write pyspark coding and creating ipynb files.
- Good understanding spark tuning in dataframes and spark-sql.
- Worked on POC in creating Big Table of HBASE. Hands on experience on working both HBASE shell and HBASE JAVA API. Developed JAVA API for bulk loading data into HBASE table.
- Developed JAVA API for Filter by column, apply multiple conditions and retrieved rows from HBASE table.
- Used Cisco Tidal scheduler to automate all jobs. Created jobs daily, weekly and bi weekly.
- Loaded data into Teradata and partitioned tables in Teradata.
- Wrote SQL in Teradata studio, pulled tables from MPP into Hadoop via Sqoop.
Environment: Hadoop, Spark, HABSE JAVA, Cisco Tidal, Teradata Map-R
Confidential, Malvern, PA
Hadoop Developer
Responsibilities:
- Following agile methodology (SCRUM) during development of the project and oversee the software development in Sprints by attending daily stand-ups.
- Created Hive tables as per requirement as internal or external tables, intended for efficiency.
- Implemented partitioning, bucketing in Hive for better organization of the data.
- Worked on Impala for creating tables and querying data.
- Created/modified UDF and UDFs for Hive.
- Developed various UDF’s in hive for various hive scripts achieving various functionalities.
- Moved all flat data files generated from various sources to HDFS for further processing.
- Migrated all the third-party files into Hdfs for about 7 years’ data and created Hive external tables
- Wrote FTP job to transfer files from Windows share drive location to Unix.
- Wrote Pig UDF (User Defined Function) to standardize date formats.
- Used Sqoop to get data from Oracle database and joined with Third party files.
- Developed continuous flow of data into HDFS from social feeds using Apache Storm Spouts and Bolts.
- Worked on creating an effective data model of all above mentioned views, tables from sales, OTG, call stats, third party files to get 360-degree view in Tableau .
- Developed Map reduce programs for Third Party files to analyze data i.e. Funds sold by a parent company to subsequent chain of companies.
- Worked extensively on creating Oozie workflows for scheduling different jobs of hive, map reduce and shell scripts.
- Worked on migrating tables in SQL to Hive using Sqoop .
- Implemented Kafka messaging services to stream large data and insert into database.
- Analyzed large amounts of data sets by writing Pig Scripts .
- Implemented Virtualization of data sources using Spark by connecting to DB2, Oracle using Spark connectors.
- Used Spark SQL to query data from Db2, Oracle using the respective connectors available.
- Written Python Code to convert result set from Hive output to excel sheet for business requirement purpose.
Environment: Java 7, Eclipse IDE, Hive, HBase, Map Reduce, Oozie, Sqoop, Pig, Spark, flume, Impala, MySQL, PL/SQL, Kafka, Linux, CDH.
Confidential, San Jose, CA
Hadoop Developer
Responsibilities:
- Involved in requirement gathering phase of the SDLC and helped team by breaking up the complete project into modules with the help of my team lead.
- Installed and configured Hadoop and Hadoop stack on a 4-node cluster.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables.
- Involved in data ingestion into HDFS using Sqoop and Flume from variety of sources.
- Responsible for managing data from various sources.
- Worked on Kafka to produce the streamed data into topics and consumed that data.
- Got good experience with NoSQL database HBase.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Installed and configured Hive and also wrote Hive UDFs that helped spot market trends.
- Involved in loading data from UNIX file system to HDFS.
- Involved in creating Hive tables, loading the data using it and in writing Hive queries to analyze the data.
Environment: HDFS, Pig, Hive, HBase, Sqoop, Spark, Oozie, Sqoop, flume, Kafka, AWS, Linux Shell Scripting.
Confidential, Salt Lake City, Utah
Java/J2EE Programmer
Responsibilities:
- Full life cycle experience including requirements analysis, high level design, detailed design, UMLs, data model design, coding, testing and creation of functional and technical design documentation.
- Used Spring Framework for MVC architecture with Hibernate to implement DAO code and also used Web Services to interact other modules and integration testing.
- Developed and implemented GUI functionality using JSP, JSTL, Tiles and AJAX.
- Designed database and involved in developing SQL Scripts.
- Used SQL navigator as a tool to interact with DB Oracle 10g.
- Developed portal screens using JSP, Servlets, and Struts framework.
- Developed the test plans and involved in testing the application.
- Implementing the Design Patterns like MVC-2, Front Controller, Composite view and all Struts framework design patterns to improve the performance.
- Used Clear case, and also subversion for maintaining the source version control.
- Wrote Ant scripts to automate the builds and installation of modules.
- Involved in writing Test plans and conducted Unit Tests using JUnit .
- Used Log4j for logging statements during development.
- Design and implementation of log data indexing and search module, and optimization for performance and accuracy. To provide a full text search capability for archived log data, utilizing Apache Lucene library.
- Involved in the testing and integrating of the program at the module level.
- Worked with production support team in debugging and fixing various production issues.
Environment: Java 1.5, JSP, AJAX, XML, Spring 3.0, Hibernate 2.0, Struts 1.2, Web Services, WebSphere7.0, JUnit, Oracle 10g, SQL, PL/SQL, log4j, RAD 7.0/7.5, Clear Case, Unix, HTML, CSS, JavaScript
Confidential, Chicago IL
Java Portal Developer
Responsibilities:
- Coordinated with the business analysts, project managers to analyze new propose Ideas/Requirements, designed the integrated tool, developed and implemented all the modules.
- Designed database and involved in developing SQL Scripts.
- Used Case Studio for developing the DB Design and generating SQL file for various databases.
- Contributed significantly in designing the Object Model for the project as senior developer and Architect.
- Responsible for development of Business Services.
- Developed Business Rules for the project using Java.
- Developed portal screens using JSP, Servlets, and Struts framework.
- Developed the test plans and involved in testing the application.
- Implementing the Design Patterns like MVC-2, Front Controller, Composite view and all Struts framework design patterns to improve the performance.
- Re-engineered OMT Wholesale Internet Service Engine (WISE) using an “n” tiered architecture involving latest technologies like EJB, CORBA, XML and JAVA .
- Explored the possibilities of using technologies like JMX for better monitoring of the system.
- Implemented Secure Socket Layer communication for CORBA servers.
- Used CVS for maintaining the source version control.
- Used Log4j for logging statements during development.
- Design and implementation of log data indexing and search module, and optimization for performance and accuracy. To provide a full text search capability for archived log data, utilizing Apache Lucene library.
- Involved in the testing and integrating of the program at the module level.
- Worked with production support team in debugging and fixing various production issues.
Environment: s: Java, J2EE, Struts 1.2/2.0, JDK, JSP, Servlets, EJB 3.0, Java Beans, JavaScript, HTML, XML, Eclipse, CORBA SSL, JUnit, Log4j, CVS, Deployment in Web Logic, Apache Lucene