Hadoop Developer Resume
Sunnyvale, CA
SUMMARY:
- Over Eight years of work experience in IT, which includes experience in Development, Testing and Implementation of Business Intelligence and Data warehousing solutions.
- Over Six years of experience with Apache Hadoop components like HDFS, MapReduce, HiveQL, Pig.
- Experience in installing Cloudera Hadoop CDH4 on an Amazon EC2 Cluster.
- Experience in Installing, Configuring and administrating the Hadoop Cluster of Major Hadoop Distributions.
- Hands on experience in MapReduce jobs using HiveQL and PigLatin.
- Hands on Experience in installing, Configuring and using echo system components like Hadoop, MapReduce, HDFS, Oozie, HiveQL, Sqoop, Pig, Flume.
- Expertise in implementing Database projects which includes Analysis, Design, Development, Testing and Implementation of end - to-end IT solution offerings.
- Extensive knowledge in RDBMS, developing database applications which involved creating Stored Procedures, Views, Triggers, user defined data types and functions.
- Knowledge in various phases of software development life cycle (SDLC) including System Analysis and Design, Software Development, Testing, Implementation, and Documentation.
- Excellent logical, analytical, communication and inter- personnel skills with exceptional ability to learn new concepts / fast learner with complex systems and a good team player, problem solver and ability to perform at high level to meet deadlines, adaptable to ever changing priorities.
- Experience in dealing with log files to extract data and to copy into HDFS using flume.
- Developed Hive UDFs and Pig UDFs using Python in Microsoft HDInsight environment.
- Efficient in writing MapReduce Programs and using Apache Hadoop Map Reduce API for analyzing the structured and unstructured data. Handling RSS Feeds in MapReduce.
- Used Pig for data cleansing and filtering.
- Experience in Streaming tools Spark, Spark Structure, Kafka Streaming.
- Developed Hive scripts to perform analysis on the data.
- Experience in developing Sqoop jobs to import data from RDBMS sources into HDFS as well as export data from HDFS into Relational tables.
- Strong skill with Distributed Stream Processing frameworks like Apache Kafka.
- Worked on installing and configuring big data multi node cluster.
- Have a good Knowledge on Python and RHadoop.
- Good Understanding of NoSQL databases like HBase, DynamoDB.
- Used AWS environment to run MR jobs, load data into HDFS and export the output of MR jobs onto Hive tables.
- Responsible for building scalable distributed solutions using Hadoop MapR.
- Used Airflow DAG’s(Directed Acyclic Graph) for scheduling the tasks automatically and to send email alerts when the task is failed.
- Created Looper Jobs using Jenkins and created operational playbooks.
- Did Graph Analysis with thresholds based on the mean and variance of the linkages.
TECHNICAL SKILLS:
Big Data Technologies: Hadoop, Scala 2.11.8, HDFS, Hive, MapR 2.7.0, Pig, Sqoop, Flume, Oozie, HBase, Spark 2.2.0, Python 2.7, Kafka
Programming Languages: Java (5, 6, 7),Python, C, C++
Databases/ RDBMS: MySQL, SQL/PLSQL, MS: SQL Server 2005, Oracle 9i/10g/11g, DB2, Azure SQL Server 2017
Scripting/ Web Languages: JavaScript, HTML5, CSS3, XML, SQL
ETL Tools: Informatica
Operating Systems: Linux CentOS 6.9, Windows XP/7/8/10, UNIX
Software Life Cycles: SDLC, Waterfall and Agile models
Office Tools: MSOffice, MS: Project and Risk Analysis tools
Utilities/Tools: Eclipse, Tomcat, NetBeans, IntelliJ IDEA CE, JUnit, SQL, Automation, MR-Unit, Airflow 1.10.2 Scheduler, Jenkins 2.107.3
Cloud Platforms: Amazon EC2
Java Frameworks: MVC, Apache Struts2.0, Spring and Hibernate
NoSQL Database: Cassandra, HBase, DynamoDB
WORK EXPERIENCE:
Confidential, Sunnyvale, CA
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed solutions using Hadoop MapR.
- Worked with CBB(Customer Back Bone) team for retrieving all ids required for implementing customer request for action with IDLookup.
- Developed spark scripts which takes sequence dataframe as input to the Confidential graphx traversal jar and traverse linked ids for IDLookup.
- Implemented Scala using spark to load json data from REST API into a dataframe and utilized Dataframes for transformations and Spark SQL API for faster processing of data.
- Optimized spark jobs configurations based on real-time or batch and tested configurations for most optimal workloads of around 800GB and 1TB joins with and without buckets.
- Developed automated SQL server reports for IDLookup through email by using shell script.
- Created Airflow DAG’s(Directed Acyclic Graph) using Python script for scheduling the tasks automatically and also to send the email alerts when the task is failed.
- Implemented unit test cases using shell script.
- Worked on Traversal Graph Analysis on hive tables with thresholds based on the mean and variance of the linkages.
- Created Looper Jobs using Jenkins and created operational playbooks.
- Created support plan for manual and automation process.
- Involved in End-to-End testing in staging and production and sent daily reports for requests received from Service Now and processed by CBB.
- Documented operational playbooks for IDLookup and service now acceptance support for team.
Environment: Hadoop YARN, Spark 2.2.0, Spark Core, Spark SQL, Scala 2.11.8, Python 2.7, MapR 2.7.0, Hive 0.13.1, Airflow Scheduler 1.10.2, Linux CentOS 6.9, Azure SQL server 2017, Service Now, Jira, Jenkins 2.107.3, IntelliJ IDEA CE.
Confidential, North Chicago, IL
Hadoop Developer
Responsibilities:
- Obtained the requirement specifications from the SME’s, Business Analysts in the BR, and SR meetings for corporate work place project. Interacted with the Business users to build the sample report layouts.
- Involved in writing the HLD’s along with the RTM’s tracing back to the corresponding BR’s and SR’s and reviewed them with the Business.
- Implementing an Enterprise level Transfer Pricing System to ensure tax efficient supply chains and achieve entity profit targets.
- IOP implementation involved understanding the Business requirements and solution design, translating the design into model construction, data loading using ETL logic, data validation and creating several custom reports as per the end user requirements.
- Installed and configured Apache Hadoop and Hive/Pig Ecosystems.
- Installed and Configured Cloudera Hadoop CDH4 via Cloudera Manager in a pseudo distributed mode and cluster mode.
- Developing the Python APIs which represent the memory subsystem.
- Developed Hive UDFs and Pig UDFs using Python in Microsoft HDInsight environment.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with HiveQL.
- Development of Python APIs to dump the array structures in the Processor at the failure point for debugging.
- Developed Map reduce program to extract and transform the data sets and resultant dataset were loaded to Cassandra and vice versa using kafka 2.0.x.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Developed Spark Streaming custom receiver to process data from RabbitMQ into Cassandra and Aerospike tables.
- Worked on Xml Stub’s integrating them with the Excel VB code and the backend DB.
- Created Map Reduce Jobs using Hive/Pig Queries.
- Used NOSQL database services like DynamoDB.
- Responsible for Data Ingestion like Flume and Kafka.
- Involved in installing, configuring and managing Hadoop Ecosystem components like Spark, Hive, Pig, Sqoop, Kafka and Flume.
- Usage of Spark streaming and Spark SQL API to process the files.
- Used Apache Spark with Python to develop and execute Big Data Analytics.
- Importing and exporting data into HDFS Sqoop and Flume and Kafka.
- Designed Outbound Packages to dump IOP Processed data into the Out tables for the Data Warehouse and the Cognos BI team.
- Worked on DB2 to store, analyze and retrieve the Data.
- Involved in Unit testing, System Integration testing and UAT post development.
- Provided End User training and configured reports in IOP.
Environment: Oracle IOP, Apache Hadoop, HDFS, Sqoop, Flume, Kafka, Cassandra, Cloudera Hadoop CDH4, HiveQL, Piglatin, Spark, DynamoDB.
Confidential, Great Neck, NY
Hadoop Developer
Responsibilities:
- Worked as a senior developer for the project.
- Used Enterprise Java Beans as a middleware in developing a three-tier distributed application.
- Developed Session Beans and Entity beans to business and data process.
- Implemented Web Services with REST.
- Developed user interface using HTML, CSS, JSPs and AJAX.
- Client side validation using JavaScript and JQuery.
- Performed client side validation with JavaScript and applied server side validation as well to the web pages.
- Used JIRA for BUG Tracking of Web application.
- Written Spring Core and Spring MVC files to associate DAO with Business Layer.
- Worked with HTML, DHTML, CSS, and JAVASCRIPT in UI pages.
- Wrote Web Services using SOAP for sending and getting data from the external interface.
- Extensively worked with JUnit framework to write JUnit test cases to perform unit testing of the application.
- Developed Spark Streaming custom receiver to process data from RabbitMQ into Cassandra and Aerospike tables.
- Developed real time data ingestion from Kafka to Elastic search by using kafka elasticsearch input and output plugins.
- Implemented JDBC modules in java beans to access the database.
- Designed the tables for the back-end Oracle database.
- Application hosted under Web Logic and developed utilizing Eclipse IDE.
- Used XSL/XSLT for transforming and displaying reports. Developed Schemas for XML.
- Involved in writing the ANT scripts to build and deploy the application.
- Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework.
- Implemented field level validations with AngularJS, JavaScript and JQuery.
- Preparation of unit test scenarios and unit test cases.
- Used Dynamo DB for running applications.
- Branding the site with CSS.
- Created alter, insert and delete queries involving lists, sets and maps in DataStax Cassandra.
- Worked with Spark on parallel computing to enhance knowledge about RDD in DataStax Cassandra.
- Worked with Scala to determine the flexibility of Scala on Spark and Cassandra to the management.
- Code review and unit testing the code.
- Used DB2 with the support of Object-Oriented features and Non-Relational structures with XML.
- Involved in unit testing using Junit.
- Implemented Log4J to trace logs and to track information.
- Involved in project discussions with clients and analyzed complex project requirements as well as prepared design documents.
Environment: Hive, Pig, HBase, Zookeeper, Sqoop, Kafka, Cassandra, Cloudera, Java, JDBC, JNDI, Struts, Maven, Subversion, JUnit, SQL language, DB2, Spring, Hibernate, Junit, Oracle, XML, Putty and Eclipse, DynamoDB.
Confidential
Hadoop Developer
Responsibilities:
- Involved in Automation of clickstream data collection and store into HDFS using Flume.
- Involved in creating Data Lake by extracting customer's data from various data sources into HDFS.
- Used Sqoop to load data from Oracle Database into Hive.
- Developed MapReduce programs to cleanse the data in HDFS obtained from multiple data sources.
- Implemented various Pig UDF's for converting unstructured data into structured data.
- Developed Pig Latin scripts for data processing.
- Involved in writing optimized Pig Script, along with developing and testing Pig Latin Scripts.
- Involved in creating Hive tables as per requirement defined with appropriate static and dynamic partitions.
- Used Hive to analyze the data in HDFS to identify issues and behavioral patterns.
- Involved in production Hadoop cluster set up, administration, maintenance, monitoring and support.
- Logical implementation and interaction with HBase.
- Assisted in creation of large HBase tables using large set of data from various portfolios.
- Cluster coordination services through Zookeeper.
- Efficiently put and fetched data to/from HBase by writing MapReduce job.
- Developed MapReduce jobs to automate transfer of data from/to HBase.
- Assisted with the addition of Hadoop processing to the IT infrastructure.
- Used flume to collect the entire web log from the online ad-servers and push into HDFS.
- Implemented custom business logic by writing UDF's in Java and used various UDF's from Piggybank and other sources.
- Implemented MapReduce job and execute the MapReduce job to process the log data from the ad-servers.
- Load and transform large sets of structured, semi structured and unstructured data.
- Back-endJava developer for Data Management Platform (DMP) and building RESTful APIs to build and letother groups build dashboards.
Environment: Hadoop, Pig, Sqoop, Oozie, MapReduce, HDFS, Hive, Java, Eclipse, HBase, Flume, Oracle 10g, UNIX Shell Scripting, GitHub, Maven.
Confidential
Hadoop Developer
Responsibilities:
- Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse.
- Worked on Spark and Cassandra for the User behavior analysis and lightning speed execution.
- Developed mapping parameters and variables to support SQL override.
- Used existing ETL standards to develop these mappings.
- Installed and configured Hadoop Map-Reduce, HDFS and developed multiple Map-Reduce jobs in Java for data cleansing and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Used UDF’s to implement business logic in Hadoop
- Extracted files from Oracle and DB2through Sqoop and placed in HDFS and processed.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Supported Map-Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Worked on JVM performance tuning to improve Map-Reduce jobs performance.
Environment: Hadoop, MapReduce, HDFS, Hive, Oracle 11g, Java, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Tomcat 6.
Confidential
Java Developer
Responsibilities:
- Implemented the project according to the Software Development Life Cycle(SDLC).
- Implemented JDBC for mapping an object-oriented domain model to a traditional relational database.
- Created Stored Procedures to manipulate the database and to apply the business logic according to the user’s specifications.
- Developed the Generic Classes, which includes the frequently used functionality, so that it can be reusable.
- Exception Management mechanism using Exception Handling Application Blocks to handle the exceptions.
- Designed and Developed user interfaces using JSP, Java Script and HTML.
- Involved in Database design and developing SQL Queries, stored procedures on MySQL.
- Used CVS for maintaining the Source Code.
- Logging was done through log4j.
Environment: Java, Java Script, HTML, JDBC Drivers, Soap Web Services, Unix, Shell scripting, SQL Server.