Hadoop Developer Resume
Sunnyvale, CaliforniA
SUMMARY:
- Software Engineer with 6+ Years of overall IT experience along with 4+ Years of industrial experience in working with Hadoop Ecosystem, and 2+ Years of industrial experience in working with Java J2EE & PHP technologies.
- Expertise on Streaming tools like Kafka, Spark Streaming to perform real time data analysis on huge volume of live data.
- Expertise on Partitions, Bucketing in Hive and designed both Managed and External tables in Hive to optimize performance.
- Expertise on analyzing data using HiveQL, Pig Latin and custom MapReduce programs in Java & Scala.
- Hands on experience in creating Pig and Hive UDFs using Java in order to analyze the data efficiently.
- Extensive experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems (RDBMS) and vice versa.
- Experienced in working with different scripting technologies like Pig scripting, Python and Shell Scripting such as Bash.
- Experienced with Administration and Data modifications on NoSQL databases like HBase, MongoDB and Cassandra.
- Excellent knowledge on various data interchange and representation formats such as CSV, JSON, AVRO, Parquet, XML.
- Excellent knowledge on Designing and developing the application using Agile methodology, and Scrum Framework.
- Experienced in installation and configuration Hadoop Tools such as HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Spark, Kafka.
- Experienced in working with Hadoop Storage and Analytics framework over AWS cloud using tools like SSH, Putty
- Experienced with AWS instances in the Amazon cloud for some interim solutions.
- Experienced with Cassandra cluster in cloud (AWS) environment with scalable nodes as per the business requirement.
- Experienced in Java development using Hibernates, Servlets, JUnit, JavaScript, JSON and JDBC.
- Hands on experience in MAVEN for dependency management and structure of the project, and project build & deployment tool such as Jenkins.
- Hands on experience in Hortonworks and Cloudera Hadoop environments.
- Extensive knowledge on data Acquisition from Data sources like MySQL server, Oracle.
TECHNICAL SKILLS:
Big Data Skillset - Frameworks & Environments: Hadoop, Spark, Spark Streaming, Kafka, Hive, Sqoop, Pig, Avro, Parquet, Zookeeper, AWS EC2, AWS S3, AWS EMR, AWS Elasticsearch.
Databases: Cassandra, Oracle 11g/12c, MySQL, HBase.
Web services & Technologies: HTML5, jQuery, CSS3, XML 1.1, PHP.
J2EE Technologies: JDBC, Hibernate framework, Spring framework, Servlet.
WORK EXPERIENCE:
Confidential, Sunnyvale, California
Hadoop Developer
Responsibilities:- Performed data cleaning on raw data in Spark before storing into the HDFS as offline data source.
- Extracted data form Kafka, and convert into Dstream in Spark Streaming, and perform transformation to meet different feature requirements such as finding current trending show with a region.
- Consumed raw data from Kafka cluster within a hundred of topics to generating each different dataset as the training data of machine learning model.
- Replaced and complimented exist Spark batch job into Spark Streaming job to enable real time data analysis.
- Extracted historical data from offline sources to enrich the view information of real time streaming data.
- Extracted and merge user interaction data from related Kafka topics, and convert it into actionable insight for further analysis.
- Updated the existing batch and streaming job to adapt the latest value of attribute changes from multiple pipeline updates.
- Created and update dozens of hives table for the offline metadata storage.
- Tuned the streaming job with experimenting the micro batch interval to handle peak traffic.
- Loaded offline video metadata from Hive Database to join with transformed RDD in order to generate required dataset.
- Developed & tested using Zeppelin, Spark Shell, Eclipse.
- Committed and deployed using GitHub, Jenkins.
Environment: Spark 1.6.x, Kafka 0.8.2.x, Zookeeper, Hive, Pig, Parquet.
Confidential, Sunnyvale, California
Hadoop Developer
Responsibilities:- Implemented Micro batch processing using spark streaming to directly update price, inventory etc. details to indexes.
- Merged real time data with historical signals data updated at different frequency.
- Updated the latest value of attribute from multiple pipeline updates.
- Updated dynamically changing product catalog and other features such as store and online availability
- Worked on performing transformations & actions on RDDs and Spark streaming data.
- Used Spark for interactive queries, processing of streaming data and integration with Cassandra database on huge volume of data.
- Captured catalog updates in Kafka, processing up to 8,000 events per sec.
- Developed using Spark Shell, Eclipse.
- Committed and deployed using GitHub, Maven, and Jenkins.
Environment: Spark 1.2.x, Spark 1.3.x, Spark 1.4.x Kafka 0.8.1.x, Kafka 0.8.2.x, Hive, Pig, Zookeeper, Cassandra
Confidential, San Jose, California
Hadoop Developer
Responsibilities:- Created and Maintained Hive warehouse for analysis of user behavior and transaction pattern.
- Implemented various Hive queries on data to generate aggregated dataset for further analysis.
- Configured Sqoop and developed script to extract historical dataset from RDBMS to HDFS on a weekly basis.
- Experienced in Creating Pig scripts and Pig UDFs to pre-process the data, Enriched original data structure into nested and multivalued data.
- Developed reusable Pig UDFs on customized loading, storing, filtering, grouping, and joining.
- Created and modified UDF and UDAF’s for Hive on generate business report.
- Developed UDFs in Java as and when necessary to use in Pig and HIVE queries.
- Implemented workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing, analyzing and training the classifier using MapReduce jobs, Pig jobs and Hive Jobs.
Environment: Hadoop 2.x, HDFS, Sqoop, Pig, Hive, Core Java, Linux, Maven, Git.
Confidential, San Mateo, California
Java Developer
Responsibilities:- Designed and developed Enterprise Eligibility business objects and domain objects with Object Relational Mapping framework such as Hibernate.
- Involved in the coding and integration of several business-critical modules of application using Java, Spring, Hibernate and REST web services on application server
- Developed the Web Application using JAVA/J2EE (spring framework)
- Developed Servlets and JSPs based on MVC pattern using Spring Framework.
- Created procedures, functions and written complex SQL queries on the database.
- Used Log4J to create log files to debug as well as trace application.
Environment: Java EE, spring, servlet, Java 1.6, Java 1.7, Oracle, HTML, CSS, JavaScript, JQuery, Eclipse, Hibernate.
Confidential, Foster City, California
Java Developer
Responsibilities:- Participated in regular code reviews for migratory projects from old legacy systems written heavily in C++ to Java.
- Performed performance/load profiling on GoFundMe services with an open source java based tools
- Participated in implementing test-plans and test-cases built on highs-leveled and detailed design.
- Contributed in developing each test plan and test case based on the high-leveled and detailed design.
- Documented and communed test results.
- Performed code profiling using an open source tool.
- Built Java backend, JSP, Servlets and Business classes
- Set up of stage4, a live production like environment.
- Contributed in regular status meetings to state any bugs, problems and risks.
Environment: Java EE, spring, servlet, Java 1.6.