We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Kansas City, MO

SUMMARY:

  • 8+ years of overall software development experience on Big Data Technologies, Hadoop Eco system and Java/J2EE Technologies with experience programming in Java, Scala, Python and SQL
  • 4+ years of strong hands - on experience on Hadoop Ecosystem including Spark, Map-Reduce, HIVE, Pig, HDFS, YARN, HBase, Oozie, Kafka, Sqoop, Flume.
  • Experience in architecting , designing , and building distributed software systems.
  • Wrote python scripts to parse XML documents and load the data in database.
  • Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance.
  • Worked with real-time data processing and streaming techniques using Spark streaming , Storm and Kafka .
  • Experience in moving data into and out of the HDFS and Relational Database Systems (RDBMS) using Apache Sqoop.
  • Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data
  • Significant experience writing custom UDF’s in Hive and custom Input Formats in MapReduce .
  • Knowledge of job workflow scheduling and monitoring tools like Oozie.
  • Strong experience productionalizing end to end data pipelines on hadoop platform.
  • Experience working with NoSQL database technologies, including MongoDB, Cassandra and HBase.
  • Good experience is designing and implementing end to end Data Security and Governance within Hadoop Platform using Kerberos.
  • Strong experience with UNIX shell scripts and commands.
  • Experience in using various Hadoop Distributions like Cloudera, Hortonworks and Amazon EMR.
  • Strong hands-on development experience with Java, J2EE (Servlets, JSP, Java Beans, EJB, JDBC, JMS, Web Services) and related technologies.
  • Work with the team to help understand requirements, evaluate new features, architecture and help drive decisions.
  • Experience successfully delivering applications using agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD).
  • Experience in Object Oriented Analysis, Design, and Programming of distributed web-based applications.
  • Extensive experience in developing standalone multithreaded applications.
  • Configured and developed web applications in Spring and employed spring MVC architecture and Inversion of Control.
  • Experience in building, deploying and integrating applications in Application Servers with ANT, Maven and Gradle.
  • Significant application development experience with REST Web Services, SOAP, WSDL, and XML.

TECHNICAL SKILLS:

Big Data Technologies: Hadoop, HDFS, Map Reduce, Hive, Pig, HBase, Impala, Sqoop, Flume, NoSQL (HBase, Cassandra), Spark, Kafka, Zookeeper, Oozie, Hue, Cloudera Manager, Amazon AWS, Hortonwork clusters

Java/J2EE & Web Technologies: J2EE, JMS, JSF, JDBC, Servlets, HTML, CSS, XML, XHTML, AJAX, Angular JS, JavaScript

Languages: C, C++, Core Java, Shell Scripting, PL/SQL, Python, Pig Latin

Scripting Languages: JavaScript and UNIX Shell Scripting, Python

Operating systems: Windows, Linux and Unix

DBMS / RDBMS: Oracle, Talend ETL, Microsoft SQL Server 2012/2008, MySQL, DB2 and NoSQL, Teradata SQL, MongoDB, Cassandra, HBase

IDE and Build Tools: Eclipse, NetBeans, MS Visual Studio, Ant, Maven, JIRA, Confluence

Version Control: Git, SVN, CVS

Web Services: RESTful, SOAP

Web Servers: Web Logic, Web Sphere, Apache Tomcat

PROFESSIONAL EXPERIENCE:

Confidential, Cranston, Rhode Island

Sr. Hadoop Developer

Responsibilities:

  • Ingested terabytes of click stream data from external systems like FTP Servers and S3 buckets into HDFS using custom Input Adaptors.
  • Implemented end-to-end pipelines for performing user behavioral analytics to identify user-browsing patterns and provide rich experience and personalization to the visitors.
  • Written Scala based Spark applications for performing various data transformations, denormalization, and other custom processing.
  • Implemented data pipeline using Spark, Hive, Sqoop and Kafka to ingest customer behavioral data into Hadoop platform to perform user behavioral analytics.
  • Created a multi-threaded Java application running on edge node for pulling the raw clickstream data from FTP servers and AWS S3 buckets.
  • Utilized HDFS File System API to connect to FTP Server and HDFS. S3 AWS SDK for connecting to S3 buckets.
  • Implemented Kafka producers for streaming real-time clickstream events from external Rest services into topics.
  • Developed Spark streaming jobs using Scala for real time processing.
  • Involved in creating external Hive tables from the files stored in the HDFS.
  • Optimized the Hive tables utilizing improvement techniques like partitions and bucketing to give better execution Hive QL queries.
  • Used Spark-SQL to read data from hive tables, and perform various transformations like changing date format and breaking complex columns.
  • Wrote spark application to load the transformed data back into the Hive tables using parquet format.
  • Worked on batch processing and scheduled workflows using Oozie.
  • Implemented installation and configuration of multi-node cluster on the cloud using Amazon Web Services (AWS) on EC2.
  • Worked on data visualization and analytics with research scientist and business stake holders.

Environment: Hadoop 2.x, Spark, Scala, Hive, Pig, Sqoop, Oozie, Kafka, Cloudera Manager, Storm, ZooKeeper, HBase, Impala, YARN, Cassandra, JIRA, MySQL, Kerberos, Amazon AWS, Shell Scripting, SBT, Git, Maven.

Confidential, Austin, Texas

Hadoop Developer

Responsibilities:

  • Involved in gathering and analyzing business requirements, and designing Hadoop Stack as per the requirements.
  • Built distributed, scalable, and reliable data pipelines that ingest and process data at scale using Hive and MapReduce .
  • Wrote python scripts to parse XML documents and load the data in database.
  • Developed MapReduce jobs in Java for cleansing the data and preprocessing.
  • Loaded transactional data from Teradata using Sqoop and create Hive Tables.
  • Worked on automation of delta feeds from Teradata using Sqoop and from FTP Servers to Hive.
  • Worked on various performance optimizations like using distributed cache for small datasets, Partition , Bucketing in hive and Map Side joins.
  • Created components like Hive UDFs for missing functionality in HIVE for analytics.
  • Used IMPALA to analyze the data present in Hive tables.
  • Handled Avro and JSON data in Hive using Hive SerDe.
  • Worked with different compression codecs like GZIP, SNAPPY and BZIP2 in MapReduce, Pig and Hive for better performance.
  • Analyzed the data by performing the Hive queries using Hive QL to study the customer behavior.
  • Generate auto mails by using Python scripts.
  • Implemented the recurring workflows using Oozie to automate the scheduling flow.
  • Worked with application teams to install OS level updates and version upgrades for Hadoop cluster environments.
  • Participated in design and code reviews.

Environment: HDFS, Hadoop, Pig, Hive, HBase, Sqoop, Talend, Flume, Map Reduce, Podium Data, Oozie, Java 6/7, Oracle 10g, YARN, UNIX Shell Scripting, SOAP, REST services, Oracle 10g, Maven, Agile Methodology, JIRA.

Confidential, Kansas City, MO

Hadoop Developer

Responsibilities:

  • Understand business requirements and involved in preparation of design document according to client requirement.
  • Imported the data from relational databases to Hadoop cluster by using Sqoop .
  • Developed data pipelines using Hive scripts to transform data from Teradata , DB2 data sources. These pipelines had customized UDF'S to extend the ETL functionality.
  • Developed multiple Map Reduce jobs in Java for complex business requirements including data cleansing and preprocessing.
  • Developed UDF for converting data from Hive table to JSON format as per client requirement.
  • Involved in creating tables in Hive and writing scripts and queries to load data into Hive tables from HDFS.
  • Implemented dynamic partitioning and Bucketing in Hive as part of performance tuning.
  • Created custom UDF’s in Pig and Hive.
  • Performed various transformations on data like changing date patterns, converting to other time zones etc.
  • Designed and developed PIG Latin Scripts to process data in a batch to perform trend analysis.
  • Installed Oozie workflow engine and scheduled it to run time dependent Hive and Pig jobs
  • Storing, processing and analyzing huge data-set for getting valuable insights from them.
  • Created various aggregated datasets for easy and faster reporting using Tableau.

Environment: HDFS, Map Reduce, Hive, Sqoop, Pig, HBase, Oozie, CDH distribution, Java, Eclipse, Shell Scripts, Tableau, Windows, Linux.

Confidential

Java Developer

Responsibilities:
  • Developed the J2EE application based on the Service Oriented Architecture by employing SOAP and other tools for data exchanges and updates.
  • Worked in all the modules of the application which involved front-end presentation logic - developed using Spring MVC, JSP, JSTL and JavaScript , Business objects - developed using POJOs and data access layer - using Hibernate framework.
  • Designed the GUI of the application using JavaScript, HTML, CSS, Servlets, and JSP.
  • Involved in writing AJAX scripts for the requests to process quickly.
  • Used Dependency Injection feature and AOP features of Spring framework to handle exceptions.
  • Involved in writing Hibernate Query Language (HQL) for persistence layer.
  • Implemented persistence layer using Hibernate that uses the POJOs to represent the persistence database.
  • Used JDBC to connect to backend databases, Oracle and SQL Server 2005.
  • Proficient in writing SQL queries, stored procedures for multiple databases, Oracle and SQL Server.
  • Wrote backend jobs based on Core Java & Oracle Data Base to be run daily/weekly.
  • Used Restful API and SOAP web services for internal and external consumption.
  • Used Core Java concepts like Collections, Garbage Collection, Multithreading, OOPs concepts and APIs to do encryption and compression of incoming request to provide security.
  • Written and implemented test scripts to support Test driven development (TDD) and continuous integration.

Environment: Java, JSP, HTML, CSS, Ubuntu Operating System, JavaScript, AJAX, Servlets, Struts, Hibernate, EJB (Session Beans), Log4J, WebSphere, JNDI, Oracle, Windows XP, LINUX, ANT, Eclipse.

Confidential

Java Developer

Responsibilities:

  • Involved in the complete SDLC (software development life cycle) of the application from requirement analysis to testing.
  • Developed the modules based on MVC Architecture and DAO design pattern for maximum abstraction of the application and code reusability.
  • Involved in development of POJO classes and writing Hibernate query language (HQL) queries.
  • Involved in design and development of server side layer using XML, JDBC, JSP, JNDI, EJB and DAO patterns.
  • Developed EJB tier using Singleton and DAO design patterns, which contains business logic, and database access functions.
  • Developed the business methods as per the IBM Rational Rose UML Model.
  • Developed front-end pages using HTML, CSS, JavaScript, AJAX and JSP.
  • Followed Scrum development cycle for streamline processing with iterative and incremental development.
  • Used JUnit for unit testing of the application and Apache log4j framework for logging services.
  • Provide daily development status reports, weekly status reports, and weekly development summary and defects report.
  • Worked with the testing team in creating new test cases and created the use cases for the module before the testing phase.

Environment: Java 1.6, HTML, Java script, JQuery , Servlets, JSP, Hibernate, JDBC, RESTful Webservices, IBM Rational Application Developer (RAD) 6, AJAX, DB2, log4j, Oracle 9i.

We'd love your feedback!