We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

New York, NY

SUMMARY:

  • Over 5+ years of IT experience encompassing wide range of skill set in Big Data technologies and Java/J2EE technologies.
  • Around 2 years of experience in working with Big Data Technologies on systems this comprises of massive amount of data running in highly distributive mode in Cloudera, Hortonworks Hadoop distributions.
  • Excellent hands on experience in developing and implementing Big Data solutions and data mining applications on Hadoop using HDFS, MapReduce, HBase, Pig, Hive and Sqoop, Flume, Kafka, Storm, Spark, Oozie, ZooKeeper.
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS) and building highly scalable Big - data solutions using Hadoop and multiple distributions i.e. Cloudera, Hortonworks and NoSQL platforms (HBase & Cassandra).
  • Experience in design, development, Unit testing, integration, debugging and implementation and production support, client interaction and understanding business application, business data flow and data relations from them.
  • Excellent communication and inter-personal skills with technical competency and ability to quickly learn new technologies as required. Good team player with ability to solve problems, organize and prioritize multiple tasks.
  • In depth understanding/knowledge of Hadoop Architecture and its various components such as Job Tracker, Task Tracker, YARN, Name Node, Data Node and MapReduce concepts.
  • Expertise in setting up Hadoop on Pseudo distributed environment and Hive, Pig, HBase, and Sqoop on Ubuntu Operating System.
  • Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
  • In command of setup, configuration and security for Hadoop clusters using Kerberos.
  • Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
  • Screen Hadoop cluster job performances and capacity planning.
  • Monitor Hadoop cluster connectivity and security.
  • Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing.
  • Expertise in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
  • Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
  • Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
  • Excellent understanding of Hadoop MapReduce Programming paradigm and worked with join patterns and implemented Map side joins and Reduce side joins using MapReduce.
  • Responsible for troubleshooting and development on Hadoop technologies like HDFS, Hive, Pig, Flume, MongoDB, Accumulo, Sqoop, Zookeeper, Spark, MapReduce2, YARN, HBase, Tez, Kafka, and Storm.
  • Extensive experience with ETL and Query Big Data tools like PigLatin and HiveQL.
  • Experience on creating databases, tables and views in Hive, Impala.
  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Hands on experience in in-memory data processing with Apache Spark.
  • Extensive Knowledge on developing Spark Streaming jobs by developing RDD’s (Resilient Distributed Datasets) and used Pyspark and Spark-Shell accordingly.
  • Good knowledge in job workflow scheduling and monitoring tools like Oozie and ZooKeeper.
  • Translate, load and exhibit unrelated data sets in various formats and sources like JSON, text files, Kafka queues, and log data.
  • Expertise in Core Java, Data Structures, Algorithms, Object Oriented Design (OOD) and Java concepts such as OOP Concepts, Collections Framework, Exception Handling, I/O System and Multi-Threading.
  • Expertise in integration of various data sources like Java, RDBMS, Shell Scripting, Spreadsheets, and Text files.
  • Developed core modules in large cross-platform applications using JAVA, J2EE, Hibernate, Python, Spring, JSP, Servlets, EJB, JDBC, JavaScript, XML, and HTML.
  • Extensive experience in working with Oracle, MS SQL Server, DB2, MySQL RDBMS databases.
  • Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile.
  • Hands on experience in Sequence files, RC files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
  • Expert in creating and designing data ingest pipelines using technologies such as Spring Integration, Apache Storm-Kafka.
  • Extensive experience in data ingestion technologies, such as Flume, Kafka and Sqoop.
  • Utilize Kafka and Flume to gain real-time and near-real time streaming data in HDFS from different sources.
  • Experienced in using Kafka as a distributed publisher-subscriber messaging system.
  • Real streaming the data using Spark with Kafka.
  • Expertise in importing and exporting data between Hadoop and RDBMS using Sqoop.
  • Extracted data from various log files and push into HDFS using Flume.
  • Worked in ETL tools like Talend to simplify MapReduce jobs from the front end. Also have knowledge of Pentaho as another working ETL tool with Big Data.
TECHNICAL SKILLS:

Big Data Technologies: Hadoop 1.x/2.x(Yarn), HDFS, MapReduce, Pig, Hive, HBase, Cassandra, ZooKeeper, Oozie, Sqoop, Flume, Puppet, HCatalog , Apache Spark, Scala, Impala, Kafka, Storm, Tez, Ganglia, Nagios

Hadoop Distributions: Cloudera, Horton Works, AWS

Operating systems: Windows, Macintosh, Linux, Ubuntu, Unix, CentOS

Programming languages: C, JAVA, J2EE, SQL, PigLatin, HiveQL, Scala, Python, Unix Shell Scripting

JAVA Technologies: JSP, Servlets, Spring, Hibernate, Maven

Databases: MS-SQL, Oracle, MS-Access, NoSQL, MySQL

Reporting tools/ ETL tools: Tableau, Informatica, Data stage, Talend, Pentaho, Power View

Methodologies: Agile/ Scrum, Waterfall

Development Tools: Eclipse, NetBeans, IntelliJ, Hue, Microsoft Office Suite (Word, Excel, PowerPoint, Access)

PROFESSIONAL EXPERIENCE:

Confidential, New York, NY

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Wrote multiple MapReduce programs using PigLatin and in Java for Data Analysis.
  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files. Developed pig scripts for analyzing large data sets in the HDFS.
  • Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
  • Experienced in migrating HiveQL into Impala to minimize query response time.
  • Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
  • Performed extensive Data Mining applications using Hive. Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Sqoop jobs, Pig and Hive scripts were created for data ingestion from relational databases to compare with historical data.
  • Administered and supported distribution of Hortonworks.
  • Used Kafka to load data in to HDFS and move data into NoSQL databases viz. Cassandra.
  • Used Oozie Operational Services for batch processing and scheduling workflows dynamically . 
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
  • Used Visualization tools such as Power View for excel, Tableau for visualizing and generating reports.
  • Exported data to Tableau and excel with Power View for presentation and refining.
  • Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources. Implemented Hive Generic UDF's to implement business logic.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Java, Unix/Linux, Teradata, ZooKeeper, Tableau, HBase, Cassandra, Kafka

Confidential, Dallas, TX

Hadoop Developer

Responsibilities:

  • Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Developed simple to complex MapReduce streaming jobs using Python language that are implemented using Hive and Pig.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (PigLatin) to study customer behavior.
  • Tested Apache Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Used Impala to read, write and query the Hadoop data in HDFS or HBase or Cassandra.
  • Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Experience in writing MapReduce jobs in Java, Pig, Hive and MapReduce, Tuning MR/Hive queries.
  • Expertise in doing Tableau Server Management (like Clustering, Load Balancing, User Management etc.).
  • Expertise in taking back ups and restoration of Tableau repository.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Used Mahout to understand the machine learning algorithms for an efficient data processing.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.

Environment: Hadoop, Pig, Hive, Apache Sqoop, Kafka, Flume, Oozie, HBase, ZooKeeper, Tez, Impala, Mahout, Cloudera manager, 30 Node cluster with Linux-Ubuntu, Tableau.

Confidential, Redwood City, CA

Hadoop Administrator/ Developer

Responsibilities:

  • Hands on experience Installation, configuration, maintenance, monitoring, performance and tuning, and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production. 
  • Job Tracker is used to assign MapReduce Tasks to Task Tracker in cluster of Nodes.
  • Good experience on cluster audit findings and tuning configuration parameters. 
  • Implemented Kerberos security in all environments. 
  • Defined file system layout and data set permissions
  • Implemented Capacity Scheduler to share the resources of the cluster for the MapReduce jobs given by the users. 
  • Demonstrate and understanding of concepts, best practices and functions to implement a Big Data solution in a corporate environment. 
  • Worked on pulling the data from oracle databases into the Hadoop cluster.
  • Help design of scalable Big Data clusters and solutions. 
  • Manage and review data backups and log files and experience in deploying Java applications on cluster.
  • Commissioning and Decommissioning Nodes from time to time.
  • Work with Hadoop developers, designers in troubleshooting MapReduce job failures and issues and helping to developers. 
  • Work with network and Linux system engineers to define optimum network configurations, server hardware and operating system. 
  • Evaluate and propose new tools and technologies to meet the needs of the organization. 
  • Production support responsibilities include cluster maintenance.

Environment: Hadoop 1.2.1, MapReduce, HDFS, Pig, Hive, Java (J2EE), XML, Microsoft (Word & excel), Linux.

Confidential

Java Developer

Responsibilities:

  • Developed the Training and Appraisal modules using Java, JSP, Servlets and JavaScript .
  • Developed UI using java swings.
  • Involved in Designing the Database Schema and writing the complex SQL queries.
  • Involved in gathering and analyzing system requirements.
  • Accessed stored procedures and functions using JDBC Callable statements.
  • Executed and coordinated the installation for the project.
  • Worked on web-based reporting system with HTML, JavaScript and JSP .
  • Played key role in the high-level design for the implementation of this application.
  • Involved in Code reviews for other modules developed by peers.
  • Designing and establishing the process and mapping the functional requirement to the workflow process.
  • Involved in Maintenance and Enhancement of the project.

Environment: JDK 1.3, J2EE 1.3, JDBC, Tomcat, Oracle, HTML, Servlets, DHTML, SQL and JUnit .

Confidential

Java Developer

Responsibilities:

  • Implemented client side validation using JavaScript.
  • Developed user interface using JSP, Struts Tag Libraries to simplify the complexities of the application.
  • Developed business logic using Stateless session beans for calculating asset depreciation on Straight line and written down value approaches.
  • Involved coding SQL Queries, Stored Procedures and Triggers.
  • Created REST based web service in JSON, RSS and CSV format.
  • Created java classes to communicate with database using JDBC.
  • Responsible for design and implementation of various modules of the application using Struts-Spring-Hibernate architecture.
  • Developed the Web Interface using Servlets, Java Beans, Java Server Pages, HTML and CSS.
  • Used JDBC for database access.
  • Extensively used the JDBC Prepared Statement to embed the SQL queries into the java code.
  • Develop GUI related changes using JSP, HTML and client validations using JavaScript.
  • Developed DAO (Data Access Objects) using Spring Framework 3.
  • Developed Web applications with Rich Internet applications using Java applets, Silverlight, Java.
  • Used JavaScript to perform client side validations and Struts-Validator Framework for server-side validation.
  • Designed and developed the application using various design patterns, such as session facade, business delegate and service locator.
  • Involved in designing use-case diagrams, class diagrams, and interaction using UML model with Rational Rose.

Environment: Java, Servlets, Java Beans, JSP, EJB, J2EE, STRUTS, XML, XSLT, JavaScript, HTML, CSS, Spring 3.2, SQL, PL/SQL, MS Visio, Eclipse, JDBC, Windows XP.

We'd love your feedback!