We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

New York New, YorK

PROFESSIONAL SUMMARY:

  • Over 5 years of IT experience as a Developer, Designer & quality Tester with cross platform integration experience using Hadoop Ecosystem, Java and Software Functional Testing.
  • Around 3 years of overall IT experience in a variety of industries, this includes hands on experience on Big Data Analytics, and Development
  • Experience in doing bash, writing shell scripting.
  • Experience spark Framework and Map Reduce for Hadoop to debug code.
  • Hands on experience in Big data development using AWS EMR, Redshift, RDS, Hive, Presto, Hadoop cluster set up, performance fine - tuning, monitoring and administration
  • J2EE technologies such as Spring, Spring MVC, Spring JDBC, Web services, SOAP, WSDL.
  • Having good experience in Bigdata related technologies like Hadoop frameworks, Map Reduce, Hive, HBase, PIG, Sqoop, Spark, Kafka, Flume, ZooKeeper, Oozie, EMR Clusters and Storm.
  • Experienced as Hadoop, expertise in providing end to end solutions for real time big data problems by implementing distributed processing concepts such as map reduce on Hadoop frameworks such as HDFS and Hadoop Ecosystem components.
  • Experience in working on large scale big data implementations and in production environment.
  • Hands on experience on Data Migration from Relational Database to Hadoop Platform\ SQOOP.
  • Good knowledge of database structures, theories, principle and RDS or other columnar Databases.
  • Working experience Spark 1.6 and 2.10 Development and remediating code to new version.
  • Hands on Experience with back-end programming using Scala.
  • Experience wih EMR, Hive and knowledge in Shell Scripting.
  • Worked with Writing ETL jobs through Spark and ability to handle performance and Memory Issues.
  • Experience with Database Structures, principles and theories.
  • Experienced in using Pig scripts to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS
  • Understanding of managed distributions of Hadoop, like Cloudera and Hortonworks.
  • Experience with Agile development methodology
  • Developed analytical components using Scala, Spark, Spark Stream and Cloudera.
  • Good experience in writing Spark applications using Python and Scala.
  • Experience processing Avro data files using Avro tools and MapReduce programs.
  • Implemented pre-defined operators in spark such as map, flat Map, filter, reduceByKey, groupByKey, aggregateByKey and combineByKey etc.
  • Used Scala sbt to develop Scala coded spark projects and executed using spark-submit
  • Developed multiple MapReduce jobs to perform data cleaning and preprocessing.
  • Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design.
  • Excellent communications skills possess strong problem solving, analytical, time management skills.
  • Experience analyzing and resolving performance, scalability and reliability issues.
  • Having experience in developing a data pipeline using Kafka to store data into HDFS. Good Experience on SDLC (Software Development Life cycle).
  • Exceptional ability to learn new technologies and to deliver outputs in short deadlines.
  • Monitoring Map Reduce Jobs and Yarn Applications.
  • Possess strong Communication skills of written, oral, interpersonal and presentation.
  • Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities

TECHNICAL SKILLS:

Hadoop/Big Data: Hadoop, Map Reduce, HDFS, Zookeeper, Kafka, Hive, Pig, Sqoop, Oozie, Flume, Yarn, HBase, Spark with Scala.

No SQL Databases: Hbase, Cassandra, mongoDB

Languages: C, C++, Java, J2EE, PL/SQL, Pig Latin, HiveQL, UNIX shell scripts

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: Red Hat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata

Tools: and IDE: Eclipse, Intellij.

PROFESSIONAL EXPERIENCE:

Confidential, New York, New York

Hadoop Developer

Responsibilities:

  • Worked on installing Kafka on Virtual Machine.
  • Created topic for different users
  • Installed Zookeepers, brokers, schema registry, control Center on multiple machine.
  • Setup ACL/SSL security for different users and assign users to multiple topics
  • Assign access to users by multiple user’s login.
  • Created documentation processes, server diagrams, preparing server requisition documents and upload them in Share point
  • Used Puppet for automation of deployment to the server
  • Monitor errors, warning on the server using Splunk.
  • Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
  • Created POC on AWS based on the service required by the project
  • Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform, NIFI
  • Set up Hortonworks Infrastructure from configuring clusters to Node
  • Installed Ambari server on the clouds
  • Performance tune and manage growth of the O/S, disk usage, and network traffic
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from LINUX file system to HDFS.
  • Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
  • Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
  • Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
  • Implemented test scripts to support test driven development and continuous integration.
  • Optimization and Tuning the application
  • Created User Guide Development and Training overviews for supporting teams
  • Provide troubleshooting and best practices methodology for development teams. This includes process automation and new application onboarding
  • Design monitoring solutions and baseline statistics reporting to support the implementation
  • Experience with designing and building solutions for data ingestion both real time & batch using Sqoop/PIG/Impala/Kafka.
  • Extremely good knowledge and experience with Map Reduce, Spark Streaming, SparkSQL for data processing and reporting.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Used Apache Kafka for importing real time network log data into HDFS.
  • Developed business specific Custom UDF's in Hive, Pig.
  • Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Optimized Map Reduce code by writing Pig Latin scripts.
  • Import data from external table into HIVE by using load command
  • Created table in hive and use static, dynamic partition for data slicing mechanism
  • Working experience with monitoring cluster, identifying risks, establishing good practices to be followed in shared environment
  • Design, develop, test, deploy and maintain the website.
  • Developed entire frontend and backend modules using Python on Django Web Framework.
  • Good understanding on cluster configurations and resource management using YARN
  • User AWS EMR to process data across cluster in EC2
  • Utilize AWS EMR for computation of data across multiple cluster

Environment: Hadoop, Confluent Kafka, Hortonworks HDF, HDP, NIFI, Linux, Splunk, Java, Puppet, Apache Yarn, Pig, Spark, Amazon EC2.

Confidential, New York, New York

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
  • Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform
  • Set up Hortonworks Infrastructure from configuring clusters to Node
  • Installed Ambari server on the clouds
  • Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH
  • Assign access to users by multiple users login.
  • Installed and configured CDH cluster, using Cloudera manager for easy management of existing Hadoop cluster.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Extensively using Cloudera manager for managing multiple clusters with petabytes of data.
  • Having knowledge on documenting processes, server diagrams, preparing server requisition documents
  • Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
  • Managing the configuration of the cluster to the meet the needs of analysis whether I/O bound or CPU bound
  • Worked on setting up high availability for major production cluster. Performed Hadoop version updates using automation tools.
  • Working on setting up 100 node production cluster and a 40 node backup cluster at two different data centers
  • Automate operations, installation and monitoring of the Hadoop Framework specifically: HDFS, Map/Reduce, Yarn, HBase.
  • Automated the setup of Hadoop Clusters and creation of Nodes
  • Monitor the improvement of CPU utilization and maintain it.
  • Performance tune and manage growth of the O/S, disk usage, and network traffic
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from LINUX file system to HDFS.
  • Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
  • Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
  • Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on tuning the performance of Mapreduce Jobs.
  • Responsible to manage data coming from different sources.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Experience in managing and reviewing Hadoop log files.
  • Job management using Fair scheduler.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Using PIG predefined functions to convert the fixed width file to delimited file.
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig job
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Created Oozie workflows to run multiple MR, Hive and pig jobs.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Develop Spark code using Scala and Spark-SQL for faster testing and data processing
  • Involved in the development of Spark Streaming application for one of the data source using Scala, Spark by applying the transformations.
  • Import the data from different sources like HDFS/MYSQL into SparkRDD.
  • Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.

Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python.

Confidential, New York, New York

Hadoop Developer

Responsibilities:

  • Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
  • Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
  • Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template.
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management.
  • Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Worked closely with the data modellers to model the new incoming data sets.
  • Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
  • Installed Hadoop, Map Reduce, HDFS, and AWS and developed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
  • Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and Hbase.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Import the data from different sources like HDFS/Hbase into Spark RDD.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Performed real time analysis on the incoming data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive,HBASE, Oozie, Scala, Spark, Linux.

Confidential, New York, New York

SDET

Responsibilities:

  • Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
  • Attend requirement meeting with Business Analysts/ Business Users
  • Analyze requirements and use cases, performed ambiguity reviews of business requirements and functional specification documents
  • Automated the functionality and interface testing of the application using Quick Test Professional (QTP)
  • Performed database testing using ODBC by Automation Test scripts
  • Created Driver Script using VB Script to execute QTP application automatically and run number of automated scripts simultaneously.
  • Develop Functional Library for DORS application.
  • Design, Develop and maintain automation framework (Hybrid Framework).
  • Analyze the requirements and prepare automation scripts scenarios
  • Automated the functionality and interface testing of the application using Quick Test Professional (QTP)
  • Prepare Test Plan for DORS(distribution online request system) for manual testing over different releases that covers GUI Testing, Functional Testing, Integration Testing, Regression Testing, Interface Testing, End-to-End Testing and User Acceptance Testing.
  • Create an automation test plan / strategy.
  • Developed MapReduce programs to perform data filtering for unstructured data.
  • Designed the application by implementing Struts Framework based on MVCArchitecture.
  • Designed and developed the front end using JSP, HTML and JavaScript and JQuery.
  • Developed framework for data processing using Design patterns, Java, XML.
  • Implemented J2EE standards, MVC2 architecture using Struts Framework.
  • Implementing Servlets, JSP and Ajax to design the user interface.
  • Used SpringIOC for dependency injection to Hibernate and Spring Frameworks.
  • Developed EJB components that are deployed on Web logic Application Server.
  • Written unit tests using Junit Framework and Logging is done using Log4J Framework.
  • Used Html, CSS, JavaScript and JQuery to develop front end pages.
  • Designed and developed various configuration files for Hibernate mappings.
  • Designed and Developed SQL queries and Stored Procedures.
  • Used XML, XSLT, XPATH to extract data from Web Services output XML

Environment: HP Load Runner, HP Quality Center, HP Quick Test Professional, Maven, Windows XP,J2EE, JSP, JDBC, Hibernate, spring, HTML, XMLCSS, JavaScript and JQuery.

Confidential, New York, New York

SDET

Responsibilities:

  • Involved in almost all the phases of SDLC.
  • Executed test cases manually and logged defects using Clear Quest
  • Automated the functionality and interface testing of the application using Quick Test Professional (QTP)
  • Design, Develop and maintain automation framework (Hybrid Framework).
  • Analyze the requirements and prepare automation scripts scenario
  • Develop test data for Regression testing using QTP
  • Wrote Test cases on IBM rational Manual Tester
  • Conducted Cross Browser testing on Different Platform
  • Client Application Testing, Web based Application Performance, Stress, Volume and Load testing of the system using Load Runner 9.5.
  • Analyzed performance of the application program itself under various test loads of many simultaneous Vusers.
  • Analyzed the impact on server performance CPU usage, server memory usage for the applications of varied numbers of multiple, simultaneous users.
  • Inserted Transactions and Rendezvous points into Web Vusers
  • Created Vuser Scripts using VuGen and used Controller to generate and executed Load Runner Scenarios
  • Complete involvement in Requirement Analysis and documentation on Requirement Specification.
  • Prepared use-case diagrams, class diagrams and sequence diagrams as part of requirement specification documentation.
  • Involved in design of the core implementation logic using MVC architecture.
  • Used Apache Maven to build and configure the application.
  • Developed JAX-WS web services to provide services to the other systems.
  • Developed JAX-WS client to utilize few of the services provided by the other systems.
  • Involved in developing EJB 3.0 Stateless Session beans for business tier to expose business to services component as well as web tier.
  • Implemented Hibernate at DAO layer by configuring hibernate configuration file for different databases.
  • Developed business services to utilize Hibernate service classes that connect to the database and perform the required action.
  • Developed JavaScript validations to validate form fields.
  • Performed unit testing for the developed code using JUnit.
  • Developed design documents for the code developed.
  • Used SVN repository for version control of the developed code.

Environment: SQL, Oracle 10g, Apache Tomcat, HP Load Runner, IBM Rational Robot, Clear quest, Java, J2EE, HTML, DHTML, XML, JavaScript, Eclipse, WebLogic, PL/SQL and Oracle.

We'd love your feedback!