Sr. Hadoop Developer/Admin Resume New York, NY - Hire IT People

SUMMARY:

Over 9 + years of professional IT experience in analysis, design, and development using Hadoop, Java J2EE and SQL.
7+ years of experience in developing large scale applications using Hadoop and Other Big data tools.
Experienced in the Hadoop ecosystem components like Map Reduce, HBase, Spark, Oozie, Hive, Sqoop, PIG, Scala, Kafka, Flume, and Cassandra.
Experience in developing solutions to analyze large data sets efficiently.
Experience with new Hadoop 2.0 architecture YARN(MRV2) and developing YARN Applications on it.
Excellent Knowledge on Hadoop architecture as in HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
Excellent hands on with importing and exporting data from different Relational Database Systems like MySQL and Oracle into HDFS and Hive using Sqoop.
Knowledge in Kafka installation & integrational with Spark Streaming.
Hands - on experience in writing Pig Latin scripts, working with grunt shells and job scheduling with Oozie.
Excellent understanding/knowledge of Hadoop Distributed system architecture and design principles.
Experience with web-based UI development using jQuery, CSS, HTML, HTML5, XHTML and Java script.
Experience in converting MapReduce applications to Spark.
Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
Good knowledge in using job scheduling and workflow designing tools like Oozie.
Good in working with BI team and transform big data requirements into Hadoop centric technologies.
Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
Experience in Hadoop administration activities such as installation and configuration of clusters using Cloudera Manager and Ambari.
Have good experience creating real time data streaming solutions using Spark/Storm, Kafka and Flume.
Very good understanding on NOSQL databases like MongoDB and HBase.
Extensive experience in creating Class Diagrams, Activity Diagrams, Sequence Diagrams using Unified Modeling Language(UML).
Extending Hive and Pig core functionality by writing custom UDFs.
Good understanding of Data Mining and Machine Learning techniques.
Experience in handling messaging services using Apache Kafka.
Experience in fine-tuning Mapreduce jobs for better scalability and performance.
Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
Experience on NoSQL Databases such as HBase and Cassandra.
Experience in writing SQL, PL/SQL queries, Stored Procedures for accessing and managing databases such as Oracle, SQL Server, MySQL
Working experience in Development, Production and QA Environments.
Experienced in SDLC, Agile (SCRUM) Methodology, And Iterative Waterfall.

PROFESSIONAL EXPERIENCE:

Sr. Hadoop Developer/Admin

Confidential, New York, NY

Skills Used: Hadoop, Map R, Spark, shark, Kafka, HDFS, Zoo Keeper, Hive, Pig, OOZIE, Core Java, Eclipse, HBASE, SQOOP, Flume, Hortonworks, Oracle 11g, Cassandra, UNIX Shell Scripting.

Responsibilities:

Installed and Configured Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
Worked with the team to increase cluster from 28 nodes to 42 nodes, the configuration for additional data nodes was done by Commissioning process in Hadoop.
Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, manage and review data backups and log files.
Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
Managed and scheduled Jobs on a Hadoop cluster.
Involved in defining job flows, managing and reviewing log files.
Installed Oozie workflow engine to run multiple Map Reduce, Hive HQL and Pig jobs.
Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
Collected the log data from web servers and integrated into HDFS using Flume.
Cassandra developer: Set-up configured and optimized the Cassandra cluster. Developed real-time java based application to work along with the Cassandra database.
Responsible to manage data coming from different sources.
Used Scala, Akka and Teradata to deliver data relating.
Worked on the core and Spark SQL modules of Spark.
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Started using apache NiFi to copy the data from local file system to HDFS.
Worked in AWS environment for development and deployment of Custom Hadoop Applications.
Developed Spark scripts by using Scala Shell commands as per the requirement.
Participated in requirement gathering form the Experts and Business Partners and converting the requirements into technical specifications
Defined Interface Mapping between JDBC Layer and Oracle Stored Procedures.
Experience in managing and reviewing Hadoop log files.
Worked along with the Informatica professional to resolved Informatica upgrade issue
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Responsible for building scalable distributed data solutions using Hadoop cluster environment with Hortonworks distribution
Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Implemented a script to transmit sysprin information from Oracle to Hbase using Sqoop.
Implemented best income logic using Pig scripts and UDFs.
Implemented test scripts to support test driven development and continuous integration.

Bigdata/ Hadoop developer

Confidential, Atlanta, GA

Skill Used: Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Java, Linux, Teradata, Zookeeper, Kafka, Impala, Akka, Apache Spark, Spark Streaming Horton Works, HBase, MongoDB

Responsibilities:

Worked with BI team in the area of Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Flume, Oozie Zookeeper and Sqoop.
Responsible for building scalable distributed data solutions using Hadoop.
Performed performance tuning and troubleshooting of MapReduce jobs by analysing and reviewing Hadoop log files.
Developed several custom User defined functions in Hive & Pig using Java.
Installed and configured Hadoop Map reduces, HDFS, developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
Collected the logs from the physical machines and the Open Stack controller and integrated into HDFS using Flume.
Migrated an existing on-premises application to AWS.
Experience in running Hadoop streaming jobs to process terabytes of xml format data.
Migrate mongo dB shared/replica cluster form one data centre to another without downtime.
Manage and Monitor large production MongoDB shared cluster environments having terabytes of the data.
Worked on Importing and exporting data from RDBMS into HDFS with Hive and PIG using Sqoop.
Highly skilled and experienced in Agile Development process for diverse requirements.
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala, Python.
Setting up MongoDB Profiling to get slow queries
Implement MMS monitoring and backup (MongoDB Management Services) on cloud and on local servers (on premise and OPS Manager).
Configuring HIVE and Oozie to store metadata in Microsoft SQL Server.
Experienced in migrating HiveQL into Impala to minimize query response time.
Developing and running Map-Reduce jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need
Used Spark API over Horton works Hadoop YARN to perform analytics on data in Hive.
Developed Spark scripts by using Scala shell commands as per the requirement
Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data
Extensive experience in working with HDFS, Pig, Hive, Sqoop, Flume, Oozie, MapReduce, Zookeeper, Kafka, Spark andHBase. Worked on Text mining project with Kafka.
Developed a data pipeline to store data into HDFS.
Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
Expertise in deployment of Hadoop Yarn, Spark and Storm integration with Cassandra, ignite and Kafka etc.
Move data between clusters using distributed copy. Support and maintenance of Sqoop jobs and programs. Designed and developed SparkRDDs, Spark SQLs.
Worked with customer to provide solutions to various problems. Worked with SPARK for POC purpose
Experience on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Involved in submitting and tracking Map Reduce jobs using Job Tracker.
Implemented Hive Generic UDF's to implement business logic.

Hadoop Developer / Administrator

Confidential, Parsippany, NJ

Skills Used: Hadoop, HDFS, HBase, MapReduce, Java, Hive, Pig, SQOOP, Flume, Kafka, OOZIE, Hue, Hortonworks, Python, Storm, Zookeeper, AVRO Files, SQL, ETL, Cloudera Manager, MySQL, MongoDB.

Responsibilities:

Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
Responsible for importing log files from various sources into HDFS using Flume.
Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
Created customized BI tool for manager team that perform Query analytics using Hive QL.
Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
Estimated the hardware requirements for NameNode and DataNodes & planning the cluster.
Created Hive Generic UDF's, UDAF's, UDTF's in python to process business logic that varies based on policy.
Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
Written programs in Scala that runs in spark and worked on Hue interface for querying the data.
Worked with Kafka for the proof of concept for carrying out log processing on a distributed system. Worked with NoSQL database Hbase to create tables and store data.
Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
Worked on performance tuning of Apache NiFi workflow to optimize the data ingestion speeds.
Migrated an existing on-premises application to AWS.
Discussed the implementation level of concurring programing in spark using python with message passing.
Involved in Cassandra Data Modeling and Analysis and CQL (Cassandra Query Language).
Experience in Upgrading Apache Ambari, CDH and HDP Cluster.
Configured and Maintained different topologies in Storm cluster and deployed them on regular basis.
Experienced with different kind of compression techniques like LZO, GZip, and Snappy.
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
Created Data Pipeline of Map Reduce programs using Chained Mappers.
Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
Worked on Informatica Schedulers to schedule the workflows.
Implemented map reduce programs to perform joins on the Map side using Distributed Cache in Java. Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
Experience in Upgrading hadoop cluster hbase/zookeeper from CDH3 to CDH4.
Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
Experienced in Monitoring Cluster using Cloudera manager.

Hadoop Developer/ Administrator

Confidential, Dover, NH

Skills Used: Hadoop Cluster, HDFS, Hive, Pig, SQOOP, Linux, Hadoop Map Reduce, HBase, Shell Scripting. Linux, UNIX Shell Scripting and Big Data.

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
Analyzed data using Hadoop components Hive and Pig.
Responsible for running Hadoop streaming jobs to process terabytes of xml's data.
Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
Experienced in developing applications in Hadoop Impala, Hive, Sqoop, Oozie, Java Map Reduce, SparkSQL, HDFS, Pig and TEZ.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by our BI team.
Responsible to manage data coming from different sources.
Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
Involved in loading data from UNIX file system to HDFS.
Responsible for creating Hive tables, loading data and writing hive queries.
Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
Extracted the data from Teradata into HDFS using the Sqoop.
Exported the patterns analyzed back to Teradata using Sqoop.
Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.

Java / Hadoop Developer

Confidential, IN

Skills Used: HDFS, HBase, MapReduce, Storm, Zookeeper, Hive, Pig, SQOOP, Cassandra, Spark, Scala, OOZIE, Hue, ETL, Cloudera Manager, Java, JDK, J2EE, Struts.

Responsibilities:

Involved in Requirement gathering, Business Analysis and translated business requirements into Technical design in Hadoop and Big Data
Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data
Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
Involved in creating hive tables, loading with data and writing hive queries that will run internally in map reduce way.
Involved in creating workflow to run multiple hive and Pig Jobs, which run independently with time and data availability
Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
Involved in developing shell scripts and automated data management from end to end integration work
Used Pig as a ETL tool to do Transformations, even joins and some pre-aggregations before storing data into HDFS
Developed Map Reduce program for parsing and loading into HDFS information.
Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
Configured and optimized the Cassandra cluster and developed real-time java based application to work along with the Cassandra database.
Used OOZIE workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and SQOOP.
Experienced in using Zookeeper and OOZIE Operational Services for coordinating the cluster and scheduling workflows.
Using HBase to store majority of data which needs to be divided based on region.

Java Developer

Confidential

Skills Used: Java, J2EE, Spring, Spring Web Service, JSP, JavaScript, Hibernate, SOAP, CSS, Struts, WebSphere, MQ Series, JUnit, Apache, Windows XP and Linux.

Responsibilities:

As a programmer, involved in designing and implementation of MVC pattern.
Extensively used XML where in process details are stored in the database and used the stored XML whenever needed.
Part of core team to develop process engine.
Developed Action Classes & Validation Struts framework.
Created project related documentations like user guides based on role.
Implemented modules like Client Management, Vendor Management.
Implemented Access Control Mechanism to provide various access levels to the user.
Designed and developed the application using J2EE, JSP, Struts, Hibernate, Spring technologies.
Coded DAO and hibernate implementation Class for data access.
Coded Springs Services Class and Transfer Objects to pass the data between layers.
Implemented Web Services using Axis
Used different features of Struts like MVC, Validation framework and tag library.
Created detail design document, Use cases, and Class Diagrams using UML
Written ANT scripts to build JAR, WAR and EAR files.
Developed Standalone Java Component that will interact with Crystal Reports on Crystal Enterprise Server in order to view Reports as well Scheduling of Reports as well storing data as XML and sending data to consumers using SOAP.
Deployed the application and tested on WebSphere Application Servers.
Developed JavaScript for client side validations in JSP.
Coordinated with the onsite, offshore and QA team to facilitate the quality delivery from offshore on schedule.

We provide IT Staff Augmentation Services!

Sr. Hadoop Developer/admin Resume

New York, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship