Hadoop Developer Resume
New York, NY
SUMMARY
- Over 7+ years of professional experience in field of IT with expertise in Enterprise Application Development including 4+ years in Big Data analytics and Hadoop Ecosystem encompassing a wide range of applications.
- Excellent hands on experience with Hadoop ecosystem components like Hadoop, Map Reduce, Impala, HDFS, Hive, Pig, HBase, MongoDB, Cassandra, Flume, Storm, Sqoop, Oozie, Kafka, Spark, Scala, and Zookeeper.
- Excellent understanding and hands on experience using NOSQL databases like Cassandra, Mongo DB and HBase.
- Experienced in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java Design Patterns.
- Very good hands on experience in advanced Big - Data technologies like Spark Ecosystem (Spark SQL, MLlib, Spark and Spark Streaming), Kafka and Predictive analytics (MLlib, R ML packages including Oxdata's ML library H2O).
- Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, Map Reduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper, Hadoop architecture and its components.
- Experienced with Hadoop and QA to develop test plans, test scripts and test environments and to understand and resolve defects.
- Experienced with cloud: Hadoop-on-Azure, AWS/EMR, Cloudera Manager (also direct-Hadoop-EC2(non EMR)).
- Experienced in analyzing data using HIVEQL, PIG Latin and custom MapReduce programs in JAVA. Extending HIVE and PIG core functionality by using custom UDF's.
- Experienced in analyzing data using HIVEQL, PIG Latin and custom MapReduce programs in Java.
- Experienced in Database development, ETL and Reporting tools using SQLServer, SQL, SSIS, SSRS, Crystal XI &SAPBO.
- Excellent knowledge in Hadoop Architecture and its major components like Hadoop Map Reduce, HDFS Framework, HIVE, PIG, HBase, Zookeeper, Sqoop, Flume, Apache Tika, Welbeck and Tableau.
- Experienced in J2EE, JDBC, Servlets, Struts, Hibernate, Ajax, JavaScript, jQuery, CSS, XML and HTML.
- Experienced in using IDEs like Eclipse, Visual Studio and experience in DBMS like SQLServer and MYSQL.
- Excellent experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Strong experience working on design and implemented a Cassandra based database and related web services for storing unstructured data.
- Good knowledge in Unified Modeling Language (UML),Object Oriented Analysis and Design and Agile (SCRUM) Methodologies.
- Experienced in optimization of Mapreduces algorithm using combiners and practitioners to deliver the best results.
PROFESSIONAL EXPERIENCE
HADOOP DEVELOPER
Confidential, New York, NY
Responsibilities:
- Developed architecture document, process documentation, server diagrams, requisition documents
- Involved in the Big Data requirements review meetings and partnered with business analysts to clarify any specific scenarios and involved in daily meetings to discuss the development/progress and was active in making meetings more productive.
- Worked with Hadoop Ecosystem components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
- Developed PIG and Hive UDF's in java for extended use of PIG and Hive and wrote Pig Scripts for sorting, joining, filtering and grouping the data.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Used AWS Data Pipeline to schedule an Amazon EMRcluster to clean and process web server logs stored in Amazon S3 bucket.
- Created detailed AWS Security Groups, which behaved as virtual firewalls that controlled the traffic allowed to reach one or more AWS EC2 instances.
- Created Hive tables, loaded data and wrote Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Tested Apache(TM), an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
- Involved in Cluster coordination services through Zookeeper and Adding new nodes to an existing cluster.
- Worked on importing data from multiple data sources to Google docs to S3/AWS, then to Data Lake.
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources and issued SQLqueries via Impala to process the data stored in HDFS and HBase.
- Involved in developing Impala scripts for extraction, transformation, loading of data in to data warehouse.
- Exported the analyzed data to the databases such as Teradata, MySQL and Oracle using Sqoop for visualization and to generate reports for the BI team.
- Developed ETL workflow which pushes web server logs to an Amazon S3 bucket.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster and developed Simple to complex Map/reduce streaming jobs using Java language that are implemented using Hive and Pig.
- Built a scalable, cost effective, and fault tolerant data warehouse system on Amazon Web Services (AWS) Cloud.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Created a Hive aggregator to update the Hive table after running the data profiling job and implemented Partitioning, Dynamic Partitioning and Bucketing in Hive.
- Used Spark with Yarn and got performance results compared with MapReduce andused Cassandra to store the analyzed and processed data for scalability.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the HDFS and to run multiple Hive and Pig jobs.
- Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis and managed and reviewed Hadoop log files.
- Prepare Maintenance Manual, System Description Document and other technical and function documents to help offshore team.
Environment: Big Data, Hadoop, MapReduce, Flume, Impala, HDFS, HBase, Hive, Pig, Sqoop, Oozie, Zookeeper, Cassandra, Teradata, MySQL, Oracle, Scala, Spark, Scala, JAVA, UNIX Shell Scripting, AWS
HADOOP DEVELOPER
Confidential, New York, NY
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
- Conducted POC on Hortonworks and suggested the best practice in terms HDP, HDFS platform
- Set up Hortonworks Infrastructure from configuring clusters to Node
- Installed Ambari server on the clouds
- Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH
- Assign access to users by multiple users’ login.
- Installed and configured CDH cluster, using Cloudera Manager for management of existing Hadoop cluster.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Extensively using Cloudera manager for managing multiple clusters with petabytes of data.
- Having knowledge on documenting processes, server diagrams, preparing server requisition documents
- Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
- Managing the configuration of the cluster to the meet the needs of analysis whether I/O bound or CPU bound
- Working on setting up 100 node production cluster and a 40 node backup cluster at two different data centers
- Performance tune and manage growth of the O/S, disk usage, and network traffic
- Responsible for building scalable distributed data solutions using Hadoop.
- Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
- Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
- Load and transform large sets of structured, semi structured and unstructured data
- Experience in managing and reviewing Hadoop log files.
- Job management using Fair scheduler.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
- Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from Python using Python-MySQL connector MySQL dB package to retrieve information.
- Created Oozie workflows to run multiple MR, Hive and pig jobs.
- Set up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Develop Spark code using Scala and Spark-SQL for faster testing and data processing
- Involved in the development of Spark Streaming for various data sources using Scala
- Import the data from different sources like HDFS/MYSQL into SparkRDD.
Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python, Machine Learning, NLP (Natural Language Processing)
HADOOP DEVELOPER
Confidential, Groveport, OHIO
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
- Involved the design, development of various modules in Hadoop Big Data Platform and processing data using MapReduce, Hive, Pig, Scoop and Oozie.
- Developed the technical strategy of using Apache Spark on Apache Mesos as a next generation, Big Data and "Fast Data" (Streaming) platform.
- Wrote the Spark code in Scala to connect to HBase and read/write data to the HBase table.
- Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
- Copied the data from HDFS to MONGODB using pig/Hive/Map reduce scripts and visualized the streaming processed data in Tableau dashboard.
- Successfully loaded files to Hive and HDFS from Oracle, Netezza and SQL Server using SQOOP
- Extracted data from different databases and to copy into HDFS using Sqoop and have an expertise in using compression techniques to optimize the data storage.
- Developed Simple to complex Map/reduce Jobs using Java programming language that are implemented using Hive and Pig.
- Continuously monitored and managed the Hadoop Cluster using ClouderaManager.
- Developed programs in Spark based on the application for faster data processing than standard MapReduce programs.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms and created the Spark Streaming code to take the source files as input.
- Developed spark programs using Scala, involved in creating Spark SQL Queries and Developed Oozie workflow for spark jobs
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
- Built analytics for structured and unstructured data and managing large data ingestion by using Avro, Flume, Thrift, Kafka and Sqoop.
- Worked on scalable distributed computing systems, software architecture, data structures and algorithms using Hadoop, Apache Spark and Apache Storm etc. and ingested streaming data into Hadoop using Spark, Storm Framework and Scala.
- Installed KAFKA on Hadoop cluster and configured producer and consumer coding part in java to establish connection from twitter source to HDFS.
- Exported the patterns analyzed back to Teradata using Sqoop.
- Organizing daily scrum call for status update with offshore by using Rally and Agile Craft and creating monthly status report for client
Environment: Hadoop, Map Reducer, Cloudera Manager, HDFS, Hive, Pig, Spark, Storm, Flume, Thrift, Kafka, Sqoop, Oozie, Impala, SQL, Scala, Spark, Teradata, Scala, Java (JDK 1.6), Hadoop (Cloudera), Tableau, Eclipse and Informatica. )
Java Developer
Confidential, New York, NY
Responsibilities:
- Launched Amazon EC2 Instances using AWS (Linux/ Ubuntu/RHEL) and configured instances with respect to specific applications
- Developed Entity Java Beans (EJB) classes to implement various business functionalities (session beans).
- Developed various end users screens using JSF, Servlet technologies and UI technologies like HTML, CSS and JavaScript.
- Performed necessary validations of each screen developed by using AngularJS and JQuery.
- Configured spring configuration file to make use of Dispatcher Servlet provided by Spring IOC.
- Separated secondary functionality from primary functionality using Spring AOP.
- Developed a Stored Procedures for regular cleaning of database and prepared test cases and provided support to QA team in UAT.
- Consumed Web Service for transferring data between different applications using RESTful APIs along with Jersey and JAX-RS.
- Built the application using TDD (Test Driven Development) approach and involved in different phases of testing like Unit Testing.
- Responsible for fixing bugs based on the test results.
- Involved in SQLstatements, stored procedures, handled SQLInjections and persisted data using Hibernate Sessions, Transactions and SessionFactoryObjects.
- Responsible for Hibernate Configuration and integrated Hibernateframework.
- Analyzed and fixed the bugs reported in QTP and effectively delivered the bug fixes reported with a quick turnaround time.
- Extensively used Java Collections like Lists, Sets and Maps.
- Use PVCS for version control and deploy the application in JBOSSserver.
- Used Jenkins to deploy the application in testing environment.
- Involved in Unit testing of the application using JUnit and implemented Log4j to maintain system log.
- Used Maven for building, deploying application and creating JPA based entity objects.
- Developed the Presentation layer, which was built using Servlets and JSP and MVC.
- Used Spring Repository to load data from MongoDB database to implement DAO layer.
Environment: s: Selenium IDE, Groovy, RC Web Driver, Cucumber, HPQC, My Eclipse, JIRA, MySQL, Oracle, Java, JavaScript .Net, Python, Microservices, Restful API Testing, JMeter, VBScript, JUnit, TestNG, Firebug, Xpath, Windows