Sr. Hadoop Developer Resume
Seattle, WA
SUMMARY:
- Around 7 years of experience in software Admin and development, 4+ years of experience in developing large scale applications using Hadoop and Other Big data tools.
- Experienced in the Hadoop ecosystem components like Hadoop Map Reduce, Cloudera, Hortonworks, HBase, Oozie, Hive, Sqoop, Pig, Flume, and Cassandra.
- Extensive experience in analyzing data using Hadoop Ecosystems including HDFS, Hive, PIG, Sqoop, Flume, MapReduce, Spark, Kafka, HBase, Oozie, Solr and Zookeeper
- Experience with distributed systems, large - scale non-relational data stores, MapReduce systems, data modeling, and big data systems.
- Knowledge on implementing BigData in Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2 instances.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode.
- Experience with Amazon Web Services, AWS command line interface, and AWS data pipeline.
- Experience in writing SQL, PL/SQL queries, Stored Procedures for accessing and managing databases such as Oracle, SQL Server2014/2012 MySQL, and IBM DB2.
- Hands on experience on Database tuning and Query tuning.
- Excellent understanding/knowledge of design and implementation of Teradata data warehousing solutions , Teradata Aster big data analytics and Analytic Applications.
- Good working experience in using Spark SQL to manipulate Data Frames in Python .
- Good knowledge in NoSQL databases including Cassandra and MongoDB.
- Excellent understanding of how Socket Programming enables two or more hosts to communicate with each other.
- Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into PigLatin and HQL (HiveQL).
- Planned and created answer for constant information ingestion utilizing Kafka, Storm, Spark spilling and different NoSQL databases.
- Experience in handling native drivers of MongoDB, The Drivers which include Java and Python.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Extensive hands on experience in writing complex Mapreduce jobs, Pig Scripts and Hive data modeling.
- Experience in converting MapReduce applications to Spark.
- Good working knowledge in cloud integration with Amazon Web Services components like EMR, EC2, S3 etc.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Good knowledge in using job scheduling and workflow designing tools like Oozie.
- Experience in working with BI team and transform big data requirements into Hadoop centric technologies.
- Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
- Have good experience creating real time data streaming solutions using Apache Spark/Spark Streaming Apache Storm, Kafka and Flume.
- Background with traditional databases such as Oracle, Teradata, Netezza, SQL Server, ETL tools processes and data warehousing architectures.
- Experience in handling messaging services using Apache Kafka.
- Experience in fine-tuning Mapreduce jobs for better scalability and performance.
- Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
- Experienced in developing and implementing web applications using Java, J2EE, JSP, Servlets, JSF, HTML, DHTML, EJB, JavaScript, AJAX, JSON, JQuery, CSS, XML, JDBC and JNDI.
- Working experience in Development, Production and QA Environments.
- Possess strong skills in application programming and system programming using C++ and Python on Windows and LINUX platforms using principles of Object Oriented Programming (OOPS) and Design Patterns.
- Experience in working with various Cloudera distributions (CDH4/CDH5) and have knowledge on Hortonworks and Amazon EMR Hadoop Distributions.
- Working experience of control version tools like SVN, CVS, Clear Case and PVCS.
TECHNICAL SKILLS:
Big Data Technologies: HDFS, Hive, Hana, AWS, Map Reduce, Pig, Sqoop, Kafka, Storm, Oozie, Zookeeper, YARN,Avro, EMR, Spark
Scripting Languages: Shell, Python, Perl, Scala
Tools: Quality center v11.0\ALM, TOAD, JIRA, HP QTP, HP UFT, Selenium, Test NG, JUnit
Programming Languages: Java, C.., C, SQL, PL/SQL, PIG-Latin, HQL,CQL
QA methodologies: Waterfall, Agile, V-model.
Front End Technologies : HTML, XHTML, CSS, XML, JavaScript, AJAX, Servlets, JSP
Java Frameworks: MVC, jQuery, Apache Struts2.0, spring and Hibernate
Defect Management: Jira, Quality Center.
Domain Knowledge: GSM, WAP, GPRS, CDMA and UMTS (3G)
Web Services: SOAP (JAX-WS), WSDL, SOA, Restful (JAX-RS), JMS
Application Servers: Apache Tomcat, Web Logic Server, Web Sphere, JBoss
Version controls : GIT, SVN, CVS
Databases : Oracle 11g, MySQL, MS SQL Server, IBM DB2 NoSQL Databases HBase, MongoDB
Cassandra Data Stax Enterprise 4.6.1:
Cassandra RDBMS : Oracle 9i, Oracle 10g, MS Access, MS SQL Server, IBM DB2, and PL/SQL
Operating Systems : Linux, UNIX, MAC, Windows NT / 98 /2000/ XP / Vista, Windows 7, Windows
PROFESSIONAL EXPERIENCE:
Confidential,Seattle, WA
Sr. Hadoop Developer
Responsibilities:
- Involved in all phases of Software Development Life Cycle (SDLC) and Worked on all activities related to the development, implementation and support for Hadoop.
- Installed and Configured Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Adding/installation of new components and removal of them through Cloudera Manager
- Played a key role in installation and configuration of the various Hadoop ecosystem tools such as Solr, Kafka, Pig, HBase and Cassandra.
- Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
- Wrote complex Hive queries and UDFs in Java and Python.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
- Involved in implementing an HDInsight version 3.3 clusters, which is based on spark version 1.5.1.
- Good knowledge in using components that are used in cluster such as spark core (Includes Spark core, Spark SQL, Spark streaming API’s.)
- Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive
- Installed APACHE NIFI and MINIFI to make data ingestion Fast, Easy and Secure from internet of anything with HORTONWORKS DATA FLOW and Configuring, Managing permissions for the users in hue
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters and Experience in converting MapReduce applications to Spark.
- Developed and maintained the continuous integration and deployment systems using Jenkins, ANT, Akka and MAVEN.
- Effectively used GIT(version control) to collaborate with the Akka team members.
- Installed Oozie workflow engine to run multiple Map Reduce, Hive HQL and Pig jobs.
- Developed HDFS with huge amounts of data using Apache Kafka.
- Collected the log data from web servers and integrated into HDFS using Flume.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts and Experience in managing and reviewing Hadoop log files
- Constructed System components and developed server side part using Java, EJB, and Spring Frame work. Involved in designing the data model for the system.
- Used J2EE design patterns like DAO, MODEL, Service Locator, MVC and Business Delegate.
- Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration
- Converted all the vap processing from Netezza and implemented by using Spark data frames and RDD's.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Implemented a proof of concept (Poc's) using Kafka, Strom, HBase for processing streaming data.
- Implemented a script to transmit sysprin information from Oracle to Hbase using Sqoop.
- Implemented best income logic using Pig scripts and UDFs.
- Component unit testing using Azure Emulator Analyze escalated incidences within the Azure SQL database
Environment: Hadoop, Map Reduce, Spark, shark, Kafka, Cloudera, AWS, HDFS, Zoo Keeper, Hive, Pig, Oozie, Core Java, Eclipse, Hbase, Sqoop, Netezza, EMR, Apache NIFI, Flume, Scala, Oracle 11g, Cassandra, SQL, Python, Sharepoint, Azure 2015, GIT, UNIX Shell Scripting, Linux, Jenkins and Maven.
Confidential, Port Washington,NY
Hadoop Developer
Responsibilities:- Working as Hadoop Developer and admin in Hortonworks (HDP 2242) distribution for 10 clusters ranges from POC to PROD
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Configured, Designed implemented and monitored Kafka cluster and connectors.
- Hands on experience in writing MR jobs for cleansing the data and to copy it to AWS cluster form our cluster.
- Experienced on adding/installation of new components and removal of them through Ambari
- Monitoring systems and services through Ambari dashboard to make the clusters available for the business
- Architecture design and implementation of deployment, configuration management, backup, and disaster recovery systems and procedures
- Worked with HQL and Criteria API from retrieving the data elements from database.
- Hand on experience on cluster up gradation and patch upgrade without any data loss and with proper backup plans
- Used Sqoop to import data into HDFS from Oracle, MySQL, Netezza and Access databases and vice-versa.
- Changing the configurations based on the requirements of the users for the better performance of the jobs
- Experienced in Ambari-alerts configuration for various components and managing the alerts
- Worked on migrating MapReduce programs into Spark transformations using Scala.
- Good troubleshooting skills on Hue, which provides GUI for developers/business users for day to day activities
- Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis
- Implemented 100 node CDH4 Hadoop cluster on Red hat Linux using Cloudera Manager
- Implemented complex MapReduce programs to perform joins on the Map side using distributed cache
- Setup flume for different sources to bring the log messages from outside to Hadoop HDFS
- Implemented Name Node HA in all environments to provide high availability of clusters
- Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases with Cron jobs
- Setting up MySQL master and slave replications and helping business applications to maintain their data in MySQL Servers
- Helping the users in production deployments throughout the process
- Written Storm topology to accept the events from Kafka producer and emit into Cassandra DB
- Experienced in production support which involves solving the user incidents varies from sev1 to sev5
Environment: Hadoop, Map Reduce, Cloudera, AWS, HDFS, Pig, Hive, Yarn, HBase, MapReduce, Kafka, Sqoop, Flume, Zookeeper, EMR, Netezza, Hortonworks, Scala, Eclipse, MYSQL, Python, UNIX, Shell Scripting
Confidential,San Mateo, CA
Hadoop Developer
Responsibilities:- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioural data into HDFS for analysis.
- Responsible for importing log files from various sources into HDFS using Flume.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Created customized BI tool for manager team that perform Query analytics using Hive QL.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Estimated the hardware requirements for NameNode and DataNodes & planning the cluster.
- Developed framework to import the data from database to HDFS using Sqoop. Developed HQLs to extract data from Hive tables for reporting.
- Hands on experience in writing MR jobs for cleansing the data and to copy it to AWS cluster form our cluster
- Used open source web scraping framework for python to crawl and extract data from web pages.
- Possess strong skills in application programming and system programming using C++ and Python on Windows and LINUX platforms using principles of Object Oriented Programming (OOPS) and Design Patterns
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system. Worked with NoSQL database Hbase to create tables and store data.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
- Involved in Cassandra Data Modelling and Analysis and CQL (Cassandra Query Language).
- Experience in Upgrading Apache Ambari, CDH and HDP Cluster.
- Configured and Maintained different topologies in Storm cluster and deployed them on regular basis.
- Experienced with different kind of compression techniques like LZO, GZip, and Snappy.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Experience in Upgrading hadoop cluster hbase/zookeeper from CDH3 to CDH4.
- Involved in Agile SDLC during the development of project.
- Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
- Experienced in Monitoring Cluster using Cloudera manager.
Environment: Hadoop, HDFS, HBase, MapReduce, Java, AWS, JDK 1.5, J2EE 1.4, Struts 1.3, Hive, Pig, Sqoop, Flume, Kafka, Oozie, C++, Hue, Storm, Zookeeper, AVRO Files, Netezza, SQL, ETL, Python, Cassandra, Cloudera Manager, MySQL, MongoDB.
Confidential,Moberly,MO
Hadoop Developer
Responsibilities:
- Worked as Hadoop Developer and responsible for taking care of everything related to the clusters total of 60 nodes ranges from POC to PROD clusters
- Experienced on setting up Horton works cluster and installing all the ecosystem components through Ambari and manually from command line
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files
- Load log data into HDFS using Flume, Kafka and performing ETL integrations
- Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade
- Responsible for Installation of various Hadoop Ecosystems and Hadoop Daemons
- Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of databases
- Implemented Kerberos Security Authentication protocol for existing cluster
- Involved in transforming data from Mainframe tables to HDFS, and HBASE tables using Sqoop and Pentaho Kettle And also worked on Impala to analyze stored data
- Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment And supporting and managing Hadoop Clusters using Apache, Horton works, Cloudera and MapReduce
- Involved in loading data from UNIX file system to HDFS And Created custom Solr Query components to enable optimum search matching
- Involved in writing Map reduce programs and tested using MRUnit
- Installed and configured local Hadoop Cluster with 3 nodes and set up 4 nodes cluster on EC2 cloud
- Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration
- Developing scripts and batch job to schedule a bundle (group of coordinators), which consists of various Hadoop programs using Oozie
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and pre-processing
- Installation and Configuration of VMware vSphere client, Virtual Server creation and resource allocation
- Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages Providing reports to management on Cluster Usage Metrics
Environment: HDFS, Map Reduce, HBase, Kafka, Yarn, Mongo DB, Hive, Impala, Oozie, Pig, Sqoop, Shell Scripting, MySQLdb, Red Hat Linux, CentOS and other UNIX utilities, Cloudera Manager
Confidential
Java Developer
Responsibilities:- Designed a system and developed a framework using J2EE technologies based on MVC architecture .
- Involved in the iterative/incremental development of project application. Participated in the requirement analysis and design meetings.
- Designed and Developed UI’s using JSP by following MVC architecture
- Designed and developed Presentation Tier using Struts framework, JSP, Servlets, TagLibs, HTML and JavaScript.
- Designed the control which includes Class Diagrams and Sequence Diagrams using VISIO.
- Used the STRUTS framework in application. Programmed the views using JSP pages with the struts tag library, Model is a combination of EJB’s and Java classes and web implementation controllers are Servlets.
- Generated XML pages with templates using XSL. Used JSP and Servlets, EJBs on server side.
- Developed a complete External build process and maintained using ANT.
- Implemented Home Interface, Remote Interface , and Bean Implementation class .
- Implemented business logic at server side using Session Bean.
- Extensive usage of XML - Application configuration, Navigation, Task based configuration.
- Designed and developed Unit and integration test cases using Junit.
- Used EJB features effectively- Local interfaces to improve the performance, Abstract persistence schema, CMRs.
- Used Struts web application framework implementation to build the presentation tier.
- Wrote PL/ SQLqueries to access data from Oracle database.
- Set up Web sphere Application server and used ANT tool to build the application and deploy the application in Web sphere .
- Prepared test plans and writing test cases
- Implemented JMS for making asynchronous requests
Environment: Java, J2EE, Struts, Hibernate, JSP, Servlets, HTML, CSS, UML, JQuery, Log4J, XML Schema, JUNIT, Tomcat, JavaScript, Oracle 9i, UNIX, Eclipse IDE.
Confidential
Java Developer
Responsibilities:- Understanding and analyzing the requirements.
- Implemented server side programs by using Servlets and JSP.
- Designed, developed and validated User Interface using HTML, Java Script, XML and CSS.
- Implemented MVC using Struts Framework.
- Handled the database access by implementing Controller Servlet.
- Implemented PL/SQL stored procedures and triggers.
- Used JDBC prepared statements to call from Servlets for database access.
- Designed and documented of the stored procedures
- Widely used HTML for web based design.
- Involved in Unit testing for various components.
- Worked on database interaction layer for insertions, updating and retrieval operations of data from oracle database by writing stored procedures.
- Involved in development for simulator which is being used for controllers to simulate real time scenarios using C / C++ programming.
- Used Spring Framework for Dependency Injection and integrated with Hibernate.
- Involved in writing JUnit Test Cases.
- Used Log4J for any errors in the application
Environment: Java, J2EE, JSP, Servlets, HTML, DHTML, XML, JavaScript, Struts, c/c++, Eclipse, WebLogic, PL/SQL and Oracle.