Hadoop Developer Resume
Houston, TX
SUMMARY
- Over 7+ years of professional IT experience with 3+ Years of Big Data Hadoop Ecosystems experience in ingestion, storage, querying, processing and analysis of big data.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, Data Node and MapReduce programming paradigm.
- Experience in one or more of the following cloud platforms: Confidential Azure, Confidential Enterprise Cloud, or other cloud technologies.
- Proficient in Installation, Configuration and migrating and upgrading of data from Hadoop MapReduce, HIVE, HDFS, HBase, Sqoop, Oozie, Pig, Cloudera, Scala, Zookeeper, Flume, Hortonworks and Cassandra.
- Experience in installation, configuration, supporting and managing - CloudEra's Hadoop platformalong with CDH4&5 clusters.
- Expertise in Hadoop eco-system (YARN, HDFS, HBase, Hive, etc.)
- Experience in one or more of the following cloud platforms: Confidential Azure, Confidential Enterprise Cloud, or other cloud technologies.
- Experience in developing a data pipeline through Kafka-Spark API.
- Responsible to migrate from Hadoop to Spark frameworks, in-memory distributed computing for real time fraud detection.
- Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data migration, Oozie for scheduling and HBase as a NoSQL data store.
- Experience in NoSQL database MongoDB and Cassandra.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experienced in deployment of Hadoop Cluster using Puppet tool.
- Experience in scheduling Cron jobs on EMR, Kafka, and Spark using Clover Server.
- Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
- Proficient in configuring Zookeeper, Cassandra & Flume to the existing Hadoop cluster.
- In depth knowledge of JobTracker, Task Tracker, NameNode, DataNodes and MapReduce concepts.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Experience in Big Data analysis using PIG and HIVE and understanding of SQOOP and Puppet.
- Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).
- Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
- Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
- Good experience in implementing and setting up standards and processes for Hadoop based application design and implementation.
- Experience in Object Oriented language like Java and Core Java.
- Expert in developing stored procedures, Views, UDFs, Triggers, Performance Monitoring.
- Database performance tuning, performance tracking & monitoring, proactively resolve potential bottleneck issues
- Extensive working knowledge in SSIS/SSRS/SSAS MSSQLServer related to hardware or software.
- Extensive knowledge in TuningSQLqueries & improving the performance of the Database.
- Experience in managing Hadoop clusters using Cloudera Manager Tool.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Extensive experience working in Oracle, DB2, SQL Server and My SQL database.
- Hands on experience in VPN, Putty, winSCP, VNCviewer, etc.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Ability to adapt evolving technology, strong sense of responsibility and accomplishment.
TECHNICAL SKILLS
Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Oozie, Spark
Languages: Java, SQL, XML, C++, C, WSDL, XHTML, HTML, CSS, Java Script, AJAX, PLSQL.
Java Technologies: Java, J2EE, Hibernate, JDBC, Servlets, JSP, JSTL, JavaBeans, JQuery and EJB.
ETL Tools: Informatica, Pentaho
Design and Modeling: UML and Rational Rose.
Web Services: SOAP, WSDL, UDDI, SDLC
Scripting languages: Java Script, Shell Script
Version Control: CVS, Clear case, SVN
Databases: Oracle 10g/9i/8i, SQL Server, DB2, MS-Access
Environment: s: UNIX, Red Hat Linux, Windows 2000/ server 2008/2007, Windows XP.
PROFESSIONAL EXPERIENCE
Confidential, Seattle, WA
Big Data Engineer
Environment: AZURE, HDInsight, Confidential Azure, Confidential Enterprise Cloud, YARN, Ambari, Hive, Java, Sqoop, MySQL, ADF,SSIS
Responsibilities:
- Involved in complete Software Development Life Cycle (SDLC) process by analyzing business requirements and
- Expertise in Hadoop eco-system (YARN, HDFS, HBase, Hive, etc.)
- Experience in one or more of the following cloud platforms: Confidential Azure, Confidential Enterprise Cloud, or other cloud technologies.
- Developed a data pipeline for data processing using Kafka-Spark API.
- Experienced in writing complex SQL Queries, Stored Procedures, Triggers, Views, Cursors, Joins, Constraints, DDL,
- DML and User Defined Functions to implement the business logic and also created clustered and non - Clustered indexes.
- Developed and Optimized Stored Procedures, Views, and User-Defined Functions for the Application.
- Involved in Normalization of the database and bringing it to 3NF. Dealt with Fine-tuning Stored Procedures to improve performance with a Query plan using SQL Tuning advisor.
- Implemented Spark streaming framework that processes the data for Kafka and perform analytics on top of it.
- Used SSIS to create ETL packages to validate, extract, transform and load data to data warehouse and data marts.
- Created Data-driven Packages in SSIS as part of Referral Process.
- Developed, deployed and monitored SSIS Packages for new ETL Processes.
- Developed various operational Drill-through and Drill-down reports using SSRS.
- Parameters to generate a report from two different Data Sets.
Confidential, Seattle, WA
Big Data Engineer
Environment: Apache Hadoop, HDFS, Perl, Python, Pig, Hive, Java, Sqoop, Cloudera CDH5, Oracle, MySQL, Tableau,Talend, Elastic search, ZoomData, Storm, Data governance implementation.
Responsibilities:
- Understanding and analyzing business requirements, High Level Design and Detailed Design
- Extensive scripting in Perl and Python.
- Design and Develop Parsers for different file formats (CSV, XML, Binary, ASCII, Text, etc.).
- Extensive usage of Cloudera Hadoop distribution.
- Executing parameterized Pig, Hive, impala, and UNIX batches in Production.
- Big Data management in Hive and Impala (Table, Partitioning, ETL, etc.).
- Design and Develop File Based data collections in Perl.
- Extensive Usage of Hue and other Cloudera tools.
- Used Map Reduce JUnit for unit testing.
- Extensive usage of NOSQL (HBASE) Database.
- Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, Cassandra and Hive).
- Design and Develop Dashboards in ZoomData and Write Complex Queries.
- Worked on Shell Programming and CronTab automation.
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
- Extensively worked in UNIX and Redhat environment.
- Performed testing and bug fixing.
Confidential, Houston, TX
Hadoop Developer
Environment: Apache Hadoop, HDFS, Pig Hive, Java, Sqoop, Cloudera CDH5, Oracle, MySQL, Tableau, Talend, Elasticsearch, Storm, Data governance implementation.
Responsibilities:
- Migrating the needed data from Oracle, MySQL in to HDFS in using Sqoop and importing various formats of flat files in to HDFS.
- Proposed an automated system using Shell script to sqoop the job.
- Worked in Agile development approach and Storm, Flume, Bolt, Kafka.
- Created the estimates and defined the sprint stages.
- Developed a strategy for Full load and incremental load using Sqoop.
- Mainly worked on Hive queries to categorize data of different claims.
- Integrated the hive warehouse with HBase
- Written customized Hive UDFs in Java where the functionality is too complex.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, Cassandra and Hive).
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
- Presented data and dataflow using Talend for reusability.
Confidential
Systems Engineer (ATG/Java Developer)
Environment: ATG, JAVA, JSP, Oracle 9i, 10g, Weblogic 10.3.5, SOAP, RESTFul, SVN, SQL Developer, UNIX, Eclipse. XML,HTML, CSS, JavaScript, AJAX, JQUERY, SCA.
Responsibilities:
- Understanding and analyzing business requirements, High Level Design and Detailed Design
- Involved in three releases of versions eShop 2.0.1, eShop 2.1 & eShop 2.2.
- Provided high level systems design; this includes specifying the class diagrams, sequence diagrams and activity diagrams
- Utilized Java/J2EE Design Patterns - MVC at various levels of the application and ATG Frameworks
- Worked extensively on DCS (ATG Commerce Suite) using the commerce API to accomplish the Store Checkout.
- Expertise in developing JSP’s, Servlets and good with web services (REST, SOAP)
- Served as DB Administrator, creating and maintaining all schemas
Java Developer
Environment: JAVA, JSP 2.0, JavaScript, CSS, HTML, XML, Weblogic Application Server 8.1, Eclipse, Oracle 9i.
Responsibilities:
- Involved in development, testing and maintenance process of the application
- Used Struts framework to implement the MVC architecture
- Created JSP, Form Beans for effective way of implementing Model View Controller architecture
- Created Session Beans, Entity beans for transactions with the database using JDBC
- Developed necessary SQL queries for database transactions
- Developed and maintained the application configuration information in various properties files
- Designed and developed HTML front screens and validated user input using JavaScript
- Used Cascading Style Sheets (CSS) to give a better view to the web pages
- Used Eclipse for code development along with CVS for managing the code
- Performed testing and bug fixing