Big Data Engineer Resume
Mountain View, CA
PROFESSIONAL SUMMARY:
- Over 9 years of experience in IT industry.
- Over 3 year of experience in development of Big Data projects using Hadoop, Hive, Pig, Hbase and other open source tools/technologies
- Over 2 years of experience in web application development using JAVA, J2EE technologies
- Move data from different sources into Hadoop and define detailed technical processes for data acquisition
- Over 5 years of extensive ETL experience using Datastage (version 8.5/ 8.1/8.0/7.5/7. 0 ), Designing and developing jobs using Datastage Designer, Datastage Manager, Datastage Director and DS Debugger.
- Solid experience in writing MapReduce jobs in Java and Pig.
- Experience in installing, configuring and administrating Hadoop cluster of major Hadoop distributions.
- Excellent experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, Hbase, Hive, Pig, Zoo Keeper, HCatalog, Oozie, Mahout, Cassandra, Flume, Chukwa, Avro, Pentaho Kettle and Horton Works.
- Hands - on experience with “Productionalizing” Hadoop applications (e.g. configuration management, administrator, monitoring, debugging and performance tuning.)
- Expertise in several J2EE technologies like JSP, Servlets, JSF, Hibernate, Spring, Web Services, Struts, Ajax, JDBC, XML
- Expertise in Client Side Designing and Validations using HTML, CSS, Java Script, JSP.
- Strong experience in Implementing MVC design pattern using Struts and Spring Framework.
- Expertise in designing and developing J2EE compliant systems using IDE tools like Eclipse, Websphere Studio Application Developer (WSAD).
- Expert in using J2EE complaint application servers Apache Tomcat, IBM Web Sphere.
- Worked on debugging using Logging Frameworks such as Apache Log4j.
- Implemented Unit Testing using JUNIT testing during the projects.
- Extensive experience in Designing, Developing and maintaining applications in data warehouse for Healthcare, Telecommunication, Banking and Insurance.
- Extensive experience in working with Parallel jobs, Troubleshooting and Performance tuning.
- Profound knowledge of the principle of DW using Fact Tables, Dimension Tables, Star schema modeling and Snowflake Schema modeling.
- Strong skills in Datastage Administrator in UNIX & LINUX environments, Report creation using OLAP data source and having knowledge in OLAP universe.
- Solid experience in coding using SQL, SQL*plus, PL/SQL stored procedure/functions, triggers and packages.
- Expert in Data Warehousing techniques for Data Cleansing, Slowly Changing Dimension Phenomenon (SCD), Surrogate key assignment and Change Data Capture(CDC).
- Excellent knowledge of different Databases like Oracle 8i/9i/10g/11g, MS SQL server, DB2 and Teradata.
- Developed interfaces using UNIX Shell Scripts to automate the process of loading, pushing and pulling data from and to different servers. .
- Vastly worked on projects involving multiple operating systems (Windows NT/2000/9x, UNIX AIX).
- Excellent experience in extracting source data from Various Databases, Sequential files, Flat files, transforming and loading it into the data warehouse.
- Excellent experience in upgrading and migrating ETL Datastage Environments from Datastage Enterprise Edition 8.1 to IBM datastage 8.5.
- Technical expertise in all phases of Software Development Life Cycle.
- Excellent experience in writing complex queries to facilitate the supply of data to other teams.
- Strong technical and analytical skills. Problem solving, communication and documentation skills.
TECHNICAL SKILLS:
Hadoop: HDFS, Hive, Pig, Sqoop, Flume, Mahout, Zoo Keeper, Hbase, HCatalog, Hue, Impala, Tez and Hawq.Amazon Web Services (AWS):EC2, S3, EMR
Tools: IBM InfoSphere Datastage Enterprise Edition 8.1/8.0/7.5/7. x, Toad 9.6/8.6, Erwin 7.0, SQL*Loader, Autosys r11.0, Tivoli 5.1, Business Object Enterprise XIR 3.1, IBM Change Data Capture 6.5, Balance Optimizer.
Language: Core Java, JSP, Servlet, JDBC, C/C++, Perl, Korn shell, SQL, PL/SQL, UNIX shell, Scripting and Pig Latin.
Web Technologies: HTML, Ajax and JavaScript
XML Technologies: XML and XSL
Unit Testing: JUnit
Application Servers: Tomcat, Websphere and Weblogic
IDE: Eclipse and Oracle JDeveloper 10g
Operating Systems: Windows/2000/2003/NT/XP, UNIX, IBM AIX
Databases: Oracle 8i/9i/10g/11g, MS SQL Server 7.0/2000/2003 , DB2 UDB, SQL Server 2008, Teradata 13.10/12.0.
Other Tools: MS Visio, ERwin, MS Office
PROFESSIONAL EXPERIENCE:
Confidential, Mountain View, CA
Big Data Engineer
Hardware/Software: Cloudera Hadoop, CDH 5.1.5, Centos 6.4/6.5, Oracle 10g/11g, SQL Server, MYSQL, Hive, Pig, MapReduce, HDFS, Tidal, Java (jdk1.6), JIRA, Informatica 9.6.1HF1, Sqoop,Tableau, Stash, GitHub, Maven and Zookeeper
Responsibilities:
- Involved to provide architect, design, develop and testing services.
- Loaded data from Oracle, MS SQL SERVER, MYSQL, Flat File database into HDFS, HIVE, NETEZZA and VERTICA.
- Involved in writing MapReduce jobs.
- Used Automation script to import Informatica mapping and workflow.
- Also used Informatica Developer for Incremental load and Python framework for data loading.
- Working closely with Operations/IT to assist with requirements and design of Hadoop clusters to handle very large scale.
- Responsible for managing and scheduling jobs on Hadoop Cluster, using Tidal.
- Experience in troubleshooting Map Reduce job failures and issues with Hive and Netezza
- 24X7 production support for weekly schedule with Ops team.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Analyzed Business Requirement Document written in JIRA and participated in peer code reviews.
Confidential, Houston, TX
Big Data Lead
Hardware/Software: Cloudera Hadoop, Hortonworks, Centos 6.4/6.5, Oracle 10g/11g, SQL Server, Hive, Pig MapReduce, HDFS, HBase, Avro, Oozie, Java (jdk1.6), JIRA, Sqoop, GitHub, Maven and Zookeeper, AWS,Tableau.
Responsibilities:
- Involved to provide architect, design, develop and testing services.
- Loaded data from Oracle database into HDFS, HIVE and HBASE.
- Involved in writing MapReduce jobs.
- Loaded the created HFiles into HBase for faster access of large customer base without taking Performance hit.
- Used apache Maven for project build.
- Involved in loading data from UNIX file system to HDFS.
- Creating Hive tables, loading with data and writing hive queries which will run internally in maps
- Performed unit testing of MapReduce jobs on cluster using MRUnit.
- Used Oozie scheduler system to automate the pipeline workflow.
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews, test development, test automation.
- Involved in story-driven Agile development methodology and actively participated in daily scrum meetings.
- Analyzed Business Requirement Document written in JIRA and participated in peer code reviews in Crucible.
- Loading and transforming large sets of structured, semi structured and unstructured data
- Managing cluster coordination services through Zoo Keeper.
Confidential
Hadoop Developer
Hardware/Software: Hadoop, MapReduce, HDFS, Java 1.6, Hadoop Distribution of Hortonworks(Ambari), Hadoop 2.2.0, Hive, Pig, Hbase, Flume, Sqoop, Oozie, Microsoft SQL Server, RHEL6.3/6.4, Centos 6.3/6.4, Java Web Services, UNIX/LINUX Shell Scripting.
Responsibilities:
- Involved to provide Architect, Design, Develop and Testing services to Confidential Control to build lead-acid automotive batteries and advanced batteries for hybrid and electric vehicles; and interior systems for automobiles.
- Working on Data ingestion from oracle, SQL Server, SAP and DB2 to Hadoop using Sqoop.
- Worked on Data loading into Hive for Data Ingestion history and Data content summary.
- Load and transform large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources
- Supported Map Reduce Programs those are running on the cluster
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Developed Hive UDFs for rating aggregation.
- Used Oozie tool for job scheduling.
Confidential, Bloomington, IL
Big Data Developer
Hardware/Software: Hadoop, MapReduce, HDFS, Java 6, Hadoop distribution of Cloudera Manager v4.7.1, Hadoop CDH4.4, Hive, Pig, Hbase, Flume, RabbitMQ, Oozie, PostgreSQL, RHEL6.3/6.4, Centos 6.3/6.4, Java Web Services, UNIX/LINUX Shell Scripting.
Responsibilities:
- Involved to provide architect, design, develop and testing services to Confidential for sub-system components within the data aggregation infrastructure associated with project Knight Hawk.
- Worked an Integrated Customer Platform (ICP) project at Confidential that is aligned to the Drive Safe and Save program.
- The project, code-named project "Knight Hawk, is focused on development of a smartphone-based telematics solution for driver behavior monitoring.
- Architect and implemented Hadoop batch solution for risk rating for "pay as you drive/pay as how you drive" model.
- Developed Java Map/Reduce job for Trip Calibration, Trip summarization and data filtering.
- Developed Hive UDFs for rating aggregation.
- Importing and exporting data into HDFS and Hive using Sqoop
- Experienced in defining job flows
- Experienced in managing and reviewing Hadoop log files
- Developed Hbase java client API for CRUD Operations.
- Developed the Java client API for node provisioning, load balancing and artifact deployment.
- Responsible to manage data coming from different sources
- Used Oozie tool for job scheduling.
Confidential, Palo Alto, CA
Hadoop Admin/Developer
Hardware/Software: Hadoop, MapReduce, HDFS, Hive, Java (jdk1.6/1.7), Hadoop distribution of Hortonworks (Ambari) v1.2/1.3/2.0 Alpha, Hadoop v1.0/1.0.4, Cloudera Manager v 4.5.1/4.5.2/4.6 (CDH4.2/4.3), MapR M7, Greenplum HD 2.0.2(Pivotal 1.0), HAWQ 1.0, HiBench v2.1/2.2, Hive v 0.9/0.10/0.11 , Pig v 0.9/0.10/0.11 , Hbase v0.94, HCatalog v0.4/0.5, RHEL6.2/6.3,Centos 6.2/6.3, ActiveSpace, Amazon Web Services (AWS), UNIX/LINUX Shell Scripting.
Responsibilities:
- Installed and configured Hadoop distributions of various vendors namely HortonWorks, Greenplum HD (Pivotal), Cloudera and MapR.
- Installed and configured different cluster monitoring tools like Ganglia and Nagios.
- Working on Tibco product ActiveSpace.
- Installed and configured management consoles for different distributions for monitoring the cluster health and metrics.
- Involved in generating test data using RandomTextWriter and TeraGen.
- Involved in HCatalog/Hive ActiveSpaces Integration Design.
- Importing and exporting data into HDFS and Hive using Sqoop
- Experienced in defining job flows
- Involved in running algorithms listed in Hibench paper after creating test data.
- Experienced in managing and reviewing Hadoop log files
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data
- Load and transform large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources
- Supported Map Reduce Programs those are running on the cluster
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
Confidential, San Ramon, CA
Data Engineer
Hardware/Software: Hadoop, MapReduce, HDFS, Hive, Java (jdk1.6), Hadoop distribution of HortonWorks, Cloudera, MapR, DataStax, IBM DataStage 8.1(Designer, Director, Administrator), Flat files, Oracle 11g/10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX/LINUX Shell Scripting, Autosys r11.0.
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple map reduce jobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop
- Experienced in defining job flows
- Experienced in managing and reviewing Hadoop log files
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data
- Load and transform large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources
- Supported Map Reduce Programs those are running on the cluster
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way
- Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
Confidential, La Palma, CA
DataStage Developer /CDC Developer
Hardware/Software: IBM DataStage 8.1 (Designer, Director, Administrator, Parallel Extender), IBM Change data Capture 6.5,Balance Optimizer tool, Oracle 10g, SQL Server 2008, DB2 UDB, Flat files, Sequential files, Tivoli 5.1, Teradata 13.10, Unix Korn Shell Script, ERwin 7.0, Windows NT, AIX UNIX, Business Objects Enterprise XIR 3.1.
Responsibilities:
- Interacted with End user community to understand the business requirements.
- Prepared the required application design documents based on functionality required.
- Designed the ETL processes using DataStage to load data from Oracle, DB2 UDB to Flat Files (Fixed Width) and Flat Files to staging Teradata database and from staging to the target Teradata Data Warehouse database.
- Implemented dimensional model(logical and physical)data model in existing architecture using Erwin
- Used DataStage Parallel Extender stages namely Sequential, Lookup, Change Capture, Funnel, Transformer Stage, Column Export stage and Row Generator stages in accomplishing the ETL Coding.
- Developed Teradata SQL Queries to Load data from Teradata Staging to Enterprise Data warehouse
- Extensively worked on Error Handling and Delete Handling.
- Designed and developed the jobs for extracting, transforming, integrating, and loading data using DataStage Designer.
- Developed job sequencer with proper job dependencies, job control stages, triggers.
- Developed DataStage job sequences used the User Activity Variables, Job Activity; Execute Command, Loop Activity, and Terminate.
- Used the DataStage Director and its run-time engine to monitor the running jobs.
- Involved in performance tuning and optimization of DataStage mappings using features like partitions and data/index cache to manage very large volume of data.
- Involved in Unit testing, System testing to check whether the data is loading into target, which was extracted from different source systems according to the user requirements.
- Extracted the data from the data warehouse using Business Object for reporting purposes.
Confidential, San Ramon, CA
ETL Developer/JAVA J2EE Developer
Hardware/Software: IBM DataStage 8.1(Designer, Director, Administrator), Flat files, Oracle 11g/10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting, Autosys r11.0, Spring 3.0, Struts 1.1, Hibernate 3.0, Design Patterns, Log4j, Maven, Eclipse, Apache Tomcat 6, Java 1.5, J2EE Servlet, JSP .
Responsibilities:
- Participated in discussions with Team leader, Group Members and Technical Manager regarding any Technical and Business Requirement issues.
- Used parallel Extender for Partition Parallelism, the same job would effectively run simultaneously by several processing Nodes each handling separate subset of Total data.
- Developed the web tier using JSP, Struts MVC, and files to show account details and summary.
- Implemented various design patterns - Singleton, Business Delegate, Value object and Spring DAO.
- Used Spring JDBC to write some DAO classes to interact with the database to access account information.
- Mapped business objects to database using Hibernate.
- Used Tomcat web server for development purpose.
- Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL/SQL code for procedures and functions.
- Used Datastage to extract client data from Oracle and mapped the data into a target business warehouse.
- Used Surrogate Keys to keep track of Slowly Changing Dimensions (SCD).
- Involved in creation of Test Cases for JUnit Testing
- Imported and exported repositories across projects.
- Involved in writing Spring Configuration XML files that contains declarations and other dependent objects declaration.
- Created and maintained the configuration of the spring Application Framework (IoC).
- Used Parallel Extender for parallel processing of data extraction and transformation.
- Extensively worked with various stages of Parallel Extender like Sequential file, Dataset, lookup, peek, transformer, Merge, Aggregator, row generator, surrogate key generator and many more to design jobs and load the data into Fact and Dimension tables.
- Wrote User-Defined SQL queries for extracting the data from source systems.
- Developed application using Eclipse and used build and deploy tool as Maven.
- Developed the automated and scheduled load processes using Unix Shell scripting & Autosys.