Hadoop Developer Resume
Golden Valley, MinnesotA
PROFESSIONAL SUMMARY:
- About 8+ years of professional IT experience which includes 2+ years of experience in Big data ecosystem related technologies like Hadoop HDFS, Map Reduce, Apache Pig, Hive, Sqoop, Hbase, Flume, Oozie, Yarn and 4+ years in Data warehouse Implementation.
- Very Good Knowledge in Object - oriented concepts with complete software development life cycle (SDLC) experience - Requirements gathering, Detail design, Development, System and User Acceptance Testing.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource manager, Node Manager and YARN concepts
- Worked on different OS like UNIX /LINUX and developed various Shell scripts.
- Worked with HiveQL to query data from Hive tables in HDFS.
- Worked with Big Data distributions Cloudera CDH5, CDH4, CDH3 and Hortonworks.
- Used Pig Latin scripts and customized UDF’s to analyze large data sets.
- Good working knowledge with ETL and Query big data tools like Pig Latin and Hive QL.
- Good hands on NoSQL database experience with Hbase.
- Knowledge in programming Spark using Scala and understanding in processing of real-time data using Spark.
- Knowledge in Implementing Spark using Scala and Spark SQL for faster testing and processing of data.
- Good knowledge on working with Flume/Kafka to load the log data from different sources into HDFS.
- Extracted the data from MySQL, Oracle, Sql Server using Sqoop and loaded data.
- Have hands on experience in writing Map Reduce jobs onHadoop Ecosystem including Hive and Pig.
- Experience in developing pipelines and processing data from various sources and processing them with Hive and Pig.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Had good experience in developing ETL pipelines in Datalake.
- Hands on experience in Putty, WinSCP etc.
- Had Knowledge on ETL tools like Talend and Informatica.
- Good Working knowledge on Java object-oriented concepts like classes, objects, Abstraction, Encapsulation, Polymorphism, inheritance, interfaces etc.
- Experience inScrum, AgileandWaterfallmodels.
- Good working experience with healthcare, insurance, retail and state government clients.
- Strong experience on Informatica Power Center with strong business understanding and knowledge of Extraction, Transformation and Loading of data from source systems like Flat files, Excel, XML, Oracle, SQL Server.
- Extensively involved in ETL Data warehousing using Informatica Power Center 7.x/8.x/9.x Designer tools like Source Analyzer, Target Designer, Mapping Designer, Mapplet Designer, Transformation Developer, Workflow Manager and Workflow Monitor.
- Experience in creating Reusable Transformations (Joiner, Sorter, Aggregator, Expression, Lookup, Router, Filter, Update Strategy, Normalizer and Rank) and Mappings using Informatica Designer and processing tasks using Workflow Manager to move data from multiple sources into targets.
- Experience with dimensional modeling using star schema.
- Hands on experience in identifying and resolving performance bottlenecks in various levels like sources, mappings and sessions.
- Good understanding on the principles of DW using Fact Tables, Dimension Tables, Star Schema.
- Involved inUnit Testingto check whether the data loads into target are accurate.
- Good Working Knowledge in writing SQL Joins, Nested Queries, Unions.
- Created slowly changing dimensions (SCD) Type1/2 dimension mappings.
- Good in coding using SQL, SQL*Plus, T-SQL, PL/SQL, Stored Procedures/Functions.
- Good experience in performing and supporting Unit testing, System Integration testing (SIT), UAT and production support for issues raised by application users.
- Good working experience in dealing projects with Insurance, Retail and State Government clients.
- A team player with good Technical, Communication and Interpersonal skills with fast Learning and creative Analytical abilities.
TECHNICAL SKILLS:
Programming Languages: PL/SQL, Java (Core), VB.NET, C#.NET
Operating Systems: Windows (NT/2000/XP/7/8), LINUX,UNIX
Databases: Oracle 10g/11g,MS SQL Server 2008, MySQL, HBase (NoSQL),T-SQL
Big Data ecosystem: Hadoop - HDFS, Map reduce, Apache Pig, Hive, Hue, Hbase, Flume, Oozie, Yarn, Kafka, Spark, Scala, Storm.
ETL Tools: Informatica Power Center 9.5/9.1/8.6/8.1 , Talend
Scheduling Tool: Autosys, Control-M and Informatica Scheduler, Zena.
IDE Tools: Eclipse
Web Technologies: ASP.NET, HTML,XML
OLAP concepts: Data warehousing
Other Technologies: SQLDeveloper, TOAD.
PROFESSIONAL EXPERIENCE:
Confidential, Golden Valley, Minnesota
Hadoop Developer
Responsibilities:
- Responsible for understanding business requirements, analyzing functional specifications, and development of end to end data transformation pipelines.
- Involved in the development of system going through Agile Scrum Methodology
- Imported terabytes of data using Sqoop from Relational Database Systems to HDFS.
- Responsible for writing Hadoop Jobs for analyzing data using Hive and Pig.
- Responsible for ingesting data from Edge node to HDFS (Datalake) using Shell Scripts.
- Prepared an ETL framework with the help of Sqoop, Pig and Hive to be able to bring in data from the source and make it available for consumption.
- Developed Sqoop scripts to handle change data capture for processing incremental records between new arrived and existing data in RDBMS tables.
- Performing Data transformations in HIVE and used Partitions, Buckets for performance improvements.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data on to HDFS.
- Used Pig to perform data validation on the data ingested using Sqoop and the cleansed data set is pushed into HIVEtables.
- Responsible for creating Hive tables in Datalake, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze data and to identify issues.
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
- Involved in daily Scrum meetings and reporting development of project activity assuring effective solution on Agile-scrum method and integrated.
Environment: Hadoop, HDFS, Pig, Sqoop, MapReduce, Oracle 11g, Eclipse, Java, Putty, CSV, Oozie, Unix Shell Scripting, Hortonworks, Agile, Linux Red Hat.
Confidential, Richardson, Texas
Hadoop Developer/Big Data Developer
Responsibilities:
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology
- Worked on Big data distribution Hortonworks.
- Responsible for data extraction and data ingestion from different data sources intohadoopData Lake by creating ETL pipelines using Pig, and Hive.
- Responsible for data ingestion from RDBMS to hadoopusing Sqoop, automated the workflow using Zena and also performed data cleansing, transformations and using PIG Piggybank and elephantbird API for further data analytics.
- Developed data pipelines using Sqoop, Pig and Hive to ingest customer member data, clinical, biometrics, lab and claims data into HDFS to perform data analytics.
- Further used pig to do transformations, event joins, elephant bird Java API and pre -aggregations performed before loading JSON files format onto HDFS.
- Worked on various file formats like JSON, CSV, HL7 etc.
- Used the JSON SerDe's for serialization and de-serialization packaged with Hive.
- Involved in resolving performance issues in Pig and Hive with understanding of Map Reduce physical plan execution and using debugging commands to run code in optimized way.
- Good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Developed Sqoop scripts to handle change data capture for processing incremental records between new arrived and existing data in RDBMS tables.
- Developed job flows to automate the workflow using Zena automation tool.
- Participated in weekly conference sessions with business analysts and high level architects to report project updates.
- Involved in daily Scrum meetings and reporting development of project activity assuring effective solution on Agile-scrum method and integrated.
Environment: HDFS, Map Reduce, Apache Pig, Sqoop, Hive, Oracle 11g, Eclipse, Linux, Putty, JSON,CSV,HL7, Zena Automation Tool, Unix Shell Scripting, Hortonworks, Agile.
Confidential, Minneapolis, Minnesota
Hadoop Consultant/Big Data Developer
Responsibilities:
- Moving data from Oracle to HDFS and vice-versa using SQOOP.
- Written the Apache PIG scripts to process the HDFS data.
- Associated with creating Hive Tables, and loading and analyzing data using Hive Queries for reports.
- Installed and configured Pig and also written Pig Latin scripts.
- Wrote Map Reduce job using Pig Latin.
- Responsible for data extraction and data ingestion from different data sources intohadoopData Lake by creating ETL pipelines using Pig, and Hive.
- Importing and exporting data into HDFS and HIVE using SQOOP.
- Analyzing/Transforming data with HIVE and PIG.
- Load and transform large sets of structured, semi structured and unstructured data.
- Developed job flows to automate the workflow for pig and hive jobs.
- Collecting and aggregating large amounts of data using Apache Flume and staging data in HDFS for further analysis.
- Designed and Implemented Partitioning (Multi-level), Buckets in HIVE.
- Loaded the aggregated data onto Oracle from Hadoop environment using Sqoop for reporting on the dashboard.
- Extensively involved in performance tuning of Oracle queries
- Good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Used agile methodology in developing the application, which included iterative application development, weekly status report and stand up meetings.
Environment: Java (JDK1.6), HDFS, Hbase, Map Reduce, Apache Pig, Sqoop, Hive, Ubuntu/CentOS, Oracle 10g, Eclipse LINUX, Python.
Confidential, Quincy, MA
Hadoop Developer/ Big Data Developer
Responsibilities:
- Analyzed large data sets by running Hive queries and Pig scripts.
- Worked closely with business intelligence analyst to develop solutions.
- Involved in creating Hive tables, and loading and analyzing data using hive queries which will internally run mapreduce jobs.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Involved in loading data from LINUX file system to HDFS.
- Used Oozie workflow engine to run multiple Hive and Pig jobs.
- Performed continual data backup using Falcon for data recovery and burst capacity.
- Involved in installing and configuring Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Responsible for managing data from multiple sources.
- Extracted files from MySQL/DB2 through Sqoop and placed in HDFS and processed.
- Experienced in managing and reviewing Hadoop log files.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, DB2, Oozie, MySQL, Linux, Ubuntu.
Confidential, St. Paul, MN
Informatica Consultant
Responsibilities:
- Involved in the design and development of Data Warehousing project for the improvement of Account Management System.
- Used Transformations like look up, Router, Filter, Joiner, Stored procedure, Source Qualifier, Aggregator and Update strategy extensively.
- Created Mapplet and used them in different Mappings.
- Performed incremental aggregation to load incremental data into Aggregate tables.
- Done extensive bulk loading into the target using Oracle SQL Loader.
- Created standard and reusable Informatica mappings/mapplets.
- Creating and Run Sessions using Workflow Manager and Monitoring using Workflow Monitor.
- Involved in Unit Testing and Resolution of various Bottlenecks came across.
- Defined Target Load Order Plan for loading data correctly into different Target Tables.
- Involved in Optimizing the Performance by eliminating Target, Source, Mapping, and Session bottlenecks.
Environment: Informatica Power Center 8.6, Flat files, Oracle 10G, TOAD, SQL server 2008, SQL, PL/SQL, T-SQL Windows 7
Confidential, Camp Hill, PA
Informatica ETL Developer
Responsibilities:
- Gathering requirements for feeding the Health Exchange DB.
- Developed various transformations like Source Qualifier, Sorter, Joiner, Update Strategy, Lookup, Filter, Expressions and Sequence Generator for loading the data into target table.
- Developed transformation logic and designed various Complex Mappings and Mapplets using the Designer.
- Responsible for identifying the missed records in different stages from source to target and resolving the issues.
- Designed and developed various mappings, sessions and workflows to extract data from flat files, and load to Oracle.
- Worked in the performance tuning for mappings and ETL procedures both at mapping and session level.
- Used Workflow monitor to monitor tasks, workflows and also to monitor performance.
- Worked within a team to populate Type I and Type II slowly changing dimension customer tables Loading facts and dimensions from source to target data marts.
- Used Mapplets, Parameters and Variables to implement Object Orientation techniques and facilitate the reusability of code.
Environment: Informatica power center 8.6, Oracle 10G, Flat Files, Windows 7.
Confidential, Edmond, OK
Graduate Assistant (Database Developer)
Responsibilities:
- Following the algorithms given by the Professors and developing tables and database queries.
- Developing proper procedures and functions for the project.
- Developing triggers for proper working of the queries, and ensure that proper error messages are generated upon errors.
- Used ORACLE as the backend connectivity and data storage.
- Wrote SQL and PL/SQL stored procedures in order to create database tables and to store data into tables.
- Used PL/SQL triggers in order to identify any erroneous data entered by the users.
Environment: ORACLE 9i, PL/SQL, Windows 7
Confidential
Informatica Developer
Responsibilities:
- Designed and developed mappings, mapplets and sessions from source to target database using Informatica Power Center, and tuned mappings for improving performance.
- Involved in debugging invalid mappings using break points, testing of stored procedures and functions, testing of Informatica sessions and the target data.
- Extensively used transformations like router, aggregator, lookup, joiner, expression, aggregator and sequence generator for extracting data.
- Used workflow manager for session management, database connection management and scheduling of jobs to be run in the batch process.
- Worked with different sources such as Oracle, MS SQL Server and flat files.
Environment: Informatica Power Center 8.1, Oracle 9i, SQL Server, T-SQL, TOAD for Oracle, MS SQL Server Management Studio.