Sr.hadoop Engineer/ Data Scientist Resume
Columbus, OH
SUMMARY:
- 10 years of total Software development experience with Hadoop Ecosystem, Big Data and Data Science Analytical Platforms, and Enterprise - level Cloud Base Computing and Applications.
- Working on Agile Methodology for more than 6 Years.
- Around 4 years of experience in leading the team and responsible for Design and Implementation of Big data applications using Hadoop stack Spark Hive, Pig, Oozie, Sqoop, Flume, HBase and NoSQL Data bases.
- Hands on experience in writing complex Hive, NiFi and data modeling.
- Have experience creating batch style distributed computing applications using Apache Spark and Flume.
- Have hands-on experience in SPARK SQL and usage of Hadoop Architecture frameworks and various components.
- Experience and in-depth understanding of analyzing data using HIVEQL, PIG.
- Worked extensively with PySpark, HIVE DDLs and Hive Query language (HQLs).
- Hands on experience in Scala.
- Good hands-on experience with PIVOTAL'S query processing model HAWQ.
- In-depth understanding of NoSQL databases such as HBase.
- Proficient knowledge and hands on experience in writing shell scripts in Linux.
- Adequate knowledge and working experience in Agile & Waterfall methodologies.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Have a fairly good understanding of Kafka.
- Experienced in job workflow scheduling and monitoring tools like Oozie and ESP.
- Experience using various Hadoop Distributions (Cloudera, Hortonworks etc.) to fully implement and leverage new Hadoop features
- Experienced in requirement analysis, application development, application migration and maintenance using Software Development Lifecycle (SDLC).
- Experience in ETL tools like Informatica Power Center (Repository Manager, Mapping Designer, Workflow Manager and Workflow Monitor).
- Hand on experience in reporting tools such as Microstrategy and Tableau
- Hands-on experience on working with schedulers like ESP, DAC, AutoSys, Control-M, SOS- Berlin
TECHNICAL SKILLS:
Hadoop/BigData: HDFS, HBase, Pig, Hive, Sqoop, Flume, MongoDB, Oozie, Zookeeper
ETL Tools: Informatica 9.6, 9.5.1, 9.1, 8.6
Business Intelligence Tools: R Studio, Tableau 9.1, MicroStrategy, MS Excel - Analytical Solver
Databases & Tools: Teradata 13, Netezza, Oracle 11g
Scheduler: Database Administration Console 10(DAC), Autosys.
Project Planning & Tracking: HP ALM, JIRA
Content management: Confluence
Release management: TFS, Tortoise SVN
Programming: R, Python, SPARK(PySpark),SQL
Database & Skills: Oracle 11 g,Netezza
Operating Systems: Windows 2000, NT, XP, UNIX
PROFESSIONAL EXPERIENCE:
Confidential - Columbus, OH
Sr.Hadoop Engineer/ Data Scientist
Responsibilities:
- Experienced in development using Hortonworks distribution system.
- Worked in an agile technology with Scrum.
- As a lead, responsible for understand the requirements, creating user stories in JIRA, managing the backlogs, defects.
- Conducted Sprint planning backlog grooming, story sizing and retrospectives.
- Responsible for reviewing and approving code review, Test case and Test plans for the QA team.
- Managed JIRA reports and dashboards like creating Burn Down Charts, Average Age chart, user workload report etc.
- Designed, developed and Tested Hadoop ETL using hive on data at different stages of pipeline.
- Worked closely with Data Architects, Data Modelers, Scheduling Team and DBA to resolve and remove dependencies
- Sqooped data from different source systems and automating them with oozie workflows.
- Generation of business reports from data lake using Hadoop SQL (Impala) as per the Business Needs.
- Automation of Business reports using Bash scripts in Unix on Data lake by sending them to business owners.
- Worked in different environments like DEV, QA, Data lake and Analytics Cluster as part of Hadoop Development.
- Snapped the cleansed data to the Analytics Cluster for reporting purpose to Business.
- Developed pig scripts, python to perform Streaming and created tables on the top of it using hive.
- Developed multiple POCs using Scala and Pyspark and deployed on the Yarn cluster, compared the performance of Spark, and SQL.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Developed Oozie workflow engine to run multiple Hive, Pig, sqoop and Spark jobs.
- Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS.
- Developed pig, hive, sqoop, Hadoop streaming, spark actions in Oozie in the workflow management.
- Supported Map Reduce Programs those are running on the cluster.
- Experienced in collecting, aggregating, and moving large amounts of streaming data into HDFS using Flume.
- Good Understanding of Workflow management process and in implementation.
- Involved in the development of frameworks that are used in Data pipelines and coordinated with Hortonworks consultant.
- Involved in in data science POC using R,Rapid Miner and Knime
- Worked on cleansing the data and EDA for the Data Science projects.
- Involved in creating data models using machine learning techniques such as Liner regression, SVM, Decision Tress and Random Forest.
- Performed customer churn analysis, Customer Segmentation and Market Basket analysis.
Environment: Informatica9.6, Teradata13.10, Tableau, MicroStrategy10.7, CDH 5.4.5, Hive1.2.1, HBase1.1.2, Flume1.5.2, MapReduce, Sqoop1.4.6, NiFi(Standalone Cluster),Shell Script, Oozie 4.2.0, Zookeeper 3.4.6.
Confidential
Sr.Hadoop Developer/Data Scientist
Responsibilities:
- Working As a lead, responsible for understand the requirements, creating user stories in JIRA, managing the backlogs, defects.
- Conducted Sprint planning backlog grooming, story sizing and retrospectives.
- Managed JIRA Scrumban board.
- Responsible for reviewing and approving code review, Test case and Test plans for the QA team.
- Managed JIRA reports and dashboards like creating Burn Down Charts, Average Age charts, Workload Pie Chart Reports and Time Since Issues Report for scrumban projects.
- Worked closely with Data Architects, Data Modelers, Scheduling Team and DBA to resolve and remove dependencies
- Experienced in development using Hortonworks distribution system.
- Working on projecting involving migration of data from the mainframes to HDFS data lake and creating reports by performing transformations on the data put in the Hadoop data lake.
- Built python script to extract the data from the Hawq tables and generated a "dat" file for the downstream application
- Built a generic framework to parse raw data with fixed length using python which takes JSON
- Layout for the fixed positions of the strings and load the data into Hawq tables.
- Built generic framework that transforms two or more data sets in HDFS using python.
- Built generic frameworks for Sqoop/Hawq to load data from SQL server to HDFS and HDFS to
- Hawq using python.
- Performed extensive data validation using Hawq partitions for efficient data access.
- Built generic framework that allows for us to update the data in a Hawq tables using python.
- Coordinated in all testing phases and worked closely with Performance testing team to create a baseline for the new application.
- Created automated workflows that schedule jobs daily for loading data and other transformation jobs in using CA-ESP.
- Developed functions using PL python for various use cases.
- Documented technical design documents and production support documents.
- Wrote python scripts to create automated workflows.
- Technology Platforms: PHD-2.0, HAWQ 1.2, SQOOP 1.4, Python 2.6, SQL
Environment: Informatica, Netezza,H adoop HDP 2.1, Oracle, SQL Server, Zookeeper3.4.6, Oozie 4.1.0, MapReduce, YARN,2.6.1, HDFS, Sqoop1.4.6, Hive 1.2.1, Pig 0.15.0.
Confidential
ETL Lead
Responsibilities:
- Analyzing content and quality of databases, recommending data management procedures, and developing extraction/ ETL processes.
- Acted as an offshore coordinator and lead the team in offshore by providing them mapping documents and acted as a point of contact for the onsite team.
- Documented user requirements, translated requirements into system solutions and develop implementation plan and schedule.
- Responsible to migrate the Informatica code from one environment to another by creating the xml files using informatica repository manager.
- Developed informatica mappings to load the data into dimension and Fact tables.
- Analyzed the business requirements and functional specifications.
- Extracted data from oracle database and spreadsheets and staged into a single place and applied business logic to load them in the central oracle database.(Warehouse)
- Used Informatica Power Center for extraction, transformation and load (ETL) of data in the data warehouse.
- Extensively used Transformations like Router, Aggregator, Normalizer, Joiner, Expression and Lookup, Update strategy and Sequence generator and Stored Procedure.
- Developed complex mappings in Informatica to load the data from various sources.
- Implemented performance tuning logic on targets, sources, mappings, sessions to provide maximum efficiency and performance.
- Parameterized the mappings and increased the re-usability.
- Used Informatica Power Center Workflow manager to create sessions, workflows and batches to run with the logic embedded in the mappings.
- Created procedures to truncate data in the target before the session run.
- Extensively used Toad utility for executing SQL scripts and worked on SQL for enhancing the performance of the conversion mapping.
- Created the ETL exception reports and validation reports after the data is loaded into the warehouse database.
- Written documentation to describe program development, logic, coding, testing, changes and corrections.
- Created Test cases for the mappings developed and then created integration Testing Document.
- Followed Informatica recommendations, methodologies and best practices.
Environment: Informatica 9.6,Oracle,OGG, Teradata, HP ALM,Teradata GCFR Frame Work, Teradata BI Temporal Framework, IBM WS MQ, SAP-BO-XI
Confidential - San Jose, CA
ETL Developer
Responsibilities:
- Analyzed the business requirements and functional specifications.
- Extracted data from oracle database and spread sheets and staged into a single place and applied business logic to load them in the central oracle database.(Warehouse)
- Used Informatica Power Center for extraction, transformation and load (ETL) of data in the data warehouse.
- Extensively used Transformations like Router, Aggregator, Normalizer, Joiner, Expression and Lookup, Update strategy and Sequence generator and Stored Procedure.
- Developed complex mappings in Informatica to load the data from various sources.
- Implemented performance tuning logic on targets, sources, mappings, sessions to provide maximum efficiency and performance.
- Parameterized the mappings and increased the re-usability.
- Used Informatica Power Center Workflow manager to create sessions, workflows and batches to run with the logic embedded in the mappings.
- Created Test cases for the mappings developed and then created integration Testing Document
Environment: Traid Importer, X-Book, Query center, Informatica9.1, SQL Developer, Oracle 11g, MySql
Confidential
ETL Developer
Responsibilities:
- Analyzed the business requirements and functional specifications.
- Extracted data from oracle database and spread sheets and staged into a single place and applied business logic to load them in the central oracle database.(Warehouse)
- Used Informatica Power Center for extraction, transformation and load (ETL) of data in the data warehouse.
- Extensively used Transformations like Router, Aggregator, Normalizer, Joiner, Expression and Lookup, Update strategy and Sequence generator and Stored Procedure.
- Developed complex mappings in Informatica to load the data from various sources.
- Implemented performance tuning logic on targets, sources, mappings, sessions to provide maximum efficiency and performance.
- Parameterized the mappings and increased the re-usability.
- Used Informatica Power Center Workflow manager to create sessions, workflows and batches to run with the logic embedded in the mappings.
- Created Test cases for the mappings developed and then created integration Testing Document.
Environment:: Informatica9.1, SQL Developer, Oracle 11g, SQL Server 2005, Netezza, Sybase, UNIX
Confidential
ETL Developer
Responsibilities:
- Developed Informatica mappings to load the data into dimension and Fact tables.
- Analyzed the business requirements and functional specifications.
- Extracted data from oracle database and spread sheets and staged into a single place and applied business logic to load them in the central oracle database (Warehouse)
- Used Informatica Power Center for extraction, transformation and load (ETL) of data in the data warehouse.
- Extensively used Transformations like Router, Aggregator, Normalizer, Joiner, Expression and Lookup, Update strategy and Sequence generator
- Parameterized the mappings and increased the re-usability.
- Used Informatica Power Center Workflow manager to create sessions, workflows and batches to run with the logic embedded in the mappings.
Environment: Informatica 9.1,8.6, SQL Developer, Oracle 11g, Netezza, QC, DAC, UNIX.
Confidential
ETL Developer
Responsibilities:
- Used Informatica to populate data into staging area and Warehouse, Operational data store.
- Created transformations and mappings including expression, aggregators, Filter router, joiner, and lookup.
- Experience of Slowly Changing Dimensions.
- Parameterized the mappings and increased the re-usability.
- Written documentation to describe program development, logic, coding, testing, changes and corrections.
- Created Test cases for the mappings developed and then created integration Testing Document. Followed Informatica recommendations, methodologies and best practices.
- Extensively used Toad utility for executing SQL scripts and worked on SQL for enhancing the performance of the conversion mapping.
Environment:: Informatica8.6, Toad, Oracle 10g.