Big Data Lead Resume
Webster, MA
SUMMARY
- 14 years of IT experience in the Develoment of Business Intelligence solutions using Data Warehouse/Data Mart, ETL, OLAP, Hadoop and Client/Server Applications.
- Good knowledge on data warehouse concepts like Star Schema, Snow Flake, Dimension, Fact tables and Slowly Changing Dimensions.
- Experience with Agile development - Organized and facilitated Agile and Scrum meetings, which included Sprint Planning, Daily Scrums or Standups, Sprint Check-In, Sprint Review & Retrospective.
- Extensive experience in design the ETL processes to extracts data from Oracle, DB2, SQL-Server, Flat files & XML sources.
- Expertise in ETL development using Informatica PowerCenter (Designer, Workflow Manager, Workflow Monitor, Repository Manager) for extracting, cleaning, managing, transforming and finally loading data.
- Prepared the Technical Design Document & ETL Specification.
- Experience in debugging and Performance tuning of sources, targets, mappings and workflows.
- Having an experience in Agile/Sprint model.
- Expert in writing PL/SQL stored procedures, functions, packages and triggers to implement the complex business logic.
- Experience in debugging the issues using Informatica Workflow manager, Informatica Workflow monitor, Autosys and database.
- Hands on experience in writing complex SQL for data validation and Reports & Dashboards validation.
- Good experience in Big Data Technologies including Hadoop (HDFS & Map Reduce),PIG, HIVE, Impala, HBase, Spark, Oozie, Kafka, Sqoop and Zookeeper
- Extracted the data from Teradata into HDFS using Sqoop.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Exposure knowledge Big Data echo system, loading large sets of data into HDFS and Hive tables using Sqoop and Implemented Partitioning, Buckets in Hive.
- Written Hive UDF's for processing data and involved in code review, bug fixing for improving the performance.
- Experience in writing Pig (version 0.11) scripts to transform raw data from several data sources into forming baseline data.
- Developed Hive(version 0.10) scripts for end user / analyst requirements to perform ad hoc analysis.
- Capable of loading large sets of structured, semi-structured and unstructured data from files and database .
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Created UDFs in Java and Python as and when necessary to use in PIG and HIVE queries.
- Created Kafka producer and consumer code in Java.
- Created Oozie workflow for scheduling and orchestrating the ETL process
- Validated and reported performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation.
- Very good experience in monitoring and managing the Hadoop cluster using Cloudera Manager.
TECHNICAL SKILLS
Database: Oracle 11g/10g/9i/8i, DB2, SQL server 2005,2008, 2012, Teradata 14, HBASE, Cassandra
ETL Tools: Informatica, Datastage, SAS, Talend, SQL*Loader, SQL Server DTS and BCP and Import/Export
Reporting Tools: OBIEE, Cognos, Crystal Reports, Business Objects, Microstrategy
Big Data tools: PIG, Hive, Sqoop, Flume, Oooze, HDFS, Impala, SPARK
Test Management Tools: Test Director 7.0, 8.0, Quality Center 10.0,11.0, ALM, JIRA, Rally, RQM
Operating System: Windows, UNIX
Languages: SQL, PL/SQL, JAVA, PYTHON, PIG, HIVE, Unix Shell Scripting
Other tools: Toad, MS Visio, AQT, SQL Developer, SharePoint, MS Office, MS Visio, MS Project
Version Control Tools: MS Visual Source Safe (VSS), Rational Clear Quest,GIT Subversion(SVN)
PROFESSIONAL EXPERIENCE
Confidential, Webster, MA
Big Data Lead
Responsibilities:
- Understanding the business requirement from Business user, BRD documents and identify the bottlenecks/challenges in the current system.
- Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into HBASE.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Preparing Technical Specification document for the requirements including key project assumptions, technical flow, process flow, ETL logic and error handling process.
- Created ETL Scheduling frame work to automate the ETL process based on server availability without any manual intervention.
- Writing PL/SQL stored procedures/Functions to implement the complex business logic.
- Automated the revenue reports which helped revenue team close the quarter much faster.
- Exporting platform transactional data to HDFS server using Sqoop.
- Created Hive tables and analyzing the loaded data in the hive tables using hive queries.
- Managing and scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Working on MDM process for managing the customer data in a centralized database.
- Created ETL Frame work to standardized address data cleansing and duplicating address.
- Created ETL process to verify the customer information with external vendor application.
- Developed post session e-mails to notify the success/failure of a session.
- Working on Agile/Sprint model using Jira
Environment: Informatica, Guidewire, TWS, Greenplum, UNIX, HADOOP, SPARK
Confidential
Big Data Lead
Responsibilities:
- Worked closely with the Client team to gather the business requirements and provide data solutions.
- Created Technical design documents which talk about the detail level requirements for the project and how the proposed solution will achieve that goal.
- Demonstrated the model changes and project related specs/docs to the stakeholders including quality adherence team.
- Create, validate and maintain scripts to load data using Sqoop manually.
- Create Oozie workflows and coordinators to automate Sqoop jobs weekly and monthly.
- Worked on reading multiple data formats on HDFS using Scala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and PYSPARK(Python).
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Analyzed the SQL scripts and designed the solution to implement using Python.
- Implemented test scripts to support test driven development and continuous integration.
- Consumed the data from Kafka queue using spark.
- Load and transform large sets of structured, semi structured and unstructured data.
- Involved in loading data from LINUX file system to HDFS
- Importing and exporting data into HDFS and Hive using Sqoop.
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
- Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Used Reporting tools like Tableau to connect to Hive ODBC connector generate daily reports of data.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Responsible for loading data files from various external sources like ORACLE, MySQL into staging area in MySQL databases.
- Actively involved in code review and bug fixing for improving the performance.
- Involved in development, building, testing, and deploy to Hadoop cluster in distributed mode.
- Analyzing data with Hive, Pig.
- Designed Hive tables to load data to and from external tables.
- Writing DistCp shell scripts to load data across servers.
- Identified the bottle neck of load issues and tuned a lot of Spark and Hadoop code to get better performance.
- Loaded end-to-end Aggregate Spend data in development, QA, UAT environments and validated data at every stage.
- Reviewed team member’s mappings, UTC, STC and UTR in compliance to quality standard.
- Worked independently to meet the project goals and deadlines along with resolving issues.
- Handled escalation for support issues and provided the resolution quickly.
- Assist QA team in creating, reviewing and modifying test plans and test cases using Mercury Quality Center and participated in unit and user acceptance testing.
- Guide team members on day to day basis, manage support issues and provide technical guidance.
- Coordinating with various teams (QA/Reporting) to understand their challenges in implementing the project and providing a feasible solution to address the concerns.
Environment: Spark, Hadoop, Guidewire, Oracle, Cognos, Talend, Tableu
Confidential, MN
Hadoop Developer
Responsibilities:
- Understanding existing data model, ETL load process, identified existing system bottleneck and end user expectation.
- Worked on the proof-of-concept for Apache Hadoop1.20.2 framework initiation
- Installed and configured Hadoop clusters and eco-system
- Developed automated scripts to install Hadoop clusters
- Involved in all phases of the Big Data Implementation including requirement analysis, design, development, building, testing, and deployment of Hadoop cluster in fully distributed mode Mapping the DB2 V9.7, V10.x Data Types to Hive Data Types and validated
- Performed load and retrieve unstructured data (CLOB, BLOB etc.)
- Developed Hive jobs to transfer 8 years of bulk data from DB2 to HDFS layer
- Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts
- Monitored Hadoop cluster job performance and capacity planning.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Had experience in Hadoop framework, HDFS, MapReduce processing implementation.
- Tuning Hadoop performance with high availability and involved in recovery of Hadoop clusters
- Responsible for coding Java Batch, Restful Service, Map Reduce program, Hive query’s, testing, debugging, Peer code review, troubleshooting and maintain status report.
- Used AVRO, Parquet file formats for serialization of data.
- Operated in 2 or 3 weeks sprints with flexibility on length based on immediate functionality concerns.
- Resolved conflict, improved morale and established clear goals by effectively managing timelines and shared resources with special emphasis on building relationships across departments.
- Proactively identified the source data and highlighted to external team to improve their system
- Highlighted the system improvement possibility to IT managers to automate the operational activities to etl report which will help to reduce operational efforts.
- Worked on multiple project enhancements using Informatica, PL/SQL procedure and incorporated changes in AGGS Spend project.
- Allocated the task to offshore team and managed the offshore resource to complete the task on time with quality standard.
- Handled escalation for support issues and provided the resolution quickly.
- Reviewed team member’s mappings, UTC, STC and UTR in compliance to quality standard.
- Guided team members on daily basis, manage support issues and provide technical guidance.
- Coordinating with various teams (QA/Reporting) to understand their challenges in implementing the project and providing a feasible solution to address the concerns.
Environment: Informatica, DB2, TWS, Teradata, UNIX, Hadoop
Confidential, Portland, OR
DW BI Lead
Responsibilities:
- Understanding the business requirement from Business user, BRD documents and identify the bottlenecks/challenges in the current system.
- Establishing data load strategy, utility setup, system requirements, and security considerations to build the application.
- Worked on Installation and configuring of Zoo Keeper to co-ordinate and monitor the cluster resources.
- Implemented test scripts to support test driven development and continuous integration.
- Consumed the data from Kafka queue using spark.
- Configured different topologies for spark cluster and deployed them on regular basis.
- Load and transform large sets of structured, semi structured and unstructured data.
- Involved in loading data from LINUX file system to HDFS
- Importing and exporting data into HDFS and Hive using Sqoop
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
- Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig
- Used Reporting tools like Tableau to connect to Hive ODBC connector generate daily reports of data.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Responsible for loading data files from various external sources like ORACLE, MySQL into staging area in MySQL databases.
- Actively involved in code review and bug fixing for improving the performance.
- Involved in development, building, testing, and deploy to Hadoop cluster in distributed mode.
- Writing shell scripts to upload the credit bureau reporting output file to external vendors (Experian, Confidential ) SFTP server.
- Automated the revenue reports which helped revenue team close the quarter much faster.
- Exporting platform transactional data to HDFS server using Sqoop.
- Created Hive tables and analyzing the loaded data in the hive tables using hive queries.
- Managing and scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Working on MDM process for managing the customer data in a centralized database.
- Created ETL Frame work to standardized address data cleansing and duplicating address.
- Created ETL process to verify the customer information with external vendor application.
- Developed post session e-mails to notify the success/failure of a session.
- Working on Agile/Sprint model using Jira
- Attending the project status meeting and update the development process, challenges to the management.
- Involved in Integration testing and User acceptance testing and clarified the defect.
- Monitoring production loads,analyzing and resolving the load issues
- Resolving deployment issues and coordination with operations for deploying services in production.
Environment: Informatica, Microstrategy, Autosys
Confidential, Portland, OR
ETL Lead
Responsibilities:
- Worked closely with the Client team to gather the business requirements and provide data solutions.
- Created Technical design documents which talk about the detail level requirements for the project and how the proposed solution will achieve that goal.
- Demonstrated the model changes and project related specs/docs to the stakeholders including quality adherence team.
- Modified existing PL/SQL Procedure to validate new rules or enhancements for SAP, CRO and iHCP data and logged the record as error if any data issues found.
- Design the CMS change detection process to flexible to identify metric changes according to the user needs in any point of time.
- Worked on multiple project enhancements using Informatica, PL/SQL procedure and incorporated changes in AGGS Spend project.
- Identified the bottle neck of load issues and tuned a lot of AGGS mappings for its better performance.
- Loaded end-to-end Aggregate Spend data in development, QA, UAT environments and validated data at every stage.
- Reviewed team member’s mappings, UTC, STC and UTR in compliance to quality standard.
- Worked independently to meet the project goals and deadlines along with resolving issues.
- Handled escalation for support issues and provided the resolution quickly.
- Assist QA team in creating, reviewing and modifying test plans and test cases using Mercury Quality Center and participated in unit and user acceptance testing.
- Guide team members on day to day basis, manage support issues and provide technical guidance.
- Coordinating with various teams (QA/Reporting) to understand their challenges in implementing the project and providing a feasible solution to address the concerns.
- Used Autosys jobs to run Informatica Workflows.
Environment: Informatica, SAS, Island Pacific (AS/400), Microstrategy, Autosys
Confidential
Dev Lead
Responsibilities:
- Involved in Creating and Administering the Physical Layer, Business Model & Mapping
- Involved in Design and Data Modeling using Star schema.
- Developed several mappings to load data from multiple sources to data warehouse.
- Develop Test plans, Test Strategy, Test Cases & decided on automation when required.
- Involved in the entire project life cycle from analysis, installation, development, testing, production and end user support
- Developed Test Plans, Test Cases, and Test Scripts for UAT tests.
- Developed shell scripts to run Abinitio jobs
- Executed Informatica workflows from Informatica workflow manager and Autosys.
- Layer and Presentation Layer using Oracle Business Intelligence Admin tool.
- Created connection pools, physical tables, defined joins and implemented authorizations in the physical layer of the repository.
- Created Dimensional Hierarchy, Level based Measures and Aggregate navigation in BMM layer.
- Managed security privileges for each subject area and dashboards according to user requirements.
- Created groups in the repository and added users to the groups and granted privileges explicitly and through group inheritance.
- Developed custom reports/Ad-hoc queries using Oracle Answers and assigned them to application specific dashboards.
- Developed different kinds of Reports (pivots, charts, tabular) using global and local Filters.
- Handled Full load and refresh load via staging tables in the ETL Layer.
- Developed and tested Store procedures, Functions and packages in PL/SQL for Data ETL.
- Managed and conducted System testing, Integration testing, Functional testing, and UAT and Regression testing.
- Loaded data to different databases using SQL scripts to create required test data.
- Executed shell scripts to run PL/SQL programs, Cognos reports from Autosys and jobs.
Environment: Oracle, Business Objects, Autosys, Informatica
Confidential
Lead - Data Analyst
Responsibilities:
- Develop Test plans, Test Strategy, Test Cases & decided on automation when required.
- Involved in the entire project life cycle from analysis, installation, development, testing, production and end user support
- Developed Test Plans, Test Cases, and Test Scripts for UAT tests.
- Used Informatica as an ETL Tool for testing the Data Warehouse.
- Created Test Cases in Quality Center and mapped Test Cases to Requirements in Req Pro.
- Developed Tractability Matrix and Test Coverage reports.
- Managed and conducted System testing, Integration testing, Functional testing, and UAT and Regression testing.
- Loaded data to different databases using SQL scripts to create required test data.
- Used Shell scripts extensively for automation of file manipulation and data loading procedures.
- Execute batch jobs and verify status and data in database tables.
- Tracked the defects using Clear Quest and Quality Center and generated defect summary reports.
- Executed shell scripts to run PL/SQL programs and jobs.
- Executed Informatica sessions and tasks from Informatica workflow manager and validated results from database and Informatica workflow monitor.
- Prepared status summary reports with details of executed, passed and failed test cases.
- Validated report layout, data with requirements.
Environment: Informatica, Crystal Reports, Teradata
Confidential, Saint Paul MN
Senior ETL Developer
Responsibilities:
- Develop Test plans, Test Strategy, Test Cases & decided on automation when required.
- Developed Test Cases for Functional, Integration and Regression Testing.
- Collected the test data from the central system to validate the test cases.
- Analyzed use case requirements and developed test cases.
- Performed queries to the database using SQL to check the data integrity using TOAD.
- Participated in Testing Methodologies like Planning, Execution, Bug Tracking and Analyzing.
- Created and executed SQL queries to fetch data from the database to validate and compare expected results with those actually obtained.
- Verified the bugs fixed by developers during each phase of testing such as Black Box Testing.
- Analyzed, documented and maintained Test Results and Test Logs.
- Run UNIX jobs in Putty to validate input and output files.
- Used manual testing for Regression testing for each new build under test.
- Used Mercury’s Test Director to log defects and bugs.
- Run Abinitio jobs to validate mappings.
- Managing the Defect Tracking process, which included prioritizing bugs, assigning bugs, and verifying bug-fixes, using Test Quality Center
Environment: Abinitio, Teradata, DB2, Business Objects
Confidential
ETL Analyst
Responsibilities:
- Involved in analyzing the business process through Use Cases, Work Flows and Functional specifications.
- Performed database testing using SQL.
- Run the Informatica jobs.
- Implementing ETL processes using Informatica to load data from Flat Files to target Oracle Data Warehouse database.
- Written SQL overrides in Source Qualifier according to business requirements.
- Written pre-session and post session scripts in mappings.
- Created Sessions and Workflow for designed mappings.
- Redesigned some of the existing mappings in the system to meet new functionality.
- Used Workflow Manager for Creating, Validating, Testing and running the sequential and concurrent Batches and Sessions and scheduling them.
- Extensively worked in the performance tuning of the programs, ETL Procedures and processes.
- Developed PL/SQL procedures for processing business logic in the database.
- Involved in the documentation of the complete testing process.
- Interacting with the development and testing teams to improve overall quality of the software.
- Provide the business with thorough analysis on systems/risks.
- Involved in creating periodic status reports.
- Followed Rational Unified Change Management Process.
- Created GUI, Bitmap, Database and Synchronization Checkpoints to compare the behavior of a new version of the application with the previous version
- Developed Test Strategy, prepared Track Sheets to keep track of the tasks assigned to the Jr. Testers, and resolved issues.
- Actively participated in BUG meetings to resolve the defects in efficient and timely manner.
Environment: Informatica, Oracle 10g, Cognos
Confidential
ETL Developer
Responsibilities:
- Reviewed the System Requirement Specs (SRS).
- Analyzed the source data coming from Oracle, Flat file.
- Used Informatica Designer to create mappings using different transformations to move data to a Data Warehouse. Developed complex mappings in Informatica to load the data from various sources into the Data Warehouse.
- Involved in identifying the bottlenecks in Sources, Targets & Mappings and accordingly optimized them.
- Created and Configured Workflows, Worklets, and Sessions to transport the data to target warehouse Netezza tables using Informatica Workflow Manager.
- Attended various meetings with the developers, clients, and the management team to discuss major defects found during testing, enhancement issues, and future design modifications.
- Developed detailed test conditions and documented test scripts and test procedures.
- Developed various reports and metrics to measure and track development effort.
- Used Win Runner for regression testing and Load Runner for server performance testing.
- Planning and estimating for the testing efforts, keeping the plan up to date.
- Coordinating with team at on-site and offshore
Environment: Informatica, Oracle 9i, Microstrategy
Confidential
Analyst
Responsibilities:
- Designed and implemented stored procedures views and other application database code objects.
- Maintained SQL scripts indexes and complex queries for analysis and extraction.
- Performed quality testing and assurance for SQL servers.
- Worked with stakeholders developers and production teams across units to identify business needs and solution options.
- Ensured best practice application to maintain security and integrity of data.
- Used Clear Quest for defect tracking.
- Involved in discussions with the Business Team and the developers regarding any changes in the requirements.
Environment: Informatica 7.3, Oracle 9i
Confidential
Developer
Responsibilities:
- Reviewed the System Requirement Specs (SRS).
- Create Use Cases process Flows, Data Flows, transitions and Decision trees by conducting Interviews, requirement workshops, brainstorming sessions; questionnaires with actuarial, stats, finance teams and getting the requirement of the system and changes need to be done.
- Analyzed business requirements document and Involved in developing test plan, test objectives, test
- Created Test strategies, test priorities etc.
- Managed requirements and developed Test Scripts and Test Cases using Test Director.
- Mapped requirements to business scenarios to assure that all requirements were covered.
- Involved in the performance testing and analyzed the response times under various loads.
- Used Performance monitor and Load Runner graphs to analyze the results.
- Manually performed Back-End testing by writing SQL queries.
- Worked with users to develop user acceptance plan and test cases.
- Tracked bugs using Test Director and performed regression testing of the entire application once the bugs are fixed.
- Attended various meetings with the developers, clients, and the management team to discuss major defects found during testing, enhancement issues, and future design modifications.
- Developed detailed test conditions and documented test scripts and test procedures.
- Developed various reports and metrics to measure and track testing effort.
- Used Win Runner for regression testing and Load Runner for server performance testing.
- Planning and estimating for the testing efforts, keeping the plan up to date.
- Preparation of system test plan & performance test plan.
- Coordinating with development team at on-site.
- Coordinating with testing team at offshore.
- Preparation of system test plan & performance test plan
Environment: ASP.NET, SQL SERVER 2000, JavaScript