Data Management Run Lead Resume
PROFESSIONAL SUMMARY:
- Knowledge on Hadoop Ecosystem components - HDFS, Map Reduce, Hive, Sqoop and Hbase for Data Analytics and in migrating the data from different databases to Hadoop HDFS and Hive using Sqoop.
- Business Intelligence professional with about 10 years of experience in Technical and Managerial activities involving different business domains: Banking - Credit Cards, Health Care - Medicare and Medicaid.
- Experience handling multiple teams in Onshore - Offshore model involving 40+ resources simultaneously working on different domains and technologies.
- About 9 years of experience in Data Warehouse space involving Data Analytics, Data Modeling, Data Mapping, Data Extraction, Data Cleaning, Data Validation, Data Manipulation, Compliance Reporting, Data Quality and ETL Process.
- Extensive knowledge of full Software Development Life Cycle (SDLC) in Waterfall and Agile methodologies.
- Strong Data Warehouse, RDBMS, and SQL background to perform complex operations in various databases like DB2, Oracle, Teradata, My SQL, Vertica, PostgreSQL etc.
- Experience in optimizing Hive SQL quarries and Spark Jobs
- 6 years of experience in ETL process using IBM Datastage and SQL to extract, clean, transform, validate, and load data into various databases.
- In-depth understanding on Star Schema and Snowflake Schema, Normalization Techniques, Slowly Changing Dimensions, Change Data Capture, Fact and Dimension Tables.
- 6 years extensive hands-on experience in SAS programming to Extract, Transform, Analyze and Report data to various bureaus and agencies.
- Extensive experience in using various SAS procedures such as PROC COMPARE, PROC MEANS, PROC SUMMARY, PROC SQL, PROC TRANSPOSE, PROC IMPORT / EXPORT, PROC REPORT, DATA STEP in SAS/Base and SAS/Macros.
- Developed complex ETL jobs from various sources such as SQL server, PostgreSQL and other files and loaded into target databases using Pentaho, Informatica, Talend OS ETL tool. Created Big Data Hadoop/Talend Dashboard
- Implemented various frameworks like Data Quality Analysis, Data Governance, Data Trending, Data Validation and Data Profiling with the help of technologies like Bigdata, Data Stage, Spark, Python, Mainframe with databases like Netezza and DB2, Hive & Snowflakes
- Experience in Python scripting to perform data load operations.
- Hands on experience in Amazon Web Services (AWS) including EC2, VPC, S3, ELB, EBS, IAM, Cloud Watch.
- Hands-on experience withAmazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SES, SQS, Lambda, EMRand other services of the AWS family.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Pyspark concepts.
- Proficient in different performance tuning, optimization, and Parallelism Techniques in ETL Tools and DB2 SQL Programming.
- Good knowledge on various Scheduling and Code Versioning Tools like Autosys, TFS, SVN and SCCS.
- Strong analytical, logical and programming skills with ability to quickly understand client’s business needs.
TECHNICAL SKILLS:
ETL Tools: IBM Data stage; Informatica PowerCenter; Ab Initio
Hadoop / Big Data: Hadoop 2.x, HDFS, HBase, Hive 1.2.4, Sqoop, Spark
Cloud Technologies: Amazon Web Services (AWS)
Scripting Languages: Unix Shell Scripting; SAS Scripting; Python; VB Scripting
Databases: IBM DB2; HP Vertica; Teradata; Oracle; MySQL; PostgreSQL; Hbase
Development Methodologies: Waterfall Model; Agile Methodologies (Scrum)
Operating Systems: Windows; Linux; Unix
BI Reporting Tools: Cognos Report Studio; Tableau
Change Management Tools: ITSM Remedy; Maximo; Service Now
Scheduling Tools: CW Autosys; IBM Tivoli Workload Scheduler (TWS)
Metadata Tools: IBM Metadata Workbench and Business Glossary
Domain Knowledge: Banking - Credit Services;
PROFESSIONAL EXPERIENCE:
Data Management Run Lead
Confidential
Responsibilities:
- Lead 40+ offshore resources working on different data warehouses and technologies to design, develop, deploy, and support various ETL processes, extract generation and SAS report creations.
- Develop Ab Initio graphs based on design document to extract, transform and load data into different RDBMS tables along with generation of extracts for downstream processing.
- Create complex SQL’s to read and analyze data as per ad-hoc requests from Business users / downstream.
- Encoded and decoded json objects using PySpark to create and modify the dataframes in Apache Spark
- Develop framework for converting existing PowerCenter mappings and to PySpark(Python and Spark) Jobs.
- Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, S3.
- Design and develop ETL integration patterns using Python on Spark.
- Create Pyspark frame to bring data from DB2 to Amazon S3
- Provide guidance to development team working on PySpark as ETL platform
- Review and approve changes related to different data warehouses to ensure all functionalities are met and coding standards are followed.
- Coordinate with multiple teams to gather requirements, propose design solutions, create implementation and validation plans for process deployment.
- Provide admin / code related support to SAS users along with report generations using SAS/Base, SAS/Macros and other SAS procedures.
- Automate source file receival process and partial table load clean-up activities caused by environment failures.
- Analyze issues to identify root cause and propose preventive / corrective actions to Business Teams.
- Create and review value added services that addresses long standing issues as part of continuous process improvement.
Environment: Abinitio GDE, SAS/Base, SAS/Macros, SAS Procedures, Oracle, Spark, PySpark, Shell Scripting, Waterfall Methodology, Linux, Autosys, Service Now, Private Label Credit Cards
Delivery Tech Lead
Confidential
Responsibilities:
- Lead a team of 6 to identify data sources, perform data mapping, gather requirements and create reports for Business / State Agencies.
- Create and automate reports on Medicare and Medicaid Claims using SAS that runs daily, weekly, monthly and quarterly.
- Implemented AWS Step Functions to automate and orchestrate the Amazon SageMaker.
- Create Technical Design Document from Functional Specification Document as per Client standards.
- Use Python scripting and Informatica Power Center to clean and load files into Greenplum.
- Created an automation process for Distribution group which receive Inventory and sales data send activation report using Talend and Big query
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop.
- Developed spark applications in python (PySpark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
- Worked on analyzing, cleaning and loading of raw data using Hive queries and Informatica.
- Review and optimize code for performance as per latest coding standards.
- Worked on reading and writing multiple data formats like JSON, ORC, Parquet on HDFS using PySpark.
- Worked with Pentaho, ETL, SSIS, Talend Open Studio & Talend Enterprise platform for data management
- Perform User Acceptance Testing (UAT) with Business Owners iteratively and deploy to production.
- Analyzing the source data to know the quality of data by using Talend Data Quality.
- Use VB Scripts to generate complex multi tab excel reports while aggregating and summarizing data.
- Create data mapping documents by understanding source and destination data for migration.
- Create reusable SAS macros to process claims and member related information to use across teams.
- Worked in a team creating Proof of Concept (POC) in Amazon Web Services to enable integration with Cloud for existing process.
- Involved with development of Ansible playbooks with Python and SSH as wrapper for management of AWS node configurations and testing playbooks on AWS instances.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop.
- Worked on analyzing, cleaning and loading of raw data using Hive queries.
Environment: SAS/Base, SAS/Macros, SAS Procedures, Python, Amazon Web Services, Shell ScriptingVB Scripting, Informatica PowerCenter, Talend, Hadoop, Hbase, Hive, Sqoop, Oracle, My SQL, PostgreSQLWaterfall Methodology, Autosys, Spark, PySpark, TFS, Service Now, Windows, Linux, Unix, Tableau, Health Care Claims
Delivery Tech Lead & Senior Digital Data Engineer
Confidential
Responsibilities:
- Lead a team of 4 to create and automate various ETL process and SAS procedures to compute internal Credit Scoring of customers from RAW data.
- Consume, transform and calculate key variables from raw data required for Credit Scoring using SAS scripting and SQL queries.
- Coordinate with multiple teams to gather requirements, code and deliver data for Business Users.
- Provide Level 3 Application Support to identify and resolve any data / process related issues caused by bad data.
- Work with different data sources like HDFS, Hive and Teradata for Spark to process the data.
- Analyze and migrate jobs from existing DB2 platform to Teradata for analytical processing and HP Vertica to DB2 version 11 for operational process.
- Analyzed the end-to-end workflow of existing Vertica process to create compatible and fine-tuned DB2 versions to reduce redundancy and improve performance while migrating process.
- Create reusable scripts and ETL process in Data Stage to handle complex operations that can be leveraged in multiple processes.
- Query optimization and performance tuning of complex SQL queries and Data Stage jobs to improve performance and reduce redundancy.
- Create Change Data Capture and Slowly Changing Dimension jobs to capture data changes and process current and historical data in data warehouse.
- Use different SAS Procedures to extract data from various sources and transform as per requirements to report to Business Users.
- Use SAS data step and other SAS procedures to perform data analysis and comparisons while migrating data between different platforms.
Environment: IBM Data Stage, SAS/Base, SAS/Macros, SAS Procedures, IBM DB2, HP Vertica, Teradata, Autosys, Tableau, Toad, MS Visio, SVN, ITSM Remedy, Waterfall and Agile Methodologies, Windows, Unix, Linux, Shell Scripting, Credit Card Scoring
Application Developer
Confidential
Responsibilities:
- Analyzed the complete flow of data between different sources in the system to identify any validation gaps and implemented data quality checks to ensure sanity of data.
- Secure Authorized Data Source for the system with a rating of well recorded and well documented.
- Created mapping documents and conversion logics between two systems when migrating from Legacy mainframe system of bank to TSYS.
- Perform data validations across systems to ensure data is flowing as per Business specifications.
- Create Data Quality checks on Key Business Elements to identify and analyze erroneous data and report to Business.
- Perform trending of certain amount fields to identify any unusual activity on account for fraudulent activity.
- Create and maintain complete metadata of the client system by automating the process that reduced the turnaround time to 2 hours from 24 hours.
- Migrated existing metadata from DB2 system to IBM Metadata Workbench to standardize the process and increase the availability of data to wider audience.
- Create Implementation, Back out and Validation plans to migrate code into production from lower environments.
Environment: IBM DataStage, IBM DB2, HP Vertica, TWS, Autosys, UNIX, Linux, Windows, Shell Scripting, Waterfall Methodology, Cognos Reporting Tool, Maximo, SCCS, IBM Metadata Workbench and Business Glossary