Sr. Informatica Developer Resume
Chicago, IL
SUMMARY
- Over 14 years of IT experience in all phases of SDLC process from Software Analysis till Production support, and Implementation of Business Intelligence (BI) and Big Data Analytical Applications with strong emphasis in ETL, Modeling, Data Analytics and Reporting tools.
- 10+ years of experience in ETL technologies with Data Integration, Data Analytics, Data warehouse Design, Analysis, Development, Testing and Implementation
- 4+ years of experience in Big Data & Hadoop technologies involving HDFS, Sqoop, Hue, Hive, Impala, Spark (SQL), Pig, Hbase, NoSQL database, Kafka, Solr, Hbase, Flume, Oozie, StreamSets, NiFi, Datameer, Trifacta, Dataiku, Python and Linux
- Expert in designing and developing Apache Sqoop program to ingest the data from Oracle, SQL Server to Hive and performed descriptive and predictive analysis on the imported data in Impala
- Implemented SCD Type 1 and Type 2 in Hive to process ETL updates with huge amount of data
- Strong experience in automating the data ingestion jobs in batch and near - real time using Oozie
- Experienced in generating key store files for storing DB password and monitor the jobs in Hue
- Expertise in using Hue for Hive, Impala, Spark, Pig, Hbase, Solr for data analysis in HDFS
- Great technical proficiency in Big Data Ingestion tools using NiFi, StreamSets and Informatica by creating data processing flows to load data in HDFS
- Developed Python scripts in PySpark for building data pipelines/analysis and ingest data in HDFS
- Strong understanding in Kafka and Flume to capture messages and store in HDFS
- Having knowledge in NoSQL database like MongoDB and Splunk for analyzing the web service logs
- Designed and developed collections in Solr and successfully deployed Cloudera Search functionality
- Strong knowledge in Kerberos authentication and Sentry authorization to maintain security
- Implemented data munging/wrangling in Datameer and Trifacta that transforms the clickstream data using sessionization logic for analytics, reporting and deep learning
- Experience in Deep and Machine Learning with supervised - Random Forest Model using Dataiku
- Strong experience in data modeling implementing Star and snowflake schemas in ERWIN and SAP Power designer with considerations on handling PII, PCI, PHI data using GDPR, HIPAA standards
- Expert in designing and developing Canned reports and Interactive Dashboards in the Zoom Data and Tableau 2018.1 using both Live and Extract connection from Hadoop HDFS
- Experienced in various domains - Insurance, Health Care, Oil & Energy and Airline Industry
- Expertise in both Waterfall and Agile methodologies
- Passionate to Excellence, Problem Solving, Troubleshooting and Decision Making skills
TECHNICAL SKILLS
BigData Ecosystem: Sqoop, Hue, Hive, Impala, Spark (SQL), Pig, Hbase, NoSQL database, Kafka, Solr, Hbase, Flume, Oozie, StreamSets, NiFi, Datameer, Trifacta, Dataiku
ETL Tools: Informatica, Business Objects Data Integrator, Data Stage
Reporting Tools: Tableau BI, Zoom Data, SAP Business Objects
Modeling Tools: SAP Power Designer, ERWIN, MS Visio
Databases: Teradata 14.0, Oracle 11g, SQL Server 2014, DB2, Netezza, MS Access
Language Skills: BTeq, SQL, PL/SQL, SQL*Plus, UNIX - Perl, Bash Shell Scripting
Utilities: TOAD, Advanced Query tool(AQT), SQL*Loader, Teradata SQL Assistant, Splunk, Kibana, HP Quality Center, Service Now, Jupyter Notebook, Spyder
Operating Systems: Windows, UNIX and Linux
Version Control: GitHub, GitLab, Share Point
PROFESSIONAL EXPERIENCE
BIG DATA ARCHITECT
Confidential, Dallas, TX
Technical Environment: Teradata, Hadoop, Hive, Tableau
Responsibilities:
- Involving in the Confidential AI - Accessory Personalization recommendations engine project for Confidential Wireless products
- Actively involved in gathering and analyzing the system specifications, use case, and storage requirements
- Designed and created data pipeline to load address information with CDC implementation from Hive to Postgres SQL database using PySpark
- Analyzed and Lead the data analysis team in deriving the metadata, data dictionary and Source to target Mapping specification documentation
- Coded and Lead the data engineering team to design the data lake, aggregations and bridge table to arrive at the unified view and to understand the customer journey by building data pipelines
- Responsible for all the data meetings with enterprise data warehouse(DWH) team and document the meeting notes and follow up on the action items
- Guiding the data science team to understand the unified view so as to design the market basket modeling for recommendation engine
- Worked with client partners in resolving the impediments to achieve the target on time
- Prepared canned and trend reports to understand the customer journey from different cluster distribution
- Responsible for the delivery of the data analysis spreadsheet in comparison to Hadoop and prepared stories which was approved by the enterprise DWH team.
- Facilitating the daily scrum meetings and document the team progress and impediments
- Track the Questionnaire to closure for the team to achieve the set goals for the week
- Maintain the program code in GitLab for version control
BIG DATA ARCHITECT
Confidential, Dallas, TX
Technical Environment: Hue, NiFi, Datameer, Trifacta, Dataiku, Zoom Data, Tableau, Power Designer
Responsibilities:
- Gather requirements and analysis of multiple source systems data like Relational, Clickstream and Portal
- Build Data pipelines for Extraction, Loading and Transformation of business critical data
- Estimated and designed the data storage based on the presentation requirements from business
- Data profile the reservation log data to identify the data elements for descriptive and predictive analytics
- Developed the SQL Server data model to store the data attributes from the REST API
- Utilized latest technologies and rich ecosystem of tools provided by Hadoop such as HBase, Dataiku(Machine Learning), Hive, Kafka, Solr
- Designed the data ingestion program from SQL Server to Big Data platform with considerations of the GDPR standards to store PII and Non PII data for easy access and deletion
- Developed the Hive Managed and External tables to store data in parquet format file and refreshed the Impala tables for quick access by downstream applications
- Implemented Joins, Dynamic Partitioning, File Formats and Compression techniques in HDFS and used various functions on Hive tables.
- Developed and tested the Sqoop program for the data ingestion into the Big Data (Hive Data Store)
- Developed Sqoop wrappers with python scripting to FTP the heavy volumes of data between Data Lake and Data Warehouse.
- Automated the complete logic in Oozie scheduler in Hue by creating workflows and coordinators
- Implemented data interface to get information of customers using REST API and pre-process data using Map Reduce and store into HDFS.
- Building the process groups and processes in the Nifi to pull the files from the various servers and placing the files in the HDFS and components to convert it into JSON
- Develop Automated Quality checks with UNIX shell scripts and reusable Source prep and Loading jobs to verify and reconcile data during data loads.
- Developed Python scripts in Spark and used Datameer in HDFS for data mining and data discovery
- Designed and developed Tableau and ZoomData reports to represent the HDFS data for business decision
- Review the Tableau reports and train the business users to enable the Self-Serve capability
- Conduct design and code reviews with the enterprise team to make sure architecture is compliant
- Participate in project roadmap discussions and handle daily Scrum meetings in 3 week iterations.
- Maintain the program code in GitHub for version control
TECHNICAL ARCHITECT
Confidential
Technical Environment: Informatica 9.6.1, Informatica 10, PMPC, Oracle 11G, Teradata, DB2, SQL Server, UNIX, MS Visio, Service Now, Sqoop, Hive, Impala, Oozie
Responsibilities:
- Createddata profilingfor the source data for trapping thedata related issues
- Designed the data modelling of the data warehouse implementing fact/dimensional tables
- Documented the High Level process flow and Low level design that helps the ETL development team to build effective and efficient mappings
- Processed semi-structured and unstructured data into a form suitable for analysis using PERL scripting
- Developed python scripts to perform the data analysis in HDFS and automated using Oozie scheduler
- Ingested data from Oracle database to Big Data platform HDFS using Apache Sqoop with failure recovery
- Built tables and aggregate views in Hive and Impala for downstream applications to access the HDFS data
- Created and processed RDD's and DataFrames using SparkSQL. Design and develop Shell Scripts, Pig Scripts and Hive Scripts.
- Created collections in Solr and used it in searching from Cloudera search functionality
- Performance tuned the jobs for optimization the processing run times which has huge volume of data
- Extensively worked with different vendors/teams to obtain the required pre-requisite on time
- Updated the status reports and reported the status to client and managers every week.
Sr. TECHNICAL DESIGNER/LEAD
Confidential, Dallas, TX
Technical Environment: Informatica 9.5.1, Oracle 10G, SAP Data modeller, Teradata, UNIX, Quality Center
Responsibilities:
- Involved in source data analysis, gap analysis and data profiling of the prescriber data
- Created a Logical and Physical data model to store the “single version of truth” of Prescriber data with considerations of HIPAA and PII data standards
- Designed and developed Informatica complex mappings to handle SCD Type I and 2 logics
- Developed complex stored procedures and functions for downstream application to consume the data
- Written Teradata BTeq scripts and used FLOAD and MLOAD utility as per requirements
- Enabled push down optimisation in Informatica session to improve the performance of the mappings
- Implemented complex aggregate tables and views on multiple EDW tables to improve the accessibility
- Written UNIX scripts for the validation of incoming files and scheduling the Informatica workflows
- Involved in enhancements and maintenance activities of the data warehouse including tuning, modifying of stored procedures for code enhancements
- Unit tested each component developed in Informatica, guided the QA team in understanding the requirements and test the code functionality to match with requirements
- Provided UAT support to users to simulate the production usage and also verify the business needs.
- Created the pre-production documents and plan for the implementation team to deploy the Informatica, UNIX and DB components in production server
- Identified problems in existing production data and developed one time scripts to correct them and fixed the invalid mappings and troubleshoot the technical problems of the database
- Reviewed and responsible for all project deliverables to clients
- Supported multiple concurrent development projects, enhancements and retrofitting
- Reported the status to client and managers by sending the weekly status report
Sr. INFORMATICA DESIGNER
Confidential, Parsippany, NJ
Technical Environment: Informatica 8.6.1, Business Objects 6.5, Oracle 10G, Netezza, SAP Power Designer, WinSQL, Aginity, Mercury Test Director
Responsibilities:
- Developedand enhanced the data models and documented data flow using Visio
- Developed and created data models, data mappings, use cases, stories and/or specs for reports, dashboard and data cubes
- Involved in design, development and maintenance of database for Data warehouse project
- Created complex mappings which involved Slowly Changing Dimensions, implementation of Business Logic and capturing the deleted records in the source systems
- Worked with complex mappings having an average of 15 transformation rules
- Monitored Workflows and Sessions using Workflow Monitor
- Coded Unix Scripts to capture data from different relational systems to flat files
- Used SQLs to test various reports and ETL load jobs in development, QA and production
- Performed Migration and Validation as per SDLC standards
Sr. INFORMATICA DEVELOPER
Confidential, Chicago, IL
Technical Environment: Informatica 8.5, Business Objects XI, Oracle9i, UNIX, Test Director 8.0
Responsibilities:
- Designed and developed the data model to store the product data in data warehouse
- Developed mappings in Informatica 8.5 using various transformations like Filter, Expression, Sequence Generator, Update Strategy, Joiner, Stored Procedure, and Union in the Informatica Designer
- Extracted the data from the flat files and other RDBMS DB into staging area and populated in DWH
- Maintained stored definitions, transformation rules and targets definitions using repository Manager
- Responsible in creation & execution of unit, link and system test cases and review the results with BSA.
- Created stored procedures and functions to truncate data in the target before the session run
- Helped the team in writing SQL Queries in Oracle which extensively helped in identifying the issues
- Designed the Business object universe and resolving the loops that was used by reporting team
- Involved in preparing reports with drill up and drill down options with run time filters and parameters
- Responsible for all the project deliverables like low level design document, unit test case & logs and system test case & logs for all Informatica & Business Objects components
INFORMATICA DEVELOPER
Confidential
Technical Environment: Informatica 7.1, Oracle 9i, UNIX, Mercury Test Director 8.0
Responsibilities:
- Involved in Data Quality Analysis and understood all the phases of the SDLC processes
- Prepared the Mapping Specification and Component Test Plan, Test Cases and Test Results
- Developed Informatica ETL mappings that transfer data from Source to the Target DWH
- Prepared the test data and involved in testing and performance tuning using DB hints
- Involved in Pre-release testing for all the incremental load mappings to avoid data loss.
- Prepared production implementation plan and documents to support the deployment
- Involved in quality process like review and testing and prepared traceability matrix.