Big Data/snowflake/etl Informatica Developer Resume
Charlotte, NC
SUMMARY
- 9+ years of extensive experience with Informatica Power Center in all phases of Analysis, Design, Development, Implementation and support of Data Warehousing applications using Informatica Power Center 10.x/9.x/8.x/7. x.
- Over all 4+ years of experience in developing and implementing Big Data solutions and data mining applications on Hadoop using HDFS, MapReduce, HBase, Pig, Hive, PySpark, Sqoop and Oozie, Zookeeper
- Have clear understanding of Data warehousing concepts with emphasis on ETL and life cycle development including requirement analysis, design, development and implementation.
- Extensive working experience in design and development of data warehouse, marts and ODS.
- Handful working experience in Data warehousing development with data migration, data conversion and extraction/transformation/loading using Informatica Power Center to extract & load data into relational databases like SQL Server, Oracle, Teradata, DB2.
- Good work experience in Informatica Data Integration Tools - Such as Repository Manager, Designer, Workflow Manager, Workflow Monitor and Scheduled workflows using Workflow Manager.
- Experience in Informatica Power Center for Data integration between source & Target which includes development of Joiner, Aggregator, Lookup, Salesforce Lookup, Router and Sorter transformations.
- Extensively worked with Informatica mapping variables, mapping parameters and parameter files.
- Experience in several facts of MDM implementations including Data Profiling, Data extraction, Data validation, Data Cleansing, Data Match, Data Load, Data Migration, Trust Score validation.
- Experience in working with business analysts to identify and understand requirements in order to create Technical Design Documents.
- Experience working with Informatica Data Quality (IDQ) for data cleansing, data matching and data conversion.
- Experienced in creating IDQ mappings using Consolidation, Key Generator, Match, Merge, Exceptional, Labeler, Standardizer, Address Validator transformations with Informatica Developer and migrated to Informatica Power Center.
- Designed ETL process using Informatica Tool to load from Sources to Targets through data Transformations.
- Extensively worked with Informatica performance tuning involving source level, target level, and map level bottlenecks.
- Strong experience in Extraction, Transformation, and Loading (ETL) data from various sources into Data Warehouses and Data Marts using Informatica Power Center Power Exchange, Power Connect as ETL tool on Oracle, DB2, and SQL Server Databases.
- Experience with ETL process using Informatica Power Center/ Power Exchange (10.x/9.x/8.x) tools - Source/Target Analyzers, Mapping designer, Mapplet Designer, Transformation Developer, Repository Manager, Repository Server, and Workflow Manager & Workflow Monitor.
- Expertise in Advanced Informatica Integrations - Web Services Integration, Java & XML Parser/Writer Transformations and used XML Sources/Target's, Message Queues.
- Experience in integrating Mainframe with Informatica Power center using Power Exchange and handled various non-relational data sources sources/targets like XML, Mainframe & Flat Files like fixed width and delimited.
- Experience in Performance Tuning- Identifying bottlenecks, tuning of Sources/Targets, Mappings, Transformations and Sessions for better performance and efficiency.
- Hands on experience with Informatica Metadata Manager for data lineage requirements.
- Very good understanding of Informatica Data Quality for Data profiling using Informatica Master Data Management (MDM) to generate Golden Record & Master Records.
- Designed & Developed Data Replication Projects in near real rime using IBM - Info Sphere Change Data Capture (CDC) Tool for Policy center data mirroring.
- Experience in building Build and Deployment plans using ALM Bit Bucket & Bamboo for CI/CD deployments.
- Strong experience in Dimensional Modeling using Star and Snow Flake Schema, Identifying Facts and Dimensions, Physical and logical data modeling using Erwin.
- Experience in using Exception Handling strategies to capture errors and referential integrity constraints of records during loading processes to notify the exception records to the source team.
- Experience in design, development, Unit testing, integration, debugging and implementation and production support, client interaction, and understanding business application, business data flow, and data relations from them.
- In-depth understanding/knowledge of Hadoop Architecture and its various components such as Job Tracker, Task Tracker, YARN, Name Node, Data Node and MapReduce concepts.
- Knowledge of provisioning new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals, and testing HDFS, Hive, Pig and MapReduce access for the new users.
- Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing.
- Extensive experience with ETL and Query tools for Big data like Pig Latin and HiveQL.
- Extensive Knowledge of developing Spark Streaming jobs by using RDD's (Resilient Distributed Datasets) and leveraged PySpark and Spark-Shell accordingly.
- Extensive experience in working with Oracle, MS SQL Server, DB2, MySQL RDBMS databases.
- Experienced in using different scheduling tools - Control-M, Auto Sys, and Maestro/TWS & Cron Job.
- Hands-on experience across all stages of Software Development Life Cycle (SDLC) including business requirement analysis, data mapping, build, unit testing, systems integration and user acceptance testing.
- Extensive experience in formulating error handling mechanism.
- Excellent analytical skills in understanding client’s organizational structure.
- Excellent problem-solving skills with a strong technical background and result oriented team player with excellent communication and interpersonal skills.
TECHNICAL SKILLS
Big Data/Hadoop Technologies: Spark, Scoop, Scala, Hive, Pig, Cloudera, Flume, NoSQL, Map Reduce, HBase.
ETL Tools: Informatica Power Center 10.4/10.2/9.6.1/9.5.1/8.6.1 , Informatica Cloud (IICS), Informatica Data Quality (IDQ) 9.x, IBM DataStage, Informatica Power Exchange 10.x/9.x
Language: SQL, PL/SQL, Unix Shell, Perl, C, Java, Python
Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.
Cloud Computing Tools: Amazon AWS, S3, EMR, EC2, Lambda, VPC, Route 53, Cloud Watch
Databases: Microsoft SQL Server 2008, MySQL 4.x/5.x, Oracle 10g, 11g, 12c, DB2, Teradata, Netezza
NO SQL Databases: HBase, Cassandra, MongoDB, Maria DB.
Build Tools: Jenkins, Maven, Ant, Toad, SQL Loader, RTC, RSA, Control-M, Oozie, Hue, SOAP UI
Modeling: Rational Rose, Star UML, Visual paradigm for UML
Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos7.0/6.0.
Operating Systems: All versions of UNIX, Windows, LINUX, Macintosh HD, Sun Solaris
PROFESSIONAL EXPERIENCE
Confidential, Charlotte, NC
Big data/snowflake/ETL Informatica developer
Responsibilities:
- Developed and refined the Spark process for ODS (Operations Data Store) by making changes and enhanced the performance of the data ingestion from raw and refined to publishing Postgres data to the core script using Python and PySpark.
- Developed complex SQL queries for querying data against different data bases for data verification process.
- Prepared the Test Plan and Testing Strategies for Data Warehousing Application
- Developed ETL test scripts based on technical specifications/Data design documents and Source to Target mappings.
- Extensively interacted with developers, business& management teams to understand the OPM project business requirements and ETL design document specifications.
- Participated in regular project status meetings and QA status meetings.
- Extensively used and developed SQL scripts/queries in backend testing of Databases.
- Validating data fields from the refined zone to ensure the integrity of the published table.
- Converting ingested data (csv, XML, Json) to parquet file format in compressed form.
- Created the business models from business cases and enterprise architecture requirements for process monitoring, improvement, and reporting and led the team in business intelligence solutions development
- Experience in performing transformations and actions on RDD, Data frames, Data sets using Apache spark.
- Good Knowledge of Spark and Hadoop Architecture and experience in using PySpark for data processing.
- Applied advanced DW techniques and Informatica best practices to load a Financial, HR & Supply Management Data Warehouse, Data Marts, and Downstream Systems.
- Developed several complex Informatica mappings, Mapplets, stored procedures, and reusable objects to implement the business logic and to load the data incrementally.
- Design, develop, and test Informatica mappings, workflows, work lets, reusable objects, SQL queries, and Shell scripts to implement complex business rules.
- Developed Gsutil scripts for compression with Gzip, backup, transfer to edge node with all necessary file operational requirements for BQ load jobs.
- Worked on data that was a combination of unstructured and structured data from multiple sources and automated the cleaning using Python scripts.
- Congregated data from multiple sources and performed resampling to handle the issue of imbalanced data.
- Coded in PostgreSQL to published 10 million records from more than 90 tables to ensure the integrity of data flow in real-time.
- Providing a single environment for data integration and data federation with role-based tools that share common metadata using Informatica data virtualization.
- Understanding ETL requirement specifications to develop HLD & LLD for type-1, SCD Type-II, and Type-III mappings and was involved in testing for various data/reports.
- Experienced as Senior ETL Developer (Hadoop ETL/ Teradata /Vertica / Informatica / DataStage/ Mainframe), Subject Matter Expertise (SME), Production Support Analyst, QA Test.
- Extensively worked on TFS (Microsoft) as a tool to deploy production-level code in part with Git.
- Constructed robust, high-volume data pipelines and architecture to prepare data for analysis by the client.
- Architected complete, scalable data warehouse and ETL pipelines to ingest and process millions of rows daily from 30+ data sources, allowing powerful insights and driving daily business decisions.
- Implemented optimization techniques for data retrieval, storage, and data transfer.
- Creating test cases for ETL mappings and design documents for production support
- Setting up, monitoring and using Job Control System in Development/QA/Prod
- Extensively worked with flat files and excel sheet data sources. Wrote scripts to test the flat files data in the databases,
- Scheduling and automating jobs to be run in a batch process.
- Effectively communicate testing activities and findings in oral and written formats
- Worked with ETL groups and Acquisition team and business analyst for understating mappings for dimensions and facts.
- Extracted data from various sources like Oracle, flat files and DB2 server.
- Worked on issues with migration of Data from development to QA-Test Environment
- Created HBase tables to load large sets of structured, semi-structured, and unstructured data coming from UNIX, NoSQL, and a variety of portfolios.
Environment: Informatica Power Center 10.4, Snowflake, Spark 2.4, H Base 1.2, Tableau 10, Power BI, Python 2.7 and 3.4, Scala, PySpark, HDFS, Flume 1.6, Hive, Zeppelin, PostgreSQL, MySQL, TFS, Linux, Spark SQL, Kafka, NIFI, Sqoop 1.46, AWS (S3).
Confidential, Redmond, WA
Big data/snowflake/ETL Informatica developer
Responsibilities:
- Reduced the overall EMR production cluster's cost (Amazon Web Services) by obtaining the best configuration for running data.
- The developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Analyze and define researcher's strategy and determine system architecture and requirement to achieve goals and developed multiple Kafka Producers and Consumers from as per the software requirement specifications.
- Used Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling.
- Developed logical data models and physical data models with experience in Forward and Reverse Engineering using ERwin.
- Designed and developed source to target data mappings. Effectively used Informatica Best Practices / techniques for complex mapping designs.
- Experience in using FTP services to retrieve Flat Files from external sources.
- Worked with wide range of sources such as delimited flat files, XML sources, MS SQL server, DB2 and Oracle databases.
- Created complex mappings to transform the Business logic using Connected/Unconnected Lookups, Sorter, Aggregator, Update Strategy, Router and Dynamic lookup transformations for populating target tables in an efficient way.
- Used Power Center Workflow Manager to create workflows, sessions and also used various tasks like command, event wait, event raise, Email to run with the logic embedded in the mappings.
- Involved in creating and modifying UNIX shell scripts and scheduling through Active Batch.
- Developed PL/SQL stored procedures for source pre load and target pre load.
- Developed sessions using Informatica Server Manager and was solely responsible for the daily loads and handling of the reject data.
- Configured Spark Streaming to get ongoing information from Kafka and store the stream information to HDFS.
- Worked closely with MDM team to identify the data requirements for their landing tables and created Mappings, Trust and Validation rules, Match Path, Match Column, Match rules, Merge properties, and Batch Group creation as part of Informatica MDM.
- Developed several complex Informatica mappings, Applets, stored procedures, and reusable objects to implement the business logic and to load the data incrementally.
- Implemented Spark using Python and Sparksql for faster processing of data and Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark)
- Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance and responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in MongoDB.
- Worked on deploying the code from the lower environments to higher environments using deployment groups. Creating repository service and integrating service in Informatica Admin console.
- Effectively used Informatica parameter files for defining mapping variables, workflow variables, FTP connections, and relational connections and stored and executed params variables in the control table.
- Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS and integrated Hive server 2 with Tableau using Horton Works HiveODBC driver, for auto-generation of Hive queries for the non-technical business user.
- Worked extensively on Spark, MLLib to develop a Logical regression model on operational Data, and the Hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions, and bucketing, intended for efficiency.
- Load and transform large sets of structured, semi-structured data using Hive and extract real-time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
- Involved in performing the analytics and visualization for the data from the logs and estimate the error rate and study the probability of future errors using regressing models.
- Designing & creating ETL jobs through Talend to load huge volumes of data into MongoDB, Hadoop Ecosystem, and relational databases.
- Created Hive tables, partitions, and implemented incremental imports to perform ad-hoc queries on structured data.
Environment: Informatica Power Center 10.1, ERwin, WinSQL, WinSCP, Hadoop YARN, Spark-Core, Spark-SQL, Graphix, Scala, Python, Kafka, Zeppelin, Jenkins, Docker, Micro services, Pig, Sqoop, Cassandra, Informatica, Cloudera, Oracle 12c, Linux, ETL.
Confidential - Mount Laurel, NJ
ETL/Informatica Developer
Responsibilities:
- Worked on the requirements with Business Analysts and business users also involved in working with data modelers.
- Worked closely with data population developers, multiple business units, and a data solution engineer to identify key information for implementing the Data warehouses.
- Analyzed logical and physical data models for Business users to determine common data definitions and establish referential integrity of the system.
- Parsed high-level design spec to simple ETL coding and mapping standards.
- Used Informatica Power Center as an ETL tool to create source/target definitions, mappings, and sessions to extract, transform and load data into staging tables from various sources.
- Written TeradataBTEQ and Informatica mappings using TPT to load data from Staging to base.
- Fine-tuned TeradataBTEQ as necessary using explain plan and collecting statistics.
- Developed and tested all the backend programs, Informatica mappings, and update processes
- Populated the Staging tables with various Sources like Flat files (Fixed Width and Delimited), Oracle, MySQL, and Informix.
- Created mappings using various Transformations such as Source qualifier, Aggregator, Expression, Lookup, Router, Filter, Rank, Lookup, Sequence Generator, java, and Update Strategy.
- Created and used the Normalizer Transformation to normalize the flat files in the source data.
- Experience in converting Informatica Power Center Objects to Informatica Cloud (IICS).
- Implemented the New process with Informatica cloud and loading data into Snow Flake DWH.
- Experience with validating all migration objects making sure of data quality.
- Worked with different Snowflake features during the development process.
- Extensively built mappings with SCD1, SCD2 implementations as per the requirement of the project to load Dimension and Fact tables.
- Used Evaluate expression options to validate and fix the code using the Debugger tool while testing Informatica code.
- Handled initial (i.e., history) and incremental loads into target database using mapping variables.
- Used Debugger to debug mappings and created breakpoints for better analysis of mappings.
- Worked with Workflow Manager for maintaining Sessions by performing tasks such as monitoring, editing, scheduling, copying, aborting, and deleting.
- Worked on performance tuning at both the Informatica level and Database as well by finding the bottlenecks.
- Worked on Maestro job scheduling and Unix Scripting.
- Developed UNIX shell scripts to run the pmcmd functionality to start and stop sessions, batches, and scheduling workflows.
- Involved in migrating the ETL Code to different environments from Dev to UAT and then to Production with ETL Admins.
- Performed Unit testing and created unit test plan of the code developed and involved in System testing and Integration testing as well. Coordinated with the testers and helped in the process of integration testing.
- Heavily involved in Production support on a rotational basis and supported the DWH system using the ticketing tool for the issues raised by Business users.
- Solved Different Severity Tickets based on SLAs for data issues raised by Customers using the trouble ticket system
- Experience in working with reporting team in building collection layer for reporting purpose.
Environment: Informatica Power Center 8.1, SQL, PL/SQL, UNIX, Shell Scripting, SQL Server 2008, Sybase, Oracle 11g, DB2, Control-M, Cognos 8.4.
Confidential, San Diego, CA
Informatica Developer
Responsibilities:
- Worked with Power Center Designer tools in developing mappings and Mapplets to extract and load the data from flat files and Oracle database.
- Mapping & transforming existing feeds into the new data structures and standards utilizing Router, Lookups Using Connected, Unconnected, Expression, Aggregator, Update strategy & stored procedure transformation.
- Worked on Informatica Power Center tool - Source Analyzer, Data Warehousing Designer, Mapping Designer & Mapplets, and Transformations.
- Used Informatica as an ETL tool to create source/target definitions, mappings and sessions to extract, transform and load data into staging tables from various sources.
- Responsible for the data management and data cleansing activities using Informatica data quality (IDQ). Collaborated with data management team and conducted large data management workshop
- Analysed, modelled big-data sets and developed conceptual model to handle velocity and variety of big data.
- Developed Shell Scripts for automation, for file validation and for data loading procedures, designed re-usable component and exception handling techniques.
- Handle service requests in a timely manner using ITIL concepts around change, incident and problem management
- Outlined the complete process flow and documented the data conversion, integration and load mechanisms to verify specifications for this data migration project.
- Parsing high-level design spec to simple ETL coding and mapping standards.
- Maintained warehouse metadata, naming standards and warehouse standards for future development
- Created the design and technical specifications for the ETL process of the project. Worked with slowly changing dimension Type1, Type2, and Type3.
- Maintained Development, Test and Production Mappings, migration using Repository Manager. Involved in enhancements and Maintenance activities of the data warehouse.
- Performance tuning of the process at the mapping level, session level, source level, and the target level.
- Utilized Informatica IDQ to complete the initial data profiling and matching/removing duplicate data for the process of data migration from the legacy systems to the target Oracle Database.
Environment: Informatica Power Center 9.6.1, Informatica Power Exchange 9.6.1, Informatica Data Quality 9.6.1, Amazon Redshift, SQL Server, Data Masking option, Autosys, Shell Scripting, XML, SQL Loader
Confidential
Informatica Developer
Responsibilities:
- Evaluated and documented Business processes to accurately reflect the current and future state of the process
- Identified and tracked the slowly changing dimensions from heterogeneous sources and determined the hierarchies in dimensions.
- Implemented audit process to ensure Data warehouse is matching with the source systems in all reporting perspectives.
- Extensively used Stored Procedures, Functions and Packages using PL/SQL.
- Created maestro schedules/jobs for automation of ETL load process.
- Identified performance issues in existing sources, targets and mappings by analyzing the data flow, evaluating transformations and tuned accordingly for better performance.
- Written UNIX shell Scripts for getting data from systems to Data Warehousing system
- Involved in developing test data/cases to verify accuracy and completeness of ETL process.
- Involved in Unit testing, User Acceptance testing to check whether the data loads into target are accurate, which was extracted from different source systems according to the user requirements.
- Actively involved in the production support and transferred knowledge to the other team members.
- Co-ordinate between different teams across circle and organization to resolve release related issues.
Environment: Informatica Power Center 9.6, Power Exchange, Power Analyzer, Erwin, Oracle 11g, SQL, PL/SQL, SQL Server 2014, Cognos, Windows XP, MS Access, UNIX Shell Scripting, Maestro, SQL*Loader, TOAD 8.6.1.0.