Spark Scala & Etl Developer Resume
Tampa, FL
SUMMARY:
- Skilled Information Technology Professional with broad - based experience in the development and implementation of enterprise technology solutions. Solid expertise in areas of strategic planning, consulting, project management and team coordination.
- 9+ years of IT experience in Data Warehousing and Business intelligence with emphasis on Business Requirements Analysis, Application Design, Development, testing, implementation and maintenance of client/server Data Warehouse and Data Mart systems.
- Expertise in development and design of ETL methodology for supporting data transformation and processing, in a corporate wide ETL Solution using Informatica PowerCenter 9.x/8.x/7.x ( Repository manager, Source Analyzer, Target Designer, Transformation Developer, Mapplet Designer, Mapping Designer, Workflow manager, Workflow monitor) and AutoSys .
- Experience using Data Modeling Tools Like ERWIN 7.2, Visio. Expertise in designing and developing Data marts and Data Warehouses using m ulti-dimensional Models such as SnowFlakes and Star schema while implementing decision support systems. Experience working with FACT and Dimension tables.
- Proper understanding/knowledge of Hadoop Architecture and various components such as HDFS, Hive, Pig, MapReduce, Flume, Kafka, Oozie . Strong knowledge of Hive, Pig, Spark Scala/Python , Data Frames and Data streaming.
- Experience of importing and exporting terabytes of data using Sqoop from Relational Database to HDFS.
- Knowledge of job workflow scheduling tools like Oozie and Zookeeper for Big Data projects.
- Skillful experience in functional programing using Scala and Python.
- Expertise in creating Business Requirements, Design, Technical, Pre & Post production, Pre & Post installation and upgrading documents for ETL and Big Data Applications.
- Experience in Unit, System and Integration Testing of different processes in Data warehouse.
- Strong database experience using Oracle 11g/10g/9i/8i/7.x, MS SQL Server 2008/2005, DB2 , MS Access , SQL and PL/SQL, SQL *Plus, SQL *Loader and Developer 2000.
- Hands on Working knowledge of Oracle and PL/SQL . Writing Stored Procedures , Functions and Cursors.
- Extensively worked with various Passive transformations like Expression and Sequence generator, unconnected, connected Lookup transformations, Joiner transformation using Normal Join, Master outer join, Detail Outer Join and Full Outer Join . Also used extensively transformation like Sorter and Aggregator transformations in combination for performance tuning of aggregations used in Informatica mappings.
- Performed transformation tunings at Informatica transformation level and at the database level to achieve optimization.
- Implemented slowly changing dimension Type 1 and Type 2 for changed data capture.
- Performed error handling using session logs.
- Lead development team comprising of Offshore and Onshore associates.
TECHNICAL PROFICIENCY:
Database ... Operating Systems ... PL/SQL EDITORS: Oracle 11g/10g/9i/8/7 HP-UX SQL Navigator, MS SQL Server 2008 Sun Solaris Toad, Microsoft Access UNIX/LINUX, Teradata Windows 7/NT/2000/XP, MS-DOS
ETL Tools ... Big Data Ecosystem: Informática 8.x/9.x Spark 1.5 with Scala, Informática MDM 9.x Spark 1.5 with Python, Hadoop 2.0 Hive
Core Languages: Hadoop 2.0 Pig, Java HiveQl, Scala HDFS
Application SERVER SQL Tools: Oracle AS 4.0.8, SQL Loader
Scripting Languages: SQL plus
Graphical User Interface: VBScript, HiveQL Oracle Developer Suite 10g JavaScript, PL/SQL
Modeling Tools: Erwin 7.2, MS Visio, Dimensional Data Modeling Using Star & Snowflake Schema
Scheduler Control... Bug Tracking Tools... Reporting Tools: TWS Maestro 8.5.1 HP Quality Center Cognos 10.1.1, AutoSys Problem Tracker, Control M Jira
PROFESSIONAL EXPERIENCE:
Spark Scala & ETL Developer
Confidential, Tampa, FL
Responsibilities:
- Expertise in designing and deployment of Hadoop Cluster and different analytical tools including Pig, Hive, HBase, Sqoop, Kafka Spark with Cloudera distribution.
- Working on a live 20 nodes Hadoop cluster running on CDH4.4.
- Working with highly unstructured and semi structured data of 40 TB in size (120 TB with replication factor of 3)
- Managing external tables in Hive for optimized performance.
- Very good understanding of Partitions and Bucketing in Hive
- Developed Spark scripts using Scala as per the requirement using Spark 1.5 framework.
- Using Spark API’s over Cloudera Hadoop Yarn to perform analytics on data used for Hive stored at HDFS.
- Developed Scala Scripts, UDFs using both Data frames/SQL and RDD in Spark for data aggregation, queries and writing data back onto HDFS.
- Exploring Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark data frames, pair RDDs, double RDDs and Yarn.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Experience in deploying data from various sources into HDFS and facilitating report building on top of it as per the business requirement.
- Performed transformations, cleaning, standardization and filtering of data using Spark Scala/Python and loaded the final required data to HDFS.
- Load the data into Spark immutable RDDs and perform in-memory computation to generate quick and better response.
- Analyzing how the data been processed by Informatica can be effectively processed using Spark and its API’s.
Technologies: Cloudera 4.4, Spark, Hive, Pig, Scala, Python, Sqoop, Oozie, Eclipse, Informatica 951, Oracle 11g, Excel files, Flat Files, Autosys, HP Quality Center.
Sr. Informatica ETL Developer
Confidential, Hartford, CT
Responsibilities :
- Involved in Analysis, Design, Development, test and implementation of scripts, Informatica transformations and workflows for extracting the data from the multiple systems and then importing the given feeds.
- Implementing work scheduling and work tracking to ensure the adherence to the project plan.
- Translating high level requirements documents to design documents.
- Involved in creating tech specifications for the ETL process based on mapping document.
- Designed and developed complex ETL process as per the tech specification.
- Created reusable components like Transformations, Mapplets, Worklets, etc.
- Extracted data from different sources like SQL Server, Oracle, Excel and Flat files.
- Developed complex mappings using transformations such as the Source qualifier, Aggregator, Expression, Sql Transformation, Static Lookup, Dynamic Lookup, Filter, Router, Rank, Union, Normalize, Sequence Generator, Update Strategy and Joiner.
- Performed tuning of Informatica sessions by implementing database partitioning, increasing block size, data cache size, sequence buffer length, target based commit interval.
- Made substantial contributions in simplifying the development and maintenance of ETL by creating shortcuts, re-usable Mapplets and Transformation objects.
- Extensively used Informatica to extract, transform data from different source systems and load the data into the Target database.
- Developed Re-usable transformations and Mapplets to use them for data load to data warehouse and database.
- Scheduled and ran Extraction, loading processes using Autosys.
- Developed complex Unix Scripts to schedule the Informatica jobs, for file handling and also for FTP processes.
- Developed unit test case scenarios for thoroughly testing ETL processes and shared them with testing team.
- Developed Informatica workflows and sessions associated with the mappings.
- Documented standards for Informatica code development and prepared a handbook of standards.
- Scheduled walkthroughs of design documents, specifications, code, test plans etc. as appropriate throughout project lifecycle.
- Used Workflow Monitor to monitor the progress of workflow.
- Scheduled walkthroughs of design documents, specifications, code, test plans etc. as appropriate throughout project lifecycle.
- Tuning Informatica Mappings and Sessions for optimum performance.
- Coordinated in code promotion to SIT , UAT and finally to Prod.
- Planned and coordinated in Informatica upgrade from 9.1 to 951.
Technologies: Informatica 9.1/9.5.1, SQL Server 2008, Teradata 13.10/13.11 , Oracle 11g, Excel files, SharePoint, Flat Files, TWS Maestro 8.5.1, HP Quality Center.
ETL Module Lead
Confidential, Hartford, CT
Responsibilities:
- Clearly understand source systems by going through the functional specification documents and one-to-one interaction with the business team.
- Developed data mapping documents that contain transformation rules to implement the business logic.
- Developed various mappings with the collection of all sources, targets, and transformations using designer. Used version mapping to update the slowly changing dimensions to keep full history to the target database.
- Project planning, work scheduling and work tracking to ensure the adherence to the project plan.
- Translating high level requirements documents to design documents.
- Used Dec base64(MD5()) to handle the Slowly Changing Dimensions (SCD) mappings and data from the source and load into Data Mart by following type II process.
- Made substantial contributions in simplifying the development and maintenance of ETL by creating shortcuts, re-usable Mapplets and Transformation objects.
- Used update strategy to effectively migrate slowly changing data from source system to target Database.
- Used transformations like aggregator, filter, router, stored procedure, sequence generator, lookup, expression and update strategy to meet business logic in the mappings.
- Created pre and post session Stored procedures to drop, recreate the indexes and keys of source and target tables.
- Understand the components of a data quality plan (Data Profiling). Make informed choices between sources data cleansing and target data cleansing.
- Designed data transformation to staging, fact and dimension tables in the warehouse.
- Involved in designed tables and implementing Informatica mappings and workflows for extraction of the data from the source systems to populate Staging Area, Dimension and Fact Tables.
- Involved in designed tables and implementing Informatica mappings and workflows for extraction of the data from the source systems to populate Staging Area, Dimension and Fact Tables.
- Extensively worked with various lookup caches like Static Cache, Dynamic Cache and Persistent Cache.
- Worked with PMCMD to interact with Informatica Server from command mode and execute the batch scripts.
- Scheduled and ran Extraction, loading processes using Autosys.
- Developed unit test case scenarios for thoroughly testing ETL processes and shared them with testing team.
- Developed the strategies like CDC(Change Data Capture), Batch processing, Auditing, Recovery Strategy etc.
- Tested the data and data integrity among various sources and targets. Used debugger by making use of breakpoints to monitor data movement, identified and fixed the bugs.
- Used powercenter workflow manager for session management, database connection management and scheduling of jobs to be run.
- Wrote Pl/Sql in Oracle, Sql server 2008 for data Audit and maintained them.
- Developed Informatica workflows and sessions associated with the mappings.
- Scheduled walkthroughs of design documents, specifications, code, test plans etc. as appropriate throughout project lifecycle.
- Used Workflow Monitor to monitor the progress of workflow.
- Scheduled and ran Extraction, Loading processes using autosys
- Tuning Informatica Mappings and Sessions for optimum performance.
- Code promotion to SIT, UAT and Prod.
Technologies: Informatica 8.6.1/9.1, UNIX, DB2, Flat Files, Oracle 11g, Excel Files, Altova XMLSpy, XML Files, Toad for Db2, SQL Server 2008, AutoSys, HP Quality Center.
Sr. Informatica Consultant
Confidential, Mechanicsburg, PA
Responsibilities:
- Analyzed business requirements and worked closely with the various application teams and business teams to develop ETL procedures that are consistent across all applications and systems.
- Involved in Analysis, Design, Development, test and implementation of Informatica transformations and workflows for extracting the data from the multiple systems.
- Interacted with the Business Analysts in collecting the technical and business requirements for the project.
- Experience in translating high level requirements documents to design documents.
- Extracted data from different sources like SQL Server 2008, Oracle, Db2 and Flat files to load into ODS.
- Worked on Power Center Designer tools like Source Analyzer, Warehouse Designer, Mapping Designer, Mapplet Designer and Transformations Developer.
- Extensively worked with various Active transformations like Filter, Sorter, Aggregator, Router and Joiner transformations. Passive transformations like Expression, Lookup and Sequence Generator.
- Created complex mappings using unconnected and Connected Lookup transformations.
- Worked with the Joiner transformation using Normal Join, Master outer join, Detail Outer Join and Full Outer Join. Implemented slowly changing dimension Type 1 and Type 2 for changed data capture.
- Used Sorter and Aggregator transformations in combination for performance tuning of aggregations used in the mappings. Implementing active transformation like filter as early as possible in the mapping.
- Worked with various Informatica Powercenter objects like Mappings, transformations, Mapplets, Workflows and Session Tasks.
- Created pre and post session Stored procedures to drop, recreate the indexes and keys of source and target tables.
- Created Transformations like Sequence generator, Lookup, joiner and Source qualifier transformations in Informatica Designer.
- Monitored workflows and session using Powercenter workflows monitor and used Informatica Scheduler for scheduling the workflows.
- Developed complex mappings to implement type 2 slowly changing dimensions using transformations such as the Source qualifier, Aggregator, Expression, Static Lookup, Dynamic Lookup, Filter, Router, Rank, Union, Normalize, Sequence Generator, Update Strategy and Joiner.
- Involved in designed tables and implementing Informatica mappings and workflows for extraction of the data from the source systems to populate Staging Area, Dimension and Fact Tables.
- Performed tuning of Informatica sessions by implementing database partitioning, increasing block size, data cache size, sequence buffer length, target based commit interval.
- Created Mapplet and used them in different Mappings.
- Developed Informatica workflows and sessions associated with the mappings.
- Performed error handling of sessions by using terse, normal, verbose initialization and verbose data tracing levels.
- Scheduled walkthroughs of design documents, specifications, code, test plans etc. as appropriate throughout project lifecycle.
- Used Workflow Monitor to monitor the progress of workflow.
- Involved in post production support once the code has moved to production.
Technologies: Informatica 8.6.1, UNIX, Linux, Oracle 10g, SQL Server 2005 and 2008, Flat Files, HP Quality Center.