Big Data - Hadoop Developer/ Etl Lead Developer & Data Architect Resume
KS
PROFESSIONAL SUMMARY:
- Over 10+ years in ETL development and with Data Analyst (Data Warehouse Implementation/development) for Retail, Health care and Banking.
- 2 years of work experience as Hadoop Developer with good knowledge of Hadoop framework, Hadoop distributed file system and parallel processing implementation.
- Experience in Hadoop Ecosystems HDFS, Map Reduce, Hive, Pig, HBase, Sqoop, AWS
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Hands on experience in Import/Export of data using Hadoop Data Management tool SQOOP.
- In depth understanding of Data Structures and Algorithms.
- Strong experience in writing Map Reduce programs for Data Analysis. Hands on experience in writing custom practitioners for Map Reduce.
- Performed data analysis using Hive and Pig.
- Having complete SDLC experience and deployed critical processes within timelines.
- Very good experience in Production support to handle more than 3 projects in parallel.
- Knowledge about Software Development Lifecycle (SDLC), Agile, Application Maintenance Change Process (AMCP).
- Expertise on Mapping Designer, Workflow Manager, Repository Manager and Workflow Monitor. Experienced in overall Data Warehouse, Database, ETL and performance tuning.
- Involved in preparing ETL mapping specification documents and Transformation rules for the mapping.
- Implemented various ETL solutions as per the business requirement using Informatica 9.x /8.x
- Extensively worked on Informatica Power Center Transformations such as Source Qualifier, Lookup, Filter, Expression, Router, Normalizer, Joiner, Update Strategy, Rank, Aggregator, Stored Procedure, Sorter, Sequence Generator etc...
- Experienced in defining data quality, data modeling, data warehouse, Star/Snowflake Schema Designs with requirements analysis - definition, database design, testing and implementation, and Quality process.
- In-depth knowledge and understanding of dimensional modeling (Star schema and Snowflake schema, SCD types- 1, 2, 3), and data modeling (star schema) at logical and physical level.
- Involved in Performance tuning of the informatica mappings, sessions and SQL’s.
- Expertise in integration of various data source definitions like Netezza, SQL Server, Oracle, Flat Files, Excel, SAP, and XML.
- Experienced with Metadata Manager to generate Lineage and load MM resources .
- Extensively worked with SQL loader for bulk loading into the Netezza database.
- Experience in complete life- cycle of test case design, test plans, test execution, defect management.
- Experience in UNIX Shell and Perl Scripting.
- Excellent communication and interpersonal skills, ability to learn quickly, good analytical reasoning and high adaptability to new technologies and tools.
- Team player with good interpersonal and problem solving skills, ability to work in team and work independently.
TECHNICAL SKILLS:
ETL Tools: Informatica Power Center 10.1/9.6.1/9.1/8.6.1 , Power Exchange and Power Connect
Databases: Netezza, Oracle 11g/10g/9i, SQL Server 2012/2008, and Teradata.
BI Reporting Tools: Oracle Business Intelligence (OBIEE 11.1.1.5/11. X, OBIEE 10.1.3.4.1/10. x), Dashboards, BI Publisher, BI Administration.
Methodologies: Ralph Kimball dimensional Modeling, Data warehousing
Bigdata, Hadoop, AWS, Azure: Hadoop, Hive, Pig, Spark, Flume, Sqoop, Hbase, Kafka, Redshift, Azure PDW
Cloud Computing: AWS, S3, EC2, VPC, EBS, Snowball, Load balancer, Auto Scaling, RDS Redshift, SQS, SQN, Lambda
Languages: SQL, PL/SQL, HTML, DHTML, XML, UNIX Shell Script, Power shell.
Data Modeling Tools: MS Office Suite, MS project, MS Visio, Model Right (Data Modeler Tool), Erwin Data Modeler 4.1, UML
Web: Oracle Apex, Microsoft Front-page, HTML, DHTML and XML.
Operating Systems: Linux (RHEL), Windows 8/7/NT, UNIX
Multimedia Tools: Adobe Photoshop and Dreamweaver.
PROFESSIONAL EXPERIENCE:
Confidential, KS
Big Data - Hadoop developer/ ETL Lead Developer & Data Architect
Responsibilities:
- Responsible for Technical Lead & Business Analyst and Hadoop developer.
- Evaluate business requirements and prepare detailed specifications that follow project guidelines required to develop written programs.
- Responsible for building scalable distributed data solutions using Hadoop.
- Analyze large amounts of data sets to determine optimal way to aggregate and report on it.
- Develop simple to complex MapReduce Jobs using Hive to cleanse and load downstream data’s
- Handle importing of data from various data sources, perform transformations using Hive, MapReduce, load data into HDFS and extract the data from MySQL into HDFS using Sqoop.
- Export the analyzed data from hive tables to SQL databases using Sqoop for visualization and to generate reports for the BI team. • Extensively used Hive for data cleansing.
- Create partitioned tables in Hive and Manage and review Hadoop log files.
- Involved in creating Hive tables, loading with data and writing Hive queries, which will run internally in MapReduce way.
- Use Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Used Unix bash scripts to validate the files from Unix to HDFS file systems.
- Load and transform large sets of structured, semi structured and unstructured data and Manage data coming from different sources.
- Worked on Informatica Power Center tools- Designer, Repository Manager, Workflow Manager and Workflow Monitor.
- Parsed high-level design specification to simple ETL coding and mapping standards.
- Designed and customized data models for Data warehouse supporting data from multiple sources on real time
- Involved in building the ETL architecture and Source to Target mapping to load data into Data warehouse.
- Created mapping documents to outline data flow from sources to targets.
- Involved in Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business process, dimensions and measured facts.
- Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse.
- Maintained stored definitions, transformation rules and targets definitions using Informatica repository Manager.
- Used various transformations like Filter, Expression, Sequence Generator, Update Strategy, Joiner, Stored Procedure, and Union to develop robust mappings in the Informatica Designer.
- Developed mapping parameters and variables to support SQL override.
- Created mapplets to use them in different mappings.
- Developed mappings to load into staging tables and then to Dimensions and Facts.
- Used existing ETL standards to develop these mappings.
- Worked on different tasks in Workflows like sessions, events raise, event wait, decision, e-mail, command, worklets, Assignment, Timer and scheduling of the workflow.
- Created sessions, configured workflows to extract data from various sources, transformed data, and loading into data warehouse.
- Used Type 1 SCD and Type 2 SCD mappings to update slowly Changing Dimension Tables.
- Extensively used SQL* loader to load data from flat files to the database tables in Oracle.
- Modified existing mappings for enhancements of new business requirements.
- Used Debugger to test the mappings and fixed the bugs.
- Wrote UNIX shell Scripts & PMCMD commands for FTP of files from remote server and backup of repository and folder.
- Involved in Performance tuning at source, target, mappings, sessions, and system levels.
- Prepared migration document to move the mappings from development to testing and then to production repositories.
Environment: Informatica Power Center 10.1, Netezza, Oracle 11gr1, AIX 6.1/ Linux RHEL 7.x, SQL SERVER 2016/2014, SQL Navigator, SQL, PL/SQL, SAP, Autosys.
Confidential, KS
Informatica ETL Lead Developer/ Data analyst & Architect
Responsibilities:
- Involved in the Technical business analysis of the user requirements and identifying the sources.
- Created technical specification documents based on the requirements by using S2T Documents.
- Involved in the preparation of High level design documents and Low level design documents.
- Involved in Design, analysis, Implementation, Testing and support of ETL processes for Stage, ODS and Mart.
- Prepared ETL standards, naming conventions and wrote ETL flow documentation for Stage, ODS and Mart.
- Followed Ralph Kimball approach (Bottom Up Data Warehouse Methodology in which individual data marts like Shipment Data Mart, Job order Cost Mart, Net Contribution Mart, Detention & Demurrage Mart are providing the views into organizational data and later combined into Management Information System (MIS)).
- Prepared Level 2 Update plan to assign work to team members. This plan is very helpful to know the status of each task.
- Designed and developed Informatica Mappings and Sessions based on business user requirements and business rules to load data from source flat files and oracle tables to target tables.
- Worked on various kinds of transformations like Expression, Aggregator, Stored Procedure, Lookup, Filter, Joiner, Rank, Router and Update Strategy.
- Developed reusable Mapplets and Transformations.
- Used debugger to debug mappings to gain troubleshooting information about data and error conditions.
- Involved in monitoring the workflows and in optimizing the load times.
- Used Change Data Capture (CDC) to simplify ETL in data warehouse applications.
- Involved in writing procedures, functions in PL/SQL.
- Worked with SQL*Loader tool to load the bulk data into Database.
- Prepared UNIX Shell Scripts and these shell scripts will be scheduled in AUTOSYS for automatic execution at the specific timings.
- Rational Clear case is used to Controlling versions of all files & Folders (Check-out, Check-in)
- Prepared test Scenarios and Test cases in HP Quality Center and involved in unit testing of mappings, system testing and user acceptance testing.
- Defect Tracking and reports are done by Rational Clear Quest
Environment: Informatica Power Center 9.6, Flat files, Oracle 11g, SQL Server 2014/12, Windows 7/NT, UNIX/Linux, Autosys and Netezza.
Confidential
Senior ETL Developer & Data Analyst
Responsibilities:
- Data analysis for the complete project life cycle and development.
- Interacted with product owners & DBA teams to design the project for ETL process.
- Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Informatica Power Center
- Experience in integration of heterogeneous data sources like Oracle, DB2, SQL Server and Flat Files (Fixed & delimited) into Staging Area.
- Wrote SQL-Overrides and used filter conditions in source qualifier thereby improving the performance of the mapping.
- Designed and developed mappings using Source Qualifier, Expression, Lookup, Router, Aggregator, Filter, Sequence Generator, Stored Procedure, Update Strategy, joiner and Rank transformations.
- Managed the Metadata associated with the ETL processes used to populate the Data Warehouse.
- Used debugger to validate the mappings and gain troubleshooting information about data and error conditions.
- Extensively used UNIX Scripting, Scheduled PMCMD and PMREP to interact with Informatica Server from command mode.
- Implemented performance tuning techniques by identifying and resolving the bottlenecks in source, target, transformations, mappings and sessions to improve performance.
- Troubles shoot the Productions failure and provide root cause analysis. Worked on emergency code fixes to Production.
Environment: Informatica Power Center 9.6, Netezza, Oracle 11gr1, AIX 6.1/ Linux RHEL 7.x, SQL SERVER 2016/2014, SQL Navigator, SQL, PL/SQL, SAP, Autosys.
Confidential
ETL/Informatica Developer
Responsibilities:
- Documents all technical and system specifications documents for all ETL processes and perform unit tests on all processes and prepare required programs and scripts.
- Provide technical knowledge of Extract/Transform/Load (ETL) solutions for Business Intelligence projects
- Work closely with project Business Analyst, Data Modeler and BI Lead to ensure that the end to end designs meet the business and data requirements
- Develop the task plan for ETL developers for a specific project
- Ensure the ETL code delivered is running, conforms to specifications and design guidelines
- Proficient with a database SQL language for user defined database extract or update statements. SQL and database skills are the two great building blocks for ETL work
- Understands the range of options and best practices for common ETL design
- techniques such as change data capture, key generation and optimization, developing the mappings, sessions and workflows with by considering all dependencies and maintaining all project standard.
- Performed Data Profiling, Data Quality and Used Erwin for data modeling.
- Extensively involved in Data Extraction, Transformation and Loading (ETL process) from Source to target systems using Informatica Power Center
- Owned the assigned reports, worked on them and updated the Report Development Scheduler for status on each report.
- Responsible for determining the bottlenecks and fixing the bottlenecks with performance tuning.
- Analyzed business process workflows and developed ETL procedures to move data from various source systems to target systems
Environment: Informatica Power Center 9.0.1/8.6.1 , Netezza, Oracle 9i, Toad, Control-M, UNIX, Flat Files.
Confidential
ETL/ Informatica Developer
Responsibilities:
- Analyzed the requirements and prepared the initial level TSD.
- Designing and Building the structure of the maps and workflows.
- Developed the ETL processes using Informatica tool to load data from text file into the Staging area.
- Developed mappings/sessions using Informatica 8.6.1 for data loading.
- Developed Bteq scripts to load data from staging to Landing zone.
- Created various Transformations like SQ, aggregator, Joiner, Expression, Lookup, Sorter.
- Loaded data through Bteq scripts from CSA to Edward area.
- Extensively used Toad utility for executing SQL scripts and worked on SQL for enhancing the performance of the conversion mappings
- Created Test cases for the mappings developed and then created integration Testing Document.
- Prepared the error handling document to maintain the error handling process.
- Automated the Informatica jobs using UNIX shell scripting
- Involved in analysis and data validations
Environment: Informatica Power Center 8.6.1, Teradata, VA Secure FX SSB, UNIX, Flat Files.
Confidential
ETL/ Informatica Developer
Responsibilities:
- Providing solutions after understanding the requirements
- Designing and Building the structure of the maps and workflows.
- Developed the ETL processes using Informatica tool to load data from Teradata into the target flat file.
- Extensively used SQL scripts/queries for data verification at the backend.
- Executed SQL queries, stored procedures and performed data validation as a part of backend testing.
- Used SQL to test various reports and ETL Jobs load in development, testing and production
- Developed mappings/sessions using Informatica 8.6.1 for data loading. Developed mapping to load the data in Flat File.
- Created various Transformations like SQ, aggregator, Joiner, Expression, Lookup, Sorter.
- Generated the extracts from the SSB DW and the reports would be FTP to the business Location as pipe separated flat files.
- Involved in mapping level Testing and Prepared UTC for every module.
Environment: Informatica Power Center 8.6.1, Teradata, VA Secure FX SSB, UNIX, Flat Files.
Confidential
ETL/ Informatica Developer
Responsibilities:
- Responsible for designing and preparing LLDs and Technical Design Documents for client, offshore team which includes all transformation level detail.
- Developing the mappings, sessions and workflows with by considering all dependencies and maintaining all project standard.
- Performed Data Profiling, Data Quality and Used Erwin for data modeling.
- Extensively involved in Data Extraction, Transformation and Loading (ETL process) from Source to target systems using Informatica Power Center
- Owned the assigned reports, worked on them and updated the Report Development Scheduler for status on each report.
- Responsible for determining the bottlenecks and fixing the bottlenecks with performance tuning.
- Analyzed business process workflows and developed ETL procedures to move data from various source systems to target systems
Environment: Informatica Power Center 9.0.1/8.6.1 , Netezza, Oracle 9i, Toad, Control-M, UNIX, Flat Files.
Confidential
Software Engineer
Responsibilities:
- Extensively involved in requirements gathering, writing ETL Specs and preparing design documents
- Designed and developed Informatica mappings for data sharing between interfaces utilizing SCD type 2 and CDC methodologies
- Fixed various performance bottle-necks involving huge data sets by utilizing Informatica's partitioning, pushdown optimizations and SQL overrides
- Worked on parameters, variables, procedures, scheduling and pre/post session shell scripts
- Built sample Microstrategy reports to validate BI requirements and loaded data
- Designed migration plan and cutover documents; created and monitored Informatica batches
- Worked on requirement traceability matrix, provided support for integration and user acceptance testing
Environment: Informatica Power Center 8.1, Oracle 8, SQL Developer, UNIX, HP Quality Center.