Sr. Data Engineer Resume
Lincoln, NE
SUMMARY
- 12 Plus years of Information Technology experience with full software development life cycle.
- Experience in developing, deploying and supporting enterprise based application using Hadoop ecosystem components like HDFS, Yarn, Spark, Kafka, HBase, Pig, Impala, Sqoop, Oozie, Flume, Storm.
- Experience on working 40 nodes Hadoop cluster.
- Experience in design and development and implementation of data pipeline using Spark application.
- Experience of importing and exporting data from Relational Database to HDFS using spark, Sqoop.
- Experience in using various file format like CSV, XML, JSON, AVRO, PARQUET etc.
- Experience in writing MapReduce Programs.
- Experience in NoSQL DBs like HBase, Cassandra.
- Experience in loading data to HDFS using Flume.
- Experience in exporting data from Linux file system to HDFS using Sqoop.
- Developed Scala Scripts,UDFsusing bothData frames/SQLandRDDin Spark for data aggregation, queries and writing data back onto HDFS.
- Performedtransformations, cleaning, standardizationandfilteringof data usingSpark Scala/Pythonand loaded the final required data to HDFS.
- Experience in functional programing usingScala and Python.
- Experience in managing external tables inHivefor optimized performance.
- Excellent understanding ofPartitionsandBucketingin Hive.
- Experience in automation and building CICD pipelines by usingJenkins.
- Experience in Building, Deploying and Integrating with Maven.
- Knowledge of job workflow scheduling tools likeOozieandZookeeperfor Big Data projects.
- Experience in various AWS service like S2, EC2, Athena, EMR etc.
- Experienced to create to Conceptual, logical and physical model of relational Databases in Visio and Erwin.
- Excellent knowledge and experience in data warehouse development life cycle, dimensional modeling, repository management and administration.
- Excellent knowledge and experience to implement STAR and Snowflake schemas, and slowly changing dimensions in Datamart.
- Excellent knowledge and experience to load data in Conformed Dimensions and Fact table in Datamart.
- Extensive experience in Datastage Design, Mapping, Extraction, Migration and development of ETL components and integrating with external systems.
- Extensive experience in low level and high level design, mapping document for designing datastage jobs.
- Involved in the design of ETL architecture, code reviews and implemented best practices in Datastage jobs.
- Migrated projects across various versions and involved in testing of different projects.
- Extensively worked with writing the test script and test plan for testing of the project.
- Experience in source systems analysis and data extraction from various sources like Flat files, complex flat file, Oracle 8.i/9i/10g/11g, DB2, UDB 9.0,MS SQL Server 2005, Informix, Sybase, Teradata, MS Access.
- Extensively worked to loaded data into Teradata using Teradata utilities (BTEQ, FASTLOAD, FASTEXPORT, MULTILOAD, and TPUMP).
- Having extensive programming experience using SQL, PL/SQL (Store Procedures, Functions and Triggers), Oracle, SQL Loader and UNIX shell programming.
- Designed and Developed UNIX shell Scripts for file validation and scheduling Data Stage jobs.
- Proficient in using SVN version control.
- Providing 24/7 data warehouse production support in time pressured business environments.
- Excellent communication, client interaction, analytical and problem solving skills.
- Ability to learn and adapt quickly to the new technologies
TECHNICAL SKILLS
- IBM Datastage 11.7/11.3 9.1 8.5 8.1/7.1/7.5/ EE Parallel Extender
- Datastage Server Edition
- Oracle 12g/11g/10g/9i/8i/8.0/7.0
- Teradata
- SalesForce.com
- DB2 UDB
- MS SQL Server
- SSAS
- MS - Access
- Windows XP/7/10
- LINUX/UNIX/AIX
- Autosys
- Tivoli
- T-SQL
- PL/SQL
- UNIX Shell Scripting
- TOAD
- SSMS
- Erwin 4.0
- Visio
- HDFS
- Spark
- Hive
- Scala
- Pig
- Python
- Sqoop
- Oozie
- Intellij.
PROFESSIONAL EXPERIENCE
Confidential, Lincoln, NE
Sr. Data Engineer
Responsibilities:
- Interacted with users and gathered functional requirement for application.
- Prepared Mapping and technical design documents.
- Designed and Developed Audit, Balance and control framework for ETL.
- Performed analysis to convert existing ETL Process to Using Hadoop, Spark.
- Experience on working 20 nodes Hadoop cluster.
- Analyzed how the data been processed by ETL Datastage can be effectively processed using Spark and its API’s.
- Analyzed SQL scripts and stored procedures and design solution using Spark.
- DevelopedSparkscripts usingScalaas per the requirement usingSpark 2.11 framework.
- Performedtransformations, cleaning, standardizationandfilteringof data usingSpark Scala/Pythonand loaded the final required data to HDFS and Hive tables.
- Developed Scala Scripts,UDFsusing bothData frames/SQLandRDDin Spark for data aggregation, queries and writing data back onto HDFS.
- Developed Spark code andSpark-SQL for faster testing and processing of data.
- Developed Spark code to parse XML/ JSON files.
- Developed Spark code to import data from SQL Server to HDFS.
- Exploring Spark to improve the performance and optimization of the existing algorithms in Hadoop usingSpark context,Spark data frames,pair RDDs,double RDDsand Yarn.
- Experience in deploying data from various sources into HDFS and facilitating report building on top of it as per the business requirement.
- Managing external tables inHivefor optimized performance.
- Very good understanding ofPartitionsandBucketingin Hive.
- Worked to setup SFTP process.
- Developed UNIX shell scripts for file watcher, Archiving Purging files/Data sets, and SFTP files to or from remote server.
- Developed UNIX shell script to validate flat files.
- Wrote a complex SQL queries to extract data from database and perform the data analysis.
Environment: Infosphere Datastage 11.7, Quality stage 11.7, Oracle 11g, Teradata, SQL Server,DB2, MS Access, Rational Software, Perl, UNIX, MSOffice Suite, Visio, Toad, Autosys, Serena, Crontab, Spark, Hive, Scala, Python, HDFS, Pig, Intellij.
Confidential, Lansing, MI
Sr. Data Engineer
Responsibilities:
- Interacted with end users in finalizing the requirements and documented the Program Specifications for the ETL jobs.
- Prepared ETL low level design document.
- Prepared Data Mapping Documents.
- Designed and Developed Audit, Balance and control framework for ETL.
- Extensively worked with XML files processing.
- Prepared ETL Naming Standard document.
- Extensively Used XML Hierarchical stage to Parse, Compose and validate XML files.
- Designed and Developed Datastage Parallel Jobs, Sequence Jobs using IIS Suite 11.3.
- Designed And Developed parallel jobs using Transformer, Sequential File, Dataset, Sort, Join, Merge, Lookup, Change Apply, Change Capture, Remove duplicates, Funnel, Filter, Copy, Column Generator, Peek, Modify, Compare, Surrogate Key, Aggregator, Row Generator, Pivot, ODBC connector, Database connector, Head, Tail stages.
- Extensively used Datastage 11.3 Stages like Row Generator, Column Generator, and Peek for development and de-bugging purposes.
- Extensively worked with Join, Look up (Normal and Sparse) and Merge stages.
- Extensively worked with sequential file, dataset, file set and look up file set stages.
- Design and Develop Sequence job to execute and control parallel job using Job Activity, Routine Activity, Execute command Activity, User Variable activity, Nested condition activity, Email notification activity, Start loop, End Loop activity, unhandled exception activity.
- Design and developed Reusable ETL components like Shared container, Multi Instance Jobs (Schema files, Runtime column propagation etc...).
- Parameterized Datastage jobs for make reusable.
- Developed Datastage server routines using Datastage Basic Language as part of the development process.
- Used Datastage Director to monitor the Jobs, View the Job log to debug the error and Schedule the Jobs.
- Analyzed the performance of the jobs and enhance the performance using standard techniques.
- Worked to setup SFTP process.
- Developed UNIX shell scripts for file watcher, Archiving Purging files/Data sets, and SFTP files to or from remote server.
- Developed UNIX shell script to validate flat files.
- Developed UNIX shell script to execute the Datastage Jobs.
- Wrote a complex SQL queries to extract data from database and perform the data analysis.
- Extensive experience to Extract Data from Oracle, Teradata, DB2 and Sql Server databases.
- Developed stored procedure and call from Datastage jobs.
- Provided support as Datastage administrator to create project, Clear the logs, Unlock the Jobs.
- Prepared test cases and test plan for Unit testing.
- Performed Unit Testing and Integration testing in testing environment.
- Migrated ETL code from Dev environment to higher environment.
- Scheduled Datastage Jobs from Autosys.
- Provided 24x7 Production support.
- Experience on working 20 nodes Hadoop cluster.
- DevelopedSparkscripts usingScalaas per the requirement usingSpark 2.11 framework.
- Developed Scala Scripts,UDFsusing bothData frames/SQLandRDDin Spark for data aggregation, queries and writing data back onto HDFS.
- Exploring Spark to improve the performance and optimization of the existing algorithms in Hadoop usingSpark context,Spark data frames,pair RDDs,double RDDsand Yarn.
- Developed Spark code andSpark-SQL for faster testing and processing of data.
- Developed Spark code to parse XML/ JSON files.
- Experience in importing data from various sources into HDFS and facilitating report building on top of it as per the business requirement.
- Performedtransformations, cleaning, standardizationandfilteringof data usingSpark Scala/Pythonand loaded the final required data to HDFS.
- Analyzing how the data been processed by Datastage can be effectively processed using Spark and its API’s.
- Managing external tables inHivefor optimized performance.
- Very good understanding ofPartitionsandBucketingin Hive.
Environment: Infosphere Datastage 11.7, Quality stage 11.7, Oracle 11g, Teradata, SQL Server,DB2, MS Access, Rational Software, Perl, UNIX, MSOffice Suite, Visio, Toad, Autosys, Serena, Crontab, Spark, Hive, Scala, HDFS, Pig, Intellij.
Confidential, Parsippany, NJ
Lead ETL- Datastage Developer
Responsibilities:
- Interacted with end users in finalizing the requirements and documented the Program Specifications for the ETL jobs
- Prepared Data Mapping Documents and Design the ETL jobs based on the DMD with required Tables in the Dev Environment
- Developed Data Stage Parallel Jobs where in using required stages, data from different sources formatted, Cleansing, summarized, aggregated and transform into SAP.
- Designed several parallel jobs using Sequential File, Dataset, Join, Merge, Lookup, Change Apply, Change Capture, Remove duplicates, Funnel, Filter, Copy, Column Generator, Peek, Modify, Compare, Surrogate Key, Aggregator, Transformer, Row Generator stages
- Extensively used Datastage 9.1 Stages like Row Generator, Column Generator, Head, and Peek for development and de-bugging purposes.
- Created multiple configuration files and defined logical nodes, scratch disk, Resource scratch disk and pools.
- Extensively worked with Join, Look up (Normal and Sparse) and Merge stages.
- Extensively worked with sequential file, dataset, file set and look up file set stages.
- Used the DataStage Director and its run-time engine to schedule running the solution, testing and debugging its components, and monitoring the resulting executable versions on ad hoc or scheduled basis.
- Parameterized Datastage jobs and also created multi-instance jobs.
- Analyzed the performance of the jobs and project and enhance the performance using standard techniques
- Created Master Job Sequencers to control sequence of Jobs using job controls.
- Extensively worked with Job sequences using Job Activity, Email Notification, Sequencer, Wait for File activities to control and execute the Data stage Parallel jobs.
- Extensively worked to promote code in Different environment through Serena version control tool.
- Migrated jobs from development to QA to Production environments.
- Defined UNIX -shell scripts for file watcher and file archiving process.
- Developed complex queries using different data providers in the same report.
- Worked with supervisor module in creating users and users groups for different areas and setting privileges to them.
- Extensively developed Data stage server routines using Data stage Basic Language as part of the development process.
- Performed Unit Testing and Integration testing in testing environment.
Environment: Infosphere Datastage 9.1, Quality stage 9.1, Oracle 10g, DB2 UDB, Teradata, MS Access, Rational Software, Perl, SAP R3,UNIX HPUX 11.23, MSOffice Suite, Visio, Toad, Autosys, Serena.
Confidential, Agoura Hills, CA
Sr. ETL- Datastage Developer
Responsibilities:
- Interacted with end users in finalizing the requirements and documented the Program Specifications for the ETL jobs
- Created logical and physical Dimensional data models using Erwin
- Prepared Data Mapping Documents and Design the ETL jobs based on the DMD with required Tables in the Dev Environment
- Provide the staging solutions for Data Validation and Cleansing with Quality Stage and Datastage ETL jobs.
- Designed Quality Stage Jobs in order to perform data Cleansing using Investigate Stage, Standardize Stage, Match Frequency, Survive Stage, Reference match Stage.
- Developed Data Stage Parallel Jobs where in using required stages, data from different sources formatted, Cleansing, summarized, aggregated and transform into data warehouse
- Designed several parallel jobs using Sequential File, Dataset, Join, Merge, Lookup, Change Apply, Change Capture, Remove duplicates, Funnel, Filter, Copy, Column Generator, Peek, Modify, Compare, Oracle Enterprise, Surrogate Key, Aggregator, Transformer, Decode, Row Generator stages
- Extensively used datastage 8.5 Stages like Row Generator, Column Generator, Head, and Peek for development and de-bugging purposes.
- Extensively used the CDC (Change Data Capture) stage to implement the slowly changing Dimensional.
- Created multiple configuration files and defined logical nodes, scratch disk, Resource scratch disk and pools.
- Extensively worked with Join, Look up (Normal and Sparse) and Merge stages.
- Extensively worked with sequential file, dataset, file set and look up file set stages.
- Used the DataStage Director and its run-time engine to schedule running the solution, testing and debugging its components, and monitoring the resulting executable versions on ad hoc or scheduled basis.
- Parameterized Datastage jobs and also created multi-instance jobs.
- Analyzed the performance of the jobs and project and enhance the performance using standard techniques
- Created Master Job Sequencers to control sequence of Jobs using job controls.
- Extensively worked with Job sequences using Job Activity, Email Notification, Sequencer, Wait for File activities to control and execute the Data stage Parallel jobs
- Extensive experience in SVN.
- Develop a Linux script to integrate IIS and SVN to Commit ETL code in SVN repository.
- Extensively worked to promote code in Different environment through SVN version control tool.
- Created PL/SQL Procedures, Functions and triggers on Database tables before loading to check some validations.
- Migrated jobs from development to QA to Production environments.
- Defined UNIX -shell scripts for file watcher and file archiving process.
- Developed complex queries using different data providers in the same report.
- Published reports to users to their e-mail addresses by using Broadcast Agent Publisher.
- Worked with supervisor module in creating users and users groups for different areas and setting privileges to them.
- Extensively developed Data stage server routines using Data stage Basic Language as part of the development process.
- Performed Unit Testing and Integration Testing in testing environment.
Environment: Websphere Datastage 8.5, Datastage 7.5, Profile Stage, WebSphere Quality stage, Oracle 10g, DB2 UDB, Teradata, MS Access, CA 7, Rational Software, Perl, SAP R3, Main frame, UNIX HPUX 11.23, MSOffice Suite, Erwin 4.1, Visio, Cognos 8.x, Toad, Autosys, SVN.
Confidential, Memphis, TN
Datastage Developer
Responsibilities:
- Interacted with end users in finalizing the requirements and documented the Program Specifications for the ETL jobs
- Created logical and physical Dimensional data models using Erwin
- Prepared Data Mapping Documents and Design the ETL jobs based on the DMD with required Tables in the Dev Environment
- Provide the staging solutions for Data Validation and Cleansing with Quality Stage and Datastage ETL jobs.
- Designed Quality Stage Jobs in order to perform data Cleansing using Investigatigate Stage, Standardize Stage, Match Frequency, Survive Stage, Reference match Stage.
- Developed Data Stage Parallel Jobs where in using required stages, data from different sources formatted, Cleansing, summarized, aggregated and transform into data warehouse
- Designed several parallel jobs using Sequential File, Dataset, Join, Merge, Lookup, Change Apply, Change Capture, Remove duplicates, Funnel, Filter, Copy, Column Generator, Peek, Modify, Compare, Oracle Enterprise, Surrogate Key, Aggregator, Transformer, Decode, Row Generator stages
- Extensively used datastage 8.1 Stages like Row Generator, Column Generator, Head, and Peek for development and de-bugging purposes.
- Extensively used the CDC (Change Data Capture) stage to implement the slowly changing Dimensional.
- Created multiple configuration files and defined logical nodes, scratch disk, Resource scratch disk and pools.
- Extensively worked with Join, Look up (Normal and Sparse) and Merge stages.
- Extensively worked with sequential file, dataset, file set and look up file set stages.
- Used the DataStage Director and its run-time engine to schedule running the solution, testing and debugging its components, and monitoring the resulting executable versions on ad hoc or scheduled basis.
- Parameterized Datastage jobs and also created multi-instance jobs.
- Analyzed the performance of the jobs and project and enhance the performance using standard techniques
- Created Master Job Sequencers to control sequence of Jobs using job controls.
- Extensively worked with Job sequences using Job Activity, Email Notification, Sequencer, Wait for File activities to control and execute the Data stage Parallel jobs.
- Created PL/SQL Procedures, Functions and triggers on Database tables before loading to check some validations.
- Migrated jobs from development to QA to Production environments.
- Defined UNIX -shell scripts for file watcher and file archiving process.
- Developed complex queries using different data providers in the same report.
- Published reports to users to their e-mail addresses by using Broadcast Agent Publisher.
- Worked with supervisor module in creating users and users groups for different areas and setting privileges to them.
- Extensively developed Data stage server routines using Data stage Basic Language as part of the development process.
- Performed Unit Testing and Integration Testing in testing environment.
Environment: Websphere Datastage 8.1, Datastage 7.5, Profile Stage, WebSphere Quality stage, Oracle 10g, DB2 UDB, Teradata, MS Access, CA 7, Rational Software, Perl, SAP R3, Main frame, UNIX HPUX 11.23, MSOffice Suite, Erwin 4.1, Visio, Cognos 8.x, Toad, Control Center.