Etl Architect- Datastage Resume
SUMMARY
- 10 years of IT industry experience with consistently increasing responsibilities in Business Analysis, Design, Development and Testing.
- Profound knowledge in System Analysis, Design and Development in the fields of databases, data warehouse and client server technologies.
- Over 3 years’ experience in installing, configuring, testing Hadoop V2 ecosystem components.
- Expertise in developing Hadoop ecosystem components like MapReduce,HDFS, Pig, Hive, Sqoop, Oozie, Impala, Mahout, Hbase, Yarn, Flume,Spark,Scala,Python.
- Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
- Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review.
- Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modelling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node.
- Intense hands on experience in writing complex Map reduce jobs, Pig Scripts and Hive data modelling.
- Experience in converting MapReduce applications to Spark.
- Experience in converting HiveQL to SparkQL
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Good knowledge in using Job scheduling and workflow designing tools like Oozie.
- Good Knowledge on Hadoop Cluster administration, monitoring and managing Hadoop clusters using Cloudera Manager.
- Have good experience creating real time data streaming solutions using Apache Spark/Spark Streaming/Apache Storm, Kafka and Flume.
- Developed a scalable, cost effective, and fault tolerant data ware house system on Amazon EC2 Cloud.
- Experience in monitoring and controlling large-scale cloud (AWS) infrastructure
- Extending Hive and Pig core functionality by writing custom UDFs.
- Good understanding of Data Mining and Machine Learning techniques.
- Experience in handling messaging services using Apache Kafka.
- Developed Python scripts to format and create daily transmission files.
- Strong Experience in Java, J2EE, Spring, Struts and Hibernate
- Extensive experience in creation of ETL Jobs and transformations using Datastage Designer to move data from multiple sources into target area.
- Worked extensively with complex Jobs using Joins, Lookups, Remove duplicates, aggregator etc.
- Expertise in implementing complex Business rules by creating robust jobs, reusable shared containers, Routines and shell scripts.
- Experienced in performance tuning of Datastage jobs.
- Worked on various databases like Oracle 9i, Confidential DB2, MS Sql server etc.
- Very strong knowledge in relational databases (RDBMS), data modeling and in building Data Warehouse, Data Marts using Star Schema and Snowflake Schema.
- Extensive knowledge in pulling the data from different source systems.
- Consolidate data from various departments into the data warehouse by identify the Common data and solve the discrepancy on data values and created various data marts.
- Good knowledge of OLAP reporting tools like Cognos, Business objects.
- Excellent working knowledge of UNIX Shell Scripting and automation of ETL processes using Autosys on platforms such as Unix.
- Experience in Project Planning, Schedule & Resources management, Communication Management, Business Continuity Planning, Training & Development and People Management
- Possess excellent interpersonal, communication and organizational skills with proven abilities in training & development, customer relationship management and planning
- Proficient in managing & leading teams for running successful process operations & experience of developing procedures, service standards for business excellence
Areas of Expertise
Presentation Skills: Ability to create/deliver dynamic presentations.
Analytical Skills: Knowledge and experience in analysis and data modelling.
Communication: Posses superior interpersonal, oratory & written skills
Team Skills: Posses an emphatic interpersonal style and a win-win attitude
Adapting Skills: Rich research experience and background of working in multicultural environment
Leadership: Ability to lead and to be led, conflict and situation management
TECHNICAL SKILLS
- Hadoop Eco Systems
- Hive
- Pig
- Sqoop
- Spark
- Flume
- Python
- Oozie
- MapReduce
- Kafka
- Java
- J2EE
- Springs
- Struts & Hibernate
- ETL Datastage
- UNIX Shell Scripting
- Autosys
- CA7
- Ctrl-M
- UC4.
PROFESSIONAL EXPERIENCE
Confidential
Application Developer-Bigdata
Responsibilities:
- Created generic framework to load the data from Teradata tables on HDFS using TPT( Teradata Parallel Transporter )
- Most of the data is already loaded on to JRNL schema in HDFS, we are creating dimension tables in Semantic Schema with joining teradata tables and JRNL schema by creating external tables
- Write HQLs to load the data on to HDFS
- Developed Scripts and automated data management from end to end and sync up between all the clusters.
- Extensively used the Hue browser for interacting with Hadoop components.
- Documented the systems processes and procedures for future references.
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews, test development, test automation.
- Worked extensively on UC4 workflows to schedule the Hadoop jobs
- Worked on creating Data Integration Framework Automation to ingest data from databases, files.
- Created multiple generic scripts to load sftp to inbound and outbound files to reduce manual intervention
- Worked closely with business to understand the requirements and deliver the deliverables with 100% Customer Satisfaction and in specified amount of time.
- Not only worked with team and also lead the team of 10 to deliver defect free code to customers which yielded great applause from everyone.
Confidential
Application Developer-Bigdata
Responsibilities:
- TSYS sents out events using Confidential Message Queue, we have kafka receiver reads the data and sends it using flume and there is kafka consumer reads the data depending on the message topic
- Kafka Consumer and receiver are written in scala
- Once data the received it will be written in HDFS file system and a text file and with replication factor of 3
- Then the data which is there is in HDFS is agnostified as per Barclays and loaded on to data mart for Fraud Strategy team to run their rules on it, all this happens real time.
- Developed a data pipeline using Kafka and Spark to store data into HDFS and performed the real-time analytics on the incoming data.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Developed Spark Programs for Batch and Real time processing.
- Implementing Spark Streaming using Scala and Sparksql for faster testing and processing of data of real time.
- Using Spark streaming consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing
- Responsible for reading text files in hadoop cluster, cleansing, transforming and writing data in avro, parquet format with Apache Spark on Scala
- Performed Data Transformations from regular RDBMS databases to NoSQL Databases
- Worked Extensively on NoSQL DataMarts
- Co-Ordination with Offshore to ensure defect free delivery and in time.
- Well Versed with Agile methodologies.
Confidential
Application Architect- Big Data
Responsibilities:
- Responsible to
- Pull data from different DB (Oracle, DB2, SQL Server...) and Mainframe source systems.
- Push data from flat file sources (can be external and internal) to data lake (Hadoop V2 environment).
- Collect and aggregate large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Developed OOZIE workflows for the Application execution
- Implementing Spark Streaming using Scala and Sparksql for faster testing and processing of data of real time.
- Using Spark streaming consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing
- Received streams will be stored in memory using RDD’s.
- Used DataFrame API in Scala for converting the distributed collection of data organized into named columns.
- Performed complex joins of DataFrames (Inner, Outer, Left Outer, Semi- Join ) using spark SQL.
- Wrote complex Hive queries and UDFs.
- Writing Pig scripts for data processing
- Worked on reading multiple data formats on HDFS using PySpark
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Analysed the SQL scripts and designed the solution to implement using PySpark
- Optimizing Hadoop Map Reduce code, Hive/Pig scripts for better scalability, reliability and performance.
- Execution of Hadoop ecosystem and Applications through Apache HUE
- Handled Full CDC, Delta processing, Incremental updates using hive and processed the data in hive tables
- Developed PIG Latin scripts to extract data from source system.
- Developed java Map reduce XML PARSER programs to process XML files using XSD's and XSLT's as per the clients requirement and used to process the data into Hive tables.
- Implemented Hive tables and HQL Queries for the reports. Written and used complex data type in Hive. Storing and retrieved data using HQL in Hive. Developed Hive queries to analyse reducer output data.
- Developed Scripts and automated data management from end to end and sync up between all the clusters.
- Extensively used the Hue browser for interacting with Hadoop components.
- Documented the systems processes and procedures for future references.
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews, test development, test automation.
- Cluster coordination services through Zoo Keeper
- Responsible for importing data from various RDBMS and Mainframe sources in to Hadoop environment with the tool using Sqoop import in full/Incremental.
- Responsible for creation of Hive tables once data is landed on Hadoop, it involves
- Creation of HIVE databases
- Preparation of create hive table SQL by referring Hadoop file schema
- Creation of external/internal Hive tables and portioning as of run date
- Moving the data from external hive tables to Internal hive tables
- Verifying the hive table's data with source systems data to make sure data is landing correctly on lake, in this unit testing process we have used extensively Hadoop tools Hive/Beeline sql's and Pig through Hue portal.
- Responsible for reading text files in hadoop cluster, cleansing, transforming and writing data in avro, parquet format with Apache Spark on Scala
- Co-Ordination with Offshore to ensure defect free delivery and in time.
- Working with BIAS Team to address admin related issues.
- Well Versed with Agile methodologies.
Confidential
ETL Architect- Datastage
Responsibilities:
- Created Design and mapping documents and explained to Development team to create the Datastage Jobs
- Responsible for working with various teams to deliver the Marketing, Sales, Opportunity, Revenue, segmentation, Transactional data to business teams on time.
- To provide daily, weekly, monthly data before business hours to ensure high quality of customer system reporting.
- Data analysis and fixes related to data inconsistency.
- Performance analysis and performance tuning related to ETL process and Database queries.
- Implementing the shell scripting as per the ETL flow requirements
- Responsible for Processing Ad-Hoc data request from business within agreed time lines.
- Responsible for knowledge transfer to team members on Datastage, Teradata and other products that are in use for this project.
- Responsible for effective communication between the project team and the customer. Provide day to day direction to the project team and regular project status to the customer.
- Establish Quality Procedure for the team and continuously monitor and audit to ensure team meets quality goals.
- Scheduling the datastage jobs using autosys calendars
- Supporting the batch runs through autosys and ensure loads are completed within in the SLA’s.
- Monitoring the production jobs and notifying the concern team in case of failures.
- Implementing the production fixes and deploying it in the prod environment.
- Continuous monitoring of file arrival and informing the SOR team in case of file arrival getting delayed.
- Sending the complete report of the run stats every day after the load completion.
- Coordinating with different teams such as SOR,DBA etc as part of production support.
- Understand the business needs and the requirements and design the jobs accordingly.
- Preparation of Autosys jils for scheduling the ETL jobs.
Environment: Datastage 8.5 /11.3 version, Ms SQL server, Teradata, Autosys, CA7, Ctrl M, Informatica 9.1, WINSCP, Unix Shell scripting, Oracle, Star Team.
Confidential
Technical Lead-ETL.Datastage
Responsibilities:
- Understand the business needs and the requirements and design the jobs accordingly.
- Analyse the data in determining the cleansing requirements and effective implementation of Data Cleansing through Qualitystage.
- Extensively used ETL to load data from MS SQL server to target Confidential DB2 tables.
- Involved in different reviews like Internal and external code review, weekly status calls, issue resolution meetings and onsite code acceptance meeting
- Developed ETL jobs using various stages like ODBC Connector,Lookup,Join,Aggregator, Transformer, Sort, Remove Duplicate, Dataset etc.
- Performed Data profiling through Information Analyzer client..
- Performance tuning of the ETL jobs and also problem analysis and Issue resolution
- Got involved in issue logging/tracking, risk identifying/maintaining.
- Preparation of Autosys JILs for scheduling the ETL jobs.
- Responsible for the Autosys migration to different environments.
Environment: Datastage 8.5 version, Ms SQL server, Confidential DB2 V2R6, HP-UNIX 8000/9000, TOAD 8.1.5, Autosys, Cognos 8.1.
Confidential
Technical Lead- ETL.Datastage
Responsibilities:
- Involved in preparing the mapping sheets based on the Requirement document
- Designing and Developing Datastage jobs based on the design document and mapping sheet.
- Modifying the Unix shell scripts which are used to trigger datastage jobs
- Got function knowledge from business analyst as part of the project
- Code reviewing and preparing unit test cases for documenting the results.
- Migrating the code from one environment to the other environment.
- Creating the Autosys jils to schedule the datastage jobs through Autosys
- Performing the full loads and delta loads through Autosys jils
- Establish and implement short and long-term goals, objectives and operating procedures. Monitor and evaluate program effectiveness and changes required for improvement.
- Dealing with multiple clients on a daily basis for Performance/Reviews/Negotiations/Conflicts
- Interact with the client on a regular basis to determine the level of satisfaction & ascertain areas of potential dissatisfaction.
Environment: Datastage 8.1 version, Teradata V2R5, MS SQL Server 2000, UNIX TOAD 8.0 and Business Objects 5.0.
Confidential
Technical Lead- ETL.Datastage
Responsibilities:
- Created a Generic Framework to create one shop stop for all the source data which is also called Enterprise Staging Platform.
- Created a Single Datastage Job and can be called multiple times with invocation ID.
- Extracts the data from Different source systems and load them in Oracle Database
- Created ETL Framework tables which are static which will have all the details about source system
- Created an UNIX Script which will check for the details in the Framework tables and calls the datastage job with appropriate invocation ID
- Created Autosys JILs to call the Jobs
- Performed Lift and Shift from Informatica to ETL Datastage.
- Code reviewing and preparing unit test cases for documenting the results.