Data Engineer Resume
SUMMARY
- Having 4+ years of experience on Apache Hadoop technologies like Hadoopd istributed file system (HDFS), Map Reduce framework, Hive, PIG, Python, Sqoop, Oozie, HBase, Spark and Scala.
- Having 8+ years of experience in the Data warehouse environment, coupled with extensive experience in SQL SERVER 2005/2008/2012/2014 /2016 and Azure SQL Data base Development.
- Strong experience in gathering requirements for BI reports and developing high quality reports for users.
- Working experience in Big data on cloud using AWS EC2 & Microsoft Azure, and handled redshift & Dynamo databases with huge amount of data 3PB.
- Good experience in writing scripts using Microsoft Master Data management.
- Having extensive experience in creating complex ETL flows using Talend & informatica power center.
- Good experience in installation and configuration of various Microsoft technologies and experience in configuring multimode Hadoop system.
- Performed administration troubleshooting and maintenance of multi node Hadoop system.
- Experience in monitoring large scale cloud era Hadoop system and created multi tenancy architecture to support large number of applications.
- Working experience in data net ETL and scheduling using DJS job scheduler on S3.
- Experience in implementing multitenant models for the Hadoop 2.0 Ecosystem using various big data technologies.
- Experience in using Microsoft Azure SQL database, Data Lake and Azure data factory.
- Good experience in writing U - SQL scripts and loading data in Azure factory.
- Worked with Confidential S3 storage and loaded data Redshift & Dynamo DB.
- Good experience on Horton Works Hadoop system and monitoring tools like Ambari, Hue etc.
- Excellent Experience in Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Implemented Big Data solutions using Hadoop technology, including Pig Latin, Hive, HBase, Sqoop and Flume, Zookeeper.
- Good experience in LDAP, active directory and other services in windows environment.
- Experience in writing D-Streams to real time data processing using SPARK and SCALA.
- Experience in writing scripts memory channel, source and sink using flume.
- Core competence in Data Warehousing & Business Intelligence Application Development including Work Flow System, Data Capture, and Process, manage and Delivery functions.
- Excellent understanding and knowledge of NOSQL databases like HBase, Mongo DB
- Performed Importing and exporting data into HDFS and Hive using Sqoop.
- Experience in writing shell scripts.
- Expertise in BI reporting Tools like SSRS, Micro strategy, Crystal reports, Tableau, Tibco spotfire and Power BI.
- Performed real time analytics using revolution R and OFSAA to create statistical model for predictive and reverse product analysis.
- 4+ years of experience in Oracle and MySQL data bases and supported ETL tools.
- Extensive knowledge on all phases of Software Development Life Cycle (SDLC).
- Sound knowledge of database architecture for OLTP and OLAP applications, Data Analysis and ETL processes and data modeling and dimension modeling.
- Experience in creating data model using Erwin, innovator and X-TEA tools and creating table relations using maples and ERD plus tool.
- Exposure and 6 + years of experience in BI tools like SSIS, SSRS, and SSAS.
- 4+ years of experience in Teradata database architecture and development.
- Experience in loading streaming data into Netezza DB and performed analytics on real time data.
- Worked on Informatica Power Center Components - Power Center Designer, Workflow Manager to create workflows and sessions, Workflow Monitor and Repository Manager.
- Experience in tasks like Project tracking, Mentoring, Version Controls, Software Change Request (SCR / SCM, HP service manager) management, Project Deliveries / Quality Control and Migration.
- Database Testing: Expert in writing SQL queries to perform data driven tests and involved in front-end and back-end testing. Strong knowledge of RDBMS concepts.
- Developed SQL queries in Oracle database, SQL Server database, DB2, Teradata and Sybase database to conduct Database testing.
- Exposure and Experience version controls like TFS and SVN.
- Strong analytical skills, capacity for work and diagnostic ability, Gathering different kind of projects requirement. Covered Agile, Waterfall, and Incremental Model and V-Model models of SDLC.
- Efficiently performed Defect Tracking using various tools like HP ALM, Quality Center, Test Director, Rational Clear Quest, Trac, PVCS Tracker and Bugzilla
- Good experience in working with Onsite-Offshore model and having good leadership qualities to lead a team of 25 resources and guide them to execute projects with high standards.
- Good interpersonal skills, committed, result oriented, hard working with a quest and zeal to learn new technologies.
- Good knowledge in ISO & IEEE standards.
TECHNICAL SKILLS
Big Data: Hadoop, Map Reduce, HIVE, PIG, Impala SQOOP, HDFS, HBASE, Oozie, Ambari, Spark, Scala and Mongo DB, Azure Data Lake, Azure Data Factory
Analytics: Revolution R, OFSAA, SAS and Cortana Analytics
DBMS: Oracle 9i,10g, SQL Server 2005/2008/2012/2014 ,TaraData,Netezza, Confidential Redshift
ETL Tools: SQL Server 2005/2008/2012/2014 Integration Services, Informatica Power Center 7.1/8.x/9.1/9.5.x,Data Stage and AB Initio
OLAP Tools: SQL Server 2005/2008/2012 Analysis Services
Reporting Tools: SQL Server 2005/2008 Reporting Services, Power Query, Power BI, ProClarity Reports, Tableau, TIBCO Spotfire, Performance point server, Qlikview and Micro Strategy
Frame Work: C#.Net, Core java
Programming Language: SQL,PL/SQL, Visual Basic, Java, ASP .Net, C#, C,C++, Python and PERL
Scripting: Unix Shell scripting, VB Script, Java Script, XML, Excel, HTML
Operating Systems: Windows 2003 Server/ XP/Vista/ 7,RedHat Linux, Ubuntu, UNIX
PROFESSIONAL EXPERIENCE
Confidential
Data Engineer
Responsibilities:
- Written ETL jobs in Data Net and Informatica to process data from different source to transform data to multiple targets.
- Designed Redshift based data delivery layer for business intelligence tools to operate directly on AWS S3.
- Designed “Data Services” to intermediate data exchange between the Data Clearinghouse and the Data Hubs.
- Established a strategy for data archival leveraging Big Data ecosystem.
- Involved in the ETL phase of the project & Designed and analyzed the data in oracle and migrated to Redshift.
- Created Databases and tables using Redshift and dynamo DB and written complex EMR scripts to process Tera bytes of data into AWS S3 cluster.
- Analyzed existing source databases and prepared high and low level design to create and migrate data into data warehouse system.
- Created jobs using Hammetstone and Datanet and monitored same using web based applications.
- Performed real time analytics on transactional data using python to create statistical model for predictive and reverse product analysis.
- Involving in client meetings and explaining the views to supporting and gathering requirements.
- Working in an agile methodology, understand the requirements of the user stories
- Prepared High-level design documentation for approval
- Also data visualization software OBIEE is used as part of bringing new insights from data extracted and better representation of data.
- Designed data models for dynamic and real-time data with intention to be used by various applications with OLAP and OLTP needs
- Working on onsite & offshore model and leading offshore team to get the work done.
Environment: Redshift, Oracle, EMR, Dynamo DB, Python, OBIEE, Informatica, Pig and Spark.
Confidential
Team Lead
Responsibilities:
- Implemented multitenant models for the Hadoop 2.0 Ecosystem.
- Designed Hive based data delivery layer for business intelligence tools to operate directly on HDFS data.
- Designed “Data Services” to intermediate data exchange between the Data Clearinghouse and the Data Hubs.
- Established a strategy for data archival leveraging Big Data ecosystem.
- Involved in the ETL phase of the project & Designed and analyzed the SQL Server/Oracle/Teradata databases and involved in gathering the user requirements for Hadoop migration.
- Created Databases and tables using Hive and written complex pig scripts to process tera bytes of data into hive and HBase.
- Analyzed existing source databases and prepared high and low level design to create and migrate data into data warehouse system.
- Created jobs using Oozie and monitored same using web based applications like Ambari and Hue.
- Performed real time analytics on transactional data using revolution R and OFSAA to create statistical model for predictive and reverse product analysis.
- Taken challenge to store and perform analytics on click stream and web logging data in first time in the client history.
- Involving in client meetings and explaining the views to supporting and gathering requirements.
- Working in an agile methodology, understand the requirements of the user stories
- Prepared High-level design documentation for approval
- Also data visualization software Tableau is used as part of bringing new insights from data extracted and better representation of data.
- Designed data models for dynamic and real-time data with intention to be used by various applications with OLAP and OLTP needs
- Working on onsite & offshore model and leading Onshore team to get the work done.
Environment: SSIS, SSRS, C#.net, SQL SERVER2008, Hadoop, Map Reduce, Hive, Pig, Sqoop, Teradata, Spark, R and OFSAA
Confidential
Team Lead
Responsibilities:
- Migrated existing SQL data warehouse to Hadoop based system.
- Processed streaming and semi structured data into Hadoop system and performed analytics using Hive and Map reduce.
- Formulated ground up strategy to around ETL, data persistence aggregation, archival and extraction.
- Design a Data Lake Architecture as a centralized Data Hub to deliver data on demand to downstream applications.
- Using Apache Sqoop, the data from various database sources like Oracle, Teradata, DB2 and Informix is extracted to HDFS
- Worked closely across an array of various teams and organizations in the company and industry, including partners, customers and data analysts.
- Derived complex business rules using pig and map reduce scripts.
- Used FALCON based web application for job schedule and monitor, and created partion retention strategies using FALCON.
- Created jobs using Oozie and monitored same using web based applications like Ambari and Hue.
- Apache Hive is being extensively used to build various tables, which are specific to user requirement.
- Designed and built scalable infrastructure and platform to collect and process very large amounts of data (structured and unstructured)
- Monitored cluster alerts and server performance using Ambari.
- Also data visualization software Tableau is used as part of bringing new insights from data extracted and better representation of data.
- Working in an agile methodology, understand the requirements of the user stories and prepared High-level design documentation for approval
- Developed the draft version of the scripts in Java Map reduce and Pig (Data transformation) and HiveQL script (if it involves ad-hoc querying)
- Fine tune the process based on the Map Reduce jobs processed
Environment: SSIS, SSRS, C#.net, SQL SERVER2008, Hadoop, Map Reduce, Hive, Pig, Sqoop, Tera data, Flume, Spark, Python, Scala, HBase, Hue, Oozie, Spark, R, OFSAA and Ambari
Confidential
Team Member
Responsibilities:
- Involved in the ETL phase of the project & Designed and analyzed the SQL Server database and involved in gathering the user requirements.
- Creation of packages depends on client requirement using SQL Server Integration Services & SQL Server 2008.
- As per requirements developed the reports in the form of Matrix, Table and Chart Using SQL Server 2008 Reporting Services.
- Involving in client meetings and explaining the views to supporting.
- Generated the Reports Weekly and Monthly basis as per the Client Requirements.
- As per the client requirements, rendering the Reports in the form PDF, Excel and CSV formats.
Environment: SSIS, SSRS, SSAS, C#.net, SQL SERVER2008, Informatica, TeraData, Netezza and Microstrategy.
Confidential
Team Member
Responsibilities:
- Migrated data from different sources (text based files, Excel spreadsheets, and Access) to SQL Server databases using SQL Server Integration Services (SSIS).
- Involving in client meetings and explaining the views to supporting.
- Implemented complex business requirement in backend using efficient stored procedures and flexible functions, and facilitated easy implementation to the front end application.
- Handling different types of errors, maintaining event handlers.
- Data load monitoring through Sql jobs.
Environment: SSIS, SSRS, SSAS, C#.net, SQL SERVER2008, Netezza and TeraData
Confidential
Team Member
Responsibilities:
- Loading the cleansed, transformed, integrated data into fact table of the data warehouse.
- Involving in client meetings and explaining the views to supporting.
- Handling different types of errors, maintaining event handlers.
- Data load monitoring through Sql jobs
- Providing the data through ProClarity Report to the end user.
- Created SSIS packages to load data in teradata and Netezza.
- Created stored procedure and views in TwinFin-6 Netezza box.
Environment: SSIS, SSRS, SSAS, C#.net, SQL SERVER2008 and TeraData