Data Engineer Resume

SUMMARY

Having 4+ years of experience on Apache Hadoop technologies like Hadoopd istributed file system (HDFS), Map Reduce framework, Hive, PIG, Python, Sqoop, Oozie, HBase, Spark and Scala.
Having 8+ years of experience in the Data warehouse environment, coupled with extensive experience in SQL SERVER 2005/2008/2012/2014 /2016 and Azure SQL Data base Development.
Strong experience in gathering requirements for BI reports and developing high quality reports for users.
Working experience in Big data on cloud using AWS EC2 & Microsoft Azure, and handled redshift & Dynamo databases with huge amount of data 3PB.
Good experience in writing scripts using Microsoft Master Data management.
Having extensive experience in creating complex ETL flows using Talend & informatica power center.
Good experience in installation and configuration of various Microsoft technologies and experience in configuring multimode Hadoop system.
Performed administration troubleshooting and maintenance of multi node Hadoop system.
Experience in monitoring large scale cloud era Hadoop system and created multi tenancy architecture to support large number of applications.
Working experience in data net ETL and scheduling using DJS job scheduler on S3.
Experience in implementing multitenant models for the Hadoop 2.0 Ecosystem using various big data technologies.
Experience in using Microsoft Azure SQL database, Data Lake and Azure data factory.
Good experience in writing U - SQL scripts and loading data in Azure factory.
Worked with Confidential S3 storage and loaded data Redshift & Dynamo DB.
Good experience on Horton Works Hadoop system and monitoring tools like Ambari, Hue etc.
Excellent Experience in Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
Implemented Big Data solutions using Hadoop technology, including Pig Latin, Hive, HBase, Sqoop and Flume, Zookeeper.
Good experience in LDAP, active directory and other services in windows environment.
Experience in writing D-Streams to real time data processing using SPARK and SCALA.
Experience in writing scripts memory channel, source and sink using flume.
Core competence in Data Warehousing & Business Intelligence Application Development including Work Flow System, Data Capture, and Process, manage and Delivery functions.
Excellent understanding and knowledge of NOSQL databases like HBase, Mongo DB
Performed Importing and exporting data into HDFS and Hive using Sqoop.
Experience in writing shell scripts.
Expertise in BI reporting Tools like SSRS, Micro strategy, Crystal reports, Tableau, Tibco spotfire and Power BI.
Performed real time analytics using revolution R and OFSAA to create statistical model for predictive and reverse product analysis.
4+ years of experience in Oracle and MySQL data bases and supported ETL tools.
Extensive knowledge on all phases of Software Development Life Cycle (SDLC).
Sound knowledge of database architecture for OLTP and OLAP applications, Data Analysis and ETL processes and data modeling and dimension modeling.
Experience in creating data model using Erwin, innovator and X-TEA tools and creating table relations using maples and ERD plus tool.
Exposure and 6 + years of experience in BI tools like SSIS, SSRS, and SSAS.
4+ years of experience in Teradata database architecture and development.
Experience in loading streaming data into Netezza DB and performed analytics on real time data.
Worked on Informatica Power Center Components - Power Center Designer, Workflow Manager to create workflows and sessions, Workflow Monitor and Repository Manager.
Experience in tasks like Project tracking, Mentoring, Version Controls, Software Change Request (SCR / SCM, HP service manager) management, Project Deliveries / Quality Control and Migration.
Database Testing: Expert in writing SQL queries to perform data driven tests and involved in front-end and back-end testing. Strong knowledge of RDBMS concepts.
Developed SQL queries in Oracle database, SQL Server database, DB2, Teradata and Sybase database to conduct Database testing.
Exposure and Experience version controls like TFS and SVN.
Strong analytical skills, capacity for work and diagnostic ability, Gathering different kind of projects requirement. Covered Agile, Waterfall, and Incremental Model and V-Model models of SDLC.
Efficiently performed Defect Tracking using various tools like HP ALM, Quality Center, Test Director, Rational Clear Quest, Trac, PVCS Tracker and Bugzilla
Good experience in working with Onsite-Offshore model and having good leadership qualities to lead a team of 25 resources and guide them to execute projects with high standards.
Good interpersonal skills, committed, result oriented, hard working with a quest and zeal to learn new technologies.
Good knowledge in ISO & IEEE standards.

TECHNICAL SKILLS

Big Data: Hadoop, Map Reduce, HIVE, PIG, Impala SQOOP, HDFS, HBASE, Oozie, Ambari, Spark, Scala and Mongo DB, Azure Data Lake, Azure Data Factory

Analytics: Revolution R, OFSAA, SAS and Cortana Analytics

DBMS: Oracle 9i,10g, SQL Server 2005/2008/2012/2014 ,TaraData,Netezza, Confidential Redshift

ETL Tools: SQL Server 2005/2008/2012/2014 Integration Services, Informatica Power Center 7.1/8.x/9.1/9.5.x,Data Stage and AB Initio

OLAP Tools: SQL Server 2005/2008/2012 Analysis Services

Reporting Tools: SQL Server 2005/2008 Reporting Services, Power Query, Power BI, ProClarity Reports, Tableau, TIBCO Spotfire, Performance point server, Qlikview and Micro Strategy

Frame Work: C#.Net, Core java

Programming Language: SQL,PL/SQL, Visual Basic, Java, ASP .Net, C#, C,C++, Python and PERL

Scripting: Unix Shell scripting, VB Script, Java Script, XML, Excel, HTML

Operating Systems: Windows 2003 Server/ XP/Vista/ 7,RedHat Linux, Ubuntu, UNIX

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer

Responsibilities:

Written ETL jobs in Data Net and Informatica to process data from different source to transform data to multiple targets.
Designed Redshift based data delivery layer for business intelligence tools to operate directly on AWS S3.
Designed “Data Services” to intermediate data exchange between the Data Clearinghouse and the Data Hubs.
Established a strategy for data archival leveraging Big Data ecosystem.
Involved in the ETL phase of the project & Designed and analyzed the data in oracle and migrated to Redshift.
Created Databases and tables using Redshift and dynamo DB and written complex EMR scripts to process Tera bytes of data into AWS S3 cluster.
Analyzed existing source databases and prepared high and low level design to create and migrate data into data warehouse system.
Created jobs using Hammetstone and Datanet and monitored same using web based applications.
Performed real time analytics on transactional data using python to create statistical model for predictive and reverse product analysis.
Involving in client meetings and explaining the views to supporting and gathering requirements.
Working in an agile methodology, understand the requirements of the user stories
Prepared High-level design documentation for approval
Also data visualization software OBIEE is used as part of bringing new insights from data extracted and better representation of data.
Designed data models for dynamic and real-time data with intention to be used by various applications with OLAP and OLTP needs
Working on onsite & offshore model and leading offshore team to get the work done.

Environment: Redshift, Oracle, EMR, Dynamo DB, Python, OBIEE, Informatica, Pig and Spark.

Confidential

Team Lead

Responsibilities:

Implemented multitenant models for the Hadoop 2.0 Ecosystem.
Designed Hive based data delivery layer for business intelligence tools to operate directly on HDFS data.
Designed “Data Services” to intermediate data exchange between the Data Clearinghouse and the Data Hubs.
Established a strategy for data archival leveraging Big Data ecosystem.
Involved in the ETL phase of the project & Designed and analyzed the SQL Server/Oracle/Teradata databases and involved in gathering the user requirements for Hadoop migration.
Created Databases and tables using Hive and written complex pig scripts to process tera bytes of data into hive and HBase.
Analyzed existing source databases and prepared high and low level design to create and migrate data into data warehouse system.
Created jobs using Oozie and monitored same using web based applications like Ambari and Hue.
Performed real time analytics on transactional data using revolution R and OFSAA to create statistical model for predictive and reverse product analysis.
Taken challenge to store and perform analytics on click stream and web logging data in first time in the client history.
Involving in client meetings and explaining the views to supporting and gathering requirements.
Working in an agile methodology, understand the requirements of the user stories
Prepared High-level design documentation for approval
Also data visualization software Tableau is used as part of bringing new insights from data extracted and better representation of data.
Designed data models for dynamic and real-time data with intention to be used by various applications with OLAP and OLTP needs
Working on onsite & offshore model and leading Onshore team to get the work done.

Environment: SSIS, SSRS, C#.net, SQL SERVER2008, Hadoop, Map Reduce, Hive, Pig, Sqoop, Teradata, Spark, R and OFSAA

Confidential

Team Lead

Responsibilities:

Migrated existing SQL data warehouse to Hadoop based system.
Processed streaming and semi structured data into Hadoop system and performed analytics using Hive and Map reduce.
Formulated ground up strategy to around ETL, data persistence aggregation, archival and extraction.
Design a Data Lake Architecture as a centralized Data Hub to deliver data on demand to downstream applications.
Using Apache Sqoop, the data from various database sources like Oracle, Teradata, DB2 and Informix is extracted to HDFS
Worked closely across an array of various teams and organizations in the company and industry, including partners, customers and data analysts.
Derived complex business rules using pig and map reduce scripts.
Used FALCON based web application for job schedule and monitor, and created partion retention strategies using FALCON.
Created jobs using Oozie and monitored same using web based applications like Ambari and Hue.
Apache Hive is being extensively used to build various tables, which are specific to user requirement.
Designed and built scalable infrastructure and platform to collect and process very large amounts of data (structured and unstructured)
Monitored cluster alerts and server performance using Ambari.
Also data visualization software Tableau is used as part of bringing new insights from data extracted and better representation of data.
Working in an agile methodology, understand the requirements of the user stories and prepared High-level design documentation for approval
Developed the draft version of the scripts in Java Map reduce and Pig (Data transformation) and HiveQL script (if it involves ad-hoc querying)
Fine tune the process based on the Map Reduce jobs processed

Environment: SSIS, SSRS, C#.net, SQL SERVER2008, Hadoop, Map Reduce, Hive, Pig, Sqoop, Tera data, Flume, Spark, Python, Scala, HBase, Hue, Oozie, Spark, R, OFSAA and Ambari

Confidential

Team Member

Responsibilities:

Involved in the ETL phase of the project & Designed and analyzed the SQL Server database and involved in gathering the user requirements.
Creation of packages depends on client requirement using SQL Server Integration Services & SQL Server 2008.
As per requirements developed the reports in the form of Matrix, Table and Chart Using SQL Server 2008 Reporting Services.
Involving in client meetings and explaining the views to supporting.
Generated the Reports Weekly and Monthly basis as per the Client Requirements.
As per the client requirements, rendering the Reports in the form PDF, Excel and CSV formats.

Environment: SSIS, SSRS, SSAS, C#.net, SQL SERVER2008, Informatica, TeraData, Netezza and Microstrategy.

Confidential

Team Member

Responsibilities:

Migrated data from different sources (text based files, Excel spreadsheets, and Access) to SQL Server databases using SQL Server Integration Services (SSIS).
Involving in client meetings and explaining the views to supporting.
Implemented complex business requirement in backend using efficient stored procedures and flexible functions, and facilitated easy implementation to the front end application.
Handling different types of errors, maintaining event handlers.
Data load monitoring through Sql jobs.

Environment: SSIS, SSRS, SSAS, C#.net, SQL SERVER2008, Netezza and TeraData

Confidential

Team Member

Responsibilities:

Loading the cleansed, transformed, integrated data into fact table of the data warehouse.
Involving in client meetings and explaining the views to supporting.
Handling different types of errors, maintaining event handlers.
Data load monitoring through Sql jobs
Providing the data through ProClarity Report to the end user.
Created SSIS packages to load data in teradata and Netezza.
Created stored procedure and views in TwinFin-6 Netezza box.

Environment: SSIS, SSRS, SSAS, C#.net, SQL SERVER2008 and TeraData

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship