We provide IT Staff Augmentation Services!

Data Engineer Resume

0/5 (Submit Your Rating)

SUMMARY

  • Having 4+ years of experience on Apache Hadoop technologies like Hadoopd istributed file system (HDFS), Map Reduce framework, Hive, PIG, Python, Sqoop, Oozie, HBase, Spark and Scala.
  • Having 8+ years of experience in the Data warehouse environment, coupled with extensive experience in SQL SERVER 2005/2008/2012/2014 /2016 and Azure SQL Data base Development.
  • Strong experience in gathering requirements for BI reports and developing high quality reports for users.
  • Working experience in Big data on cloud using AWS EC2 & Microsoft Azure, and handled redshift & Dynamo databases with huge amount of data 3PB.
  • Good experience in writing scripts using Microsoft Master Data management.
  • Having extensive experience in creating complex ETL flows using Talend & informatica power center.
  • Good experience in installation and configuration of various Microsoft technologies and experience in configuring multimode Hadoop system.
  • Performed administration troubleshooting and maintenance of multi node Hadoop system.
  • Experience in monitoring large scale cloud era Hadoop system and created multi tenancy architecture to support large number of applications.
  • Working experience in data net ETL and scheduling using DJS job scheduler on S3.
  • Experience in implementing multitenant models for the Hadoop 2.0 Ecosystem using various big data technologies.
  • Experience in using Microsoft Azure SQL database, Data Lake and Azure data factory.
  • Good experience in writing U - SQL scripts and loading data in Azure factory.
  • Worked with Confidential S3 storage and loaded data Redshift & Dynamo DB.
  • Good experience on Horton Works Hadoop system and monitoring tools like Ambari, Hue etc.
  • Excellent Experience in Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Implemented Big Data solutions using Hadoop technology, including Pig Latin, Hive, HBase, Sqoop and Flume, Zookeeper.
  • Good experience in LDAP, active directory and other services in windows environment.
  • Experience in writing D-Streams to real time data processing using SPARK and SCALA.
  • Experience in writing scripts memory channel, source and sink using flume.
  • Core competence in Data Warehousing & Business Intelligence Application Development including Work Flow System, Data Capture, and Process, manage and Delivery functions.
  • Excellent understanding and knowledge of NOSQL databases like HBase, Mongo DB
  • Performed Importing and exporting data into HDFS and Hive using Sqoop.
  • Experience in writing shell scripts.
  • Expertise in BI reporting Tools like SSRS, Micro strategy, Crystal reports, Tableau, Tibco spotfire and Power BI.
  • Performed real time analytics using revolution R and OFSAA to create statistical model for predictive and reverse product analysis.
  • 4+ years of experience in Oracle and MySQL data bases and supported ETL tools.
  • Extensive knowledge on all phases of Software Development Life Cycle (SDLC).
  • Sound knowledge of database architecture for OLTP and OLAP applications, Data Analysis and ETL processes and data modeling and dimension modeling.
  • Experience in creating data model using Erwin, innovator and X-TEA tools and creating table relations using maples and ERD plus tool.
  • Exposure and 6 + years of experience in BI tools like SSIS, SSRS, and SSAS.
  • 4+ years of experience in Teradata database architecture and development.
  • Experience in loading streaming data into Netezza DB and performed analytics on real time data.
  • Worked on Informatica Power Center Components - Power Center Designer, Workflow Manager to create workflows and sessions, Workflow Monitor and Repository Manager.
  • Experience in tasks like Project tracking, Mentoring, Version Controls, Software Change Request (SCR / SCM, HP service manager) management, Project Deliveries / Quality Control and Migration.
  • Database Testing: Expert in writing SQL queries to perform data driven tests and involved in front-end and back-end testing. Strong knowledge of RDBMS concepts.
  • Developed SQL queries in Oracle database, SQL Server database, DB2, Teradata and Sybase database to conduct Database testing.
  • Exposure and Experience version controls like TFS and SVN.
  • Strong analytical skills, capacity for work and diagnostic ability, Gathering different kind of projects requirement. Covered Agile, Waterfall, and Incremental Model and V-Model models of SDLC.
  • Efficiently performed Defect Tracking using various tools like HP ALM, Quality Center, Test Director, Rational Clear Quest, Trac, PVCS Tracker and Bugzilla
  • Good experience in working with Onsite-Offshore model and having good leadership qualities to lead a team of 25 resources and guide them to execute projects with high standards.
  • Good interpersonal skills, committed, result oriented, hard working with a quest and zeal to learn new technologies.
  • Good knowledge in ISO & IEEE standards.

TECHNICAL SKILLS

Big Data: Hadoop, Map Reduce, HIVE, PIG, Impala SQOOP, HDFS, HBASE, Oozie, Ambari, Spark, Scala and Mongo DB, Azure Data Lake, Azure Data Factory

Analytics: Revolution R, OFSAA, SAS and Cortana Analytics

DBMS: Oracle 9i,10g, SQL Server 2005/2008/2012/2014 ,TaraData,Netezza, Confidential Redshift

ETL Tools: SQL Server 2005/2008/2012/2014 Integration Services, Informatica Power Center 7.1/8.x/9.1/9.5.x,Data Stage and AB Initio

OLAP Tools: SQL Server 2005/2008/2012 Analysis Services

Reporting Tools: SQL Server 2005/2008 Reporting Services, Power Query, Power BI, ProClarity Reports, Tableau, TIBCO Spotfire, Performance point server, Qlikview and Micro Strategy

Frame Work: C#.Net, Core java

Programming Language: SQL,PL/SQL, Visual Basic, Java, ASP .Net, C#, C,C++, Python and PERL

Scripting: Unix Shell scripting, VB Script, Java Script, XML, Excel, HTML

Operating Systems: Windows 2003 Server/ XP/Vista/ 7,RedHat Linux, Ubuntu, UNIX

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer

Responsibilities:

  • Written ETL jobs in Data Net and Informatica to process data from different source to transform data to multiple targets.
  • Designed Redshift based data delivery layer for business intelligence tools to operate directly on AWS S3.
  • Designed “Data Services” to intermediate data exchange between the Data Clearinghouse and the Data Hubs.
  • Established a strategy for data archival leveraging Big Data ecosystem.
  • Involved in the ETL phase of the project & Designed and analyzed the data in oracle and migrated to Redshift.
  • Created Databases and tables using Redshift and dynamo DB and written complex EMR scripts to process Tera bytes of data into AWS S3 cluster.
  • Analyzed existing source databases and prepared high and low level design to create and migrate data into data warehouse system.
  • Created jobs using Hammetstone and Datanet and monitored same using web based applications.
  • Performed real time analytics on transactional data using python to create statistical model for predictive and reverse product analysis.
  • Involving in client meetings and explaining the views to supporting and gathering requirements.
  • Working in an agile methodology, understand the requirements of the user stories
  • Prepared High-level design documentation for approval
  • Also data visualization software OBIEE is used as part of bringing new insights from data extracted and better representation of data.
  • Designed data models for dynamic and real-time data with intention to be used by various applications with OLAP and OLTP needs
  • Working on onsite & offshore model and leading offshore team to get the work done.

Environment: Redshift, Oracle, EMR, Dynamo DB, Python, OBIEE, Informatica, Pig and Spark.

Confidential

Team Lead

Responsibilities:

  • Implemented multitenant models for the Hadoop 2.0 Ecosystem.
  • Designed Hive based data delivery layer for business intelligence tools to operate directly on HDFS data.
  • Designed “Data Services” to intermediate data exchange between the Data Clearinghouse and the Data Hubs.
  • Established a strategy for data archival leveraging Big Data ecosystem.
  • Involved in the ETL phase of the project & Designed and analyzed the SQL Server/Oracle/Teradata databases and involved in gathering the user requirements for Hadoop migration.
  • Created Databases and tables using Hive and written complex pig scripts to process tera bytes of data into hive and HBase.
  • Analyzed existing source databases and prepared high and low level design to create and migrate data into data warehouse system.
  • Created jobs using Oozie and monitored same using web based applications like Ambari and Hue.
  • Performed real time analytics on transactional data using revolution R and OFSAA to create statistical model for predictive and reverse product analysis.
  • Taken challenge to store and perform analytics on click stream and web logging data in first time in the client history.
  • Involving in client meetings and explaining the views to supporting and gathering requirements.
  • Working in an agile methodology, understand the requirements of the user stories
  • Prepared High-level design documentation for approval
  • Also data visualization software Tableau is used as part of bringing new insights from data extracted and better representation of data.
  • Designed data models for dynamic and real-time data with intention to be used by various applications with OLAP and OLTP needs
  • Working on onsite & offshore model and leading Onshore team to get the work done.

Environment: SSIS, SSRS, C#.net, SQL SERVER2008, Hadoop, Map Reduce, Hive, Pig, Sqoop, Teradata, Spark, R and OFSAA

Confidential

Team Lead

Responsibilities:

  • Migrated existing SQL data warehouse to Hadoop based system.
  • Processed streaming and semi structured data into Hadoop system and performed analytics using Hive and Map reduce.
  • Formulated ground up strategy to around ETL, data persistence aggregation, archival and extraction.
  • Design a Data Lake Architecture as a centralized Data Hub to deliver data on demand to downstream applications.
  • Using Apache Sqoop, the data from various database sources like Oracle, Teradata, DB2 and Informix is extracted to HDFS
  • Worked closely across an array of various teams and organizations in the company and industry, including partners, customers and data analysts.
  • Derived complex business rules using pig and map reduce scripts.
  • Used FALCON based web application for job schedule and monitor, and created partion retention strategies using FALCON.
  • Created jobs using Oozie and monitored same using web based applications like Ambari and Hue.
  • Apache Hive is being extensively used to build various tables, which are specific to user requirement.
  • Designed and built scalable infrastructure and platform to collect and process very large amounts of data (structured and unstructured)
  • Monitored cluster alerts and server performance using Ambari.
  • Also data visualization software Tableau is used as part of bringing new insights from data extracted and better representation of data.
  • Working in an agile methodology, understand the requirements of the user stories and prepared High-level design documentation for approval
  • Developed the draft version of the scripts in Java Map reduce and Pig (Data transformation) and HiveQL script (if it involves ad-hoc querying)
  • Fine tune the process based on the Map Reduce jobs processed

Environment: SSIS, SSRS, C#.net, SQL SERVER2008, Hadoop, Map Reduce, Hive, Pig, Sqoop, Tera data, Flume, Spark, Python, Scala, HBase, Hue, Oozie, Spark, R, OFSAA and Ambari

Confidential

Team Member

Responsibilities:

  • Involved in the ETL phase of the project & Designed and analyzed the SQL Server database and involved in gathering the user requirements.
  • Creation of packages depends on client requirement using SQL Server Integration Services & SQL Server 2008.
  • As per requirements developed the reports in the form of Matrix, Table and Chart Using SQL Server 2008 Reporting Services.
  • Involving in client meetings and explaining the views to supporting.
  • Generated the Reports Weekly and Monthly basis as per the Client Requirements.
  • As per the client requirements, rendering the Reports in the form PDF, Excel and CSV formats.

Environment: SSIS, SSRS, SSAS, C#.net, SQL SERVER2008, Informatica, TeraData, Netezza and Microstrategy.

Confidential

Team Member

Responsibilities:

  • Migrated data from different sources (text based files, Excel spreadsheets, and Access) to SQL Server databases using SQL Server Integration Services (SSIS).
  • Involving in client meetings and explaining the views to supporting.
  • Implemented complex business requirement in backend using efficient stored procedures and flexible functions, and facilitated easy implementation to the front end application.
  • Handling different types of errors, maintaining event handlers.
  • Data load monitoring through Sql jobs.

Environment: SSIS, SSRS, SSAS, C#.net, SQL SERVER2008, Netezza and TeraData

Confidential

Team Member

Responsibilities:

  • Loading the cleansed, transformed, integrated data into fact table of the data warehouse.
  • Involving in client meetings and explaining the views to supporting.
  • Handling different types of errors, maintaining event handlers.
  • Data load monitoring through Sql jobs
  • Providing the data through ProClarity Report to the end user.
  • Created SSIS packages to load data in teradata and Netezza.
  • Created stored procedure and views in TwinFin-6 Netezza box.

Environment: SSIS, SSRS, SSAS, C#.net, SQL SERVER2008 and TeraData

We'd love your feedback!