We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Lebanon, NJ

SUMMARY:

  • Over 8+ years of experience as Big Data Engineer /Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
  • Experience in designing the Conceptual, Logical and Physical data modeling using Erwin and E/R Studio Data modeling tools.
  • Strong knowledge of Spark for handling large data processing in streaming process along with Scala.
  • Good in System analysis, ER Dimensional Modeling, Database design and implementing RDBMS specific features.
  • Experience in Worked on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig, and Splunk.
  • Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR, Elastic search), Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solve big data type problems.
  • Strong experience in writing SQL and PL/SQL, Transact SQL programs for Stored Procedures, Triggers and Functions.
  • Expertise in analyzing and documenting business requirement documents (BRD) and functional requirement documents (FRD) along with Use Case Modeling and UML.
  • Experience in UNIX shell scripting, Perl scripting and automation of ETL Processes.
  • Expertise in designing complex Mappings and have expertise in performance tuning and slowly-changing Dimension Tables and Fact tables
  • Experienced in Data Scrubbing/Cleansing, Data Quality, Data Mapping, Data Profiling, Data Validation in ETL
  • Experienced in creating and documenting Metadata for OLTP and OLAP when designing a systems.
  • Performeddataanalysis anddataprofiling using complex SQL on various sources systems including Oracle and Teradata.
  • Excellent Knowledge of Ralph Kimball and BillInmon's approaches toDataWarehousing.
  • Excellent in performing data transfer activities between SAS and various databases and data file formats like XLS, CSV, DBF, MDB etc.
  • Experience in working with Excel Pivot and VBA macros for various business scenarios.
  • Experience in Data modeling using ER diagram, Dimensional data modeling, Conceptual/Logical/Physical Modeling using 3NormalForm (3NF), Star Schema modeling, Snowflake modeling using tools like ER/Studio, CA Erwin, Sybase Power Designer for both forward and reverse engineering.
  • Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
  • Expert in writing SQL queries and optimizing the queries in Oracle, SQL Server.
  • Extensive experience in development of Oracle PL/SQL Scripts, Stored Procedures and Triggers for business logic implementation.
  • Experience in designing the Data Mart and creation of Cubes.
  • Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
  • Performing extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
  • Experience in Performance Tuning and query optimization techniques in transactional and Data Warehouse Environments.
  • Experience with distributed data warehousing and/or data mining systems, using one or more Big Data/NoSQL technologies (Hadoop, Hive, HBase, Pig, Cassandra, MongoDB)
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS.
  • Experience in Technical consulting and end-to-end delivery with architecture, data modeling, data governance and design - development - implementation of solutions.
  • Extensive experience in developing and driving strategic direction of SAP operating system (SAP ECC) and SAP business intelligence (SAP BI) system
  • Experience in designing, building and implementing complete Hadoop ecosystem comprising of Map Reduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase, MongoDB, and Spark.
  • Experience in various Teradata utilities like Fastload, Multiload, BTEQ, and Teradata SQL Assistant.
  • Extensively worked with Teradata utilities BTEQ, Fast Export and Multi Load to export and load data to/from different source systems including flat files.

TECHNICAL SKILLS:

DataModeling Tools: Erwin 9.7, ER/Studio v17, Sybase Power Designer

Big Data Technologies: Hadoop 3.0, Hive 2.3, HDFS, HBase 1.2, Apache Flume 1.8, Sqoop 1.4, Spark 2.4, Pig 0.17, Impala 3.0, and MapReduce MRv2/MRv1.

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED

Databases: Oracle 12c, Teradata R15, MS SQL Server 2017Testing and defect tracking Tools: HP/Mercury (Quality Center, Win Runner, Quick Test Professional, Performance Center, Requisite, MS Visio & Visual Source Safe

Operating System: Windows 10/8, Unix, Sun Solaris

ETL/Datawarehouse Tools: Informatica 9.6, SAP Business Objects XIR3.1/XIR2, Web Intelligence, Talend, Tableau, Pentaho

Tools & Software: TOAD, MS Office, BTEQ, Teradata SQL Assistant

Other Tools: Teradata SQL Assistant, Toad 9.7/8.0, DB Visualizer 6.0, Microsoft Office, Microsoft Visio, Microsoft Excel, Microsoft Project

Project Execution Methodologies: Ralph Kimball and BillInmondatawarehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)

WORK EXPERIENCE:

Confidential - Lebanon, NJ

Sr. Big Data Engineer

Responsibilities:

  • As a Sr. Big Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
  • Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, Hive.
  • Used Agile (SCRUM) methodologies for Software Development.
  • Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
  • Implemented Security in Web Applications using Azure and deployed Web Applications to Azure.
  • Implemented logical and physical relational database and maintained Database Objects in the data model using Erwin
  • Developed a high performance, scalable data architecture solution that incorporates a matrix of technology to relate architectural decision to business needs.
  • Participated in integration of MDM (MasterDataManagement) Hub anddatawarehouses.
  • Designed the whole data warehouse system including ODS, DWH and data marts.
  • Used Normalization methods up to 3NF and De-normalization techniques for effective performance in OLTP systems.
  • Extensively using Agile methodology as the Organization Standard to implement thedataModels.
  • Performed theDataMapping,Datadesign (DataModeling) to integrate thedataacross the multiple databases in to EDW.
  • Developed and configured on Informatica MDM hub supports the Master Data Management (MDM), Business Intelligence (BI) and Data Warehousing platforms to meet business needs.
  • Used Load utilities (Fast Load & Multi Load) with the mainframe interface to load the data into Teradata.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Involved in Planning, Defining and Designing data base using Erwin on business requirement and provided documentation.
  • Worked in Azure environment for development and deployment of Custom Hadoop Applications.
  • Involved in database development by creating Oracle PL/SQL Functions, Procedures and Collections.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies using Hadoop, MapReduce, HBase, Hive and Cloud Architecture.
  • Involved in converting MapReduce programs into Spark transformations using Spark python API.
  • Developed Spark scripts by using python and bash Shell commands as per the requirement.
  • Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
  • Maintained metadata (datadefinitions of table structures) and version controlling for thedatamodel.
  • Created stored procedures, functions, database triggers and packages as per the business needs for developing ad-hoc and robust reports.
  • Defined best practices fordatamodeling and extraction and ensure architectural alignment of the designs and development.
  • Worked in Hadoop Environment using pig, Sqoop, Hive, HBase and detailed understanding of map reduce programs
  • Involved in integration of various relational and non-relational sources such as Oracle, XML and Flat Files.
  • Designed and implemented scalable Cloud Data and Analytical architecture solutions for various public and private cloud platforms using Azure.
  • Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
  • Worked closely with the SSIS, SSRS Developers to explain the complex data transformation using Logic.
  • Developed multiple MapReduce jobs for Data Cleaning and pre-processing analyzing data in Pig.
  • Analyzed existing source system with the help ofDataProfiling and source system datamodels.
  • Worked in Data Analysis, data profiling and data governance identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
  • Worked in SQL across a number of dialects (we commonly write MySQL, PostgreSQL, Redshift, and Oracle).
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from Teradata database.

Environment: Oozie 4.3, PL/SQL, NoSQL, MS Excel 2016, Visio, Erwin r9.7, Oracle12c, Hadoop 3.0, MDM, OLTP, OLAP, ODS, SQL, Apache Hive2.3, HDFS, HBase1.2, SSIS, SSRS, PL/SQL, Apache Pig 0.17, MapReduce, SAS, Tableau, Azure

Confidential, San Jose, CA

Sr. Data Engineer

Responsibilities:

  • Participated in requirements sessions to gather requirements along with business analysts and product owners.
  • Involved in Agile development methodology active member in scrum meetings.
  • Involvement in design, development and testing phases of Software Development Life Cycle (SDLC).
  • Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
  • Architected, Designed and Developed Business applications and Data marts for reporting.
  • Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
  • Developed Big Data solutions focused on pattern matching and predictive modeling
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
  • Installed and configured Hadoop Ecosystem components.
  • Worked on implementation and maintenance of Cloudera Hadoop cluster.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Experience in Server infrastructure development on Gateway, ELB, Auto Scaling, Dynamo DB, Elastic search, Virtual Private Cloud (VPC)
  • Involved in Kafka and building use case relevant to our environment.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
  • Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
  • Created Integration Relational 3NF models that can functionally relate to other subject areas and responsible to determine transformation rules accordingly in the Functional Specification Document.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
  • Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
  • Developed Spark code using Scala for faster testing and processing of data.
  • Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for our use case.
  • Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
  • Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
  • Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
  • Developed continuous flow of data into HDFS from social feeds using Apache Storm Spouts and Bolts.
  • Involved in loading data from Unix file system to HDFS.

Environment: Hadoop 3.0, 3NF, flume 1.8, Sqoop 1.4, pig 0.17, YARN, HDFS, HBase 1.2, Kafka, Scala 2.12, NoSQL, Cassandra 3.11, Elastic Search, MLlib, Teradata 15, Sqoop, MapReduce, UNIX, Zookeeper 3.4

Confidential - Greensboro, NC

Data Architect/Data Modeler

Responsibilities:

  • Involved in documentation of Data Architect/Data Modeler and ETL specifications for Data warehouse Erwin.
  • Developed a high performance, scalable data architecture solution that incorporates a matrix of technology to relate architectural decision to business needs.
  • Worked on ER Studio for multiple Operations across in both OLAP and OLTP applications.
  • Involved in debugging and Tuning the PL/SQL code, tuning queries, optimization for the Oracle database.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Designed the Logical Data Model using ER/Studio with the entities and attributes for each subject areas.
  • Used the Agile Scrum methodology to build the different phases of Software development life cycle (SDLC).
  • Involved in several facets of MDM implementations including Data Profiling, Metadata acquisition and data migration.
  • Independently coded new programs and design Tables to load and test the program effectively for the given POC's using Big Data/Hadoop.
  • Responsible for full data loads from production to AWS Redshift staging environment.
  • Designed the Fact and Dimension table for Data Marts using ER/Studio.
  • Designed of ODS layer, Dimensional modeling using Kimball Methodologies, of the Data Warehouse sourced from MDM Base tables and other Transactional systems.
  • Worked on migrating of EDW to AWS using EMR and various other technologies.
  • Designed and produced client reports using Excel, Access, Tableau and SAS.
  • Created logical and physical data model using Cassandra’s model
  • Extracting Mega Data from Amazon Redshift, AWS, and Elastic Search engine using SQL Queries to create reports
  • Developed long term data warehouse roadmap and architectures, designs and builds the data warehouse framework per the roadmap.
  • Designed and Developed Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing
  • Worked extensively with Business Objects XI Report Developers in solving critical issues of defining hierarchy, loops and Contexts.
  • Developed and configured on Informatica MDM hub supports the Master Data Management (MDM), Business Intelligence (BI) and Data Warehousing platforms to meet business needs.
  • Generated comprehensive analytical reports by running SQL queries against current databases to conductdataanalysis.

Environment: ER/Studio v17, Oracle 12c, SDLC, MDM, SQL, PL/SQL, Business objects, AWS, Apache Hive 2.3, SAS, Tableau 8.3.1, HDFS, Amazon Redshift.

Confidential - Atlanta, GA

Sr. Data Analyst/Data Modeler

Responsibilities:

  • Worked as a Data Analyst/Modeler to generate Data Models using ER/Studio and subsequent deployment to Enterprise Data Warehouse.
  • Wrote T-SQL statements for retrieval of data and Involved in performance tuning of T-SQL queries and Stored Procedures.
  • Involved in Data profiling and performed Data Analysis based on the requirements, which helped in catching many Sourcing Issues upfront.
  • Worked with supporting business analysis and marketing campaign analytics with data mining, data processing, and investigation to answer complex business questions.
  • Created tables, views, sequences, indexes, constraints and generated SQL scripts for implementing physical data model.
  • Worked with ETL team for documentation of transformation rules for data migration from OLTP to warehouse for purpose of reporting.
  • Developed operational data store to design data marts and enterprise data warehouses.
  • Designed and Developed Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing.
  • Created and Maintained Logical Data Model (LDM) for the project. Includes documentation of all Entities, Attributes, Data Relationships, Primary and Foreign Key Structures, Allowed Values, Codes, Business Rules, Glossary Terms, etc.
  • Designed data flows that (ETL) extract, transform, and load data by optimizing SSIS performance.
  • Developed the required data warehouse model using Star schema for the generalized model
  • Involved in Normalization / De-normalization, Normal Form and database design methodology.
  • Extensively performed Data Profiling, Data Cleansing, De-duplicating the data and has a good knowledge on best practices.
  • Used forward engineering to create a Physical Data Model with DDL that best suits the requirements from the Logical Data Model
  • Designed and Developed Use Cases, Activity Diagrams, and Sequence Diagrams using Unified Modeling Language (UML)
  • Involved in the analysis of the existing claims processing system, mapping phase according to functionality and data conversion procedure.
  • Performed Normalization of the existing OLTP systems (3NF), to speed up the DML statements execution time.
  • Created the XML control files to upload the data into Data warehousing system.
  • Identified the Primary Key, Foreign Key relationships across the entities and across subject areas.
  • Worked for cleansing and organizing various tables in a presentable manner to help with better understanding of already existing models.

Environment: ER/Studio, Oracle 11g, OLTP, ETL, XML, T-SQL, UML, 3NF, SSIS, DDL, DML.

Confidential

Data Analyst

Responsibilities:

  • Worked with Data Analyst for requirements gathering, business analysis and project coordination.
  • Performed migration of Reports (Crystal Reports, and Excel) from one domain to another domain using Import/Export Wizard.
  • Wrote a complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process.
  • Used advanced Excel formulas and functions like Pivot Tables, Lookup, If with and/index, match for data cleaning.
  • Redesigned some of the previous models by adding some new entities and attributes as per the business requirements.
  • Reviewed Stored Procedures for reports and wrote test queries against the source system (SQL Server) to match the results with the actual report against the Data mart (Oracle).
  • Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
  • Performed SQL validation to verify the data extracts integrity and record counts in the database tables
  • Created Schema objects like Indexes, Views, and Sequences, triggers, grants, roles, Snapshots.
  • Effectively used data blending feature in Tableau to connect different databases like Oracle, MS SQL Server.
  • Transferred data with SAS/Access from the databases MS Access, Oracle into SAS data sets on Windows and UNIX.
  • Provided guidance and insight on data visualization and dashboard design best practices in Tableau
  • Performed Verification, Validation and Transformations on the Input data (Text files) before loading into target database.
  • Executed data extraction programs/data profiling and analyzing data for accuracy and quality.
  • Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects.
  • Documented designs and Transformation Rules engine for use of all the designers across the project.
  • Designed and implemented basic SQL queries for testing and report/data validation
  • Used ad hoc queries for querying and analyzing the data.
  • Performed Gap Analysis to check the compatibility of the existing system infrastructure with the new business requirements.

Environment: SQL, PL/SQL, Oracle9i, SAS, Business Objects, Tableau, Crystal Reports, T-SQL, SAS, UNIX, MS Access 2010

We'd love your feedback!