We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

5.00/5 (Submit Your Rating)

Indianapolis, IN

SUMMARY:

  • Above 8+ years of experience as Big Data Engineer /Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
  • Strong knowledge of Software Development Life Cycle (SDLC) and expertise in detailed design documentation.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, BeanStalk, ECS, Cloudwatch, Lambda, ELB, VPC, ElasticCache, DynamoDB, Redshit, RDS, Aethna, Zeppelin & Airflow.
  • Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS).
  • Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
  • Experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Expert level in Server side development using J2EE , EJB, J2SE, Spring, Servlets, J2SE, Python, C++ on Windows, Unix and Linux Platform.
  • Involve in writing SQL queries, PL/SQL programming and created new packages and procedures and modified and tuned existing procedure and queries using TOAD
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
  • Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ and MDM.
  • Experience in cloud development architecture on Amazon AWS, EC2, Redshift and Basic on Azure.
  • Experience in Work on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Excellent Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Experience in Working Dimensional Data modeling, Star Schema/Snow flake schema, Fact & Dimensions Tables.
  • Upgraded Hadoop CDH to 5.x, and Hortonworks. Installed, Upgraded and Maintained Cloudera Hadoop-based software, Cloudera Clusters, Cloudera Navigator.
  • Built and Deployed Industrial scale Data Lake on premise and Cloud platforms.
  • Strong knowledge of Spark for handling large data processing in streaming process along with Scala.
  • Thorough Knowledge in creating DDL, DML and Transaction queries in SQL for Oracle and Teradata databases.
  • Experienced in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
  • Expertise in performing User Acceptance Testing (UAT) and conducting end user training sessions.
  • Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
  • Good expertise knowledge with the UNIX commands like changing the permissions of the file to file and group permissions.
  • Experience in building reports using SQL Server Reporting Services and Crystal Reports.
  • Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
  • Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server, and MySQL.
  • Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems
  • Performing extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
  • Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.

TECHNICAL SKILLS:

Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6.

Big Data / Hadoop Ecosystem: MapReduce, HBase 1.2, Hive 2.3, Pig 0.17, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

Cloud Management: Amazon Web Services(AWS), Amazon Redshift

ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho.

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, ANSI SQL, SED

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.

Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio & Visual Source Safe

Operating System: Windows, Unix, Sun Solaris

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.

PROFESSIONAL EXPERIENCE:

Confidential, Indianapolis, IN

Sr. Big Data Engineer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop components.
  • Solid Understanding of Hadoop HDFS Map-Reduce and other Eco-System Projects.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Knowledge of architecture and functionality of NOSQL DB like HBase.
  • Used S3 for data storage, responsible for handling huge amounts of data.
  • Used EMR for data pre-analysis by creating EC2 instances.
  • Used Kafka for obtaining the near real time data.
  • Good experience in writing data ingesters likes Sqoop.
  • Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API, Data Frames and Pair RDD's for faster processing of data and created RDD's, Data Frames and datasets.
  • Batch-processing is done by using Spark implemented by Scala.
  • Extensive data validation using HIVE and also written Hive UDFs.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Used Cloudera data platform for deploying Hadoop in some modules.
  • Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way lots of scripting (python and shell) to provision and spin up virtualized Hadoop clusters.
  • Developed and Configured Kafka brokers to pipeline server logs data into Spark streaming.
  • Created external tables pointing to HBase to access table with huge number of columns.
  • Involved in loading the created HFiles into HBase for faster access of large customer base without taking Performance hit.
  • Configured TALEND ETL tool for some data filtering,
  • Processed the data in HBase using Apache Crunch pipelines, a map-reduce programming model which is efficient for processing AVRO data formats.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.

Environment: UNIX, Linux Java, Apache HDFS Map Reduce, Spark, Pig, Hive, HBase, Kafka, Sqoop, NOSQL, AWS (S3 buckets), EMR cluster, SOLR.

Confidential, Merrimack, NH

Sr. Big Data Engineer

Responsibilities:

  • Worked as a Big Data Engineer to Import and export data from different databases.
  • Prepared ETL technical Mapping Documents along with test cases for each Mapping for future developments to maintain Software Development Life Cycle (SDLC).
  • Implemented a centralized Data Lake in Hadoop with data from various sources.
  • Wrote script for Location Analytic project deployment on a Linux cluster/farm & AWS Cloud deployment using Python.
  • Involved in Data Querying and Summarization using Hive and created UDF’s, UDAF’s and UDTF’s.
  • ETL/ELT, Data Migration and Integration (K2view ADI/ Data Fabric, Oracle ODI, SSIS/SSAS etc.)
  • Creating the High level Design for Datastage Components,
  • Wrote Scala/Spark/AWS EMR cloud application to process & transform billions of Rest & mobile events generated on Realtor.com every hour.
  • Responsible for delivery of the ETL Migration for 20K+ jobs from Legacy code to Datastage.
  • DS migration planning and testing from Datastage 9.1 to 11.7 Big integrate using Hadoop clusters.
  • Created Logical & Physical Data Modeling on Relational (OLTP), Dimensional Data Modeling (OLAP) on Star schema for Fact & Dimension tables using Erwin.
  • Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily.
  • Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
  • Orchestrated hundreds of sqoop scripts, pig scripts, hive queries using oozie workflows and sub-workflows. Managing Hadoop Clusters using Apache, Horton works ,Cloudera and MapReduce.
  • Used Python, Ruby, Pig, Hive, Sqoop to implement various tools & utilities for data import & export.
  • Utilize U-SQL for data analytics/ data ingestion of raw data in Azure and Blob storage
  • Performed thorough data analysis for the purpose of overhauling the database using ANSI-SQL.
  • Translated business concepts into XML vocabularies by designing XML Schemas with UML.
  • Worked with medical claim data in the Oracle database for Inpatient/Outpatient data validation, trend and comparative analysis.
  • ETL platform will also be migrated from Ab Initio on AIX to IBM Big Integrate on Hadoop
  • Designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
  • Enhance team productivity by providing strategies and training program design for the K2View data fabric platform.
  • Migrated databases from DB2 to Hadoop eco system as part of Atlas data lake project.
  • Implemented reporting in PySpark, Zeppelin & querying through Airpal & AWS Aethna.
  • Implemented Data Validation using MapReduce programs to remove un-necessary records before move data into Hive tables.
  • Worked on analyzing and examining customer behavioral data using MongoDB.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and Netezza
  • Worked with MDM systems team with respect to technical aspects and generating reports.
  • Worked on with importing and exporting data from different Relational Database Systems like DB2 into HDFS and Hive and vice-versa, using Sqoop
  • Supported various reporting teams and experience with data visualization tool Tableau.
  • Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
  • Designed the data aggregations on Hive for ETL processing on Amazon EMR to process data as per business requirement
  • Build reusable Hive UDF’s libraries for business requirements.
  • Data Fabric, Hadoop HDFS,Hive, HBase, Spark, Kafka, Mark Logic and canonical data model. This project was implemented in agile

    methodology.

  • Installed and configured a HortonWorks HDP and Hadoop using AMBARI.
  • Developed and designed data integration and migration solutions in Azure.
  • Developed Sqoop scripts to extract the data from MYSQL and load into HDFS.
  • Prepared complex ANSI-SQL queries, views and stored procedures to load data into staging area.
  • Involved in Hive-HBase integration by creating hive external tables and specifying storage as HBase format.
  • Analyzed the data which is using the maximum number of resources and made changes in the back-end code using PL/SQL stored procedures and triggers
  • Deployed SSRS reports to Report Manager and created linked reports, snapshots, and subscriptions for the reports and worked on scheduling of the reports.
  • Integrated multiple sources data (SQL Server, DB2, TD) into Hadoop cluster and analyzed data by Hive-HBase integration.
  • Imported data from different relational data sources like Oracle, Teradata to HDFS using Sqoop.

Environment: Hadoop 3.0, Kafka 1.1, SQL, PL/SQL, OLAP, Big Integrate, ANSI-SQL, Java, Data Lake, Hortonworks, Python, OLTP, SDLC, MDM, Netezza, Oracle 12c, XML, MapReduce, SSRS, UDF, MYSQL, T-SQL, Teradata 15, Azure, AWS, ETL, HDFS, Hive 2.3, Sqoop 1.4, Tableau, Pig 0.17.

Confidential ., West Point, PA

Sr. Hadoop Data Engineer

Responsibilities:

  • Worked as a Data Engineer designed and Modified Database tables and used HBase Queries to insert and fetch data from tables.
  • Developed Spark and Java applications for data streaming and data transformation

    Involved in selecting and integrating any Big Data tools and frameworks required to provide requested capabilities

  • Developed Pig scripts and UDF's as per the Business logic.
  • Developing a new architecture for the project which uses less infrastructure and costs less, by converting the data load jobs to read directly from on premise data sources.
  • Wrote Big Query scripts for further data transformation in google cloud storage

    Wrote Azkaban jobs to create clusters, run spark jobs and schedule google Big Queries.

    Wrote Shell Scripts to optimize Azkaban projects and read JAR files directly from cloud bucket, read through the data in bucket for debugging.

  • Developed a building block code (Common feed parser) that can work with any Kafka topics and data formats like XML and JSON with a driver program.
  • Participated in design discussions and assured functional specifications are delivered in all phases of SDLC in an Agile Environment.
  • Implemented business logic based on state in Hive using Generic UDF's.
  • Designed Data Stage ETL jobs for extracting data from heterogeneous source systems, transform and finally load into Data Warehouse
  • Designed & Developed tool like Ambari & Chef for MPact Software, Hadoop, Hbase & SparkDeployment, Configuration, Monitoring, HA, Load & Data Balancing, Scalability on AWS, ESXI, XEN & distributed cluster using Java 8, Spring, Scala, Python & Ruby.
  • Fixing the Load balancing issues of Datastage Jobs and Database Jobs on Server.
  • Created data models for AWS Redshift and Hive from dimensional data models.
  • Executed change management processes surrounding new releases of SAS functionality
  • Prepared complex T-SQL queries, views and stored procedures to load data into staging area.
  • Performed Normalization of the existing (3rd NF), to speed up the DML statements execution time.
  • Participated in data collection, data cleaning, data mining, developing models and visualizations.
  • Worked with Sqoop to transfer data between the HDFS to relational database like MySQL and vice versa and experience in using of Talend for this purpose.
  • Installed and configured Horton works HADOOP from scratch for development and HADOOP tools like Hive, HBASE, SQOOP, ZOOKEEPER and FLUME.
  • Configure data fabric which provides seamless, real-time integration and access across the multiple data silos of a big data system
  • Built and Deployed Industrial scale Data Lake on premise and Cloud platforms.
  • Developed Server side and client-side web applications using Spring 2.5, Struts 2, EJB, Hibernate, IBatis, JSF, JSTL, ExtJS and Web 2.0 Ajax frameworks. Developed small intranet sites using Python.
  • Developed automatic job flows and ran through Oozie daily and when needed which runs MapReduce jobs internally.
  • Used SSIS and T-SQL stored procedures to transfer data from OLTP databases to staging area and finally transfer into data-mart.
  • Extracted Tables and exported data from Teradata through Sqoop and placed in Cassandra.
  • Worked on analyzing and examining customer behavioral data using MongoDB.
  • Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
  • Enable the processing, management, storage and analysis of data using data fabric.
  • Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
  • Generated parameterized queries for generating tabular reports using global variables, expressions, functions, and stored procedures using SSRS.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Created jobs and transformation in Pentaho Data Integration to generate reports and transfer data from HBase to RDBMS.
  • Wrote DDL and DML statements for creating, altering tables and converting characters into numeric values.
  • Worked on Master data Management (MDM) Hub and interacted with multiple stakeholders.
  • Worked on Kafka and Storm to ingest the real time data streams, to push the data to appropriate HDFS or HBase.
  • Undertaken data model design for Data Lake holding more than 400 TB data.
  • Extensively involved in development and implementation of SSIS and SSAS applications.
  • Collaborated with ETL, and DBA teams to analyze and provide solutions to data issues and other challenges while implementing the OLAP model.
  • Implemented Star Schema methodologies in modeling and designing the logical data model into Dimensional Models.
  • Developed data pipeline using Pig and Hive from Teradata, DB2 data sources. These pipelines had customized UDF'S to extend the ETL functionality.
  • Extensively used various stages Like Lookup, Join, Merge, sort, remove duplicate, filter and transformer stages in data stage designer.
  • Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.
  • Worked on Performance Tuning of the database which includes indexes, optimizing SQL Statements.
  • Worked with OLTP to find the daily transactions and type of transactions occurred and the amount of resource used
  • Developed a Conceptual Model and Logical Model using Erwin based on requirements analysis.
  • Created reports analyzing large-scale database utilizing Microsoft Excel Analytics within legacy system.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy oracle and SQL server database systems.

Environment: Hadoop 3.0, HDFS, HBase 1.2, SSIS, SSAS, OLAP, Hortonworks, Data lake, OLTP, ETL, Java, ANSI-SQL, AWS, SDLC, T-SQL, SAS, MySQL, Big Integrate, HDFS, Sqoop 1.4, Cassandra 3.0, MongoDB, Hive 2.3, SQL, PL/SQL, Teradata 15, Oracle 12c, MDM.

Confidential, Lowell, AR

Sr. Data Analyst/Data Engineer

Roles & Responsibilities

  • Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
  • Used Agile Central (Rally) to enter tasks which has the visibility to all the team and Scrum Master.
  • Connected to AWS Redshift through Tableau to extract live data for real time analysis.
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
  • Performed the Data Mapping, Data design (Data Modeling) to integrate the data across the multiple databases in to EDW.
  • Implemented Forward engineering to create tables, views and SQL scripts and mapping documents
  • Captured the data logs from web server into HDFS using Flume for analysis.
  • Involved in PL/SQL code review and modification for the development of new requirements.
  • Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP and OLAP.
  • Worked closely with the ETL SSIS Developers to explain the complex Data Transformation using Logic.
  • Worked with MDM systems team with respect to technical aspects and generating reports.
  • Worked on Data Mining and data validation to ensure the accuracy of the data between the warehouse and source systems.
  • Created logical and physical data models using Erwin and reviewed these models with business team and data architecture team.
  • Used SAS procedures like means, frequency and other statistical calculations for Data validation.
  • Developed and presented Business Intelligence reports and product demos to the team using SSRS (SQL Server Reporting Services).
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS).
  • Performed data analysis and data profiling using complex SQL on various sources systems including Oracle.
  • Worked in importing and cleansing of data from various sources like Teradata, flat files, SQL Server with high volume data.
  • Handled importing of data from various data sources, performed transformations using Map Reduce, loaded data into HDFS and Extracted the data from My SQL into HDFS using Sqoop
  • Integrated various sources in to the Staging area in Data warehouse to Integrating and Cleansing data.
  • Cleansed, extracted and analyzed business data on daily basis and prepared ad-hoc analytical reports using Excel and T-SQL

Environment: Erwin 9.6, Teradata r14, Oracle 11g, SQL, T-SQL, PL/SQL, AWS, Agile, OLAP, OLTP, SSIS, HDFS, SAS, Flume, SSRS, Sqoop, Map Reduce, My SQL, HDFS.

Confidential, SFO, CA

Data Analyst/Data Modeler

Responsibilities:

  • Worked as a Sr. Data Analyst / Data Modeler I was responsible for all data related aspects of a project.
  • Participated in requirement gathering session, JAD sessions with users, Subject Matter experts, and BAs.
  • Reverse Engineered DB2 databases and then forward engineered them to Teradata using E/R Studio.
  • Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
  • Improved performance on SQL queries used indexes for tuning created DDL scripts for database. Created PL/SQL Procedures and Triggers.
  • Extensively used of data transformation tools such as SSIS, Informatica or Data Stage.
  • Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
  • Performed data mining on data using very complex SQL queries and discovered pattern.
  • Designed both 3NF data models for OLTP systems and dimensional data models using star and snowflake Schemas
  • Worked on Normalization and De-Normalization techniques for OLAP systems.
  • Created Tableau scorecards, dashboards using stack bars, bar graphs, geographical maps and Gantt charts.
  • Analyzed the business requirements by dividing them into subject areas and understood the data flow within the organization.
  • Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
  • Used E/R Studio for reverse engineering to connect to existing database and ODS to create graphical representation in the form of Entity Relationships and elicit more information
  • Created mappings using pushdown optimization to achieve good performance in loading data into Netezza.
  • Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects.
  • Wrote T-SQL statements for retrieval of data and Involved in performance tuning of T-SQL queries and Stored Procedures.
  • Created and reviewed the conceptual model for the EDW (Enterprise Data Warehouse) with business user.
  • Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services (SSRS).
  • Created a list of domains in E/R Studio and worked on building up the data dictionary for the company
  • Created a Data Mapping document after each assignment and wrote the transformation rules for each field as applicable
  • Performed data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs.

Environment: E/R Studio V15, Teradata, SQL, PL/SQL, T-SQL, OLTP, SSIS, SSRS, OLAP, Tableau, OLTP, Netezza.

Confidential, Austin, TX

Data Analyst

Responsibilities:

  • Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
  • Extensively worked in Data Analysis by querying in SQL and generating various PL/SQL objects.
  • Analyzed the Business information requirements and examined the OLAP source systems to identify the measures, dimensions and facts required for the reports.
  • Worked extensively on SQL querying using Joins, Alias, Functions, Triggers and Indexes.
  • Performed the Data Accuracy, Data Analysis, Data Quality checks before and after loading the data.
  • Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
  • Worked with business analyst to design weekly reports using combination of Crystal Reports.
  • Built dashboards using SSRS and Tableau for the business teams to take cost effective decisions.
  • Conducted design walk through sessions with Business Intelligence team to ensure that reporting requirements are met for the business.
  • Involved in Troubleshooting, resolving and escalating data related issues and validating data to improve data quality.
  • Extensively used SQL for Data Analysis to understand and document the data behavioral trend.
  • Monitored the Data quality and integrity of data was maintained to ensure effective functioning of department.
  • Used SSIS and T-SQL stored procedures to transfer data from OLTP databases to staging area and finally transfer into data-mart.
  • Used SAS to mine, alter, manage and retrieve data from a variety of sources and perform statistical analysis.
  • Created reports using either Tableau based client needs for dynamic interactions with the data produced.
  • Actively participated in data cleansing and anomaly resolution of the legacy application.
  • Developed and tested PL/SQL scripts and stored procedures designed and written to find specific data.
  • Defined and represented Entities, Attributes and Joins between the entities.
  • Used Excel with VBA scripting to maintain existing and develop new reports as required by the business.
  • Written UNIX shell scripts to automate loading files into database using crontab.
  • Performed ad-hoc analyses, as needed, with the ability to comprehend analysis as needed.

Environment: SQL, PL/SQL, OLAP, OLTP, SSIS, SAS, T-SQL, SSRS, UNIX, Tableau, MS Excel 2010, Business Intelligence.

We'd love your feedback!