We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

4.00/5 (Submit Your Rating)

Lowell, AR

SUMMARY

  • Over 7+ years of experience as Big Data Engineer /Data Engineer including designing, developing and implementation of data models for enterprise - level applications and systems.
  • Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
  • Extensive experience in using ER modeling tools such as Erwin and ER/Studio.
  • Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.
  • Experience in transferring the data using Informatica tool from AWS S3 to AWS Redshift
  • Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
  • Managed ELDM Logical and Physical Data Models in ER Studio Repository based on the different subject area requests for integrated model.
  • Expertise in moving structured schema data between Pig and Hive using HCatalog.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Solid knowledge of Data Marts, Operational Data Store (ODS),OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
  • Good understanding and exposure to Python programming.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
  • Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
  • Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Solid understanding of architecture, working of Hadoop framework involving Hadoop Distribute File System and its eco-system components MapReduce, Pig, Hive, HBase, Flume, Sqoop, and Oozie.
  • Experience in building highly reliable, scalable Big data solutions on Hadoop distributions Cloudera, Horton works, AWS EMR.
  • Good Experience on importing and exporting the data from HDFS and Hive into Relational Database Systems like MySQL and vice versa using Sqoop.
  • Good knowledge on NoSQL Databases including HBase, MongoDB, MapR-DB.
  • Strong experience and knowledge of NoSQL databases such as MongoDB and Cassandra.
  • Familiar with Amazon Web Services along with provisioning and maintaining AWS resources such as EMR, S3 buckets, EC2instances, RDS and others.
  • Expertise in Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Centre.
  • Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
  • Good experienced in Data Analysis as a Proficient in gathering business requirements and handling requirements management.
  • Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.
  • Experience with RDBMS like SQL Server, MySQL, Oracle and data warehouses like Teradata and Netezza.
  • Proficient knowledge and hands on experience in writing shell scripts in Linux.

TECHNICAL SKILLS

Data Modeling Tools: Erwin R9.7/9.6, ER Studio V17

Big Data & Hadoop Ecosystem: MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

OLAP Tools: Tableau 7, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Reporting Tools: SSRS, Power BI, Tableau, SSAS, MS-Excel, SAS BI Platform.

Cloud Platforms: AWS, EC2, EC3, Redshift & MS Azure, Azure Kubernetes

Operating Systems: Microsoft Windows Vista7/8 and 10, UNIX, and Linux.

Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.

PROFESSIONAL EXPERIENCE

Confidential - Lowell, AR

Sr. Big Data Engineer

Responsibilities:

  • As a Sr. Big Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
  • Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Azure Kubernetes Service.
  • Implemented a proof of concept deploying this product in Azure .
  • Worked in Azure environment for development and deployment of Custom Hadoop Applications.
  • Designed and implemented scalable Cloud Data and Analytical architecture solutions for various public and private cloud platforms using Azure.
  • Responsible for the planning and execution of big data analytics and predictive analytics.
  • Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
  • Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
  • Used SDLC (System Development Life Cycle) methodologies like RUP and Agile methodology.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Designed and developed software applications, testing, and building automation tools.
  • Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
  • Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
  • Conducted performance tuning of Hadoop clusters while monitoring and managing Hadoop cluster job performance, capacity forecasting, and security.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Worked on implementation and maintenance of Cloudera Hadoop cluster.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Developed Oozie workflow jobs to execute hive and Sqoop actions.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi.
  • Used windows Azure SQL reporting services to create reports with tables, charts and maps.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate Hadoop jobs.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
  • Developed complete end to end Big-data processing in Hadoop eco-system.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Developed customized classes for serialization and De-serialization in Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: Hadoop 3.0, HBase, Hive 2.3, HDFS, Oozie 5.1, Sqoop 1.4, HDFS, Data Pipeline, Nifi, Azure, Azure, Kubernetes, Azure Sql, Spark 2.4

Confidential - Rosemont, IL

Sr. Data Engineer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, Shell Scripting, Hive.
  • Used Agile (SCRUM) methodologies for Software Development.
  • Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Architected, Designed and Developed Business applications and Data marts for reporting.
  • Developed Big Data solutions focused on pattern matching and predictive modeling.
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Developed Python scripts to automate and provide Control flow to Pig scripts.
  • Developed reconciliation process to make sure elastic search index document count match to source records
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Implemented the Big Data solution using Hadoop, hive to pull/load the data into the HDFS system.
  • Worked with Oozie workflow engine to schedule time based jobs to perform multiple actions.
  • Developed incremental and complete load Python processes to ingest data into Elastic Search from oracle database
  • Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Experienced in AWS cloud environment and on S3 storage and EC2 instances
  • Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
  • Developed Rest services to write data into Elastic Search index using Python Flask specifications
  • Developed complete end to end Big-data processing in Hadoop eco system.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.

Environment: Hive 2.3, HDFS, Yarn, HBase, PL/SQL, Mongo DB, Pig 0.16, Sqoop 1.2, Oozie 4.3

Confidential - St. Louis, MO

Sr. Data Analyst/Engineer

Responsibilities:

  • Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
  • Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams
  • Involved in Manipulating, cleansing &processing data using Excel, Access and SQL and responsible for loading, extracting and validation of client data.
  • Used Python programs for data manipulation, automation process of generating reports of multiple data sources or dashboards
  • Coordinated with Data Architects on AWS provisioning EC2 Infrastructure and deploying applications in Elastic load balancing.
  • Developed Python programs and batch scripts on windows for automation of ETL processes to AWS Redshift.
  • Managed the Metadata associated with the ETL processes used to populate the Data Warehouse.
  • Created sheet selector to accommodate multiple chart types (Pie, Bar, Line etc) in a single dashboard by using parameters.
  • Performed Reverse Engineering of the current application using Erwin, and developed Logical and Physical data models for Central Model consolidation.
  • Translated logical data models into physical database models, generated DDLs for DBAs
  • Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
  • Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
  • Developed and maintain sales reporting using in MS Excel queries, SQL in Teradata, and MS Access.
  • Involved in writing T-SQL working on SSIS, SSRS, SSAS, Data Cleansing, Data Scrubbing and Data Migration.
  • Redefined many attributes and relationships in the reverse engineered model and cleansed unwanted tables/columns as part of Data Analysis responsibilities.
  • Published Workbooks by creating user filters so that only appropriate teams can view it.
  • Worked on SAS Visual Analytics & SAS Web Report Studio for data presentation and reporting.
  • Extensively used SAS/Macros to parameterize the reports so that the user could choose the summary and sub-setting variables to be used from the web application.
  • Created Teradata External loader connections such as Mload, Upsert, Update, and Fastload while loading data into the target tables in Teradata Database.
  • Resolved the data related issues such as: assessing data quality, testing dashboards, evaluating existing data sources.
  • Created DDL scripts for implementing Data Modeling changes, reviewed SQL queries and involved in Database Design and implementing RDBMS specific features.
  • Created data mapping documents mapping Logical Data Elements to Physical Data Elements and Source Data Elements to Destination Data Elements.
  • Written SQL Scripts and PL/SQL Scripts to extract data from Database to meet business requirements and for Testing Purposes.
  • Written complex SQL queries for validating the data against different kinds of reports generated by Business ObjectsXIR2
  • Performed GAP analysis of current state to desired state and document requirements to control the gaps identified.
  • Developed the batch program in PL/SQL for the OLTP processing and used Unix Shell scripts to run in corn tab.

Environment: Erwin 9.0, PL/SQL, Business Objects XIR2, Informatica 8.6, Teradata R13, Teradata SQL Assistant 12.0, PL/SQL, Flat Files

Confidential - Houston, TX

Sr. Data Analyst

Responsibilities:

  • Designed and implemented business intelligence to support sales and operations functions to increase customer satisfaction.
  • Involved in data analysis, data discrepancy reduction in the source and target schemas.
  • Developed complex PL/SQL procedures and packages using views and SQL joins.
  • Developed of reports using different SSIS Functionalities like sort prompts and cascading parameters, Multi Value Parameters.
  • Conducted detailed analysis of the data issue, mapping data from source to target, design and data cleansing on the Data Warehouse
  • Involved in identifying the Data requirements and creating Data Dictionary for the functionalities
  • Analyzed and build proof of concepts to convert SAS reports into tableau or use SAS dataset in Tableau.
  • Created or modifying the T-SQL queries as per the business requirements.
  • Developed and optimized stored procedures for use as a data window source for complex reporting purpose.
  • Performed the batch processing of data, designed the SQL scripts, control files, batch file for data loading.
  • Coordinated with data stewards / data owners to discuss the source data quality issues and resolving the issues based on the findings.
  • Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
  • Worked on SQL Server concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services).
  • Developed database objects including tables, Indexes, views, sequences, packages, triggers and procedures to troubleshoot any database problems
  • Worked on different data formats such as Flat files, SQL files, Databases, XML schema, CSV files
  • Involved in designing Parameterized Reports for generating ad-hoc reports as per the business requirements

Environment: SAS, Tableau 8.1, Ad-hoc, SQL, T-SQL, Flat Files, SSIS Vs 2013, SSRS, SSAS, SML, Business Intelligence.

Confidential

Data Analyst

Responsibilities:

  • Worked extensively along with business analysis team, scrum masters in gathering requirements and understanding the workflows of the organization
  • Involved in Data mapping specifications to create and execute detailed system test plans. The data mapping specifies what data will be extracted from an internal data warehouse, transformed and sent to an external entity.
  • Analyzed business requirements, system requirements, data mapping requirement specifications, and responsible for documenting functional requirements and supplementary requirements in Quality Center.
  • Wrote and executed unit, system, integration and UAT scripts in a data warehouse projects.
  • Wrote and executed SQL queries to verify that data has been moved from transactional system to DSS, Data warehouse, data mart reporting system in accordance with requirements.
  • Created the test environment for Staging area, loading the Staging area with data from multiple sources.
  • Worked on data profiling and data validation to ensure the accuracy of the data between the warehouse and source systems.
  • Monitored the Data quality of the daily processes and ensure integrity of data was maintained to ensure effective functioning of the departments.
  • Developed data mapping documents for integration into a central model and depicting data flow across systems & maintain all files into electronic filing system.
  • Worked and extracted data from various database sources like DB2, CSV, XML and Flat files into the Data Stage.
  • Used and supported database applications and tools for extraction, transformation and analysis of raw data
  • Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and DB2.
  • Wrote SQL scripts to test the mappings and Developed Traceability Matrix of Business Requirements mapped to Test Scripts to ensure any Change Control in requirements leads to test case update.
  • Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
  • Delivered file in various file formatting system (ex. Excel file, Tab delimited text, Coma separated text, Pipe delimited text etc.)
  • Performed ad hoc analyses, as needed, with the ability to comprehend analysis as needed

Environment: Oracle 9i, SQL, DB2, XML, ad hoc, Excel 2008, data validation

We'd love your feedback!