Cloud Data Engineer/Data Architect Resume

SUMMARY:

A highly versatile, adaptive, and results - oriented data professional who has extensive experience developing and implementing real-time and batch data pipelines for data lakes, data hubs, data warehouses, Master Data Management (MDM), and analytics solutions for both Cloud and hybrid platforms. Has demonstrated success in executing complex, cross functional, and multi-disciplinary data-related strategic initiatives for large enterprises and mid-size projects. An expert in designing, developing, and implementing both data pipeline and analytics solutions for a variety of business scenarios. Highly proficient in the data models of the most widely used ERPs as source data sets. A veteran of many languages, tools, and methodologies who always maintains a high aptitude of quickly learning and mastering new skills. A creative problem solver, a hard-working, and dedicated team member.
An AWS Big Data certified data engineer with several projects successfully completed building data pipelines and analytics assets in Data Engineering, Data Analytics, and Enterprise Reporting areas
Experienced with Real-time event and continuous data streaming, batch (ETL), and job orchestration based on AWS technology stack, Python, and Spark
Extensive experience in dashboard development using Tableau and PowerBI tools
Experienced with common Python modules like boto3, Pandas, PySpark and development of custom packages
Extensive experience in SQL and database development in Redshift, Snowflake, PostGres, Oracle and MSSQL.
Extensive experience data modeling in 3NF, denormalized, dimensional (Kimbal), data vault (Lindstedt), JSON data modeling, etc.
Working experience with SDLC waterfall and Agile development practices (Scrum, Kanban)

PROFESSIONAL EXPERIENCE:

Confidential

Cloud Data Engineer/Data Architect

Responsibilities:

Built over 30 data streams using Amazon Kinesis, SNS, SQS, API Gateway, and Lambda to ingest and process various National Assessment of al Progress (NAEP) student assessment data with the largest datasets totaling over 900 million rows in an AWS-based data lake platform (S3 and Lake Formation) for 50+ data scientists. Parsed, converted, transformed, cleansed, secured (anonymized) and processed CSV, JSON, XML and Parquet file formats and used the Glue Data Catalog (Hive Metastore) and Athena to prepare data for testing and user consumption. Worked extensively with Python (and 3rd party libraries), Lambda, Visual Studio Code, Jupyter notebooks, and Step Functions to develop metadata driven pipelines. Developed IAM policies and S3 bucket policies, analyzed CloudWatch logs, helped author CloudFormation templates in YAML, and used GitLab to store and manage code versions.
Implemented Databricks clusters in AWS towards the last three months of the project in preparation for conversion of the NAEP data lake into the Delta Lake. Developed a number of real-time ingestion pipelines integrated with API Gateway and Kinesis Data Streams in the Bronze layer replicating the behavior and patterns that existed in the legacy data lake platform.

Confidential

Cloud Data Engineer/Data Architect

Responsibilities:

During a 6 week POC, designed; developed; and implemented 32 PySpark-based jobs in AWS Glue aimed at deduping, loading and transforming online/digital slot machine events and transaction log data for Monopoly games with tables averaging daily loads of 150 million rows. SciPlay’s Petabyte-scale data warehouse consist of tables exceeding over 20 billion rows. AWS Glue Jobs were tuned to run in parallel in distributed fashion using hundreds of AWS’ high powered compute resources.
Quickly sifting through POC results and into Production platform build starting Dec 2020, designed; converted; and developed the ingestion routines for SciPlay’s Monopoly, Jackpot Party, Big Fish, and Quick Hit slot machine games using Databrick’s delta lake architecture and Snowflake at the target data warehouse. Implemented Databricks PySpark-based notebooks to actively de-duped and persist Petabyte-size Parquet files in the Snowflake staging schema. Developed staging, denormalized and dimensional models and structures in Snowflake. Populated and performed transformations to target Snowflake structures using Python-based Databricks notebooks and orchestrated Databricks jobs using Astronomer DAGs and leveraging Python, Spark, Bash, Databricks, and S3 operators.

Confidential, Chicago, IL

Cloud Data Engineer/Data Architect

Responsibilities:

Designed, developed, and implemented 80+ ELT routines in AWS Glue using PySpark/Python shell with the aid of 3rd party libraries like boto3, Pandas and NumPy, to move Sales, Finance, and Executive Recruitment data from sources to an S3 data lake and a Redshift data warehouse. Developed custom Python packages designed to parse, format, aggregate, enrich, and load large datasets using of Sagemaker interactive notebooks. Designed and implemented over 120 Redshift compliant dimensional (star schema) and denormalized table structures.

Confidential, Wilmington, NC

Cloud Data Engineer/Data Architect

Responsibilities:

Designed, developed, and implemented ELT routines for SBA Loan data warehouse using PySpark scripts in AWS Glue (for 20+ dimensions and 9 facts) and orchestration workflows in Step Functions. Implemented Python scripts consisting of advanced transformation logic like lookups, deduplication, data type conversion, nested levels of joins and anti-joins, and relationalizing (un-nesting) JSON structures. Developed custom modules with s to common libs like Pandas, Numpy, PySpark, pycopg2, boto3, etc.
As member of a six-person development team, participated in formal Sprint Jira-based ceremonies comprised of regimented code review and merges and CI/CD deployment processes using GitLab, Docker, VS Code, Terraform IaC.
Designed, developed, and implemented a $1.2 million Snowflake data warehouse in AWS for Valent’s Finance, Procurement, Operations, and Supply Chain data analytics. Developed ingestion routines, configurations, and scripts to populate target data platform consisting of Amazon S3 data lake, Glue data catalog and over 60 batch data processing PySpark jobs, Step Functions for Orchestration, and PowerBI Pro for analytics and data consumption. Implemented custom Python 3.x custom modules in conjunction with libraries like boto3, Pandas, SQLAlchemy, NumPy, 3rd JDBC connectors, etc. Also, co-developed over 40 PowerBI reports and visualizations.

Confidential

Cloud Data Engineer/Data Architect

Responsibilities:

Implemented Crown’s hybrid platform (on premise and Cloud) Master Data Management (MDM) system. The on-premise plaform served as the ‘master’ instance which integrated with 16 different on-premise systems through Apache Kafka (real-time data streams), REST APIs, and Oracle Data Integrator for batch processing. This Lambda architecture solution provided an online view of Crown’s critical domain areas like accounts, customers, assets, and opportunities and dispatched them to AWS S3 (Crown’s data lake) using AWS Database Migration System (DMS). The MDM data in S3 was merged and transformed with Crown’s IoT tower, small cell, and fiber optic geospatial data to obtain a full 360 degree view of customers and accounts service levels and coverage. Used Lambda with Amazon IoT to persist unstructured data to S3 (EMRFS) raw data buckets. A transient EMR cluster was used to transform unstructured data in HBase nightly using Spark (PySpark) data pipelines. IoT semi-structured data was then copied to Redshift and merged with MDM data for canned analytics and advanced analyses by data scientists.
Designed and developed a custom, on-premise data warehouse using Lambda architecture with Kafka real-time data streams (Python) combined with Oracle Data Integrator 12C batch routines for General Ledger, Sales Order, and Accounts Receivable transactions.
Provided ETL/data integration custom architecture, best practice design and support to CACC’s team of internal and contract developers using batchl and real-time interfaces with ODI 11g & REST APIs, Oracle PL/SQL, and flat files. The scope of the CACC project includes the realignment of new ERP sources and the code refactoring of over 150 interfaces with Workday, Oracle EBS R12, OFSLL, and other custom built applications. Employed Inmon and Kimball’s data modeling as well as data vault techniques for CACC’s custom multi-sourced data warehouse.
Designed, developed and implemented ODI 11G topology connections (data, physical server, and logical architecture settings), packages, and mappings/interfaces using the "OdiInvokeRESTfulService" and OAUTH2 authentication. This real-time solution supported two of CACC's Workday HCM use cases: Obtain new employee profile and network login information to support the onboarding and termination processes and obtain employee timesheet information for manager payroll approval process. The ODI RESTful service outputted to JSON formatted files. Afterwards, the JSON driver was used to parse data; CRUD operations were performed in ODI mappings/interfaces to the target data mart tables in Oracle.
Implemented Dealer Financials data warehouse pilot using AWS Redshift and Glue (Python with Pandas and PySpark libs) with each transactional table totaling 500 million rows. This AWS Redshift and Glue instances were converted from an existing, custom Oracle 12c database and ODI mappings using AWS SCT, DMS and CloudFormation tools.
Remediated defective Tableau 10 worksheets and visualizations of HR dashboards caused by data warehouse model conversions from EBS HRMS to Workday data sources.

Confidential, Chicago, IL

Data Architect

Responsibilities:

Successfully optimized OBIA ODI ETL custom and “out-of-the-box” mappings that were running on an incremental load from 42 down to 14 hours. Optimizations included refactored and rewritten code efforts of ODI mappings, SQL tuning, and ODI configuration changes. The scope of the optimizations spanned across Motorola’s nine (9) OBIA offerings to include Financials, OM/SCM and Inventory Mgmt, Sales, Demand Forecasting, and Contract Mgmt.
Successfully converted OBIA Financials, Procurement, and SCM instances in Oracle 12c DB to AWS Redshift using AWS SCT, DMS and CloudFormation tools. Continued to use ODI 11g as the ETL/ELT tool.
Responsible for the full conversion of legacy ERP systems data of acquired IMI entities into the JD Edwards Enterprise One 9.1 data model for all static and dynamic data sets. The scope of the data conversion involves four (4) legacy ERP sites in Kentucky, Colorado, and Mexico. Data conversion routines are performed in staged batches using Informatica 9.6.1 and Informatica Cloud Services (ICS) with real-time REST API (JSON) interfaces.
Worked with business analysts within different functional areas, formally defined and validated source (legacy ERP systems) to target (JDE) mappings via interop tables, batch processing rules, required fields, and data formats for more than 56 static and dynamic JDE tables. Completed batch processing design using staging, consolidation (business logic), cleansing (standardization, exception handling and error reporting), and loading mappings.

Confidential

Data Engineer

Responsibilities:

As a data integrator and ETL developer, designed and implemented customer loyalty data models into Pei Wei’s Big Data and enterprise master data management system for use with its data science and analytics projects. Developed batch extraction and table loadding routines for large customer survey and marketing datasets using Python’s Pandas and Numpy libs.
Converted/implemented customer loyalty and marketing data warehouse similar to Confidential ’s (affiliation) using PowerBI, SSIS/SSDT, SSAS/AAS, Data Factory (batch pipelines) in both on-premise and Microsoft Azure Cloud implementations.
Implemented, supported, and provide expert guidance on Oracle BI Applications 11g with ODI 11g for PeopleSoft 9.1 Financials and Procurement and Spend data sources. Responsible for stabilizing ETL platforms, reduction of excessive ETL run times, report and dashboard optimization, implementation of formal development and migration practices, and development and support of ODI and OBIA customizations. Improved multiple, excessively slow performing OBIEE reports that ran for over 30 mins to less than 10 seconds.
On March 2017, converted over a 100 custom interfaces from OBIA 11.1.1.7 to OBIA 11.1.1.10.2 with PeopleSoft FINSCM 9.2 as the source. Converted and migrated the entire OBIA platform to 11.1.1.10.2 to three upgraded environments.
Developed and implemented over 50 tables, charts, and visualizations in Tableau 9 using OBIA Financials and Procurement and Spend data sources.

Confidential

MDM Architect

Responsibilities:

Completed a formal, written assessment of Confidential ’s current state, custom-built BI and data warehouse leading to a realignment of initiatives, technologies, and business units.
Documented all functional and technical interfaces of Confidential ’s enterprise systems, data integration, and lineage in preparation for its upcoming MDM project.
Designed, implemented and supported custom data warehousing and analytics for PFC’s check data, customer loyalty and marketing programs using the Microsoft Azure stack of PowerBI, SSIS/SSDT, SSAS/AAS and ADF. Developed SSAS cubes from both existing ODS and new data models on restaurant check data and financial activities.

Confidential

Data Integration Architect

Responsibilities:

Select Comfort Corporation, Minneapolis, MN (Retail and Manufacturing)
Responsible for functional design, technical architecture, and solution delivery of financial, accounting, supply chain, and order management workstreams for the enterprise data warehouse and business intelligence deliverables of Select Comfort’s ERP and CRM implementation using the platform stack of OBIEE 11g, Oracle Data Integrator (ODI) 11g, and OBIA 11.1.1.7.1. Data sources include Oracle eBusiness Suite and Siebel CRM, while target is OBI data warehouse. Led 3 onsite and 4 offshore developers, DBA, and infrastructure resources.
Implemented automated controls across Informatica, the DAC, and OBIEE to ensure OBIA Financial Analytics is synchronized with JD Edwards Enterprise One.
Developed custom Informatica mappings for Procurement and Spend using Universal Adapters augmenting data source adapters to JD Edwards Enterprise One. Changes include the DAC, the OBIEE Repository and the OBIEE catalog (dashboards and reports).

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship