Big Data Enterprise Architect/Engineer Resume New York - Hire IT People

SUMMARY

10 years of experience in managing all aspects of Data Science, Big Data and Enterprise Data Warehouse (EDW) Implementation: data design, data mining, hypothesis testing, statistical modeling, machine learning algorithms and predictive analytics in consumer behavior, product development and product pricing. Experience in Supervised Learning and Unsupervised Learning.
Experience in Funnel Analysis using Markov Chains with Customer level data to find the best paths in machine learning
Experienced in deploying enterprise data management solutions and Consolidating multiple data Sources to unified single interface for management decision support system consisting of dashboard analytical reporting engine.
4 years of experience in extensively contributing into data science projects using techniques such as Linear Regression, Logistic Regression, time series, pattern detection, A/B testing, nearest neighbor, cluster analysis, sentiment analysis, decision trees, regression analysis, random forest, ensemble models, SVM, PCA, Neural Networks (RNN, CNN) etc,
Data Science Project - Utilizing Big Data, Structured, Unstructured and Semi-Structured Sensor Data to develop an Analytics Solution for Predictive Maintenance and Cost models. Hands On experience on working with SPARK libraries in R Programming.
Automobile shock absorber Sensor Data Time-Series Modeling- Cox Regression, Hazard Proportion Model for Prediction of devices time to support Predictive Maintenance.
BNSF Train (Transportation Analytics). (Train -Estimated Time of Arrival-ETA) Prediction model for Arrival Schedule, track signals and Sensor data is used to build the model.
World’s Leading Retail giant ( Confidential ), Big Data Science advance analytics, build Churn prediction model and Customer Products Cohort Migration - Journey Predictive analytics models. Deploy models within Customer 360 framework.
World’s Leading FMCG, Artificial Intelligence (AI) project for Shipments image processing, and Text Extraction from Image. Solution development, using Python image Text extraction and recognition algorithms. Marketing Mix Modeling (MMM)- Major Retail or Sales Marketing Campaign impact analysis and sales revenue forecasting.
Anomaly and Fraud Detection Banking and Financial Services - Confidential Bank Anti- Money Laundering-AML Use Cases. Implemented ARIMA based forecasting algorithms for discrete manufacturing with optimal R-squared values. Implemented causal analysis for Forecasting Applications.
Create business value through the use of Big Data Analytics and successfully leading Data monetization projects for large US and Canada Banking & Financial Services Institutions.
Creating Data Lake by Data Ingestion and Data Egesting techniques for Hadoop environment, capitalizing Map Reduce for data processing on both Hortonworks and Cloudera flavors.
Used Pig for Data Cleaning, created relations between tables/Joins using Hive after Data Loading/Workflow using Sqoop/Flume and WebHDFS.
Experience on working with No-SQL and Time-Series databases like Mongo/Cassandra, AWS S3, HDFS, Mongo DB, Oracle 11g, IBM DB2, MySql and Influxdata.
Utilizing Ambari and Zookeeper for Operations and Administration of cluster and Scheduling using Oozie.
Worked with Teradata Loadutilitieslike MultiLoad, FastLoad,BTEQ and Fast Export.

TECHNICAL SKILLS

Hadoop(Big Data/HDFS)
Data Vault
Hortonworks/Cloudera (Ambari
Pig
HIVE
Sqoop
Flume
Solr
Storm
Ingestion
Impala
NiFi)
Teradata 12/13.10 &14 (SQL and Utilities like MultiLoad
FastLoad
BTEQ and Fast Export)
DB2
Oracle (PL/SQL) and Vertica
IBM Datastage v8.1/8.5/9.3 (ETL DW/BI)
Mainframe (JCL/SYSINs)
CA Autosys/ Zeke (Schedulers)
Spark with R-Programming (SPARKLYR)
Python (pandas)
Business Intelligence and Reporting Tool
Data Warehouse
Data Modeling (ERWIN)
Middleware Integration on Mulesoft
AWS(Cloud)
Talend and Tableau
Experience working with code repositories and continuous integration (i.e. Git
Jenkins etc.)
Modeling Techniques like Time series pattern detection
A/B testing nearest neighbor decision trees regression analysis random forest ensemble models
SVM
PCA
Neural Networks (RNN
CNN) Tensor Flow

PROFESSIONAL EXPERIENCE

Confidential, New York

Big Data Enterprise Architect/Engineer

Responsibilities:

Played a role of Big Data Architect working directly with Line of Business Owner on Data design and sourcing to be able to leverage Data Lake capabilities to scale applications for future advance analytics built on Hortonworks Big Data platform and IBM Big Integra.
Automating the process of creating and ingesting data on the Data Lake with a single generic framework for all different data source types reducing the maintenance overhead from 10,000+ code streams to 4 code stream.
Guiding the teams on Automating the process of Creating tables on Hive for 10000+ tables
Design and implement data governance strategies over Data lake and suggest best practices for Hadoop Big Integra framework across environments on Dev/QA/prod (like short-circuit setup and dynamic node setup)
Develop a generic code which can be used to enable parallelism at run time and spawn the YARN containers at run time, Refine performance of jobs
Build PySpark queries to reduce QA effort by 65% that help validate data which moved across the data lake.
Develop a reconciliation framework to bring in audit controls for data.

Confidential

Data Scientist & Architect

Responsibilities:

Played a role of Data Scientist & Architect working directly with Business Partner on Data design and sourcing to be able to leverage existing BI capabilities to scale applications for future advance analytics.
Evaluation of machine learning algorithms and data usage for scoring models and classification. Understand business models and select best approaches to improve their performance. Also, analyzing data for trends.
Business Analysis, Requirement Gathering, Functional and Architecture for Credit Risk Rating and Retail Banking.
Achieving basic validations on the source systems and achieving intermediate tasks using Unix Shell scripting and testing of the artifacts and development of Parallel Jobs.
Designing, developing and implementation of Metadata driven framework with Multi-Time zone source integrations, Staging, Homogenization, Error and Audit process, SCD-1, SCD-2, Surrogate keys generations, variable frequency loading by handling Late Arriving Dimensions.
Source system analysis, data analysis, analysis of integration between modules, analysis of business sense of source entities and relationships.

Confidential, Atlanta

Senior Consultant

Responsibilities:

Interview stakeholders, understand the business requirement and scope of the projects to direct Data teams on future ready architecture designs.
Identify repeated data transformations and processes to narrow the process streams like batch processing, message processing and asynchronous real-time data processing and significantly reduce on-boarding of new applications.
Conceptualized and designed the solution framework for the use-case and evolve with KPIs for measuring ROI by doing event data analysis to keep a check on various business metrics which includes the performance of the marketing team.
Work closely with Senior Director and provide leadership on the data aspects of the organisation.
Lead in the development of data-related analytical products and tools.
Process result data using EMR AWS cluster to find out top content.
Overseeing the development of data handling, storage, integration, and presentation.
Experienced in predictive modelling inclusive of linear and non - Linear regression, logistic regression and time series analysis models
Also use using statistical techniques like clustering, regression analysis, hypothesis testing, and multivariate statistical analysis and forecasting methodologies along with econometric techniques.

Confidential, Cincinnati, Ohio

Big Data Architect/Supervisor, Analyst & Designer

Responsibilities:

Implemented Account Based Costing (ABC) techniques and reveal actions to decision makers which can be taken to increase throughput or reduce spending to convert the savings into increased profits.
Overseeing a team of 35 and ensure timely delivery with highest quality of the software. Bea backbone for the baseline team of the bank and identify tasks that can be automated.
Develop / Manage functional architecture design requirements with the vendor if required, suggest data quality inconsistencies in the application architecture and method to improvise.
Create Data Flow Diagrams (DFD) and recommend Data Models suitable for the environment with overall Technology Landscape in scope.
Identified 970 application after Data Lineage for FSLDM migration.
Estimation of Projects (Hadoop and Data warehouse), preparing Request for Proposals (RFPs), SoWs and brainstorm on architectural design recommendations.
Create Requirement documents for bringing data from various sources to Operational and MOSAIC Data Warehouse models. Provision of OLTP and OLAP data models.
Create Logical and Physical data model using ER Studio for operational and data warehouse and get the design reviewed and approved by the Data Architect Manager and then publish and distribute the changed / new model to various stakeholders and teams.
Streamline the design process to provide better design. Introduced more processes to improve the design standards
Created business glossary for enterprise wide users, Logical and Physical model by using ER Studio Data Architect 10 and finally Handover the design to ETL team to start the development.
Interacted with reporting and ETL team to make them understand the design so that all the teams can be up to speed to understand the project.
Create DDL scripts for the change for Teradata DB and work with DBA to create the tables. Validate the results and work for production implementation.

Confidential

Solution Architect/Subject Matter Expert/Business Analyst/Consultant

Responsibilities:

Define, develop and implement all related changes to policies, processes, governance, systems, data and reporting in support of meeting the BCBS 239 Principles. (RDARR - Compliance)
Developed and Managed product development plan, manage delivery to the plan and identify risks in data replication strategies.
Creating Data Lake by Data Ingestion and Data Egesting techniques for Hadoop environment and creating Joins Using Hive after Data Loading using Pig/WebHDFS.
Used columnar data storage format like ORC and Parquet for efficient storage and better performance while processing.
Lead, Develop, Design, Implement and act as SME for end-to-end Support for the Program Creditor Insurance Critical Illness
Decent experience scheduling (Zeke/Autsys/Cron-tab) and trouble-shoot production issues w.r.t all components of IBM Datastage, migration of code across Dev, Test environments.

Confidential, Rhode Island

Software Engineer

Responsibilities:

Design, build and testing of the development of Parallel Jobs in Datastage & Qualitystage.
Achieving basic validations on the source systems and achieving intermediate tasks using Unix Shell scripting.
Designing, developing and implementation of various approaches achieving Metadata driven ETL framework with Multi-Time zone source integrations, Staging, Homogenization, Error and audit process, SCD-1, SCD-2, Surrogate keys generations, variable frequency loading by handling Late Arriving Dimensions.
Decent experience scheduling (Zeke/Autosys) and troubleshoot production issues w.r.t all components of IBM Datastage, migration of code across Dev, Test and Prod environments.

We provide IT Staff Augmentation Services!

Big Data Enterprise Architect/engineer Resume

New, YorK

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship