We provide IT Staff Augmentation Services!

Big Data/data Scientist Consultant Resume

0/5 (Submit Your Rating)

SUMMARY:

  • Confidential is a highly skilled Data Scientist with over 10 years of experience in Database Marketing, Data Warehouse (TB plus) and Statistical model development a variety of industries including retail, CPG, financial services and healthcare. He specializes in Machine Learning algorithm development and ETL tool development where he uses languages such as Java J2EE, Python, PERL, Shell, and C++.
  • Statistical Data Mining techniques: Data Exploration, Significance Testing, Regression (Multinomial/LOGIT), Clustering (K - Means), Segmentation (C4.5, CHAID). Other Marketing Mix models such as Market Basket Analysis (Association Rules), Demand / Revenue Forecasting
  • Statistical Data Mining software - SAS, KXen, SPSS, SAS Enterprise Miner, SAS EG
  • Custom Algorithm development and maintenance
  • Recommender Systems and Collaborative Filtering algorithms
  • Big Data - Hortonworks/Clouderabased multi-node cluster setup on RHEL/Debian Linux. Hadoopecosystem Hive/SQOOP/OOZIE, Apache Spark with Java/MLlib for Machine Learning application development
  • Data Warehousing - DB2, Oracle, SQL Server, Teradata and Informatica
  • Operating Systems: Debian/RHEL, Mac OSX,Windows Server
  • Java J2EE Server - Multi-threading, concurrent data structures, connection pooling, web sockets

TECHNICAL SKILLS:

Java build tools: - ANT, Maven, SBT

Source control: - GitHub

Web Containers: Apache, Tomcat, Glassfish

Programming Languages: Database specific: DB2/Oracle/SQL Server Stored Procedures, XML

Procedural: C/C++, Visual Basic, PERL, Shell (KORN, BASH) and VBA (Visual Basic) for Applications, JavaScript, VBScript

Compilers: VC++, Borland C++, GCC/G++

PROFESSIONAL EXPERIENCE:

Confidential

Big Data/Data Scientist Consultant

Responsibilities:

  • Leverage technology, employ pseudo automated quantitative methods with Machine Learning to optimize digital media spend and generate revenue lift through targeted online campaigns
  • ETL and distributed data storage: ETL and Data Storage: Designed and implemented a large scale (several Terabytes) data warehouse on Hortonworks HDFS multi-node cluster with full suite of Hadoop ecosystem - Yarn, Map-Reduce, Tez, Oozie, Sqoop and Hive
  • Machine Learning: Primary objective is media spend and revenue optimization. In the process of developing/deploying self-calibrating models with Apache Spark - Scala and Java on Mahout/MLlib/Weka based Machine Learning

Tools: Hadoop HDFS based mufti node cluster, Hive, Oozie, Apache Spark/Scala, Maven and SBT build tools

Confidential

Responsibilities:

  • Developed Stochastic demand forecasting models using Monte Carlo Simulation to facilitate decision making for new product launches. Incorporated sensitivity analysis into the forecasting process to determine key drivers of demand and the Net Present Value (NPV) of the expected revenue over the entire demand cycle.
  • Developed Logistic Regression based Machine Learning models to predict returns and determine which model inputs Confidential explain the returns phenomenon. The generated insights from this project will be used in fine-tuning selling operations to minimize returns.
  • Formulated success criteria for pre and post program analysis with matched-pairs methodology to assess revenue lifts by comparing spend patterns of “similar” groups of customers in various time-periods.

Tools: Base SAS plus SAS Enterprise Miner on Solaris/SunOS

Confidential

Regional Client Director

Responsibilities:

  • Aviana, a partner of IBM, is a small consulting company specializing in Advanced Analytics consulting and Business Intelligence applications development
  • Aviana at Boeing: Optimize the production operations for the F-15 fighter jet program
  • The primary objective is to identify risk factors in the supply/demand chain that would prevent on-time delivery of orders using SPSS Modeler
  • In a continuous test-learn-calibrate cycle, build various time-based states of the project plan, optimize and measure success rate
  • Techniques used for model building include Regression, Decision Trees (C5.0 CHAID), and Neural Networks

Data Scientist Consultant

Confidential

Responsibilities:

  • For commercial accounts, develop ground up and residual pricing models to determine optimal premium based on account risks
  • Develop claims propensity models to proactively identify accounts with high probability of a significant loss
  • Assist actuaries with technical price development using data driven insights resulting from analytics
  • Modeled loss ratios with Weka decision tree (Reptree and M5P algorithms) machine learning systems

Tools: RapidMiner studio with Weka Machine Learning libraries and Radoop plug-in for RapidMiner for interacting with data on a Hadoop cluster

Confidential

Responsibilities:

  • Which cohorts of members are at risk of dis-enrolling (aka churn) from the plans?
  • Factors leading to dis-enrolment and turn them into actionable insights - the “why” part
  • Survival and Tree models in R using survival and rpart packages
  • Data retrieval from Hadoop cluster with RMR based MapReduce jobs

Tools: Hadoop/HDFS (version: 2.5.0) on Linux x86 64 (version 6.5) - 8 node cluster, Hue/Hive (version: 0.120-cdh5.1.2), R (version: 3.1.1) and R Studio (version: 0.98.1062) on 8 node Hadoop cluster

Confidential

Responsibilities:

  • Optimized online user experience with data driven insights delivered through mining semi-structured text data generated by various channels - online, in-branch, phone, mobile etc.
  • New customer acquisition - use analytics on semi-structured textual data to look for opportunities for improving conversion rates
  • Latent Dirichlet Allocation (LDA) and Support Vector Machines (SVM) based Machine Learning Topic model development in R/RHive/TM packages and Python with NumPy/Scikit-Learn
  • Data retrieval from Hadoop cluster with Hive based MapReduce jobs
  • Developed master shell script to automate on-going data processing and model calibration - staged on an edge-node of the Hadoop cluster

Tools: Hadoop/HDFS (version: 2.5.0) on Linux x86 64 (version 6.5), Hive (version: 0.120-cdh5.1.2), R (version: 3.1.1) and R Studio (version: 0.98.1062), Python (version: 2.7.8) and Anaconda (version: 2.1.0)

Confidential

Analytics Consultant/Statistician

Responsibilities:

  • Developed Maximum Likelihood Estimators computed from Probability Mass Functions/Probability Density Functions to augment fee estimates
  • Developed dynamic causal Regression based Linear models to communicate key dependent and independent variable relationships
  • Designed and developed number of Data Warehouses (de-normalized) and ETL scripts with primary objective of mining data

Tools: Base SAS, SAS EG, Teradata, SQL Server, UNIX Shell Scripting, VBA

Confidential

Senior Risk Modeler CCAR

Responsibilities:

  • Develop and stress tested Risk models as part of the Comprehensive Capital Analysis and Review (CCAR) project in accordance with applicable Dodd-Frank standards
  • Submit recommendations to Model Validation and Management team, Federal Reserve and the OCC
  • The process involved assessing delinquency/default risk in various consumer mortgage portfolios stress tested for various scenarios to assist JPMC Portfolio Risk Management team in ascertaining true Risk Weighted valuation of assets and allocating cash reserves required to offset risk.
  • Calculated Probability of Default (PD) based on various test scenarios such as Housing Price Index (HPI) shifts over time and Regression analysis
  • Computed Exposure At Default (EAD) and Loss Given Default (LGD) for each loan in a given portfolio
  • Forecasted asset value based on Risk Weights computed from PD, EAD and LGD into the future quarters
  • Conducted ELTV and Delinquency band QoQ migration analysis

Tools: R, Base SAS, SAS EG, Teradata, SQL Server, UNIX Shell Scripting, VBA

Confidential

Director, Customer Analytics

Responsibilities:

  • Store layout planning
  • Localized assortment and placement of products in stores
  • Enabling pricing and promotion decisions
  • Evaluated available variables in data and compute various descriptive statistics
  • Performed dimensionality reduction as necessary using techniques such as Market Basket and Principal Component Analysis
  • Built and validated models: Identified homogenous customer segments (Trip Types) in data using K-Means Clustering, Decision Trees, Association Rules and Logistics Regression that are well classified and exhibit differentiable behavior. Train and validate models on sample data plus on-going model calibration and application to new data
  • Preferred modeling tools were SAS and KXen

Confidential

Lead Data Architect/Modeler

Responsibilities:

  • Designed and populated data repository in Oracle database with historic load, generation and weather data
  • Developed Regression based load and generation forecasting models fitted to hourly load and generation curves
  • Developed Mixed Integer Programming models to optimize power generation on the grid

Tools: SPSS Modeler, ILOG/CPLEX, Oracle PL/SQL

Confidential

Lead Analyst/Modeler

Responsibilities:

  • Tier qualification: Evaluate and recommend Silver tier qualification for Confidential Buys’ RewardZone loyalty program by applying Action Cluster methodology. This exercise resulted in proposing an invite strategy based on insights deduced by evaluating a customer on a number of key attributes.
  • Marketing mix: Produce a RewardZone targeted offer mailing list as part of a campaign to re-activate customer segments impacted by recent changes in the RZ program. Various customer cohorts were identified using Action Clusters who would likely respond to certain offers.
  • Post Purchase Warranty Renewal Response Model: Who’s likely to respond to a warranty renewal offer? What’s their channel p? How to time the offers to maximize response?
  • Feature Vector development and binning: Used as inputs to Action Clustering and developed from high priority business questions from the clients’ perspective
  • Action Clusters scoring: Applied existing cluster scoring rules from a previously trained segmentation model to new customer data and produce profiling reports that include new feature vectors
  • Developed BTP data warehouse, ETL scripts and renewal response model with LOGIT to predict probability of response to an offer
  • Identified channel ps by leveraging Action Cluster based segmentation. Timing of offer scenarios were constructed from historic solicitation data

Tools: SAS, Teradata, Oracle, SpeakEZ, UNIX/AIX Shell scripts, PERL, C++

Confidential

Data Warehouse Developer/Statistical Modeler

Responsibilities:

  • Designed and deployed the BestBuy.com online customer data warehouse on Unix/Oracle 9i
  • Developed and cronned ETL scripts for data transformations and enforcing business rules using PL/SQL for populating the data warehouse which is now approaching almost a Terabyte in size
  • Performance tuned data warehouse tables using techniques such as data partitions, explain plans, dynamic index construction and query hints
  • Developed SAS programs as middle-tier in UNIX and Windows to construct econometric and classification models of online customer behavior using Logistics Regression and Clustering techniques
  • As a member of QA team tested raw XML/Schema feeds into from the dotcom servers into the data warehouse using Oracle’s native XML parser
  • The propensity to purchase online Logistics Regression model enabled the Personalization Marketing reveal factors that impacted customers’ online shopping experience in a positive or a negative way
  • The serving of personalized/relevant content to the customer resulted in improving the online shopping experience and higher response rates
  • Assisted Confidential Campaign Marketing with launching new campaigns for online customers who showed interest in certain products by adding and removing it from their carts but never purchasing online
  • This was accomplished based on frequency of online visits to BestBuy.com by offering online visitors special promotion of the products they had added or removed from their shopping cart in the past

Confidential

Manager, Decision Sciences

Responsibilities:

  • Designed and executed CRM (Customer Relationship Management) initiatives for clients in financial services, retail and healthcare primarily to:
  • Increase customer retention and reduce churn
  • Optimize marketing budgets with targeted communications thereby reducing channel spend
  • Build customer loyalty and deliver higher ROI on various channel investments
  • Data warehouse design and ETL (Extraction Transformation and Loading) script/batch job development for periodic data loads into data marts on SAS, Oracle and Teradata platforms
  • Data Mining, Information delivery/Reporting and Measurement:
  • Supported marketing campaigns using data mining employing supervised techniques such as Logistics Regression (SAS) and Decision Trees (Salford Systems CART/MARS)
  • Developed Customer Lifetime value models
  • Supported Peppers and Rogers Group (acquired by Carlson) 1to1 CRM (Customer Relationship Management) initiatives such as the development and maintenance of the living Touchmap that describes various stages of customer lifecycles and interactions at different touchpoints (home grown web based application developed using ASP/Dot Net)
  • Developed customer segmentation models (un-supervised K means)
  • Developed automated dashboards and reports of Key Performance Indicators (KPIs’), program measurement and ROI delivered via Flash enabled Web based dynamic dashboards developed on ASP and Dot Net platforms
  • Improved the targeted Marketing Campaign’s response rates and program ROI

Confidential

Data Analyst/Project Lead

Responsibilities:

  • Designed and developed an Energy Consumption application for monitoring, reporting and predicting U of M’s energy consumption on an Oracle 9i database. The data model prototype was designed using MS Access and Visio
  • Used Oracle data as INPUT files for SAS programs and prepared data for test sets, sets and hold out samples for validating prediction models and measuring model goodness thereof
  • Used a combination of home grown algorithms and SAS for developing energy prediction models using Times Series and Quantitative Causal models such as Regression and storing the analyzed data back on Oracle and MS Access database
  • Developed the Client/Server front-end with a compiled VB 6.0 executable, MS Access, MS Excel and deployed some management reports on the Facilities Management’s intranet using ASP 3.0, CSS, JavaScript, ADO, ActiveX, XML, DTD and XSLT
  • Improved the staff efficiency by providing a user interface for data entry and automatic batch loading of data where possible
  • The improved energy consumption reporting and forecasting identified high consumption buildings on campus and facilitated accurate budgeting process especially in the face of budget cuts

Confidential

Data Warehouse Developer / GUI Developer

Responsibilities:

  • Developed a clinical Confidential t visits and a healthcare claims billing 2nd generation Data Warehouses and Data Marts on AIX/Oracle 8i platform using Visio
  • Wrote ETL scripts with PL/SQL, Triggers and SQL Loader to enforce business rules and populate data warehouse from an OLTP system and flat files
  • Read files of various formats such as spreadsheets, flat files, and data from relational databases into SAS programs for analysis and wrote analyzed data back to a database, flat files and other structured files
  • Mined Confidential t and billing data using SAS
  • Constructed causal models such as Logistics and Multiple Regression for predicting trends at various Confidence Intervals
  • Developed ANOVA and Time Series procedures for detecting Correlation and Outliers in data
  • Developed Web based and Client / Server front-end UI to allow users to perform ad-hoc analysis
  • Key findings in billing data helped reduce cost of high ticket clinical procedures and improved clinical staff productivity
  • Significantly improved cash flow and revenues by reducing claim reimbursement period and delinquent accounts

We'd love your feedback!