We provide IT Staff Augmentation Services!

Sr. Data Scientist Resume

5.00/5 (Submit Your Rating)

SUMMARY:

  • Extensive research in Logistic Regression, Credit Scoring and Machine Learning. Published 7 peer - reviewed journal papers in Credit Scoring (Information Value, Reject Inference, KS, Mutual Information, AUC, Weight of Evidence, Maximum Likelihood Estimates for Weighted Logistic Regression).
  • Seasoned Data Scientist specializing in Data Mining, Machine Learning, Financial Fraud detection, Risk Management, Collection, Consumer Loan default prediction, Credit Scoring.
  • Extensive programming experience in SAS, R, Python, C# and C/C++ (UNIX/LINUX).
  • Five academic degrees including a Ph.D. in Math and a M.S. in Computer Science.
  • 10+ years working experience in Software Engineering.
  • 20+ years of long distance running.

COMPUTER EXPERTISE:

Languages: SAS, R, Python, C#, C/C++, VC++, JAVA, PERL, VISUAL BASIC

Software Processes: Waterfall, Spiral, Rapid, Agile, Cloud, Visualization

Operating Systems: UNIX (SunOS and HP), WINDOWS 10/7/VISTA/XP/2000/NT/98, LINUX

Networks/Protocols: SIP, SNMP, TMN, IEEE 802.11 - 15, RTP, GRE, TCP/IP, CDMA, GSM, OBSG.711/G.729, H.263, H.323, H.248, PTS (R1/R2), CCS7/SS7

Applications: SAS Enterprise Miner, Hadoop, .NET 2.0, ASP.NET 2.0, Visual Studio 2005, Visual Studio 2008, XML, UML, SQL, XML, SQLXML, MATLAB, MathCAD,

WORKING EXPERIENCE:

Sr. Data Scientist

Confidential

Responsibilities:

  • Developed Risk, Fraud, Collection and Direct Mail Response models using SAS (SAS Enterprise Guide 5.1/7.1). Implemented a two-dimensional score cutoff strategy for Direct Mail campaigns.
  •  Data Mining: Analyzed and validated raw data from different vendors including Experian, Call Credit BSB/TAC, Clarity and TransUnion, identified and treated special values, derived new variables, and decided Bad Definition through Roll Rate Analysis and Vintage Analysis.
  • Modeling: Automated variable deduction through SAS macros. Wrote SAS macros for logit plots, missing value replacement, special value treatment, flooring and capping, lift table and KS.
  • Scoring: Converted raw data into scores in Excel for the IT department to implement the model. Automated the conversion through a VB script.
  • Led development of company’s first credit scoring model (Gradient Boosting, Decision Trees and Random Forest) with machine learning in R. Wrote R code to calculate Mutual information, KS and AUC, and R tool to automatically tune parameters with Gradient Boosting

Sr. Data Scientist/Team Leader

Confidential

Responsibilities:

  • Lead a team to develop roadmap and frame work of model governance per Regulatory requirements from SR11-7 and CCAR and do post-implementation review of models (Hadoop/Hive, R, SAS).
  • Write SAS macros to perform stability analysis for scores and variables, and conduct performance analysis in terms of KS, ROC/AUC, Gini, Lift Table and scores’ rank order.
  •  Present model governance results to Credit Committee Review in a monthly basis.
  • Develop Loss Given Default (LGD) model using Linear Regression and Fractional Regression to predict loss for our defaulted loans.
  • Train the team in R, SAS, Machine Learning and Model Governance.

Dallas, Texas

Sr. SAS Consultant 

Confidential

Responsibilities:

  • Provided Medical claim reports for Medicare Part A and Part B using Base SAS 9.2. Responsible for routine (weekly or monthly) and ad hoc reporting for internal and external customers. Experience in large tables with millions of rows and hundreds of columns in large data warehouse of thousands of tables.
  • Calculate Part A and Part B weekly workload counts (beginning, receiving, processed claims and ending claims) for Prepay, Reopening and Routine claims from daily data records. Automate weekdays by SAS function today() and distinguish holiday and non-holiday cases.
  • Run monthly Comprehensive Error Rate Testing (CERT) and Error Rate reporting for Part A and Part B. Summarize results by projected dollars paid in error, projected dollars paid, and projected error rate. 

Dallas, Texas 

Sr. Statistician

Confidential

Responsibilities:

  • Predicted the risk of death within 30 days discharges and calculate the mortality score for heart failure patients on the basis of age and the worst value obtained within the first 24 hours of hospital presentation and vital sign variables: albumin, total bilirubin, creatine kinase, creatinine, sodium, blood urea nitrogen, partial pressure of carbon dioxide, white blood cell count, troponin-I, glucose, internationalized normalized ratio, brain natriuretic peptide, pH, temperature, pulse, diastolic blood pressure, and systolic blood pressure.
  • Predicted readmission within 30 days of discharges for heart failure patients by Logistics Regression. Dependent variables include mortality score, age, gender, race, payment method, history of depression, history of drug use, history of leaving against medical advice, history of missed clinic visit, number of prior impatient admissions, and number of prior emergency visits.
  • Provided monthly listing of discharges, readmissions, admission rate, mortality rate, time to follow up visit, DRGs under each department and each division of the hospital (Proc Report).
  • Provided monthly listing of top 25 DRGs per charges in fiscal year 2011 (Proc Summary, Proc Sort, Proc Merge).
  • Reported monthly frequencies for Order Set 1654 (ICU Therapeutic Hypothermia Following Cardiac Arres) and 2510 (Theraupeutic Hypothermia following Cardiac Arrest).

Dallas, Texas

Sr. Software Engineer

Confidential

Responsibilities:

  • Enhanced, maintained and sustained ETALK’s Qfiniti software for call recordings (Visual Studio 2005 , .NET, SQL, Stored Procedures, VC++, C#, XML).

Richardson, Texas

Sr. Software Engineer

Confidential

Responsibilities:

  • Designed, developed, coded, tested and maintained Nortel’s wireless CDMA and GSM products (UNIX/Linux, C/C++, Protel II, PERL/CGI, VB).

Plano, Texas

Senior Software Engineer

Confidential

Responsibilities:

  • Developed and sustained EMX 2500/5000 Call Processing software in the areas of EMX 2500 CDMA 2000 and SS7/ISUP (C/C++, Assembly and ClearCase under UNIX).

Adjunct Professor

Confidential

Responsibilities:

  • Taught graduate core course “Performance Evaluation of Computer Networks”, which attracted 60 graduate students in average. Assigned and directed projects to simulate various statistical models and calculate their performance measures in C++ and JAVA.
  • Taught undergraduate core courses “Discrete Math I” and “Discrete Math II”.
  • Conducted academic research on performance evaluations of computer networks.

We'd love your feedback!