Sr. Data Scientist Resume
SUMMARY:
- Extensive research in Logistic Regression, Credit Scoring and Machine Learning. Published 7 peer - reviewed journal papers in Credit Scoring (Information Value, Reject Inference, KS, Mutual Information, AUC, Weight of Evidence, Maximum Likelihood Estimates for Weighted Logistic Regression).
- Seasoned Data Scientist specializing in Data Mining, Machine Learning, Financial Fraud detection, Risk Management, Collection, Consumer Loan default prediction, Credit Scoring.
- Extensive programming experience in SAS, R, Python, C# and C/C++ (UNIX/LINUX).
- Five academic degrees including a Ph.D. in Math and a M.S. in Computer Science.
- 10+ years working experience in Software Engineering.
- 20+ years of long distance running.
COMPUTER EXPERTISE:
Languages: SAS, R, Python, C#, C/C++, VC++, JAVA, PERL, VISUAL BASIC
Software Processes: Waterfall, Spiral, Rapid, Agile, Cloud, Visualization
Operating Systems: UNIX (SunOS and HP), WINDOWS 10/7/VISTA/XP/2000/NT/98, LINUX
Networks/Protocols: SIP, SNMP, TMN, IEEE 802.11 - 15, RTP, GRE, TCP/IP, CDMA, GSM, OBS, G.711/G.729, H.263, H.323, H.248, PTS (R1/R2), CCS7/SS7
Applications: SAS Enterprise Miner, Hadoop, .NET 2.0, ASP.NET 2.0, Visual Studio 2005, Visual Studio 2008, XML, UML, SQL, XML, SQLXML, MATLAB, MathCAD,
WORKING EXPERIENCE:
Sr. Data Scientist
ConfidentialResponsibilities:
- Developed Risk, Fraud, Collection and Direct Mail Response models using SAS (SAS Enterprise Guide 5.1/7.1). Implemented a two-dimensional score cutoff strategy for Direct Mail campaigns.
- Data Mining: Analyzed and validated raw data from different vendors including Experian, Call Credit BSB/TAC, Clarity and TransUnion, identified and treated special values, derived new variables, and decided Bad Definition through Roll Rate Analysis and Vintage Analysis.
- Modeling: Automated variable deduction through SAS macros. Wrote SAS macros for logit plots, missing value replacement, special value treatment, flooring and capping, lift table and KS.
- Scoring: Converted raw data into scores in Excel for the IT department to implement the model. Automated the conversion through a VB script.
- Led development of company’s first credit scoring model (Gradient Boosting, Decision Trees and Random Forest) with machine learning in R. Wrote R code to calculate Mutual information, KS and AUC, and R tool to automatically tune parameters with Gradient Boosting
Sr. Data Scientist/Team Leader
ConfidentialResponsibilities:
- Lead a team to develop roadmap and frame work of model governance per Regulatory requirements from SR11-7 and CCAR and do post-implementation review of models (Hadoop/Hive, R, SAS).
- Write SAS macros to perform stability analysis for scores and variables, and conduct performance analysis in terms of KS, ROC/AUC, Gini, Lift Table and scores’ rank order.
- Present model governance results to Credit Committee Review in a monthly basis.
- Develop Loss Given Default (LGD) model using Linear Regression and Fractional Regression to predict loss for our defaulted loans.
- Train the team in R, SAS, Machine Learning and Model Governance.
Dallas, Texas
Sr. SAS Consultant
ConfidentialResponsibilities:
- Provided Medical claim reports for Medicare Part A and Part B using Base SAS 9.2. Responsible for routine (weekly or monthly) and ad hoc reporting for internal and external customers. Experience in large tables with millions of rows and hundreds of columns in large data warehouse of thousands of tables.
- Calculate Part A and Part B weekly workload counts (beginning, receiving, processed claims and ending claims) for Prepay, Reopening and Routine claims from daily data records. Automate weekdays by SAS function today() and distinguish holiday and non-holiday cases.
- Run monthly Comprehensive Error Rate Testing (CERT) and Error Rate reporting for Part A and Part B. Summarize results by projected dollars paid in error, projected dollars paid, and projected error rate.
Dallas, Texas
Sr. Statistician
ConfidentialResponsibilities:
- Predicted the risk of death within 30 days discharges and calculate the mortality score for heart failure patients on the basis of age and the worst value obtained within the first 24 hours of hospital presentation and vital sign variables: albumin, total bilirubin, creatine kinase, creatinine, sodium, blood urea nitrogen, partial pressure of carbon dioxide, white blood cell count, troponin-I, glucose, internationalized normalized ratio, brain natriuretic peptide, pH, temperature, pulse, diastolic blood pressure, and systolic blood pressure.
- Predicted readmission within 30 days of discharges for heart failure patients by Logistics Regression. Dependent variables include mortality score, age, gender, race, payment method, history of depression, history of drug use, history of leaving against medical advice, history of missed clinic visit, number of prior impatient admissions, and number of prior emergency visits.
- Provided monthly listing of discharges, readmissions, admission rate, mortality rate, time to follow up visit, DRGs under each department and each division of the hospital (Proc Report).
- Provided monthly listing of top 25 DRGs per charges in fiscal year 2011 (Proc Summary, Proc Sort, Proc Merge).
- Reported monthly frequencies for Order Set 1654 (ICU Therapeutic Hypothermia Following Cardiac Arres) and 2510 (Theraupeutic Hypothermia following Cardiac Arrest).
Dallas, Texas
Sr. Software Engineer
ConfidentialResponsibilities:
- Enhanced, maintained and sustained ETALK’s Qfiniti software for call recordings (Visual Studio 2005 , .NET, SQL, Stored Procedures, VC++, C#, XML).
Richardson, Texas
Sr. Software Engineer
ConfidentialResponsibilities:
- Designed, developed, coded, tested and maintained Nortel’s wireless CDMA and GSM products (UNIX/Linux, C/C++, Protel II, PERL/CGI, VB).
Plano, Texas
Senior Software Engineer
ConfidentialResponsibilities:
- Developed and sustained EMX 2500/5000 Call Processing software in the areas of EMX 2500 CDMA 2000 and SS7/ISUP (C/C++, Assembly and ClearCase under UNIX).
Adjunct Professor
ConfidentialResponsibilities:
- Taught graduate core course “Performance Evaluation of Computer Networks”, which attracted 60 graduate students in average. Assigned and directed projects to simulate various statistical models and calculate their performance measures in C++ and JAVA.
- Taught undergraduate core courses “Discrete Math I” and “Discrete Math II”.
- Conducted academic research on performance evaluations of computer networks.