Research Analyst/ Data Scientist Resume
Detroit, MI
SUMMARY
- Data analytics professional with eight years of clinicalresearch experience in delivering end to end data science projects.
- Proficient in managing entire data science project life cycle and actively involved in the entire data science project life cycle including
- Data acquisition (Primary and secondary data collection).
- Power analysis, Hypothesis generation and testing, effect size.
- Data cleaning, Data Imputation (Outlier detection using Chi Sq detection, Residual analysis, Multivariate Outlier detection.
- Data Transformation.
- Statistical modeling both linear and nonlinear ( Linear and Logistic regression, Naïve Bayes, Decision trees, Random forest, Neural networks, SVM, K means Clustering, MBA, KNN).
- Dimensionality reduction using Principal Component Analysis (PCA) and Factor Analysis.
- Testing and validation using ROC plot, K - fold cross validation, Confusion matrix and statistical significance testing, Data Visualization using R- gg2 plot package.
- Experience in Exploratory Data Analysis, obtain insights from data then choose appropriate Machine Learning Algorithms (Classification, Regression, Association and Clustering).
- Experience with SPSS, R (packages- Knitr, dplyr, data-table, SparkR), Python (sklearn, scipy, numpy, panda).
- Experience in employing statistical models such as ANOVA, MANOVA, repeated- measure ANOVA.
- Excellent written and verbal communication skills, preparing scripts for proper data access, manipulation and reporting functions with R.
- Experience in writing journal articles and budget preparation for grant applications.
TECHNICAL SKILLS
Programming Languages: SPSS, SAS-Base, R, Python
Packages and tools: Pandas, NumPy, SciPy, Scikit-Learn, matplotlib, ggplot2, dplyr, data.table
Machine Learning: Linear regression, Logistic regression, Decision Trees, Support Vector Machines, Ensemble learning such as Random Forest, K-Nearest Neighbor, Unsupervised learning such as Market Basket Analysis and K-means clustering
Statistical Methods: ANOVA, MANOVA, Repeated Measure ANOVA, Linear regression, Logistic Regression, Survival Analysis, parametric and non-parametric tests
PROFESSIONAL EXPERIENCE
Confidential, Detroit, MI
Research Analyst/ Data Scientist
Responsibilities:
- Participated in all phases of data collection, data cleaning, developing models, visualization, validation and presentation
- Built a statistical regression model to diagnose sleep disordered breathing (Sleep Apnea) in aging population.
- Responsible for performing Machine-Learning techniques such as regression/classification to predict the risk factors associated with sleep disordered breathing in obese, spinal cord injured patients, and aging population.
- Performed Data Manipulation and Aggregation using dplyr R package, SPSS and Python libraries
- Responsible in employing statistical methods to identify the impact of pharmacological treatments in patients with sleep apnea vs normal people using SPSS and R programs.
- Extracted data of spinal cord injured patients from the Veterans Affair central portal of system using medical diagnostic code.
- Formed a liaison between the Primary Investigators, sleep fellows and research assistants.
- Presenting results in graphical format using gg2plot package, SIGMAPLOT in International conferences such as American Thoracic Society and Sleep.
- Responsible for writing technical and statistical portion of the journal article released from the research group.
Environment: Python 2.x, R, HDFS, Hadoop 2.3, IBM SPSS, SQL Server 2012, Microsoft Excel, Matlab, Spark SQL, Pyspark.
Confidential, Detroit, MI
Intern in Maternal and Child Health
Responsibilities:
- Performed data pre-processing and cleaning to prepare data sets for further statistical analysis including outlier detection and treatment, missing value treatment, variable transformation and other data manipulation technique using SPSS, R and SAS-Base
- Involved in the process of manipulating historical data obtained from Medicare using SPSS and SAS-Base for model prediction analysis.
- Utilized logistic regression analysis in R to identify risk factors for low birth weight babies
- Performed cluster analysis to classify mothers into risk groups and associated factors involved in birthing low weigh babies
- Accompanied Nurse Practioner in field visits and administered health questionnaire to expectant mothers.
Environment: IBM SPSS, Microsoft Excel, Microsoft Access, SAS-Base
Confidential, Detroit, MIResearch Assistant
Responsibilities:
- Responsible for data collection, data entry, data pre-processing and cleaning data sets for further statistical analysis including missing value treatment, transformation techniques using R and SPSS
- Involved in data visualization using R-gg2plot package and SPSS.
- Involved in journal article searches and selecting appropriate articles for journal writing
Environment: IBM SPSS, Microsoft Excel, SAS-Base