Biostatistician Resume
5.00/5 (Submit Your Rating)
Washington, DC
SUMMARY:
- Cloudera Certified Data Scientist.
- Looking for opportunities in Data Science.
- 2 yrs of non - academic experience for large scale machine learning, statistical modeling of numerical and textual data.
SKILL:
COMPUTATIONAL: R, SAS (Macro / SQL), Python (NLTK / Scikit-learn / Pandas / BeautifulSoup), Amazon Web Service (Elastic MapReduce / EC2 / S3), SQLite, LaTex, Matlab, STATA
HDFS: NoSQL Hadoop Pig / Hive
Knowledge in: Sqoop / Hbase / Mahout
ANALYTICAL: Statistical Modeling, Machine Learning, Predictive Analysis, Cloud Computing
WORKING EXPERIENCE:
Biostatistician
Confidential, Washington, DC
Responsibilities:
- Building and evaluating models to predict clinical outcome and financial burden using various classification techniques (Logistic Regression with Regularization, Random Forest, SVM, etc). Targeting high-risk population for customized prevention strategy in order to minimize disease prevention budget.
- Used Hidden Markov Model (HMM) to model the time series transient nasal colonization status and the bacterial transmission dynamics from patient to patient. Estimate the transition matrix and seasonal transition pattern of the hidden epidemic vs non-epidemic state.
- Creating and executing statistical analysis plan and performing disease risk estimation using classical statistical techniques such as: Time to Event analysis, Mixed Model, and GEE method for clustered data.
- Assisting in the development and implementation of study protocol and reports, and providing statistical support to the investigative team in the study design and preparation of reports, abstracts, presentations and manuscripts.
- Providing programming guidance and logic check specifications to the programming team in dataset creation and data cleaning, coordinating data collection and management with data entry team.
- Reviewed scientific literature, new concepts, and statistical methods, assisted the selection of appropriate study design as well as development of analysis plans.
- Leading data analysis meetings and project-based conference calls with study personnel and external collaborators. Effectively presenting research findings to study principle investigator by tables and graphs, delivering the interim and final report to the internal review board.
Research Assistant
Confidential, Providence, RI
Responsibilities:
- Performed genetic differential expression analysis by Empirical Bayes method.
- Estimate the posterior probability of each model for each of the genes in the high throughput dataset.
- Information from the entire set of genes is used to form an empirical prior distribution for the parameters in the NB model.
- Gene clustering detection using statistical models such as k-mean/hierarchical clustering.
- Building NLP models and predicted sentiment score from 1 to 5 based on 156,000 movie review text.
- Predicted document categories(rating) using Multinomial Bayes / SGD / One-Vs-All / K-NN classifier with the gridded search cross validation with a pipeline with tf-idf vectorizer to find the optimal parameter combination.
- Python Scikit-learn and Pandas packages was used in the analysis and modeling.