Consultant Resume
Rochester, MN
PROFESSIONAL SUMMARY:
- Confidential is a data scientist with a strong research background, having 12 years of experience in data analysis & scientific programming, and a publication record of 25 peer - reviewed papers.
- Much of his work has focused on healthcare data analytics and discovery research in both academic and industrial settings. Areas of research include data integration, quality control, feature extraction, algorithm development, machine learning and predictive modeling.
- Expert in Python, C++, Perl, SAS, R, and proficient in SQL, PHP, Java. Grasp emerging big data and deep learning techniques such as Hadoop, Spark and TensorFlow.
TECHNICAL SKILLS:
Programming: C++ (15 years), Python (8 years), Perl (12 years), PHP (12 years), SQL (12 years), Java (7 years), HTML (12 years), CSS (6 years), XML (2 years), Bash (10 years)
Statistical programming:: SAS (8 years), R (11 years)
Operating systems: Windows (18 years), Unix/ Linux (12 years), Mac (8 years)
Machine learning: SVM (5 years), LASSO (5 years), Deep learning /TensorFlow (2 years)
Databases: MySQL (10 years), NoSQL (2 years)
Big data analytics: Hadoop/MapReduce (2 years), Spark (2 years)
HPC: SGE/PBS queuing system (8 years), multi-threading (2 years), MPI (6 years)
Bioinformatics: Bioconductor (10 years), BioPerl (10 years), Microarray data analysis (12 years), NGS data analysis (6 years), Genomics (12 years), Population genetics (9 years), Sequence analysis (8 years)
PROFESSIONAL EXPERIENCE:
Confidential, Rochester, MN
Consultant
Responsibilities:
- Analyze computer systems and make Bash/Python scripts to solve diverse questions during testing and deployment of NGS clinical pipelines.
- Design and code automation pipelines for regression testing using golden datasets, and then merge them into deployment scripts.
- Conducted unit testing for two novel NGS clinical pipelines. Detected and corrected crucial logic errors which enabled their release prior to deadlines.
Independent Consultant
Confidential
Responsibilities:
- On-demand bioinformatics support for National Institute of Nursing Research.
Consultant
Confidential
Responsibilities:
- Designed a deep learning framework for a personalized skincare product recommendation system.
- Developed custom deep learning (TensorFlow) programs for diverse applications.
- Made a python script to preprocess data, build multiple machine learning models and visualize top weighted variables.
- All the models achieved accuracy around 90%.
- Processed different types of input files and built SDTM DM, AE, EX, LB and XP datasets.
- Generated “define.xml” files according to metadata Excel files.
- Left-normalized WGS data of 70 samples using VT tools and merged normalized VCF files using VCFtools.
- Pipelined annotation of genetic variants for the merged VCF file using Perl and AnnoVar.
- Analyzed missing heritability in rare variants based on simulation & 1000 Genomes data using R, C++, Perl, VCFtools and PLINK.
- Analyzed high-order interactions in cancer expression data using optimization algorithms and C++ multi-threading (github.com/wyp1125/xSyn).
- Two research papers were published.
- Wrote HTML and CSS scripts to construct querying web interfaces.
- Made dynamic web pages to retrieve, analyze and visualize market data using PHP with GD library.
- Installed libraries, packages and software for various applications.
- Set up and ran Hadoop on AWS EC2.
- Set up and ran TensorFlow and PySpark.
Confidential
Project Director (Consultant)
Responsibilities:
- Made SAS programs to detect weight change-associated gene expression profiles from microarray data (>20,000 genes), through applying a linear regression model to each gene with gender and race factors corrected, and then adjusting P-values for multiple testing
- Made R/Bioconductor programs to analyze and visualize unstructured clinical and genomic data: a) used “arrayQualityMetrics” to remove outliers; b) used “affy” and “gcrma” to generate expression values; c) used “preprocessCore”, “sva” and “ggfortify” to visualize PCA and correct batch effects; d) used “qvalue” and FDR to correct for multiple testing; e) used two-way ANOVA to assess expression differences under interacting categorical variables; f) iteratively processed different clinical data and computed their correlations with genotypes; g) used “gplots” to visualize a bi-clustering heat map.
- Built machine learning models for predicting obesity risks using Support Vector Machine (SVM). Customized feature selection procedures: SNPs were selected to maximize the correlation between their genetic risk score and Body Mass Index (BMI), and gene expression profiles were selected according to significant correlation with BMI and mutual independence.
- Generated two research papers from the above analyses which perfectly realized project goals.
- Regularly communicated project status and analysis strategies with the client to ensure all the client’s needs were met.
Confidential, Falls Church, VA
Senior Bioinformatics Scientist
Responsibilities:
- Identified genetic variants responsible for preterm birth through applying a novel algorithm for integrating 790 mothers’ whole-genome sequencing (WGS) & RNA-seq data, which realized the value of $20M investment in generating the data.
- Designed a customized MapReduce framework on SGI UV2000 for analyzing7000 individuals’ WGS data. Adapted this framework for common use based on PySpark
- Deployed a generic NoSQL (key-value) database for querying allele frequencies of 7000 individuals’ WGS data based on in-memory design on SGI UV2000, facilitating the publication of a paper in the Brain journal.
- Made SAS programs to validate genetic associations for preterm birth by involving software versions (batch effect), ancestry proportions, and age into linear regression models.
- Constructed SVM machine learning models to evaluate Complete Genomics and Illumina Isaac variant calling pipelines and further filter variant calls based on the QC metrics of their common variants extracted from MasterVar and VCF files (190 samples were sequenced by both CG and Illumina platforms).
Confidential, NIH, Bethesda, MD
Research Fellow
Responsibilities:
- Constructed and tuned SVM machine learning models to predict inactive enhancers from DNA motif signatures
- Pipelined genetic variant calling using BWA, Samtools, Picard, GATK, PLINK and VCFtools.
Confidential, Ithaca, NY
Postdoctoral Associate
Responsibilities:
- Performed -based RNA-seq data analysis and de novo assembly of RNA-seq data.
- Integrated and statistically analyzed RNA-seq and Mass spectrometry data.
Confidential, Athens, GA
Postdoctoral Research Associate
Responsibilities:
- Analyzed complicated correlation patterns between (epi)genomic features and gene duplications using SAS.
- Developed software for detecting transposed gene duplications using C++, Perl and Java
- Results presented in leading plant science and bioinformatics journals.
Confidential, Athens, GA
Research Assistant
Responsibilities:
- Processed and analyzed thousands of microarray datasets using R/Bioconductor and Perl on SGE clusters.
- Developed software for detecting gene synteny and colinearity using C++, Perl and Java
- Results presented in leading bioinformatics journals.
Confidential, Indianapolis, IN
Summer Intern
Responsibilities:
- Analyzed SNPs associated with maize quantitative traits using logistic regression models coded in C++/GSL.
Confidential, Athens, GA
Research Assistant
Responsibilities:
- Analyzed gene family evolution for the families including taste receptor genes, importins and Sp transcription factors.
- Developed statistical procedures for assessing SNP-coexpression associations using C++, parallelized computation on SGE clusters, and made web interfaces for querying results using PHP and SQL
- Results presented in leading bioinformatics journals.