Data Scientist Resume
0/5 (Submit Your Rating)
Seattle, WA
SUMMARY
- 4 years of experience in Data Analysis, Machine Learning, Data mining with large datasets of Structured and Unstructured data.
- Strong skills in Machine Learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Tree, Random Forest, Support Vector Machine, K - Nearest-Neighbors, K-means Clustering, Neural networks, Ensemble Methods using Python Pandas, NumPy and scikit-learn.
- Proficient in Predictive Modeling, ANOVA, Hypothetical testing, A/B testing and advanced statistical techniques.
- Proficient at building robust Machine Learning, Deep Learning models, Convolution Neural Networks (CNN), Recurrent Neural Networks (RNN) using TensorFlow, Keras and Pytorch in Python.
- Worked on Natural Language Processing (NLP) (Topic modeling, sentiment analysis, text classification) using BOW, n-grams, NLTK, TF-IDF, Word2Vec, Doc2Vec, spacy and gensim.
- Knowledge in Cloud services such as Microsoft Azure and Amazon AWS.
- Adept in analyzing large datasets using Apache Spark, PySpark, Spark ML and Amazon Web Services (AWS) and knowledge in Big Data ecosystem components like Hadoop, MapReduce, Spark, Pig, Hive.
- Knowledge in SQL database servers and NoSQL Databases such as HBase, Cassandra and MongoDB.
- Worked on web scrapping tools like Scrapy and Beautiful Soup in python to extract data form websites.
- Hands on experience in python Web frameworks like Django and Flask, Bootstrap (Front-end framework).
- Experience in visualization tools like Tableau and Plotly for web apps, matplotlib and seaborn in Pandas.
- Knowledge and experience in GitHub/Git version control tools.
PROFESSIONAL EXPERIENCE
Confidential, Seattle, WA
Data Scientist
Responsibilities:
- Perform data manipulation, data preparation, normalization, and predictive modelling. Improve efficiency and accuracy by evaluating model in Python.
- Implemented, tuned and tested the models on Amazon SageMaker, Jupyter Notebooks in EC2 and Microsoft Azure with the best performing algorithm and parameters.
- Used Python and R for programming for improvement of model. Upgrade the entire models for improvement of the product.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Design built and deployed a set of python modelling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction and support multiple marketing segmentation programs.
- Validated the machine learning classifiers using ROC Curves and Lift Charts.
- Presented Dashboards to Higher Management for more Insights using Power BI.
- Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring.
- Performed Boosting method on predicted model for the improve efficiency of the model.
- Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom, visualization tools using R, Tableau, and Power BI.
- Collaborating with the project managers and business owners to understand their organizational processes and help design the necessary reports.
Confidential, Seattle, WA
Data Scientist
Responsibilities:
- Participated in all phases of data acquisition, data cleaning, developing models, validation, and visualization using python Pandas, NumPy, Scikit-learn, R to deliver data science solutions.
- Develop models to accurately detect instances of fraud for further action using Logistic Regression, Decision Trees, Random Forest and Neural networks provided by Scikit-learn in python.
- Designed and implemented Cross-validation and statistical tests including k-fold, stratified k-fold, hold-out scheme to test and verify the models' significance.
- Design built and deployed a set of python modelling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction and support multiple marketing segmentation programs
- Validated the machine learning classifiers using ROC Curves and Lift Charts.
- Segmented the customers based on demographics using K-means Clustering
- Explored different regression and ensemble models in machine learning to perform forecasting
- Presented Dashboards to Higher Management for more Insights using Power BI
- Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring
- Performed Boosting method on predicted model for the improve efficiency of the model
- Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom, visualization tools using R, Tableau, and Power BI
- Collaborating with the project managers and business owners to understand their organizational processes and help design the necessary reports
- Articulating business questions and using mathematical techniques to arrive at an answer using available data.
Confidential
Jr Data Analyst
Responsibilities:
- Extensively involved in all phases of data acquisition, data collection, data cleaning, model development, model validation and visualization to deliver data science solutions.
- Built machine learning models to identify whether a user is legitimate using real-time data analysis and prevent fraudulent transactions using the history of customer transactions with supervised learning.
- Extracted data from SQL Server Database copied into HDFS File system and used Hadoop tools such as Hive and Pig Latin to retrieve the data required for building models.
- Performed data cleaning including transforming variables and dealing with missing value and ensured data quality, consistency, integrity using Pandas, NumPy.
- Tackled highly imbalanced Fraud dataset using sampling techniques like under sampling and oversampling with SMOTE (Synthetic Minority Over-Sampling Technique) using Python Scikit-learn.
- Utilized PCA, t-SNE and other feature engineering techniques to reduce the high dimensional data, applied feature scaling, handled categorical attributes using one hot encoder of scikit-learn library
- Developed various machine learning models such as Logistic regression, KNN, and Gradient Boosting with Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn in Python.
- Identified and evaluated various distributed machine learning libraries like Mahout, MLLib (Apache Spark) and R.
- Worked on Amazon Web Services (AWS) cloud services to do machine learning on big data.
- Developed Spark Python modules for machine learning & predictive analytics in Hadoop.
- Implemented a Python-based distributed random forest via PySpark and MLlib.