We provide IT Staff Augmentation Services!

Data Scientist Resume

0/5 (Submit Your Rating)

Seattle, WA

SUMMARY

  • 4 years of experience in Data Analysis, Machine Learning, Data mining with large datasets of Structured and Unstructured data.
  • Strong skills in Machine Learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Tree, Random Forest, Support Vector Machine, K - Nearest-Neighbors, K-means Clustering, Neural networks, Ensemble Methods using Python Pandas, NumPy and scikit-learn.
  • Proficient in Predictive Modeling, ANOVA, Hypothetical testing, A/B testing and advanced statistical techniques.
  • Proficient at building robust Machine Learning, Deep Learning models, Convolution Neural Networks (CNN), Recurrent Neural Networks (RNN) using TensorFlow, Keras and Pytorch in Python.
  • Worked on Natural Language Processing (NLP) (Topic modeling, sentiment analysis, text classification) using BOW, n-grams, NLTK, TF-IDF, Word2Vec, Doc2Vec, spacy and gensim.
  • Knowledge in Cloud services such as Microsoft Azure and Amazon AWS.
  • Adept in analyzing large datasets using Apache Spark, PySpark, Spark ML and Amazon Web Services (AWS) and knowledge in Big Data ecosystem components like Hadoop, MapReduce, Spark, Pig, Hive.
  • Knowledge in SQL database servers and NoSQL Databases such as HBase, Cassandra and MongoDB.
  • Worked on web scrapping tools like Scrapy and Beautiful Soup in python to extract data form websites.
  • Hands on experience in python Web frameworks like Django and Flask, Bootstrap (Front-end framework).
  • Experience in visualization tools like Tableau and Plotly for web apps, matplotlib and seaborn in Pandas.
  • Knowledge and experience in GitHub/Git version control tools.

PROFESSIONAL EXPERIENCE

Confidential, Seattle, WA

Data Scientist

Responsibilities:

  • Perform data manipulation, data preparation, normalization, and predictive modelling. Improve efficiency and accuracy by evaluating model in Python.
  • Implemented, tuned and tested the models on Amazon SageMaker, Jupyter Notebooks in EC2 and Microsoft Azure with the best performing algorithm and parameters.
  • Used Python and R for programming for improvement of model. Upgrade the entire models for improvement of the product.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Design built and deployed a set of python modelling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction and support multiple marketing segmentation programs.
  • Validated the machine learning classifiers using ROC Curves and Lift Charts.
  • Presented Dashboards to Higher Management for more Insights using Power BI.
  • Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring.
  • Performed Boosting method on predicted model for the improve efficiency of the model.
  • Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom, visualization tools using R, Tableau, and Power BI.
  • Collaborating with the project managers and business owners to understand their organizational processes and help design the necessary reports.

Confidential, Seattle, WA

Data Scientist

Responsibilities:

  • Participated in all phases of data acquisition, data cleaning, developing models, validation, and visualization using python Pandas, NumPy, Scikit-learn, R to deliver data science solutions.
  • Develop models to accurately detect instances of fraud for further action using Logistic Regression, Decision Trees, Random Forest and Neural networks provided by Scikit-learn in python.
  • Designed and implemented Cross-validation and statistical tests including k-fold, stratified k-fold, hold-out scheme to test and verify the models' significance.
  • Design built and deployed a set of python modelling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction and support multiple marketing segmentation programs
  • Validated the machine learning classifiers using ROC Curves and Lift Charts.
  • Segmented the customers based on demographics using K-means Clustering
  • Explored different regression and ensemble models in machine learning to perform forecasting
  • Presented Dashboards to Higher Management for more Insights using Power BI
  • Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring
  • Performed Boosting method on predicted model for the improve efficiency of the model
  • Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom, visualization tools using R, Tableau, and Power BI
  • Collaborating with the project managers and business owners to understand their organizational processes and help design the necessary reports
  • Articulating business questions and using mathematical techniques to arrive at an answer using available data.

Confidential

Jr Data Analyst

Responsibilities:

  • Extensively involved in all phases of data acquisition, data collection, data cleaning, model development, model validation and visualization to deliver data science solutions.
  • Built machine learning models to identify whether a user is legitimate using real-time data analysis and prevent fraudulent transactions using the history of customer transactions with supervised learning.
  • Extracted data from SQL Server Database copied into HDFS File system and used Hadoop tools such as Hive and Pig Latin to retrieve the data required for building models.
  • Performed data cleaning including transforming variables and dealing with missing value and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Tackled highly imbalanced Fraud dataset using sampling techniques like under sampling and oversampling with SMOTE (Synthetic Minority Over-Sampling Technique) using Python Scikit-learn.
  • Utilized PCA, t-SNE and other feature engineering techniques to reduce the high dimensional data, applied feature scaling, handled categorical attributes using one hot encoder of scikit-learn library
  • Developed various machine learning models such as Logistic regression, KNN, and Gradient Boosting with Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn in Python.
  • Identified and evaluated various distributed machine learning libraries like Mahout, MLLib (Apache Spark) and R.
  • Worked on Amazon Web Services (AWS) cloud services to do machine learning on big data.
  • Developed Spark Python modules for machine learning & predictive analytics in Hadoop.
  • Implemented a Python-based distributed random forest via PySpark and MLlib.

We'd love your feedback!