We provide IT Staff Augmentation Services!

Data Scientist Resume

5.00/5 (Submit Your Rating)

New York, NY

SUMMARY

  • Over 6 years of profound experience as a Data Scientist with excellent Statistical Analysis, Data Mining and Machine Learning Skills.
  • Worked in the domains of Financial Service, Healthcare and Retail.
  • Expertise in managing full life cycle of Data Science project includes transforming business requirements into DataCollection,Data Cleaning, Data Preparation, DataValidation, Data Mining,and DataVisualization from structured and unstructured Data Sources.
  • Hands on experience in writing queries in SQL and R to extract, transform and load (ETL) data from large datasets usingDataStaging.
  • Proven ability in using Text Analytics and statistical modeling techniques such as: linear regression, LASSO regression, logistic regression, elastic net, ANOVA, Monte Carlo methods, factor analysis, clustering analysis, principle component analysis and Bayesian inference.
  • Professional working experience in Machine Learning algorithms such as LDA, linear regression, logistic regression, GLM, SVM, Naive Bayes, Random Forests, Decision Trees, Clustering, neural networks and Principle Component Analysis.
  • Working knowledge on Recommender Systems and Feature Creation, Validation using ROC plot and K - fold cross validation.
  • Professional working experience of using programming languages and tools such as Python, Hive, Spark, Java, PHP and PL/SQL.
  • Working experienced of statistical analysis using R, SAS (STAT, macros, EM), SPSS, Matlab and Excel.
  • Hands on experience of Data Science libraries in Python such as Pandas, Numpy, SciPy, scikit-learn, Matplotlib, Seaborn, Beautiful Soup, Orange, Rpy2, LibSVM, neurolab, NLTK.
  • Familiar with packages in R such as ggplot2, Caret, Dplyr, Tidyr, Wordcloud, Stringr, e1071, MASS, Rjson, Plyr, FactoMineR, MDP.
  • Working knowledge of NLP based deep learning models in Python 3.
  • Working experience in RDBMS such as SQL Server 2012/2008 and Oracle 11g.
  • Extensive experience of Hadoop, Hive and NoSQL databases such as MongoDB, Cassandra and HBase.
  • Experience in data visualizations using Python, R, D3.js and Tableau 9.4/9.2.
  • Familiar with conducting GAP analysis, User Acceptance Testing (UAT), SWOT analysis, cost benefit analysis and ROI analysis.
  • Deep understanding of Software Development Life Cycle (SDLC) as well as Agile/Scrum methodology to accelerate Software Development iteration.
  • Experience with version control tool- Git.
  • Extensive experience in handling multiple tasks to meet deadlines and creating deliverables in fast-paced environments and interacting with business and end users.

TECHNICAL SKILLS

Hadoop Ecosystem: Hadoop2.X, Spark1.6+,Hive2.1, Hbase1.0+

Languages: Python2.7/3, R-3, PL/SQL, SAS 9.4, Hive, Pig, Java, PHP

Packages: Pandas, Numpy, Scikit-learn, Beautiful Soup, GGPLOT2, caret, dplyr, tidyr, wordcloud, stringr, e1071, MASS, rjson, plyr, FactoMineR, seaborn, matplotlib, MDP, Orange, Rpy2, LibSVM, neurolab, NLTK.

Machine Learning: LDA, Naive Bayes, Decision trees, Regression models, Neural Networks, SVM, XG Boost, SVM, random forests, bagging, gradient boosting machines, k-means

Databases: MySQL 5.X, Oracle 11g,SQL Server2012/2008, MongoDB3.2, HBase 1.0+, Cassandra 3.0

Business Analysis: Requirements Engineering, Business Process Modeling & Improvement, Gap analysis, Cause and Effect Analysis, UI Design, UML Modeling, User Acceptance Testing (UAT), RACI Chart, Financial Modeling

Scripting Language: UNIX Shell, HTML, XML, CSS, JSP, SQL, Markdown

Data Analysis/Visualization: Tableau 9.4/9.2, Matplotlib, D3.js, Rshiny

Documentation/Modeling Tools: MS Office 2010, MS Project, MS Visio, Rational Rose, Excel (Pivot, Tables, Lookups) Share Point, Rational Requisite Pro, MS Word, PowerPoint, Outlook

Version Control: Git, TFVC

Operating Systems: Linux, Ubuntu, Mac OS, CentOS, Windows

PROFESSIONAL EXPERIENCE

Confidential -- New York, NY

Data Scientist

Responsibilities:

  • Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object Oriented Design) using UML and Visio with Agile methodology to transform business requirements into analytical goals.
  • Created queries using Spark SQL, Hive, SAS (Proc SQL) and PL/SQL to load large amount of data from MongoDB and SQL Server into HDFS to spot data trends.
  • Wrote Hive-QL to retrieve, query and process raw data.
  • Used Spark SQL to perform data cleansing, transformation and filtering such as identifying outliers, missing value and invalid values.
  • Utilized K-means clustering technique to classify unlabeled data.
  • Worked on data pattern recognition, data cleaning as well as data visualizations such as Scatter Plot, Box Plot and Histogram Plot to explore the data using packages Matplotlib, Seaborn in Python, ggplot in R and SAS.
  • Used LDA, PCA and Factor Analysis to perform dimensional reduction.
  • Modified and applied Machine Learning algorithm such as Neural Networks, SVM, Bagging, Gradient Boosting, K-Means using PySpark and MLlib to detect target customers.
  • Worked on customer segmentation based on the similarities of the customers using an unsupervised learning technique - cluster analysis.
  • Used Pandas, Numpy, Scipy, Scikit-learn, NLTK in Python for scientific computing and data analysis.
  • Applied cross validation to evaluate and compare the performance among different models. Validated the machine learning classifiers using ROC Curves and Lift Charts.
  • Configured Spark Streaming with Kafka to clean and aggregate real time data.
  • Involved in Text Analytics such as analyzing text, language syntax, structure and semantics.
  • Generated weekly and monthly reports and maintained, manipulated data using SAS macro, Tableau and D3.js.
  • Involved in using Sqoop to load historical data from SQL Server into HDFS.
  • Used Git for version control.

Environment: Python 3/2.7, R 3, SAS 9.4, HDFS, MongoDB 3.2, Hadoop, Hive, Linux, Spark, Kafka, Tableau 9.4, D3.js, SQL Server 2012, Spark SQL, PL/SQL, UML, Git

Confidential, Hartford, CT

Data Scientist

Responsibilities:

  • Conducted comprehensive analysis and evaluations of business needs; Provided analytical support for policy; Engineered financial, operational and reputational impacts and influence decisions for different models.
  • Retrieved, manipulated, analyzed, aggregated and performed ETL through billions of records of claim data from databases like RDBMS and Hadoop cluster using SAS (Proc SQL), PL/SQL, SparkSQL, Sqoop and Flume.
  • Used Matplotlib, Seaborn in Python to visualize the data and performed featuring engineering such as detecting outliers, missing value and interpreting variables.
  • Worked on transformation and dimension reduction of the dataset using PCA and Factor Analysis.
  • Developed, validated and executed machine learning algorithms including Naive Bayes, Decision trees, Regression models, SVM, XG Boost to identify different kinds of fraud and reporting tools that answer applied research and business questions for internal and external clients.
  • Implemented models like Linear Regression, Lasso Regression, Ridge Regression, Elastic Net, Random Forest and Neural Network to provide predictions to help reducing the rate of frauds.
  • Experienced in using Pandas, Numpy, SciPy, Scikit-learn to develop various machine learning algorithms.
  • Used PySpark, MLlib to evaluate different models like F-Score, Precision, Recall, and A/B testing.
  • Fine-tuned the developed algorithms using regularization term to avoid overfitting.
  • Configured Kafka with Spark Streaming API to fetch near real time data from multiply source such as web log.
  • Analyzed real time data using Spark Streaming and Spark core with MLlib.
  • Used the final machine learning model to detect fraud of real time data.
  • Extensively involved in data visualization using D3.js and Tableau.

Environment: Python 3/2.7, R 3, SAS 9.4, HBase 1.0+, Kafka, HDFS, Hadoop, Hive, Linux, Spark, Tableau 9.2, D3.js, SQL Server 2012, Excel, Spark SQL.

Confidential

Data Scientist

Responsibilities:

  • Created new features based on information from million transaction records and training models using Machine-Learning techniques such as Gradient Boosting Tree and Deep Learning.
  • Analyzed and determined a cutoff point for accepting/ decliningtransactions to minimize fraud losses and increase customer experienceby using various machine learning algorithms such as Logistic Regression, Classification, Random Forests and Clustering in SAS, R and Python.
  • Used Pandas, Numpy, Seaborn, Scipy, Matplotlib, Scikit-learn, NLTK in Python for implementing various machine learning algorithms.
  • Used SAS, SQL, Oracle, Teradata and MS Office analysis tools to complete analysis requirements. CreatedSASdata sets by extracting data from Oracle database and flat files
  • Used Proc SQL, Proc Import,SASData Step to clean, validate and manipulate data.
  • Performed updating data by weekly and monthly; maintained, manipulated the data for database management. Used theSASMacro and Excel Macro for the monthly production.
  • Experienced in SQL queries to retrieve and validate data, prepared for data mapping document.
  • Worked on RDBMS like MySQL and NoSQL databases like MongoDB.
  • Used the Agile Scrum methodology to build the different phases of software development life cycle.

Environment: SAS9.4,Base SAS, SASMacros SASGraph,SASAccess,SASSTAT,SASODS,SASSQL, SAS/ETL, SAS/Stat, SAS ENTERPRISE Miner, Python, PL/SQL, Oracle 9i, Hadoop, MongoDB

Confidential

Data Analyst

Responsibilities:

  • Analyzed online user behavior, conversion data and customer journeys, funnel analysis andmulti-channel attribution.
  • Worked on business forecasting, segmentation analysis and data mining.
  • Involved in the development of Data Warehouse for personal lines property and casualty insurance.
  • Generated graphs and reports using ggplot in RStudio for analyzing models.
  • Developed and implemented R and Shiny for business forecasting.
  • Developed predictive models using Decision Tree, Random Forest and Naïve Bayes.
  • Used available data sources to deep dive and troubleshoot campaign performance issues.

Environment: MySQL5.5, R 3, caret, Shiny

Confidential

Data Analyst

Responsibilities:

  • Provided analytical support for the Claims, Ancillary, and Medical Management.
  • Performed Data Mapping and Logical Data Modeling; Created class diagrams, ER diagrams
  • Cleaned data by analyzing and eliminating duplicate and inaccurate data using PROC FREQ, PROC MEAN, PROC UNIVARIATE, PROC RANK, and macros in SAS; Used SQL queries to filter data.
  • Converted various SQL statements into stored procedures thereby reducing the number of database accesses.
  • Worked with Quality Control Teams to develop Test Plan and Test Cases.
  • Designed and implemented basic SQL queries for Data Report and Data Validation.
  • Developed user manuals and provided orientation and training to end users for all modified and new systems.

Environment: Base SAS, SAS/Access, SAS/Stat, SAS/Graph, SAS/SQL, SAS/ODS, SAS DI Studio, SAS/Macros, MS Excel, MS Word, PowerPoint, Oracle 9g, DB2, MS Excel, UNIX, SAS ENTERPRISE Miner, SAS EBI (Enterprise Business Intelligence) 9.4

We'd love your feedback!