We provide IT Staff Augmentation Services!

Sr. Data Scientist Resume

3.00/5 (Submit Your Rating)

Cincinnati, OH

SUMMARY:

  • Highly efficient Data Scientist/Data Engineer with around 7 years of experience in areas including Data Analysis, Statistical Analysis, Machine Learning, Deep Learning, Data mining with large data sets of structured and unstructured data in domains like banking, travel services, strong functional knowledge on business processes, latest market trends and manufacturing industries.
  • Skilled in performing web crawling, web scraping, data parsing, data manipulation and data preparation with methods including describing data contents, computing descriptive statistics of data, regex, split and combine, remap, merge, subset, reindex, melt and reshape.
  • Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleansing, data engineering, features scaling, features engineering, statistical modeling (decision trees, regression models, neural networks, SVM, clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K - fold cross validation, ad-hoc analysis and data visualization.
  • Professional working experience with Python 2.X / 3.X libraries including MatplotLib, Numpy, Scipy, Pandas, Beautiful Soup, Seaborn, Scikit-learn and NLTK for analysis purpose.
  • Experience in implementing ETL & data analysis with SQL Queries & various analytic tools such as Anaconda 4.0 / 2.X (Jupyter Notebook, Spyder), Matlab 8.0 and Excel 2010/2013, Spotfire, KNIME, RapidMiner, Alteryx, Azure ML Studio, Adobe Analytics using Adobe Clickstream, etc.
  • Working experience in Statistical Analysis and Testing including Hypothesis testing, Anova, Survival Analysis, Longitudinal Analysis, Experimental Design, Sample Determination and A/B testing.
  • Hands-on experience in importing and exporting data using SQL Queries in Relational Databases including Oracle 11g / 12c, MySQL 5.0, MS SQL Server, NoSQL database like MongoDB 3.3 / 3.4, HBase, CouchBase, Redis, Vertica, Cassandra.
  • Hands-on experience in Distributed Data/Computing tools like HDFS, Map/Reduce, Hadoop, Apache Hive, Apache Spark, Kafka, Apache Pig.
  • Experienced in JIRA software for Plan, Track, Report and Release management.
  • Hands on experience with Alteryx software for ETL, data preparation for EDA and performing spatial and predictive analytics.
  • Extensively experienced in Excel Pivot tables to run and analyze the result data set and perform UNIX scripting.
  • Experience with Object Oriented Analysis and Design (OOAD) using UML, Rational Unified Process (RUP), Rational Rose and MS Visio.
  • Working experience in version control tools such as Git 2.X and SVN to coordinate work on file with multiple team members.
  • Driven by delivering the best results and taking the ownership of work with intellectual honesty while being an effective communicator with teammates and a quick learner in any new business industry or software environment.
  • Strong analytical skills with the ability to collect, organize, analyze and disseminate significant amount of information with attention to detail and accuracy.

TECHNICAL SKILLS:

Programming & Scripting languages: R (Packages Stats, Zoo, Matrix, data, table, OpenSSL), Python, SQL, C, C++, JAVA, JCL, COBOL, HTML, CSS, JSP, Java Script, Scala, Apache Pig Latin.

Database: SQL, MySQL, TSQL, MS Access, Oracle, Hive, MongoDB, Redis, Cassandra, PostgreSQL

Statistical Software: SPSS, R, SAS

Algorithms Skills: Machine Learning, Neural Networks, Deep Learning, NLP, Bayesian Learning, Optimization, Prediction, Pattern Identification, Data / Text mining, Regression, Logistic Regression, Bayesian Belief, Clustering, Classification, Statistical modeling

Data Science/Data Analysis Tools & Techniques: Generalized Linear Models, Logistic Regressions, Boxplots, K-Means, Clustering, Neural networks, AI, Teradata, Tableau, KNIME, Azure ML, Alteryx, SVN, PuTTY, WinSCP, Redmine (Bug Tracking, Documentation, Scrum)

Development Tool: R Studio, Notepad++, Python, Jupyter, Spyder IDE

Numpy, SciPy, Pandas, Scikit: learn, Matplotlib, Seaborn, Statsmodels, Keras, TensorFlow, Theano, TensorFlow, NLTK, Scrapy

Techniques: Machine learning, Regression, Clustering, Data mining

Machine Learning: Na ve Bayes, Decision trees, Regression models, Random Forests, Time-series, K-means, K-NN

Cloud Technologies: AWS (EC2, S3, RDS, EBS, VPC, IAM, Security Groups), Microsoft Azure, Rackspace, GCP

Operating Systems: Windows, Linux, Unix, Macintosh HD, Red Hat

PROFESSIONAL EXPERIENCE:

SR. Data Scientist

Confidential, Cincinnati, OH

Responsibilities:

  • Gathered, analyzed, documented and translated application requirements into data models, supported standardization of documentation and the adoption of standards and practices related to data and applications.
  • Queried and aggregated data using SQL queries and other ETL methods from Amazon Redshift, Azure Cloud, GCP to get the sample dataset.
  • Identified patterns, data quality issues, and leveraged insights by communicating with BI team.
  • Automated solutions to manual processes with big data tools (Spark, Python, AWS)
  • Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
  • Created a pipeline to scrape the newly added trials using Microsoft Azure Data Factory and performed data transformation and Exploratory Data Analysis from different data sources using Microsoft Azure Databricks and visualized it in Power BI and reported and presented the results to the stakeholders.
  • In preprocessing phase, used Pandas to remove or replace all the missing data, and feature engineering to eliminate unrelated features.
  • Balanced the dataset with Over-sampling the minority label class and Under-sampling the majority label class.
  • In data exploration stage used correlation analysis and graphical techniques to get some insights about the claim data.
  • Applied machine learning techniques to tap into new markets, new customers and put forth my recommendations to the top management which resulted in increase in customer base by 5% and customer portfolio by 9%.
  • Used KNIME to do customer segmentation and profiling using ML models to design an online marketing campaign that increased the customer base by 3%.
  • Built and trained multi-layered Neural Networks to implement Deep Learning by using Tensorflow, Keras, KNIME & Azure ML studio.
  • Performed hyper-parameter tuning by doing Distributed Cross Validation in Spark to speed up the computation process.
  • Exported trained models into Protobuf to be served by Tensorflow Serving and performed integration job with client's application.
  • Analyzed customer master data for the identification of prospective business, to understand their business needs, built client relationships and explored opportunities for cross-selling of financial products. 60% (Increased from 40%) of customers availed more than 6 products.
  • Improved fraud detection performance by using Random Forest and Gradient Boosting for feature selection with Python Scikit-learn.
  • Tested classification algorithms such as Logistic Regression, Gradient Boosting and Random Forest using Pandas and Scikit-learn and evaluated the performance.
  • Worked extensively with data governance team to maintain data models, Metadata and dictionaries.
  • Developed advanced models using multivariate regression, Logistic regression, Random forests, decision trees and clustering.
  • Applied predictive analysis and statistical modeling techniques using Python, R, Tableau and Spotfire to analyze customer behavior and offer customized products, reduce delinquency rate and default rate. Lead to fall in default rates from 5% to 2%.
  • Applied various machine learning algorithms and statistical modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
  • Implemented, tuned and tested the model on AWS EC2 with the best algorithm and parameters.
  • Set up data preprocessing pipeline to guarantee the consistency between the training data and new coming data.
  • Deployed the model on AWS Lambda, collaborated with development team to build the business solutions.
  • Collected the feedback after deployment, retrained the model to improve the performance.
  • Discovered flaws in the methodology being used to calculate weather peril zone relativities; designed and implemented a 3D algorithm based on k-means clustering and Monte Carlo methods.

Environment: Data Governance, SQL Server, Azure Cloud, GCP, Python, ETL, NLP, MS Office Suite - Excel (Pivot, VLOOKUP), DB2, R, Visio, HP ALM, Agile, Azure, MDM, Share point, Data Quality, Tableau, Spotfire, KNIME and Reference Data Management.

SR. Data Scientist

Confidential, Urbana, Maryland

Responsibilities:

  • Analyzed customer's historic calls data and other network metric that is measured every two hours, so as to see correlation between customer troubles calls that leads to truck rolls to individual customer houses or business places, using some advance statistical R packages and tools such as Time Series, GLM and NNET, implemented on Azure ML studio, result was delivered that made it possible to save $60,000.00 daily on truck rolls.
  • Worked with team to identify gap within the data while working on Node Congestion, as it is one of the issues experienced in telecommunication industries, detected discrepancy and ensured recommendation was given for future data usage in the analytics that improved data quality by 30%.
  • Lead presentation of prepared dashboard presentation to the upper management in Tableau and PowerPoint.
  • Provided expert advice to the Sr. Director on the analytics results and implementation of recommendations.
  • Helped in building Hive, HBase tables in Hadoop to help improve data analytic process.
  • Performed NLP using KNIME (text mining and analysis, topic modeling, Ngram, and Sentiment Analysis) with the help of survey data to understand customer reactions to build Attrition Model and recommendation to help improve customer relationship by 25%.
  • Analytics Skills and Tools in Big Data Technology: Hadoop ecosystem, HDFS, Map Reduce, Hive, Hive QL, Pig, Sqoop, Map Reduce, text analytics with NLP, NoSQL Enterprise Architecture, Data Modeling with R, SAS, Alteryx, SPSS, Excel, Minitab, Access, Oracle ERP, Hadoop, Cloudera, and Python.

Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, Qlikview, MatplotLib, NLP, SpaCy, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), Map Reduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS, Azure ML Studio

Data Scientist

Confidential, Houston, TX

Responsibilities:

  • Worked on corpus of enterprise data as a team, to reduce redundancies, coordinated with different functional teams and delivered insights from a client’s requirements and built tasks that assisted our team to perform tasks faster and become more reliable.
  • Discovered data sources, web-scraped data from public webpages, from company databases, cleansed and condensed them to create features and define classes from the underlying data in order to make them model ready and make models perform better using Adobe Analytics, Tealeaf, Sitespect and Enterprise Databases.
  • Designed and built statistical models and feature extraction systems. Used models to solve business problems related to company’s data pipeline and communicated these solutions to executive stakeholders.
  • Performed data integrity checks, data cleansing, exploratory analysis and feature engineering using Python libraries like Pandas, Matplotlib etc.
  • Worked on missing value imputation, outliers identification with statistical methodologies using Pandas, Numpy.
  • Conducted analysis on assessing customer consuming behaviors and discovered value of customers with RFM analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Developed personalized product recommendation with Machine Learning algorithms, including Collaborative filtering and Gradient Boosting Tree, to better meet the needs of existing customers and acquire new customers which resulted in increase in customer base by 4%.
  • Used RMSE score, Confusion matrix, ROC, AUC, Cross validation and A/B testing to evaluate model performance in both simulated environment and real world with high recall rates as high as 94%.
  • Implemented Statistical Analysis and Testing including Hypothesis testing, Anova, Survival Analysis, Longitudinal Analysis, Experimental Design, Sample Determination and A/B testing to analyze customer behavior and offer customized products, reduce delinquency rate and default rate from 6% to 3%.
  • Tackled highly imbalanced datasets using undersampling with ensemble methods, oversampling with SMOTE and cost sensitive algorithms with Python Scikit-learn.
  • Set up data preprocessing pipeline to guarantee the consistency between the training data and new coming data.
  • Optimized custom algorithms with stochastic gradient descent algorithm and fine-tuned the algorithms’ parameters with manual tuning and automated tuning such as Bayesian Optimization.
  • Collected feedback after deployment and retrained models to improve the performance.
  • Conveyed stories through data to clients using reports and dashboard to help them know and visualize their business metrics better, improving customer satisfaction by 60 %.

Environment: Machine learning, AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit-Learn/SciPy/NumPy/Pandas), NLP, NLP Core, Spacy, R, SAS, SPSS, MySQL, Eclipse, PL/SQL, SQL connector, Tableau

Data Analyst

Confidential

Responsibilities:

  • Collaborated with database engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction and merging from SQL server database.
  • Gathered, analyzed, and translated business requirements, communicated with other departments to collect client business requirements and access available data.
  • Responsible for Data Cleansing, features scaling, features engineering by using NumPy and Pandas in Python.
  • Conducted Exploratory Data Analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlation between features.
  • Used information value, principal components analysis, and Chi square feature selection techniques to identify.
  • Applied resampling methods like Synthetic Minority Over Sampling Technique (SMOTE) to balance the classes in large data sets.
  • Designed and implemented customized Linear regression model to predict the sales utilizing diverse sources of data to predict demand, risk and price elasticity.
  • Experimented with multiple classification algorithms, such as Logistic Regression, Support Vector Machine (SVM), Random Forest, ADA boost and Gradient boosting using Python Scikit-Learn and evaluated the performance on customer discount optimization on millions of customers.
  • Used F-Score, AUC/ROC, Confusion Matrix and RMSE to evaluate different model performance.
  • Performed data visualization and Designed dashboards with Tableau, and generated complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders.
  • Overfitting issues was resolved by batch norm, dropout helped to overcome the issue.
  • Conducted in-depth analysis and predictive modelling to uncover hidden opportunities; communicate insights to the product, sales and marketing teams.
  • Built models using Python to predict the probability of attendance for various campaigns and events.

Environment: Python, Tableau 6.0, ML Lib, PL/SQL, HDFS, Teradata 12, HADOOP (HDFS), HIVE, AWS.

Data Analyst

Confidential

Responsibilities:

  • Performed visual and exploratory data analysis on both unstructured and structured data from the logs collected by Confidential Maternal/Fetal monitors to validate the accuracy of the models as a part of design and development team.
  • Discovered entities, facts and relationships using SQL in massive quantities of enterprise data to help Research Scientists and Product Designers provide actionable information to senior decision-makers.
  • Analyzed weekly and monthly reporting on staffing needs such as headcount, turnover, cost analysis, prepare presentation slides for weekly meetings with executives.
  • Assisted in building reporting on inventory of goods for the products the Warehouses built, communicated with management and supervisors across different campuses and locations.
  • Assisted with maintaining electronic filing system for all reports, documentation and official client communication.

Environment: Informatica 9.1, Oracle 11g, SQL Developer, PL/SQL, Cognos, Splunk, TOAD, MS Access, MS Excel.

We'd love your feedback!