We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

SUMMARY

  • Artificial Intelligence visionary and innovator with proven track record of taking businesses to the next level by building Data Science capabilities and Machine Learning solutions.
  • Proficient in utilizing advanced statistics models such as linear regression, LASSO, logistic regression, elastic net, ANOVA, Generalized linear models and non - linear models, Monte Carlo Methods, factor analysis, clustering analysis, Reinforcement Learning, Principle Component Analysis, Time Series Analysis and Bayesian inference.
  • Skilled in machine learning algorithms: support vector machine, decision tree, random forest, artificial neural network, boosting, K-NN classification and K-means clustering algorithm, Naive Bayesian classifier.
  • Proficient in Statistical Modeling, Data Mining and Machine Learning Algorithms in Data Science/Forecasting/Predictive Analytics/Reinforcement Learning such as Linear and Logistics Regression, LDA, Item and Discriminate Analysis, Random Forest, K Means, Artificial Neural Network, Decision Trees, SVM, K-Nearest, Bayesian, Hidden Markov
  • Experience in Big Data such as HDFS, YARN, MapReduce, PIG, HIVE, Hue Spark, & Kafka
  • Strong background in statistics, modeling and optimization. Knowledge of A/B testing
  • Skilled in using statistical methods including exploratory data analysis, regression analysis, regularized linear models, time-series analysis, cluster analysis, goodness of fit, MonteCarlo simulation, sampling, cross-validation, ANOVA, A/B testing, etc.
  • Skilled in implementing machine learning algorithms such as KNN, decision tree, random forest, SVM, Boosting, Bagging, stacking, K-means for classification, regression and cluster problems
  • Experience with Platforms like H2O, AWS SageMaker, MLFlow, Anaconda etc.
  • Proficient in Python 3.7/3.6 programming using various packages including Pandas, NumPy, SciPy, Scikit-learn, Matplotlib, Requests, Plotly, Threading, NLTK, rpy2, Gurobi, PyTorch, TensorFlow, CookieCutter, web scraping, NLP, optimizing, plotting with packages using matplotlib.
  • Experience in using various packages in R: caret, random Forest, e1071, rpart, nnet, glmnet, tree, party, arules, lars, gbm, kernalab, ROCR, ggplot2, shiny, RWekaetc
  • Experience in Computer vision and image processing algorithm design and implementation
  • Experienced in building real-time analysis with Kafka and Spark Streaming
  • Experience with toolset GIT, Maven, Jenkins and UNIX scripting
  • Exposure to all major cloud platforms namely AWS, Azure and Google Cloud platform.
  • Experience in doing data visualization and Intelligent dashboard in Tableau, MS Power BI, Qlik
  • Hands on experience in performing statistical tests (ANOVA, Hypothesis & A/B testing), Multivariate analysis
  • Hands on experience working with relational databases like MySQL, MS Access, NoSQL databases like Document Database, Key-Value pair, Graph Database etc.
  • Experience working in AI and Deep learning platforms such as PyTorch, TensorFlow, CNN, RNN

TECHNICAL SKILLS

Data Sources: MongoDB, Netteza Datalabs, MySQL, MSSQL Server, Amazon S3 Buckets, Amazon Athena, AWS DynamoDB, Azure Blob, Data Lake, HDFS, Oracle SQL Server, MS Excel, Sharedrive, BigQuery.

Programming Languages: Python, R, SQL, SAS, MATLAB, Unix Commands, GIT, BASH Commands, C, C++, JAVA Script, C#, VB, Julia, Golang, OpenGL HTML, XML, Web forms, Reports, Web Servers.

Data Visualization: Tableau, R, R Shin,y Python, Weka, Microsoft Power BI, Plotply, D3.is, Chart.js

Data Exploration: Jupyter Notebook, Tableau, Azure ML, MS power BI, Azure IOT hub, HD Insight, Streaming Analytics, S3, Redshift

Cloud services: Google Cloud Platform, BigQuery, AWS, EC2, Lambda Funcitons, EMR2, Azure ML, AWS ML, MS power BI, Azure IOT hub, HD Insight, Streaming Analytics

Cloud Computing platforms: Google VM Instance, G Suite, Amazon AWS EC2, Lambda, Elastic, MapReduce.

Machine Learning Library: SciKit-Learn, TensorFlow, Gensim, SparkML, Pyspark, OpenCV, OpenCV4.

Machine learning Algorithms: MiobileNetSSD, YOLO, Classification, KNN, Regression, Random Forest, Clustering (K-means), Neural Networks, SVM, Bayesian Algorithm,Multivariate Gaussian, Social Media Analytics,Reinforcement Learning, Sentimental analysis, Market Base Analysis, Bagging, Boosting and Stacking

Statistical Techniques: R-Squre, Std Error, SSE, MSE, Z-Test, T-test, Null Hypothesis, P-Values, Simpson’s paradox.

PROFESSIONAL EXPERIENCE

Data Scientist

Confidential

Responsibilities:

  • Working for the smart cities project by Confidential to identify latest cutting-edge tools and technologies in data science to build IoT (Internet of Things) platform architecture.
  • Develop, train, and test machine learning models utilizing SageMaker.
  • Working with deep learning frameworks such as MXNet, Caffe 2, TensorFlow, Theano, CNTKm and Keras to build deep learning models for object detection, and facial recognition as part of computer vision.
  • Worked with Azure Cloud Platform for AI/ML for handwriting recognition and Sketch2code for integration into ML platform where developers can utilize sketching into html conversion for front end development. Sketch2Code architecture is Part of Azure Cognitive Services, the Custom Vision and Computer Vision are used for object detection, OCR including handwriting recognition.
  • Working with AWS CodeCommit for repository of code, S3 Bucket to store data, DynamoDB to store NoSQL data and Glue crawler for the data catalog.
  • Prototyping the algorithms in software using high level programming languages like Python.
  • Implementing the algorithms on real hardware like CPU/GPU/FPGA in low level programming languages like C/C++.
  • Development, simulation, evaluation, and implementation of in software of computer vision and machine learning algorithms using Programming Languages like Python and C++ applied to a variety of applications
  • Test the developed and deployed deep learning algorithms and provide ongoing support and enhancements in existing applications
  • Optimizing the algorithm flow and its code to run fast on the target hardware
  • Design, develop, deploy and support interactive data visualizations
  • Identifying data needs, gaps, sources, verify data quality, transform and clean data as needed
  • Performing statistical analysis and building high-quality prediction and classification system using data mining and machine learning techniques. Improved prediction and classification techniques by using various machine learning modeling techniques such as SVM, Naive Bayes, Decision Trees, Gradient Boosting- GBM, XGBoost, AdaNet, Random Forest, Classi cation, Linear/Logistic Regression, K-Means, K-NN along with Deep Learning applications such as TensorFlow and Keras library with the objective of achieving lowest test error.
  • Developing, analyzing and comparing algorithms (ML, AI) that could be used to solve a given problem and ranking them by their success probability
  • Design, construct and curate large functional requirements as part of RFP (Request for Proposal) for Machine Learning Modeling, Data Warehouse, Databases and Data Aggregation for the Internet of Things platform architecture.
  • Analyzed and assessed number of RFI (request for information) system requirements for data storage methods, video storage and analytics, data analytics subscriptions, monitoring services and out of the box applications etc.
  • Designed and developed smart cities RFP use cases, important features and dependencies for multifunctional integrated streetlight pole (Smart lighting pole) and Smart Electric, Gas and Water meters.
  • Developing algorithms, train, tune and validate these based on various datasets (internal/ external data sources)
  • Apply objective, analytical and orderly thinking to the analysis of complex problems
  • Communicate findings and suggesting innovative solutions
  • Visualizing/Presenting data for stakeholders using different tools and dashboard
  • Defining the preprocessing or feature engineering to be done on a given dataset
  • Developing PoC’s in the labs then do a technology transfer to the production environment
  • Identified data needs, gaps, sources, verify data quality, transform and clean data as needed
  • Create reports, analytics, models, and presentations to support business strategy and tactics.
  • Design requirements for machine learning techniques aimed at solving specific problems.
  • Working with java-script, Vue.js, vuetify.js, grapes.js and node-red for creation of interactive charts for the IoT platform.
  • Providing technical guidance on machine learning and data processing/analytics.
  • Establish technical foundations, best practices and next-generation technologies.
  • Communicate project plans, design and scope to a wide audience including both technical and non-technical people.

Environment: MS VS Code, Python (Scikit- Learn/SciPy/NumPy/Pandas/TensorFlow), R, MySQL, Git, AWS CodeCommit, Glue, YOLO, OpenCV, DynamoDb, SageMaker, Lambda, Anaconda, Snowflake, CRISP Data Mining, MS Computer Vision, Docker Containerization, Confluence, Jira, Kanban, Jupyter Notebook, Tableau, JavaScripts, Spark, Linux.

Data Scientist

Confidential

Responsibilities:

  • Building deep learning models to maximize cash collections and optimize workforce efficiency.
  • Data Acquisition of the transactional claims level data and perform data mining to get meaningful data trends to perform Machine Learning modelling.
  • Analyzing large volumes of structured and unstructured data using Python, R, Tableau and ML library such as Scikit Learn with an emphasis on to evaluate data for anomaly detection, trend analysis, and data mining to support various business objectives.
  • Writing complex SQL scripts for data acquisition. Perform flattening of long data sets and perform feature engineering on Claims data.
  • Perform NLP analysis to determine which N number of topics are important for the disposition.
  • Categorize each note into one of the N number of topics above using a document-matching NLP methodology.
  • Developing and implementing various complex models using machine learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering, KNN, PCA, regularization and imputation.
  • Performing a proper exploratory data analysis, used K-Means clustering technique to identify outliers. Dealt with unbalanced data by bootstrap resampling.
  • Implemented cost effective and cutting-edge Data Science/Big Data Lab where I can easily prototype using the most advanced tools such as Jupyter Notebook, Docker Image, Python, R, NoSQL, RStudio, GPUs/CPUs.
  • Developing optimized and scalable machine learning algorithms capable of performing predictive modeling using PyTorch, Spark, SparkML, Scikit-learn and other packages.
  • Perform feature engineering to come up with effective input features for predictive models using one hot encoding.
  • Utilizing principal component analysis and factor analysis to complete dimensional reduction of the data. Worked on ensemble Methods such as Bagging (Random Forest, Boosting (Ada-boost) & Stacking. Feature Engineering: Pearson Correlation, F-score and Dimensionality Reduction: PCA, LDA.
  • Identified, analyzed, and interpreted trends or patterns in complex data sets and provide business insights using LDA Topic Modeling.
  • Responsible for analyzing patterns across multiple years. Feature engineered various metrics to track performance and make necessary improvements in statistical models. Tool/techniques leveraged include Python, R, TensorFlow, LDA, LSA and API’s.
  • Developed a prototype pipeline in Jupyter Notebook to pre-process data and to make user recommendations based on an ensemble of machine learning models.
  • Designed and developed various interactive Intelligent dashboards in Tableau for decision making and various insights supporting business objectives.
  • Interpreting business problems from a variety of business units, projects and designing technical solutions that may require work from various sub-disciplines of data science.
  • Providing technical leadership on machine learning and data processing/analytics.
  • Involved in Jira user stories creation, Kanban Process, Sprint Planning, Daily Standups, Sprint Demos, Story Grooming, Release Planning, and PO acceptance.

Environment: Anaconda, CRISP Data Mining, GPU/CPU Servers, Docker Containerization, Confluence, BitBucket, Jira, Kanban, Jupyter Labs, Tableau, Google Cloud Platform, BigQuery, Netezza Datalabs, Spark, HDFS, Hive, Pig, Linux, Python (Scikit- Learn/SciPy/NumPy/Pandas/Pytorch), R, MySQL, Microsoft SQL Server.

Data Scientist

Confidential

Responsibilities:

  • Completed projects for multiple in analyzing diverse datasets that utilized skills in Python- NumPy, Scipy, Plotting with Matplotlib, Control Flow, Pandas, SQL, R with an emphasis on data modeling and evaluate data to deliver results and communicate findings.
  • Analyzed large volumes of structured and unstructured data that utilized my skills in Python, R, SAS, SQL, Hadoop Ecosystem such as HDFS, YARN, Spark, HIVE, Kafka &ML library such as Scikit Learn with an emphasis on to evaluate data for anomaly detection, trend analysis, and, data mining to support various business objectives.
  • Worked with Hadoop eco system covering HDFS, HBase, YARN and MapReduce. Used Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Performed a proper exploratory data analysis, used K-Means clustering technique to identify outliers. Dealt with unbalanced data by bootstrap resampling.
  • Implemented cost effective and cutting-edge Data Science/Big Data Lab where I can easily prototype using the most advanced tools such as: Apache Spark, Apache Drill, Hadoop/Hive/Impala, Python, R, NoSQL, Apache Zeppelin, Jupyter, RStudio etc.
  • Used cloud solutions (AWS|VPC) to scale these production-grade prototypes to enable advanced analytics capabilities.
  • Set up a Hadoop/Spark cluster in the cloud with a remote Jupyter Notebook interface and used it for ETL and modeling.
  • Identified, analyzed, and interpret trends or patterns in complex data sets and provide business insights using R and R shiny app.
  • Responsible for analyzing patterns across multiple years. Feature engineered various metrics to track performance and make necessary improvements in statistical models. Tool/techniques leveraged include Python, R, TensorFlow, Google Cloud, and API's.
  • Developed MapReduce framework to extract and transform the data sets stored in the HDFS. Involved in e ciently collecting and aggregating large amounts of data into Hadoop cluster using Flume.
  • Exported the result set from Hive to MySQL using Sqoop after processing the data.
  • Developed a prototype pipeline in Spark to pre-process data and to make user recommendations based on an ensemble of machine learning models
  • Build BI reports to assist VP’s in making critical business decisions

Environment: Azure Cloud Platform, Azure Data Lake, Data Factory, SQL Database, Anaconda, Machine learning, Tableau, Databricks, Azure MLFlow, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit- Learn/SciPy/NumPy/Pandas/PyTorch), R, SAS, SPSS, MySQL, Eclipse, PL/SQL, SQL connector, AWS S3.

Data Scientist

Confidential

Responsibilities:

  • Conducted qualitative and quantitative research to gather data from data mart
  • Responsible for data identification, collection, exploration & cleaning for modelling, participate in model development
  • Visualize, interpret, report findings and develop strategic uses of data.
  • Understand transaction data and develop Analytics insights using Statistical models using Machine learning.
  • Involved in gathering requirements while uncovering and defining multiple dimensions. Extracted data from one or more source files and Databases.
  • Collected Database of sales of items in all aspects. Cleaned, filtered and transformed data to specified format.
  • Designed various Intelligent reports using various reporting tools.
  • Cleaned data using R, then visualize the data, and derive statistical modelling plots.
  • Performed data visualization via ggplot 2 in R and matplotlib in Python.
  • Worked in Amazon Web Services cloud computing environment.
  • Responsible for providing reports, analysis and insightful recommendations to business leaders on key performance metrics pertaining to sales & marketing.
  • Gathered all the data that is required from multiple data sources and creating datasets that will be used in analysis.
  • Used R to identify product performance via Classification, tree map and regression models along with visualizing data for interactive understanding and decision making.
  • Created Intelligent dashboards and visualization on regular basis using ggplot2 and Tableau Tabpy. Rserve.
  • Accomplished multiple tasks from collecting data to organizing data and interpreting statistical information.
  • Created dynamic linear models to perform trend analysis on customer transactional data in R.
  • Conducted exploratory and descriptive data analysis of large data sets.
  • Expertise in Business Intelligence and data visualization using R and Tableau. Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
  • Applied concepts of probability, distribution and statistical inference on given dataset to unearth interesting findings through use of comparison, T-test, F-test, R-squared, P-value.

Environment: Machine learning, AWS, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit-Learn/Scipy/NumPy/Pandas/PyTorch), R, SAS, SPSS, MySQL, Eclipse, PL/SQL, SQL connector, Tableau.

We'd love your feedback!