Data Scientist/machine Learning Resume
San Bruno, CA
SUMMARY:
- 10+ Years of experience, 3+ years in providing business solutions on different platforms of Data Analysis, Data Science and Machine Learning and 7 years in C++ programming and testing for Telecom
- Experience in implementing Logistic Regression and skilled in Random Forests, Decision Trees, Linear Regression model and Naive Bayes, SVM, Clustering, K - NN, neural networks and Principle Component Analysis
- Worked on different libraries related to Data science and Machine learning like Scikit-learn, OpenCV, NumPy, SciPy, Matplotlib, pandas, SQL, Scala etc.
- Hands on SparkMlib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction.
- Good Knowledge on Pyspark
- Proficient in statistical programming languages like R and Python 2.x/3.x.
- Involved in all the phases of project life cycle including Data acquisition (sampling methods: SRS/stratified/cluster/systematic/multistage), Power Analysis, A/B testing, Hypothesis testing, EDA (Univariate & Multivariate analysis), Data cleaning, Data Imputation (outlier detection via chi square detection, residual analysis, PCA analysis, multivariate outlier detection), Data Transformation, Features scaling, Features engineering, Statistical modeling both linear and nonlinear (logistic, linear, Naïve Bayes, decision trees, Random forest, neural networks, SVM, clustering, KNN), Dimensionality reduction using Principal Component Analysis (PCA) and Factor Analysis, testing and validation using ROC plot, K- fold cross validation, statistical significance testing, Data visualization
- Documented methodology, data reports and model results and communicated with the project team manager to share the knowledge.
- Used Natural Language Processing (NLP) for response modeling and sentiment analysis for products
- Supported client by developing Machine Learning Algorithms using Python programming, Cluster Analysis etc.
- Having Strong years of experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
- Used Microsoft Azure services as part of data gathering and analyzing and developing models
- Have used a CUDA-capable NVIDIA™ GPU with compute capability 3.0 for faster processing of NN
- Also have an ability to design a Data warehouse/Mart or a Database model on different platforms like SQL, Oracle, SQL Server with full redundancy and normalization and AWS REDShift.
- Worked on deep learning platforms like TensorFlow and knowledge on Caffe.
TECHNICAL SKILLS:
Data Analysis/Statistical Analysis: Hypothesis Test, ANOVA, Survival Analysis, Longitudinal Analysis, Experimental Design and Sample Determination, A/B Test, Z-test, T-test.
Machine Learning: Ensemble Methods( Random forest, gradient boosting, XG Boost, ADA Boost etc), SVM, KNN, Naive Bayes, Logistic/Linear regression, Decision Tress(CART/Information Gain), Fuzzy/K-means/Modes clustering, Hierarchical clustering, TensorFlow, Caffe.
Visualization Tools: Tableau, R shiny, seaborn, matplotlib
Programming Languages: Python, R, XML, SQL, C, C++
Libraries: Pandas, Numpy, Numba, Keras, SKLearn, OpenCV, SciPy, Caffe, NLP, NLTK, Google ML, ggplots
Configuration Management Tools: Git, Clearcase, VSS
Bug Tracking Tools: Jira
PROFESSIONAL EXPERIENCE:
Confidential, San Bruno, CA
Data Scientist/Machine Learning
Responsibilities:
- The present project is to develop an algorithm that accurately predicts the demand of products among multiple classes based on the historical sales data available on multiple products. Further, the aim was to improve the profit by maintaining the right stock of products whose demand is high while avoiding the loss of maintaining unnecessary products.
- Gathering business requirement from client and approach formulation and design methodology to match client requirements
- Extraction by developing a pipeline using Amazon Redshift to retrieve the data from S3
- Performed Data Cleaning, features scaling, features engineering using R
- Replacement of missing data and perform a proper EDA to understand the time series data
- Involved working on different databases like XML, Oracle and SQL of different platforms etc.
- Involved working in Data science using Python 2.x/3.x on different data transformation and validation techniques like Dimensionality reduction using Principal Component Analysis (PCA) and A/B testing, Factor Analysis, testing and validation using ROC plot, K- fold cross validation, statistical significance testing.
- Check the existence of trend and seasonality in the data
- Used Python 2.x/3.x / R to develop many other machine learning algorithms such as ARIMA, HYBRID, DNN that help in decision making using Keras, TensorFlow and Sklearn using a CUDA-capable NVIDIA™ GPU.
- Improvise the model until achieving best accuracy
- Generated visualizations using Tableau and R-Shiny to present the findings
- Also worked with several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.
Environment: Python 2.x/3.x, R, Linux, Spark, TensorFlow, Tableau SQL Server 2012, Microsoft Excel, MATLAB, SQL, Scikit-learn, Pandas, AWS(S3/Redshift), XML
Confidential, San Bruno, California
Data Scientist/NLP Developer
Responsibilities:
- Develop predictive models on large scale datasets to address various business problems through leveraging advanced statistical modeling, machine learning and deep learning.
- To identify the key drivers which are driving for the revenue drop and also predicting the customers who are moving from high revenue bucket to low revenue bucket using predictive models..
- Extracting meaning from huge volumes of data to help improve decision making and to provide business intelligence through data driven solutions.
- Developed pipeline using Redshift to retrieve the data from S3, SQL to retrieve data from Oracle database
- Work closely with other analysts, data engineers to develop data infrastructure (data pipelines, reports, dashboards etc.) and other tools like Azure Services to make analytics more effective.
- Gather data from different formats like XML, SQL of different platforms etc.
- Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in Python 2.x/3. x.
- Replacement of missing data and perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined effects.
- Used Python 2.x/3.x on different data transformation and validation techniques like Dimensionality reduction using Principal Component Analysis (PCA) and Factor Analysis
- Encoding the text documents as feature vectors and use algorithms to classify the text based on their polarity, also verified the incremental learning to train and topic modelling to classify into different categories.
- Used Python 2.x/3.x / R to develop many other machine learning algorithms such as Decision Tree, linear/logistic regression, multivariate regression, NLP (Natural Learning Processing), Naive Bayes, Random Forests, Gradient Boosting, XG Boost, K-means, & KNN based on Unsupervised/Supervised Model that help in decision making using Keras, TensorFlow and Sklearn.
- Performed model Validation using test and Validation sets via K- fold cross validation, statistical significance testing.
- Performed metric evaluation via regression (RMSE, R2, MSE etc), classification (Accuracy, precision, recall, concordance, discordance etc), threshold calculations using ROC plot.
- Used predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on Tableau
- Provided data and analytical support for the company’s highest-priority initiatives.
- Generated visualizations using Tableau to present the findings.
- Participated in production meetings for managers and senior leaders - as well as specific subject meetings to create usecases with complex data for consumption by senior leaders.
Environment: Python 3.6, R, Scikit-Learn, MySQL, SQL, NoSQL, Amazon RedShift, Random Forest, xgboost, Neural Nets,Logistic Regression, sklearn etc.
Confidential
Data Analyst/Data Scientist
Responsibilities:
- Acquire data from primary or secondary data sources and maintain databases/data systems.
- Established new client data preparing them for entry into the new platform.
- Loaded data by converting a CSV file into the corresponding database tables.
- Work with management team to create a prioritized list of needs for each business segment.
- Generating the summary reports for identifying the key reasons for sales improving for various regions, which helps to management to take critical decisions Updating and presenting the reports to the customers
- Comparing the sales and Revenue impacts with previous month’s reports. It helps to take the right decisions to improve the business.
- Gathering sales data of different commodities, different region from database and preparing the data for analyzing based on requirement
- Ran diagnostic survey tool to measure and predict team performance.
- Extracted, compiled and analyzed data using Excel and Adobe to build reports and provide recommendations to clients to improve team performance.
- Generated ongoing reports of each active account as they are being consulted.
- Involved in client-facing activities where reports were presented to upper-management and to each team.
- Identify and address data quality problems by eliminating duplicates and standardizing data sets.
- Locate and define new process improvement opportunities.
- Used advanced Excel functions to generate spreadsheets and pivot tables.
- Performed daily data queries and prepared reports on daily, weekly, monthly, and quarterly basis.
- Advise client on system usage.
- Execute customized self-service client dashboards.
- Data cleaning and Imputing missing values based on the requirement.
- Grouping the data based on the requirement and performed summary statistics
Confidential
Developer/Test Lead
Responsibilities:
- Expertise in networking protocols like Ethernet, IP, TCP/UDP, SNMP, FTP, Telnet, HTTP, RIP and bus protocols like ARINC-429,RS422,AHB,SPI.Experience working with DO-254 and DO-178B standards for avionics
- Lead a test team of 5 members, planned, monitored and ensured quality of the team tasks
- Participated in R&D for telecom network elements design involving Ethernet protocols. Involved in integration testing of a switch which is divided into modules
- Involved in network elements design involving Ethernet protocols, including the application and utilization of each protocol on specific layer of TCP/IP protocol for the network elements by coding in line with change requirements in C++.
- Lead the test activities, planned the resources, validated the team progress against the test plan and deadlines.
- Installed the set up environment (Plug-in, Element Manager & Network Manager).
- Regression Testing is automated using shell scripting.