Data Scientist/Machine Learning Resume San Bruno, CA - Hire IT People

SUMMARY:

10+ Years of experience, 3+ years in providing business solutions on different platforms of Data Analysis, Data Science and Machine Learning and 7 years in C++ programming and testing for Telecom
Experience in implementing Logistic Regression and skilled in Random Forests, Decision Trees, Linear Regression model and Naive Bayes, SVM, Clustering, K - NN, neural networks and Principle Component Analysis
Worked on different libraries related to Data science and Machine learning like Scikit-learn, OpenCV, NumPy, SciPy, Matplotlib, pandas, SQL, Scala etc.
Hands on SparkMlib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction.
Good Knowledge on Pyspark
Proficient in statistical programming languages like R and Python 2.x/3.x.
Involved in all the phases of project life cycle including Data acquisition (sampling methods: SRS/stratified/cluster/systematic/multistage), Power Analysis, A/B testing, Hypothesis testing, EDA (Univariate & Multivariate analysis), Data cleaning, Data Imputation (outlier detection via chi square detection, residual analysis, PCA analysis, multivariate outlier detection), Data Transformation, Features scaling, Features engineering, Statistical modeling both linear and nonlinear (logistic, linear, Naïve Bayes, decision trees, Random forest, neural networks, SVM, clustering, KNN), Dimensionality reduction using Principal Component Analysis (PCA) and Factor Analysis, testing and validation using ROC plot, K- fold cross validation, statistical significance testing, Data visualization
Documented methodology, data reports and model results and communicated with the project team manager to share the knowledge.
Used Natural Language Processing (NLP) for response modeling and sentiment analysis for products
Supported client by developing Machine Learning Algorithms using Python programming, Cluster Analysis etc.
Having Strong years of experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
Used Microsoft Azure services as part of data gathering and analyzing and developing models
Have used a CUDA-capable NVIDIA™ GPU with compute capability 3.0 for faster processing of NN
Also have an ability to design a Data warehouse/Mart or a Database model on different platforms like SQL, Oracle, SQL Server with full redundancy and normalization and AWS REDShift.
Worked on deep learning platforms like TensorFlow and knowledge on Caffe.

TECHNICAL SKILLS:

Data Analysis/Statistical Analysis: Hypothesis Test, ANOVA, Survival Analysis, Longitudinal Analysis, Experimental Design and Sample Determination, A/B Test, Z-test, T-test.

Machine Learning: Ensemble Methods( Random forest, gradient boosting, XG Boost, ADA Boost etc), SVM, KNN, Naive Bayes, Logistic/Linear regression, Decision Tress(CART/Information Gain), Fuzzy/K-means/Modes clustering, Hierarchical clustering, TensorFlow, Caffe.

Visualization Tools: Tableau, R shiny, seaborn, matplotlib

Programming Languages: Python, R, XML, SQL, C, C++

Libraries: Pandas, Numpy, Numba, Keras, SKLearn, OpenCV, SciPy, Caffe, NLP, NLTK, Google ML, ggplots

Configuration Management Tools: Git, Clearcase, VSS

Bug Tracking Tools: Jira

PROFESSIONAL EXPERIENCE:

Confidential, San Bruno, CA

Data Scientist/Machine Learning

Responsibilities:

The present project is to develop an algorithm that accurately predicts the demand of products among multiple classes based on the historical sales data available on multiple products. Further, the aim was to improve the profit by maintaining the right stock of products whose demand is high while avoiding the loss of maintaining unnecessary products.
Gathering business requirement from client and approach formulation and design methodology to match client requirements
Extraction by developing a pipeline using Amazon Redshift to retrieve the data from S3
Performed Data Cleaning, features scaling, features engineering using R
Replacement of missing data and perform a proper EDA to understand the time series data
Involved working on different databases like XML, Oracle and SQL of different platforms etc.
Involved working in Data science using Python 2.x/3.x on different data transformation and validation techniques like Dimensionality reduction using Principal Component Analysis (PCA) and A/B testing, Factor Analysis, testing and validation using ROC plot, K- fold cross validation, statistical significance testing.
Check the existence of trend and seasonality in the data
Used Python 2.x/3.x / R to develop many other machine learning algorithms such as ARIMA, HYBRID, DNN that help in decision making using Keras, TensorFlow and Sklearn using a CUDA-capable NVIDIA™ GPU.
Improvise the model until achieving best accuracy
Generated visualizations using Tableau and R-Shiny to present the findings
Also worked with several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.

Environment: Python 2.x/3.x, R, Linux, Spark, TensorFlow, Tableau SQL Server 2012, Microsoft Excel, MATLAB, SQL, Scikit-learn, Pandas, AWS(S3/Redshift), XML

Confidential, San Bruno, California

Data Scientist/NLP Developer

Responsibilities:

Develop predictive models on large scale datasets to address various business problems through leveraging advanced statistical modeling, machine learning and deep learning.
To identify the key drivers which are driving for the revenue drop and also predicting the customers who are moving from high revenue bucket to low revenue bucket using predictive models..
Extracting meaning from huge volumes of data to help improve decision making and to provide business intelligence through data driven solutions.
Developed pipeline using Redshift to retrieve the data from S3, SQL to retrieve data from Oracle database
Work closely with other analysts, data engineers to develop data infrastructure (data pipelines, reports, dashboards etc.) and other tools like Azure Services to make analytics more effective.
Gather data from different formats like XML, SQL of different platforms etc.
Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in Python 2.x/3. x.
Replacement of missing data and perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined effects.
Used Python 2.x/3.x on different data transformation and validation techniques like Dimensionality reduction using Principal Component Analysis (PCA) and Factor Analysis
Encoding the text documents as feature vectors and use algorithms to classify the text based on their polarity, also verified the incremental learning to train and topic modelling to classify into different categories.
Used Python 2.x/3.x / R to develop many other machine learning algorithms such as Decision Tree, linear/logistic regression, multivariate regression, NLP (Natural Learning Processing), Naive Bayes, Random Forests, Gradient Boosting, XG Boost, K-means, & KNN based on Unsupervised/Supervised Model that help in decision making using Keras, TensorFlow and Sklearn.
Performed model Validation using test and Validation sets via K- fold cross validation, statistical significance testing.
Performed metric evaluation via regression (RMSE, R2, MSE etc), classification (Accuracy, precision, recall, concordance, discordance etc), threshold calculations using ROC plot.
Used predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on Tableau
Provided data and analytical support for the company’s highest-priority initiatives.
Generated visualizations using Tableau to present the findings.
Participated in production meetings for managers and senior leaders - as well as specific subject meetings to create usecases with complex data for consumption by senior leaders.

Environment: Python 3.6, R, Scikit-Learn, MySQL, SQL, NoSQL, Amazon RedShift, Random Forest, xgboost, Neural Nets,Logistic Regression, sklearn etc.

Confidential

Data Analyst/Data Scientist

Responsibilities:

Acquire data from primary or secondary data sources and maintain databases/data systems.
Established new client data preparing them for entry into the new platform.
Loaded data by converting a CSV file into the corresponding database tables.
Work with management team to create a prioritized list of needs for each business segment.
Generating the summary reports for identifying the key reasons for sales improving for various regions, which helps to management to take critical decisions Updating and presenting the reports to the customers
Comparing the sales and Revenue impacts with previous month’s reports. It helps to take the right decisions to improve the business.
Gathering sales data of different commodities, different region from database and preparing the data for analyzing based on requirement
Ran diagnostic survey tool to measure and predict team performance.
Extracted, compiled and analyzed data using Excel and Adobe to build reports and provide recommendations to clients to improve team performance.
Generated ongoing reports of each active account as they are being consulted.
Involved in client-facing activities where reports were presented to upper-management and to each team.
Identify and address data quality problems by eliminating duplicates and standardizing data sets.
Locate and define new process improvement opportunities.
Used advanced Excel functions to generate spreadsheets and pivot tables.
Performed daily data queries and prepared reports on daily, weekly, monthly, and quarterly basis.
Advise client on system usage.
Execute customized self-service client dashboards.
Data cleaning and Imputing missing values based on the requirement.
Grouping the data based on the requirement and performed summary statistics

Confidential

Developer/Test Lead

Responsibilities:

Expertise in networking protocols like Ethernet, IP, TCP/UDP, SNMP, FTP, Telnet, HTTP, RIP and bus protocols like ARINC-429,RS422,AHB,SPI.Experience working with DO-254 and DO-178B standards for avionics
Lead a test team of 5 members, planned, monitored and ensured quality of the team tasks
Participated in R&D for telecom network elements design involving Ethernet protocols. Involved in integration testing of a switch which is divided into modules
Involved in network elements design involving Ethernet protocols, including the application and utilization of each protocol on specific layer of TCP/IP protocol for the network elements by coding in line with change requirements in C++.
Lead the test activities, planned the resources, validated the team progress against the test plan and deadlines.
Installed the set up environment (Plug-in, Element Manager & Network Manager).
Regression Testing is automated using shell scripting.

We provide IT Staff Augmentation Services!

Data Scientist/machine Learning Resume

San Bruno, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship