Senior Data Scientist Resume
3.00/5 (Submit Your Rating)
Albany, NY
SUMMARY:
- 10+ years of experience of Data Modeling and Analytics in creating SAS programs to produce ARIMA forecasts and made prediction of stochastic endogenous variables
- 5+ years of experience working as Data Scientist in applying Machine Learning and statistical modeling techniques to solve business problems
- Experience in building big data applications and products using open source frameworks like Hadoop, HIVE, Apache Spark, MapReduce, Python and R software
- Created Hive tables, loading with data and writing Hive queries which will run internally in map reduce way
- Manipulated, merged and created restructured datasets from big data using Machine learning, Hadoop, SAS, Python, R, SQL and built predictive models
- Experience in data mining, including predictive behavior analysis, CHAID, CART, Optimization and Customer Segmentation analysis using SAS and SQL
- Possessed outstanding communication skills and ability to handle multiple projects within a given timeframe and work cooperatively in a team
TECHNICAL SKILLS:
- Machine Learning Hadoop / Big Data Java programming
- Python C# programming SQL programming and Oracle
- Statistical Modeling Regression Modeling Data mining
- Predictive Modeling Time Series Forecasting R programming
- SAS programming Multivariate Statistics MapReduce
- Excel, Word, PowerPoint Access Visio
- Tableau Business Intelligence Knowledge in HTML and CSS
- Data Visualization Minitab Applied Mathematics
WORK EXPERIENCE:
Senior Data Scientist
Confidential
Responsibilities:
- Applied Python and machine learning algorithms such as Logistic Regression, KNN, Random Forest, K - Means and SVM to developed insight into customers’ purchasing behavior and help make better decisions and predictions
- Worked in large scale database environment like Hadoop and MapReduce, with working mechanism of Hadoop clusters, nodes and Hadoop Distributed File System (HDFS).
- Performed subset, sort, reshape, merge, slice and edit on collected data with use of Numpy and Pandas module of python
- Worked on Natural Language Processing(NLP) with NLTK module of python for application development for automated customer response
- Built hypothesis and executed ML algorithms using SPARK
- Developed Java application code contained in JAR files and deployed it to SAS to enable SAS to connect to Hadoop
- Imported data into R for data exploration and data cleaning for developing predictive models as per requirements
Data Analyst/ Graduate Student
Confidential, Albany, NY
Responsibilities:
- Applied logistic regression in Python and SAS to understand the relationship between different attributes of dataset and causal relationship between them
- Importing and exporting data into Hadoop (HDFS) and Hive using Sqoop.
- Developed a project on structural models to produce Multivariate time series forecast of consumption, investment, interest rates and other endogenous stochastic variables in SAS
- Built Logistic Regression Models to predict Probability of Default (PD)
- Created a project for predicting future sales volumes with SAS skills, including SAS/MACRO, SAS /IML, SAS/GRAPH, SAS/BASE, and SAS/STAT
- Developed Two Stage Least Square models to solve system of equations for predicting future loan demand, interest rates, etc. and performed Monte Carlo simulation using SAS
Senior Data Scientist
Confidential
Responsibilities:
- Implemented large scale data and analytics using advanced statistical and Machine Learning models to classify similar products together for developing sub-markets
- Executed queries using Hive and developed Map-Reduce jobs to analyze data
- Performed subset, sort, reshape, merge, slice and edit on collected data with use of Numpy and Pandas, Scikit learn, module of python
- Validated Macro-Economic data and conducted predictive analysis using key indicators in Python and machine learning concepts like logistic regression and Random Forest
- Used PROC SQOOP to access Apache Sqoop utility from a SAS session and to transfer data between a database and HDFS
- Implemented Machine learning algorithms on big data to find natural patterns in data that generate insight into customers’ purchasing behavior and help make better decisions and predictions
- Analyzed data sets to help transition credit risk computations, programming in R, python, and Java / Hadoop to deliver solutions
- Built demand forecasting models to predict the trend and future volumes of import/export products utilizing Time Series ARIMA programming in SAS
Statistical Analyst
Confidential, Dearborn, Michigan
Responsibilities:
- Created structural models to produce Multivariate time series forecast of retail sales and performed Monte Carlo Simulation for evaluating forecast performance in SAS software
- Implemented Hadoop (HDFS), MapReduce, and Hive techniques on big data
- Predictive modeling experience in using logistic regression to predict binary outcomes for credit risk analysis with the use of SAS
- Experience in SAS programming of simultaneous equations to forecast economic conditions on interest rate, real GDP, inflation, and investment etc.
- Created SAS macros and developed Generalized Linear Models (GLM) using nonlinear regression techniques to predict Benchmark Auction prices for off-lease vehicles
- Used CART, CHAID, decision trees, and Data Mining Techniques in SAS to develop classification models
- Built predictive models to forecast residual values of leased vehicles and estimated the demand for these used vehicles using Regression and Time Series techniques in SAS
- Pulled data from Oracle database using SQL queries, data analysis, generating reports, graphs and corrected data step syntax errors
Statistician/ Data Analyst
Confidential
Responsibilities:
- Collected data on economic indicators, analyzed economic trends, and developed models to forecast growth, interest rates, inflation and investments using SAS
- Programming experience with SQL and SAS, including relational database query construction, optimization and predictive modeling
- Practical experience in employing multivariate regression techniques to identify trends in market demand and consumption
- Developed segmentation of customer base by utilizing Multivariate Statistical Techniques, such as Clustering and Factor Analysis using SAS and Excel to get potential target customers