Data Scientist Resume
Houston, TX
SUMMARY
- Having 7+ years of experience in interpreting and analyzing data for driving business solutions.
- Having 2 years of experience in data science area working with Machine Learning using Python language.
- Solid noledge of mathematics and experience in applying it to technical and research fields
- Certified Base and Advanced SAS programmer with two years of experience in data analysis using SAS at windows environment.
- Hands on experience in developing predictive models leveraging data from Financing, Advertising, Software, Healthcare industries.
- Worked in stats function with NumPy, visualization using Matplotlib/Seaborn and Pandas for organizing data
- Proficient at identifying patterns and trends from large and high - dimensional datasets using Machine Learning algorithms (ML) such as classification algorithms.
- Experience in using classification algorithms such as Logistic Regression, Random Forest, SVM and Naïve Bayes.
- Knowledge and experience of extracting information from text format data using Natural Language Processing (NLP) methods.
- Experience with analyzing and validating large datasets using Weka.
- Familiar with Apache Spark, Hive, Pig, and MapReduce.
- Highly skilled in using visualization tools like Power BI and ggplot2 for creating dashboards
- Experience with intermediate statistical analysis, such as linear and logistic regression, ANOVA and time-series analysis such as ARIMA.
- Generated descriptive statistics, data validation, preliminary reporting for input data.
- Knowledge and experience in Microsoft Office tools like MS Access, MS word, MS PowerPoint, MS Excel.
- Proficient with SQL Queries like all joins, insert, select, and create, etc.
- Assumed ownership for analysis projects from data modeling to final delivery and published results through articles in highly-ranked journals and presented findings at related conferences.
- Hands on experience with statistical software like SPSS and Minitab.
- Knowledge and/or experience in various programming languages including Python, R, Java, Matlab, and C
- Knowledge and Experience in social media analytics tools like Confidential Social Media Analytics, Microsoft Social Engagement, and Confidential Watson Analytics for social media.
- Having one-year university level teaching experience for various courses such as Introduction to Big Data Analytics and Data Analytics: Basic Methods. Topics covered are statistics for model building and evaluation, experimental research, correlation analysis, regression, confidence intervals, group comparisons and parametric and non-parametric models
- Able to communicate clearly, concisely and correctly in written and spoken forms.
- Strong ability to adapt and learn new technologies with agility.
TECHNICAL SKILLS
SAS Skills: Base SAS, SAS/Macro, SAS/Connect, SAS/SQL, SAS/STAT, SAS/ODS, SAS/Graph, SAS/ODS.
SAS Procedures: Print, Transpose, Contents, Means, Chart, Plot, Tabulate, Univariate, Summary, Sort, Reg, SQL, Copy, Freq, Forms, Upload, Download, Formats.
DBMS: SQL queries, Word, Access, and Excel.
Programming Languages: SAS, Base R, Python, Base Java, Matlab, and C.
Data Visualization tools: Power BI.
Statistical Software: SPSS, Minitab.
Hadoop Environment: Familiar with Apache Spark, Pig, Hive, Map Reduce, etc.
Data Mining and Text Mining Software: Weka.
Others: SAP/ERP, GIS, Confidential SPSS modeler, Confidential Social Media Analytics, MS Microsoft Social Engagement, Confidential Watson Analytics.
PROFESSIONAL EXPERIENCE
Confidential, Houston, TX
Data Scientist
Responsibilities:
- Cleaned and manipulated complex datasets to create the data foundation for further analytics and the development of key insights (Python, Excel)
- Application of various machine learning algorithms and statistical modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using Scikit-learn package in python, Jupiter and Notebook.
- Led technical implementation of advanced analytics projects, Defined the mathematical approaches, developer new and effective analytics algorithms and wrote the key pieces of mission-critical source code implementing advanced machine learning algorithms utilizing Tensorflowand other tools and languages needed.
- Performed Multinomial LinearRegression, XGBOOST, Random forest, BSpline, SVM to classify package is going to deliver on time for the new route.
- Responsible for planning & scheduling new product releases and promotional offers.
- CreatedDataQuality Scripts using SQL to validate successfuldataload and quality of thedata. Created various types ofdatavisualizations using Python and Power BI.
- Worked ondatapre-processing and cleaning thedatato perform feature engineering and performeddataimputation techniques for the missing values in the dataset using Python.
- Key focus on data governance availability, usability, integrity and security.
- Worked on Text Analytics with Nltk library of python and Punkt.
- Project No:01
- Project Name:Supply & Trading - Probabilistic Planning
- Opportunity - Through analysis of past variances to plan determine the likelihood the plan (e.g. 120 mbpd gasoline production) will be achieved. Also, wat are the probabilities of the variance being above or below plan and to wat degree. The probabilities can be used to alter the plan during the planning process or identify risks where the plan TEMPhas a low likelihood of being achieved and mitigating actions can be determined.
- Project No:02
- Project Name:Supply & Trading - Demand Forecasting
- Current State - Demand forecasts are developed using third-party software dat analyzes historical sales data and statistical forecasting methods to generate forecasted demand. The model does not incorporate other factors (i.e. publicly available data) in it’s analysis.
- Project No:03
- Project Name:Finance - Tiered Contractor Pricing Identification and Verification
- Confidential is not consistently receiving the best pricing available, as mandated by approved contract rate sheets. In instances where multi-tiered pricing exists (hourly, weekly, daily, monthly, etc.), Confidential is not always charged the most favorable price, which can lead to overbilling.Solution will define relationship between specific goods and services and identify potential billing errors.
Environment: Machine Learning Algorithms, Python, Power BI, SQL Server
Confidential, Bentonville, AR
Data Scientist
Responsibilities:
- Responsible for building conceptual models by reviewing the Confidential ’s documents and interviewing managers in different Confidential departments.
- Involved in loading data from Hive and imported to R for data analysis and visualization.
- Responsible for data identification, collection, exploration & cleaning for modeling, participate in model development
- Exploratory analysis and model building to develop predictive insights.
- Creating various types of data visualizations using ggplot library in R.
- Visualize, interpret, report findings and develop strategic uses of data.
- Obtaining a set of TEMPprincipal variables such as feature selection andfeature extraction to reduce the number of random variables under consideration.
- Feature selection techniques such as Wrapper methods and Embedded methods ( e.g. LASSO (least absolute shrinkage and selection operator) or Recursive Feature Elimination algorithm forSupport Vector Machines)
- Feature extraction techniques such as TEMPPrincipal component analysis(PCA), Factor Analysis, Linear discriminant analysis(LDA)
- Ground up Data understanding, Hypothesis formulation, data preparation and model building experience.
- Interacted with the other departments to understand and identify data needs and requirements and work with other members of the IT organization to deliver data visualization and reporting solutions to address those needs
- Responsible for providing reporting, analysis and insightful recommendations to business leaders on key performance metrics pertaining to sales & marketing.
- Design, develop and produce reports dat connect quantitative data to insights dat drive and change business.
- Feature selection approaches
- Used Python, SQL to create prediction models involving Times Series Analysis, Linear Regression, Random forest, SVM, Neural Network.
- Perform ad hoc custom analysis as needed using SQL and R.
- Using Power BI and Tableau server dat provide an easy-to-use drag and drop interface to halp them quickly turn their data into business insights.
Environment: Machine Learning Algorithms, Python, Power BI, SQL Server
Confidential, Rochester, MN
Associate Data Scientist
Responsibilities:
- Analyzed large and complex medical and health record data using Machine Learning algorithms such as Random Forest, Naïve Bays, SVM, and logistic regression by using Python.
- Validated the results using statistical techniques such as precision, recall, specificity, accuracy and ROC curve.
- Classified complex health information data to pre-non categories by using supervised learning such as linear regression and k-nearest neighbor’s algorithm(K-NN) using Python Scipy libraries such as Scipy and Numpy.
- Clustered complex health and medical records data using unsupervised learning such as k means clustering.
- Produced various routine and ad hoc reports for internal and external clients in evaluating a wide array of data types including case review, medical and prescription drug claims, and eligibility data.
- Worked on large, complex data files such as claims data, health tradition datasets linking and integrating data from disparate databases, and transforming raw data coming from complex ASCII files into finished reports using SAS in file.
- Transformed company’s old software Monarch report creation system into customized SAS reporting application using advanced features of PROC REPORT’s compute blocks, SAS templates, stored and compiled SAS macros.
- Write SAS utility macros dat greatly halped the report automation process by reducing programming effort, increase efficiency and decrease the chances of making mistakes in manual processes.
Environment: SAS, Python, R, SQL, Excel, Word, PowerPoint.
Confidential, New York City, NY
Predictive Analytics Analyst
Responsibilities:
- Gatheird and validated customer business requirements.
- Helped coordinate the secure transfer and storage of customer data.
- Collaborated with the customer to select datasets for analysis.
- Assess data quality and prepared data for analysis.
- Performed analytics on complex datasets to reveal new insights and halped customers gain a progressive view of their business.
- Built prediction models using machine learning algorithms such as decision tree and software such as Python or R.
- Compared prediction models made up of different algorithms such as SVM and Naïve Bays using recall and precision to identify the most accurate model for a given data.
- Combined ML algorithms such as regression and decision tree in Weka or Python to make a more accurate prediction modal.
- Built and delivered reports and presentations dat tell the story of the data as it impacts the business.
- Provided recommendations to clients as they relate to their data and business objectives.
- Helped to develop and grow the new service area of Business Analytics.
- Validated models include time series models, multivariate models, logistic regression, linear models, non-linear models, generalized linear models and other statistical models.
Environment: SAS, Excel, SPSS, GIS, Confidential SPSS modeler, Confidential Social Media Analytics, MS Microsoft Social EngagementPower BI, Confidential Watson Analytics
Confidential
Research Assistant
Responsibilities:
- Analyzed semi-structured and unstructured data of thousands verbal autopsy forms filled in developing countries such as India (6,777 forms) to develop automated tools for assigning the most probable cause of death to each record.
- Implemented architecture to conduct statistical analysis of relevant data coming from the narrative section of the autopsy forms in textual format to draw meaningful conclusions from statistical trends, using NLP techniques and ML algorithms.
- Performed data analysis to include data quality, the relationship of data sources and to classify the data to pre-non categories, utilizing classification algorithms like Naïve Bayes and SVM and software like Weka and R.
- Provided thought-leadership and dependable execution on diverse projects as well as recommendations on how best to understand the usage of data through visualization and reports.
- Constructed, cleaned, and documented large datasets for various projects using SAS, R and Excel
- Created metadata analysis on various gatheird data from heterogeneous sources
- Wrote SAS and R code for data manipulation for use in statistical packages for statistical analyses
- Prepared various charts and graphs for all data and drafted appropriate articles implemented statistical sampling techniques in various settings including univariate, bivariate, multivariate methods with extensive use of logistic regression
- Designed experiments, applied appropriate statistical methodologies, synthesized data for various projects
- Produced reports in excel using Proc Report
- Extensively used Proc SQL and SAS Macros for application automation
- Coded extremely complex SAS Data step and proc including the advanced techniques and the use of the available options to serve the ever evolving and dynamic business reporting needs
- Involved in aggregating data residing on the vast network of distributed servers with all forms of data storage.
Environment: SAS, R, Weka, SQL Server, ML, NLP
Confidential
Research Assistant
Responsibilities:
- Assisted with the development, maintenance, and on-going support of business decision models.
- Reporting to the Senior Manager of Loyalty Analytics, facilitated decision making and improving the performance of the loyalty program and team through the analysis and interpretation of data and reports.
- Provided statistical and data analysis support and training to students using different computer software - SAS, R, SPSS, and Excel.
- Maintained and developed necessary business strategic and performance reports.
- Worked with cross-functional teams within the University to improve modeling practices.
- Stayed on top of best practices in reporting and analytics.
- Established scalable, efficient and automated processes for model development, model validation, model implementation and largescale data analysis.
- Utilized algorithms to extract information from large data sets by using ML algorithms and software like SAS and Matlab
- Assumed ownership for analysis projects from data modeling to final delivery and published results through articles in highly-ranked journals and presented findings at related conferences, such as 2014 Rocky Mountain Bioinformatics
- Teaches diverse topics for CKME 130 (Introduction to Big Data Analytics) such as overview of big data, state of the practice in analytics, the role of the data scientist, big data analytics in industry verticals and analytics lifecycle as an end-to-end process.
- Conducts classes and presents material for CKME 132 (Data Analytics: Basic Methods). Topics covered are statistics for model building and evaluation, experimental research, correlation analysis, regression, confidence intervals, group comparisons and parametric and non-parametric models.
Environment: SAS, Matlab, Base R, Base Python, SQL, ML. SPSS, Excel, Access, Java
Confidential
Hedge fund Analyst
Responsibilities:
- Programmed and synthesized large data sets including data manipulation and transformation
- Developed strategies to analyze appropriate data to meet the objectives and implemented solutions
- Implemented statistical or econometric modeling for strategic/policy/business decisions with extensive use of time series analysis, multivariate analysis and logistic regression.
- Using the statistical models developed, predicted market outlook, risk exposure and conducted scenario analysis.
- Utilized computer programs and tools, including Bloomberg and Matlab to analyze both buy-side and sell-side securities, while considering company performances and economic factors.
- Developed a prediction tool with nonlinear regression modeling to predict the stock prices based primarily on forecasting a time series, which is equivalent to choosing a model and fitting its parameters to the data available.
Environment: Bloomberg, Matlab, Excel