We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

Oakland, CA

SUMMARY

  • 8 years of working experience in data analysis and statistical modeling with extensive use of SAS & R
  • Proficient in R - Commander, R-Studio and Base SAS, SAS/Macro, SAS/Stat, SAS/Graph, SAS/SQL.
  • Expert in Hypothesis testing, ANOVA, and Linear and Logistic Regression Analysis.
  • Adept in Factor analysis, Decision Trees, clustering (K-means/Hierarchical, DBscan) techniques.
  • Accomplished in Text Analytics using NavieBayes classification method using R-Studio.
  • Experienced in causal and mechanistic analysis of the given scenario to recognizing key performance indicators (KPI).
  • Proficient in Survey Design, Questionnaire Design, Design of Experiment and Conjoint Analysis.
  • Capable to work in time series analysis using ARIMA model.
  • Adept skill both in structured programing in SAS Data stage and dynamic programing using SAS Macros.
  • Extensive experience of using advanced statistical Procs like ANOVA, GLM, UNIVARIATE etc.
  • Deep understanding of Statistical Modeling, Multivariate Analysis and Standard Procedures. Familiar with model testing, problem analysis, model comparison and validation.
  • Familiar with a large number of SAS functions and SAS data step options.
  • Accustomed with shell scripting to handle SAS files and manage SAS program.
  • Strong understanding of Data Warehousing concepts like Fact Tables, Dimension Tables, Star and Snow Flake Schema, Metadata and Data marts.
  • Familiar in collecting data from various database and cleaning data for statistical analysis and model.
  • Proficiency in using SQL to manipulate data, query expressions, join statements, subquery etc.
  • Proficient in Python scripting. Worked in stats function with Numpy, visualization using Matplotlib and Pandas for organizing data.
  • Used Ski-kit packages in Python for predictions.
  • Proficient in Boosting Algorithms such as Gradient Boost (make powerful Predictions), Adaboost (adaptive boost) and XgBoost.
  • Used Dimensionality Reduction methods such as PCA (Principal component Analysis), Factor Analysis etc. Implemented bootstrapping methods such as Random Forests (Classification), K-Means Clustering, KNN (K nearest neighbors), Naïve Bayes, SVM (Support vector Machines), Decision Tree, Linear and Logistic Regression Methods.
  • Considerable understanding of RDBMS (Relational Database management system) - OLAP, OLTP and query via T-SQL.
  • Knowledge of basic construct of HDFS (Hadoop File Distribution System), Map Reduce and use of tools like DMX-h for operations on a Hadoop Cluster.
  • Profound analytical and problem solving skills along with ability to understand current business processes and implement efficient solutions to problems.
  • Ability to present complex data and analytics to non-analytical audience.
  • Detailed oriented professional, ensuring highest level of quality in reports & data analysis.
  • Advanced written and verbal communication skills.
  • Expert in innovation and formulation of new ideas and predictive models.
  • Proven ability of multi-tasking to engage with stake holders at various levels to process data at large scale (Big Data) with enterprise systems.
  • Member of R-User Group (BARUG- Bay Area R user group).
  • Member of Analytics club

TECHNICAL SKILLS

BI and Visualization: Tableau Desktop 8.3, R, Python and SAS

Databases: MS SQL Server, MS Access, and MySQL

Operating Systems: Windows

Other Tools: MS Office (including Excel, Word, PPT, and Access)

Statistical Tools: R-Studio, Numpy (Python), Base SAS, SAS/Macros, SAS/Graph, SAS/Stat, SAS

Programing Language: R, Python and SQL

Statistical Concept: Inference Methods (Chi-square/T test), ANOVA, RegressionFactor Analysis, Logistic Regression, Text Analytics (Naive Bayes), Decision Trees, and Cluster (K-means/ Hierarchical), Forecasting/Time Series Analysis (ARIMA model)

PROFESSIONAL EXPERIENCE

Confidential, Oakland, CA

Data Scientist

Responsibilities:

  • Involved in gathering requirements while uncovering and defining multiple dimensions.
  • Extracted data from one or more source files and Databases.
  • Participated in continuous interaction with Marketing and Finance teams for obtaining the data and data quality.
  • Accomplished multiple tasks from collecting data to organizing data and interpreting statistical information.
  • Unearthed the raw data by doing the Explanatory Data Analysis (Classification, splitting, cross-validation).
  • Converted raw data to processed data by merging, finding outliers, errors, trends, missing values and distributions in the data.
  • Utilized various techniques like Histogram, Bar plot, Pie-Chart, Scatter plot, Box plots to determine the condition of the data.
  • Conducted data exploration (Dplyr, TidyR) to look for trends, patterns, grouping, and deviations in the data to understand the data diagnostics.
  • Designed various reports using Pivot - tables, and different charts like Bar plot, Pie plot, Histograms etc.
  • Identified the financial and non-financial independent attributes that were to be used in modeling.
  • Developed segmentation trees (Optimization, Pruning, Modelling) to find out high risk segment of the population.
  • Achieved multi-dimensional segmentation analysis to discover business rules and finalize the segmentation procedure.
  • Used Logistic Regression to obtain the probabilities for non-defaulters and defaulters.
  • Identified Key performance indicators (KPI’s) among all the given attributes.
  • Executed what if scenario analysis to discover effective, implementable ways of reducing loan defaults.
  • Maintained a log of all the iterations performed in R during the data modeling process.
  • Fashioned scoring model to score propensity of loan applicants to default with high degree of accuracy in capturing defaulters.
  • Created an ROI dashboard on campaigns spending and measuring its efficacy.
  • Led med-sized teams for production support and handle multiple tasks with strong interpersonal communication, technical aptitude and learning skill to adapt to environment.

Environment: R, SQL-Server, Microsoft Excel and Tableau.

Confidential, Oakland, CA

Research Scientist

Responsibilities:

  • Collected a database of the proposed research project. Accumulated raw data and filtered to RDBMS.
  • Performed Chi-Square test, ANOVA test to identify significance between data samples. Performed classification, clustering and Time series analysis in collaboration with Research faculty.
  • Contributed to development in identifying grants and funding opportunities for projects as well as maintain grant life cycle
  • Coordinating with management and diverse academic and technical staff identifying challenges and developing appropriate strategies for maintenance and generating a platform for projects across diverse domains (ie; Finance, Marketing, IT research).
  • Co-ordinated with Research faculty team in identifying and developing trends in Business Research and sustainability of long term projects.
  • Created research report of Projects.

Environment: - R, SQL, MS EXCEL

Confidential, San leandro, CA

Data Scientist

Responsibilities:

  • Collected Database of sales of items in all aspects. Cleaned, filtered and transformed data to specified format.
  • Prepared the work space for Markdown.
  • Accomplished Data analysis, statistical analysis, generated reports, listings and graphs
  • Instigated the Test Analysis to understand the potentiality of insurer.
  • Embedded code ie; Weaved code narrating to single doc format, rendering the doc to create a finished output.
  • Customized the process and open the door for automated targeted reporting.
  • Responsible for all data reporting, data mining activities and fraud detection activities including data prep and design, model development and reporting results.
  • Used R to identify product performance via Classification, tree map and regression models along with visualizing data for interactive understanding and decision making.
  • Find outliers, anomalies, trends and fraudulent behavior.
  • Using a combination of R and No SQL models and analysis and deployed the same in real time.
  • Customized R code chunks, labelling and reusing code chunks.
  • Used to Forecast and ARIMA model for time-series analysis of customer behavior and purchase.
  • Provided insights on effectively running of marketing campaigns including direct mail, email, mobile and other digital channels.
  • Documented all programs and procedures to ensure an accurate historical record of work completed on assigned project as well as to improve quality and efficacy
  • Produced quality reports for business team and business data manager.
  • SEO optimization to stay top in search of web search.

Environment: R, SAS/Macros, SQL, No SQL, MS Excel, MS Access, Tableau.

Confidential

Financial Reporting

Responsibilities:

  • Scraped both structured and structured data from desperate sources (Web page and public repositories)
  • Used SQL joins extensively on SQL Developer to fetch data from MS SQL database
  • Developed multiple prepared statements, stored procedures for efficient update of database to achieve speedup
  • Performed outlier detection analysis on data as part of Dodd Frank requirements
  • Produced bond and attribute coverage of financial instruments present in the database on regular basis
  • Developed modules for enhancement of time - series and term-structure functionalities
  • Performed Principal Component Analysis using PROC FACTOR to develop logic for hedging of portfolio
  • Used proprietary analytics library to develop exponential and cubic spline term-structures for various bond markets
  • Used SAS to perform data mining/prescriptive analysis on bond data to identify under(over) valued bonds
  • Developed scripts that updated time-series databases with trade data from internal trading systems and external sources (flat files for futures data, risk from dbRisk system and CSA rate from FUSE systems)
  • Updated database using SQL queries for front-end manipulation (hide/show of various columns, markets, sectors, attributes, sources)
  • Used Proc SQL, Proc Report, Proc Mean, Proc Freq, Proc Summary, Proc Content, Proc Tabulate extensively to create sector based reports for credit research desk
  • Performed discussions with sales and research teams for timelines/deliverables of feature requests
  • Performed ad-hoc analysis on trade idea for clients as well as sales team
  • Presented application features to various audiences by reproducing trade ideas from research journals
  • Involved in Data preparation over multiple iterations with inputs from senior analysts for the problem at hand
  • Assisted in creating fact and dimension tables in star schema model based on requirements
  • Implemented algorithms like Brownian Bridge Construction to interpolate missing values
  • Created tables in Oracle database and stored rich cheap data using SAS PROC SQL
  • Developed scripts and ad-hoc tests to ascertain data validity and correct attribute calculation
  • Performed statistical and predictive analysis on corporate market data to identify trends, buy-sell opportunities
  • Optimized data access by efficient SQL coding for high throughput and low latency
  • Executed rich reports after close of business to provide users instant access to last day reports
  • Performed correlation and time-series analysis to recommend pairs trading strategies to management
  • Performed advanced statistical analysis like scenario analysis and back testing as per requirements
  • Created profit and loss report for collateral desk detailing profit at counterparty level, trade level, book level and desk level granularities

Environment: R, MS Excel, Tableau Desktop 8.3, PL/SQL

Confidential

Data Analyst

Responsibilities:

  • Created the Database from raw existing data. Organized the data to required type and format for further manipulation.
  • Performed statistical analysis, and generate reports, listings and graphs using SAS/Base, SAS/Macros,
  • SAS/Stat, SAS/Graph, SAS/SQL, SAS/ODS and SAS/Access.
  • Used different SAS procedures such as PROC REPORT, UNIVARIATE, TABULATE, FREQ, MEANS, TRANSPOSE, SUMMARY and Data NULL
  • Integrated SAS datasets into Excel using Direct Data Exchange, using SAS to analyze data, statistical tables, listings and graphs for reports.
  • Used SAS/ ODS to format HTML and RTF reports.
  • SAS macros for data cleaning, reporting and to support routing processing.
  • Created and maintained ad hoc SAS programs/Macros for the validation, Extraction, Presentation, manipulation, analysis, and reporting.
  • Used SAS/EG in multi-user environment for intermediate data manipulation, analysis and summary statistics.
  • Optimized existing code for efficiency and automation of SAS Programs to improve reporting efficiency.
  • Pull out data from the clinical database and prepare customized analysis datasets for specific reporting needs.
  • Transfer and migrate data from one platform to another to be used for further analysis, Extract data from
  • Oracle ODBC and SQL pass through facility or LIBNAME method.
  • Responsible for the proper coding documentation and validation of SAS programs/macros/procedures to produce the standardized display.

Environment: UNIX SAS/Base, SAS/Macros, SAS/Graph, SAS/Stat, SAS/SQL, SAS/ODS, SQL Server 2100, MS Excel, MS Access.

Confidential

Programmer Analyst

Responsibilities:

  • Introduced to the programming language C and did class project to apply the concept of programming.
  • Learned Data structure concept and SQL Server. Applied the data modeling concept in dummy project using the SQL Server platform.
  • Learned Base SAS and implemented this knowledge in class project where concept of reading raw data, making table in SQL Library and producing report using SQL programming had been used.
  • Develop SQL queries for data analysis and data extraction

Environment: SQL Server, MS Excel, MS Access, C

We'd love your feedback!