We provide IT Staff Augmentation Services!

Data Scientist Resume

5.00/5 (Submit Your Rating)

Oakland, CA

SUMMARY

  • 8 years of working experience in data analysis and statistical modeling with extensive use of SAS & R
  • Proficient in R - Commander, R-Studio and Base SAS, SAS/Macro, SAS/Stat, SAS/Graph, SAS/SQL.
  • Adept in Factor analysis, Decision Trees, clustering (K-means/Hierarchical, DBscan) techniques.
  • Accomplished in Text Analytics using NavieBayes classification method using R-Studio.
  • Experienced in causal and mechanistic analysis of the given scenario to recognizing key performance indicators (KPI).
  • Proficient in Survey Design, Questionnaire Design, Design of Experiment and Conjoint Analysis.
  • Capable to work in time series analysis using ARIMA model.
  • Adept skill both in structured programing in SAS Data stage and dynamic programing using SAS Macros.
  • Extensive experience of using advanced statistical Procs like ANOVA, GLM, UNIVARIATE etc.
  • Deep understanding of Statistical Modeling, Multivariate Analysis and Standard Procedures. Familiar with model testing, problem analysis, model comparison and validation.
  • Familiar with a large number of SAS functions and SAS data step options.
  • Accustomed with shell scripting to handle SAS files and manage SAS program.
  • Strong understanding of Data Warehousing concepts like Fact Tables, Dimension Tables, Star and Snow Flake Schema, Metadata and Data marts.
  • Familiar in collecting data from various database and cleaning data for statistical analysis and model.
  • Proficiency in using SQL to manipulate data, query expressions, join statements, subquery etc.
  • Proficient in Python scripting. Worked in stats function with Numpy, visualization using Matplotlib and Pandas for organizing data.
  • Used Ski-kit packages in Python for predictions.
  • Proficient in Boosting Algorithms such as Gradient Boost (make powerful Predictions), Adaboost (adaptive boost) and XgBoost.
  • Used Dimensionality Reduction methods such as PCA (Principal component Analysis), Factor Analysis etc. Implemented bootstrapping methods such as Random Forests (Classification), K-Means Clustering, KNN (K nearest neighbors), Naïve Bayes, SVM (Support vector Machines), Decision Tree, Linear and Logistic Regression Methods.
  • Considerable understanding of RDBMS (Relational Database management system) - OLAP, OLTP and query via T-SQL.
  • Knowledge of basic construct of HDFS (Hadoop File Distribution System), Map Reduce and use of tools like DMX-h for operations on a Hadoop Cluster.
  • Profound analytical and problem solving skills along with ability to understand current business processes and implement efficient solutions to problems.
  • Ability to present complex data and analytics to non-analytical audience.
  • Detailed oriented professional, ensuring highest level of quality in reports & data analysis.
  • Advanced written and verbal communication skills.
  • Expert in innovation and formulation of new ideas and predictive models.
  • Proven ability of multi-tasking to engage with stake holders at various levels to process data at large scale (Big Data) with enterprise systems.
  • Member of R-User Group (BARUG- Bay Area R user group).
  • Member of Analytics club

CORE COMPETENCIES

Analytics: - Highly skilled analytics professional and worked for several clients across various domains and sectors such as Web Analytics, Retail Analytics, Insurance etc. specialized in Classification, Clustering, Association rule Mining and Time-series Analysis.

Modelling: -Extensive experience in statistical modeling to predict the potentiality of a customer in various sectors. Particularly expertise in customer segmentation for credit card applicants and health insurer. Familiar with Time series analysis to predict future scenario based on the historical data

Reporting: - In the beginning of career, worked as SAS programmer and were responsible for producing monthly, weekly and ad-hoc reports based on client’s requirements. Also worked in Tableau desktop for data visualization.

Management: - Expert in communicating business requirements with clients and reporting to management.

TECHNICAL SKILLS

BI and Visualization: Tableau Desktop 8.3, R, Python and SAS

Databases: MS SQL Server, MS Access, and MySQL

Operating Systems: Windows

Other Tools: MS Office (including Excel, Word, PPT, and Access)

Statistical Tools: R-Studio, Numpy (Python), Base SAS, SAS/Macros, SAS/Graph, SAS/Stat, SAS

Programing Language: R, Python and SQL

Statistical Concept: Inference Methods (Chi-square/T test), ANOVA, Regression, Factor Analysis, Logistic Regression, Text Analytics (Naive Bayes), Decision Trees, and Cluster (K-means/ Hierarchical), Forecasting/Time Series Analysis (ARIMA model)

PROFESSIONAL EXPERIENCE

Confidential, Oakland, CA

Data Scientist

Responsibilities:

  • Involved in gathering requirements while uncovering and defining multiple dimensions.
  • Extracted data from one or more source files and Databases.
  • Participated in continuous interaction with Marketing and Finance teams for obtaining the data and data quality.
  • Accomplished multiple tasks from collecting data to organizing data and interpreting statistical information.
  • Unearthed the raw data by doing the Explanatory Data Analysis (Classification, splitting, cross-validation).
  • Converted raw data to processed data by merging, finding outliers, errors, trends, missing values and distributions in the data.
  • Utilized various techniques like Histogram, Bar plot, Pie-Chart, Scatter plot, Box plots to determine the condition of the data.
  • Conducted data exploration (Dplyr, TidyR) to look for trends, patterns, grouping, and deviations in the data to understand the data diagnostics.
  • Designed various reports using Pivot - tables, and different charts like Bar plot, Pie plot, Histograms etc.
  • Identified the financial and non-financial independent attributes that were to be used in modeling.
  • Developed segmentation trees (Optimization, Pruning, Modelling) to find out high risk segment of the population.
  • Achieved multi-dimensional segmentation analysis to discover business rules and finalize the segmentation procedure.
  • Used Logistic Regression to obtain the probabilities for non-defaulters and defaulters.
  • Identified Key performance indicators (KPI’s) among all the given attributes.
  • Executed what if scenario analysis to discover effective, implementable ways of reducing loan defaults.
  • Maintained a log of all the iterations performed in R during the data modeling process.
  • Fashioned scoring model to score propensity of loan applicants to default with high degree of accuracy in capturing defaulters.
  • Created an ROI dashboard on campaigns spending and measuring its efficacy.
  • Led med-sized teams for production support and handle multiple tasks with strong interpersonal communication, technical aptitude and learning skill to adapt to environment.

Environment: R, SQL-Server, Microsoft Excel and Tableau.

Confidential, Oakland, CA

Research Scientist

Responsibilities:

  • Collected a database of the proposed research project. Accumulated raw data and filtered to RDBMS.
  • Performed Chi-Square test, ANOVA test to identify significance between data samples. Performed classification, clustering and Time series analysis in collaboration with Research faculty.
  • Contributed to development in identifying grants and funding opportunities for projects as well as maintain grant life cycle
  • Coordinating with management and diverse academic and technical staff identifying challenges and developing appropriate strategies for maintenance and generating a platform for projects across diverse domains (ie; Finance, Marketing, IT research).
  • Co-ordinated with Research faculty team in identifying and developing trends in Business Research and sustainability of long term projects.
  • Created research report of Projects.

Environment: - R, SQL, MS EXCEL

Confidential, San leandro, CA

Data Scientist

Responsibilities:

  • Collected Database of sales of items in all aspects. Cleaned, filtered and transformed data to specified format.
  • Prepared the work space for Markdown.
  • Accomplished Data analysis, statistical analysis, generated reports, listings and graphs
  • Instigated the Test Analysis to understand the potentiality of insurer.
  • Embedded code ie; Weaved code narrating to single doc format, rendering the doc to create a finished output.
  • Customized the process and open the door for automated targeted reporting.
  • Responsible for all data reporting, data mining activities and fraud detection activities including data prep and design, model development and reporting results.
  • Used R to identify product performance via Classification, tree map and regression models along with visualizing data for interactive understanding and decision making.
  • Find outliers, anomalies, trends and fraudulent behavior.
  • Using a combination of R and No SQL models and analysis and deployed the same in real time.
  • Customized R code chunks, labelling and reusing code chunks.
  • Used to Forecast and ARIMA model for time-series analysis of customer behavior and purchase.
  • Provided insights on effectively running of marketing campaigns including direct mail, email, mobile and other digital channels.
  • Documented all programs and procedures to ensure an accurate historical record of work completed on assigned project as well as to improve quality and efficacy
  • Produced quality reports for business team and business data manager.
  • SEO optimization to stay top in search of web search.

Environment: R, SAS/Macros, SQL, No SQL, MS Excel, MS Access, Tableau.

Confidential

Responsibilities:

  • Scraped both structured and structured data from desperate sources (Web page and public repositories)
  • Used SQL joins extensively on SQL Developer to fetch data from MS SQL database
  • Developed multiple prepared statements, stored procedures for efficient update of database to achieve speedup
  • Performed outlier detection analysis on data as part of Dodd Frank requirements
  • Produced bond and attribute coverage of financial instruments present in the database on regular basis
  • Developed modules for enhancement of time - series and term-structure functionalities
  • Performed Principal Component Analysis using PROC FACTOR to develop logic for hedging of portfolio
  • Used proprietary analytics library to develop exponential and cubic spline term-structures for various bond markets
  • Used SAS to perform data mining/prescriptive analysis on bond data to identify under(over) valued bonds
  • Updated database using SQL queries for front-end manipulation (hide/show of various columns, markets, sectors, attributes, sources)
  • Used Proc SQL, Proc Report, Proc Mean, Proc Freq, Proc Summary, Proc Content, Proc Tabulate extensively to create sector based reports for credit research desk
  • Performed discussions with sales and research teams for timelines/deliverables of feature requests
  • Performed ad-hoc analysis on trade idea for clients as well as sales team
  • Presented application features to various audiences by reproducing trade ideas from research journals
  • Involved in Data preparation over multiple iterations with inputs from senior analysts for the problem at hand
  • Assisted in creating fact and dimension tables in star schema model based on requirements
  • Implemented algorithms like Brownian Bridge Construction to interpolate missing values
  • Created tables in Oracle database and stored rich cheap data using SAS PROC SQL
  • Developed scripts and ad-hoc tests to ascertain data validity and correct attribute calculation
  • Performed statistical and predictive analysis on corporate market data to identify trends, buy-sell opportunities
  • Optimized data access by efficient SQL coding for high throughput and low latency
  • Executed rich reports after close of business to provide users instant access to last day reports
  • Performed correlation and time-series analysis to recommend pairs trading strategies to management
  • Performed advanced statistical analysis like scenario analysis and back testing as per requirements
  • Created profit and loss report for collateral desk detailing profit at counterparty level, trade level, book level and desk level granularities

Environment: R, MS Excel, Tableau Desktop 8.3, PL/SQL

Confidential

Data Analyst

Responsibilities:

  • Created the Database from raw existing data. Organized the data to required type and format for further manipulation.
  • Performed statistical analysis, and generate reports, listings and graphs using SAS/Base, SAS/Macros,
  • SAS/Stat, SAS/Graph, SAS/SQL, SAS/ODS and SAS/Access.
  • Integrated SAS datasets into Excel using Direct Data Exchange, using SAS to analyze data, statistical tables, listings and graphs for reports.
  • Used SAS/ ODS to format HTML and RTF reports.
  • SAS macros for data cleaning, reporting and to support routing processing.
  • Created and maintained ad hoc SAS programs/Macros for the validation, Extraction, Presentation, manipulation, analysis, and reporting.
  • Used SAS/EG in multi-user environment for intermediate data manipulation, analysis and summary statistics.
  • Optimized existing code for efficiency and automation of SAS Programs to improve reporting efficiency.
  • Pull out data from the clinical database and prepare customized analysis datasets for specific reporting needs.
  • Transfer and migrate data from one platform to another to be used for further analysis, Extract data from Oracle ODBC and SQL pass through facility or LIBNAME method.
  • Responsible for the proper coding documentation and validation of SAS programs/macros/procedures to produce the standardized display.

Environment: UNIX SAS/Base, SAS/Macros, SAS/Graph, SAS/Stat, SAS/SQL, SAS/ODS, SQL Server 2100, MS Excel, MS Access.

Confidential

Programmer Analyst

Responsibilities:

  • Introduced to the programming language C and did class project to apply the concept of programming.
  • Learned Data structure concept and SQL Server. Applied the data modeling concept in dummy project using the SQL Server platform.
  • Develop SQL queries for data analysis and data extraction

Environment: SQL Server, MS Excel, MS Access, C

We'd love your feedback!