Data Scientist Resume
Oakland, CA
SUMMARY
- 8 years of working experience in data analysis and statistical modeling with extensive use of SAS & R
- Proficient in R - Commander, R-Studio and Base SAS, SAS/Macro, SAS/Stat, SAS/Graph, SAS/SQL.
- Expert in Hypothesis testing, ANOVA, and Linear and Logistic Regression Analysis.
- Adept in Factor analysis, Decision Trees, clustering (K-means/Hierarchical, DBscan) techniques.
- Accomplished in Text Analytics using NavieBayes classification method using R-Studio.
- Experienced in causal and mechanistic analysis of the given scenario to recognizing key performance indicators (KPI).
- Proficient in Survey Design, Questionnaire Design, Design of Experiment and Conjoint Analysis.
- Capable to work in time series analysis using ARIMA model.
- Adept skill both in structured programing in SAS Data stage and dynamic programing using SAS Macros.
- Extensive experience of using advanced statistical Procs like ANOVA, GLM, UNIVARIATE etc.
- Deep understanding of Statistical Modeling, Multivariate Analysis and Standard Procedures. Familiar with model testing, problem analysis, model comparison and validation.
- Familiar with a large number of SAS functions and SAS data step options.
- Accustomed with shell scripting to handle SAS files and manage SAS program.
- Strong understanding of Data Warehousing concepts like Fact Tables, Dimension Tables, Star and Snow Flake Schema, Metadata and Data marts.
- Familiar in collecting data from various database and cleaning data for statistical analysis and model.
- Proficiency in using SQL to manipulate data, query expressions, join statements, subquery etc.
- Proficient in Python scripting. Worked in stats function with Numpy, visualization using Matplotlib and Pandas for organizing data.
- Used Ski-kit packages in Python for predictions.
- Proficient in Boosting Algorithms such as Gradient Boost (make powerful Predictions), Adaboost (adaptive boost) and XgBoost.
- Used Dimensionality Reduction methods such as PCA (Principal component Analysis), Factor Analysis etc. Implemented bootstrapping methods such as Random Forests (Classification), K-Means Clustering, KNN (K nearest neighbors), Naïve Bayes, SVM (Support vector Machines), Decision Tree, Linear and Logistic Regression Methods.
- Considerable understanding of RDBMS (Relational Database management system) - OLAP, OLTP and query via T-SQL.
- Knowledge of basic construct of HDFS (Hadoop File Distribution System), Map Reduce and use of tools like DMX-h for operations on a Hadoop Cluster.
- Profound analytical and problem solving skills along with ability to understand current business processes and implement efficient solutions to problems.
- Ability to present complex data and analytics to non-analytical audience.
- Detailed oriented professional, ensuring highest level of quality in reports & data analysis.
- Advanced written and verbal communication skills.
- Expert in innovation and formulation of new ideas and predictive models.
- Proven ability of multi-tasking to engage with stake holders at various levels to process data at large scale (Big Data) with enterprise systems.
- Member of R-User Group (BARUG- Bay Area R user group).
- Member of Analytics club
TECHNICAL SKILLS
BI and Visualization: Tableau Desktop 8.3, R, Python and SAS
Databases: MS SQL Server, MS Access, and MySQL
Operating Systems: Windows
Other Tools: MS Office (including Excel, Word, PPT, and Access)
Statistical Tools: R-Studio, Numpy (Python), Base SAS, SAS/Macros, SAS/Graph, SAS/Stat, SAS
Programing Language: R, Python and SQL
Statistical Concept: Inference Methods (Chi-square/T test), ANOVA, RegressionFactor Analysis, Logistic Regression, Text Analytics (Naive Bayes), Decision Trees, and Cluster (K-means/ Hierarchical), Forecasting/Time Series Analysis (ARIMA model)
PROFESSIONAL EXPERIENCE
Confidential, Oakland, CA
Data Scientist
Responsibilities:
- Involved in gathering requirements while uncovering and defining multiple dimensions.
- Extracted data from one or more source files and Databases.
- Participated in continuous interaction with Marketing and Finance teams for obtaining the data and data quality.
- Accomplished multiple tasks from collecting data to organizing data and interpreting statistical information.
- Unearthed the raw data by doing the Explanatory Data Analysis (Classification, splitting, cross-validation).
- Converted raw data to processed data by merging, finding outliers, errors, trends, missing values and distributions in the data.
- Utilized various techniques like Histogram, Bar plot, Pie-Chart, Scatter plot, Box plots to determine the condition of the data.
- Conducted data exploration (Dplyr, TidyR) to look for trends, patterns, grouping, and deviations in the data to understand the data diagnostics.
- Designed various reports using Pivot - tables, and different charts like Bar plot, Pie plot, Histograms etc.
- Identified the financial and non-financial independent attributes that were to be used in modeling.
- Developed segmentation trees (Optimization, Pruning, Modelling) to find out high risk segment of the population.
- Achieved multi-dimensional segmentation analysis to discover business rules and finalize the segmentation procedure.
- Used Logistic Regression to obtain the probabilities for non-defaulters and defaulters.
- Identified Key performance indicators (KPI’s) among all the given attributes.
- Executed what if scenario analysis to discover effective, implementable ways of reducing loan defaults.
- Maintained a log of all the iterations performed in R during the data modeling process.
- Fashioned scoring model to score propensity of loan applicants to default with high degree of accuracy in capturing defaulters.
- Created an ROI dashboard on campaigns spending and measuring its efficacy.
- Led med-sized teams for production support and handle multiple tasks with strong interpersonal communication, technical aptitude and learning skill to adapt to environment.
Environment: R, SQL-Server, Microsoft Excel and Tableau.
Confidential, Oakland, CA
Research Scientist
Responsibilities:
- Collected a database of the proposed research project. Accumulated raw data and filtered to RDBMS.
- Performed Chi-Square test, ANOVA test to identify significance between data samples. Performed classification, clustering and Time series analysis in collaboration with Research faculty.
- Contributed to development in identifying grants and funding opportunities for projects as well as maintain grant life cycle
- Coordinating with management and diverse academic and technical staff identifying challenges and developing appropriate strategies for maintenance and generating a platform for projects across diverse domains (ie; Finance, Marketing, IT research).
- Co-ordinated with Research faculty team in identifying and developing trends in Business Research and sustainability of long term projects.
- Created research report of Projects.
Environment: - R, SQL, MS EXCEL
Confidential, San leandro, CA
Data Scientist
Responsibilities:
- Collected Database of sales of items in all aspects. Cleaned, filtered and transformed data to specified format.
- Prepared the work space for Markdown.
- Accomplished Data analysis, statistical analysis, generated reports, listings and graphs
- Instigated the Test Analysis to understand the potentiality of insurer.
- Embedded code ie; Weaved code narrating to single doc format, rendering the doc to create a finished output.
- Customized the process and open the door for automated targeted reporting.
- Responsible for all data reporting, data mining activities and fraud detection activities including data prep and design, model development and reporting results.
- Used R to identify product performance via Classification, tree map and regression models along with visualizing data for interactive understanding and decision making.
- Find outliers, anomalies, trends and fraudulent behavior.
- Using a combination of R and No SQL models and analysis and deployed the same in real time.
- Customized R code chunks, labelling and reusing code chunks.
- Used to Forecast and ARIMA model for time-series analysis of customer behavior and purchase.
- Provided insights on effectively running of marketing campaigns including direct mail, email, mobile and other digital channels.
- Documented all programs and procedures to ensure an accurate historical record of work completed on assigned project as well as to improve quality and efficacy
- Produced quality reports for business team and business data manager.
- SEO optimization to stay top in search of web search.
Environment: R, SAS/Macros, SQL, No SQL, MS Excel, MS Access, Tableau.
Confidential
Financial Reporting
Responsibilities:
- Scraped both structured and structured data from desperate sources (Web page and public repositories)
- Used SQL joins extensively on SQL Developer to fetch data from MS SQL database
- Developed multiple prepared statements, stored procedures for efficient update of database to achieve speedup
- Performed outlier detection analysis on data as part of Dodd Frank requirements
- Produced bond and attribute coverage of financial instruments present in the database on regular basis
- Developed modules for enhancement of time - series and term-structure functionalities
- Performed Principal Component Analysis using PROC FACTOR to develop logic for hedging of portfolio
- Used proprietary analytics library to develop exponential and cubic spline term-structures for various bond markets
- Used SAS to perform data mining/prescriptive analysis on bond data to identify under(over) valued bonds
- Developed scripts that updated time-series databases with trade data from internal trading systems and external sources (flat files for futures data, risk from dbRisk system and CSA rate from FUSE systems)
- Updated database using SQL queries for front-end manipulation (hide/show of various columns, markets, sectors, attributes, sources)
- Used Proc SQL, Proc Report, Proc Mean, Proc Freq, Proc Summary, Proc Content, Proc Tabulate extensively to create sector based reports for credit research desk
- Performed discussions with sales and research teams for timelines/deliverables of feature requests
- Performed ad-hoc analysis on trade idea for clients as well as sales team
- Presented application features to various audiences by reproducing trade ideas from research journals
- Involved in Data preparation over multiple iterations with inputs from senior analysts for the problem at hand
- Assisted in creating fact and dimension tables in star schema model based on requirements
- Implemented algorithms like Brownian Bridge Construction to interpolate missing values
- Created tables in Oracle database and stored rich cheap data using SAS PROC SQL
- Developed scripts and ad-hoc tests to ascertain data validity and correct attribute calculation
- Performed statistical and predictive analysis on corporate market data to identify trends, buy-sell opportunities
- Optimized data access by efficient SQL coding for high throughput and low latency
- Executed rich reports after close of business to provide users instant access to last day reports
- Performed correlation and time-series analysis to recommend pairs trading strategies to management
- Performed advanced statistical analysis like scenario analysis and back testing as per requirements
- Created profit and loss report for collateral desk detailing profit at counterparty level, trade level, book level and desk level granularities
Environment: R, MS Excel, Tableau Desktop 8.3, PL/SQL
Confidential
Data Analyst
Responsibilities:
- Created the Database from raw existing data. Organized the data to required type and format for further manipulation.
- Performed statistical analysis, and generate reports, listings and graphs using SAS/Base, SAS/Macros,
- SAS/Stat, SAS/Graph, SAS/SQL, SAS/ODS and SAS/Access.
- Used different SAS procedures such as PROC REPORT, UNIVARIATE, TABULATE, FREQ, MEANS, TRANSPOSE, SUMMARY and Data NULL
- Integrated SAS datasets into Excel using Direct Data Exchange, using SAS to analyze data, statistical tables, listings and graphs for reports.
- Used SAS/ ODS to format HTML and RTF reports.
- SAS macros for data cleaning, reporting and to support routing processing.
- Created and maintained ad hoc SAS programs/Macros for the validation, Extraction, Presentation, manipulation, analysis, and reporting.
- Used SAS/EG in multi-user environment for intermediate data manipulation, analysis and summary statistics.
- Optimized existing code for efficiency and automation of SAS Programs to improve reporting efficiency.
- Pull out data from the clinical database and prepare customized analysis datasets for specific reporting needs.
- Transfer and migrate data from one platform to another to be used for further analysis, Extract data from
- Oracle ODBC and SQL pass through facility or LIBNAME method.
- Responsible for the proper coding documentation and validation of SAS programs/macros/procedures to produce the standardized display.
Environment: UNIX SAS/Base, SAS/Macros, SAS/Graph, SAS/Stat, SAS/SQL, SAS/ODS, SQL Server 2100, MS Excel, MS Access.
Confidential
Programmer Analyst
Responsibilities:
- Introduced to the programming language C and did class project to apply the concept of programming.
- Learned Data structure concept and SQL Server. Applied the data modeling concept in dummy project using the SQL Server platform.
- Learned Base SAS and implemented this knowledge in class project where concept of reading raw data, making table in SQL Library and producing report using SQL programming had been used.
- Develop SQL queries for data analysis and data extraction
Environment: SQL Server, MS Excel, MS Access, C