R Developer - Data Scientist Resume
SUMMARY
- Data Scientist - R/Python Developer with hands on 8 years of experience. Excellent skills in Analytical skills, Statistical modeling, Databases,Management and Predictive Analytics, Six Sigma DMAIC Methodology, R (Programming language), Python,SAS, Minitab, SPSS, Advanced MS-Excel, Open office,Java and SQL
- Expertise and experience on statistical data analysis such as transforming business requirements into analytical models, designing algorithms, and strategic solutions that scales across massive volumes of data.
- Proficient in Statistical Methods like Regression models, Time series, Prediction models, hypothesis testing, confidence intervals, interpretation of all machine learning models
- Expert in R and Python scripting. Worked in stats function with Numpy, visualization using Matplotlib and Pandas for organizing data.
- Experience in using various packages in R and python like ggplot2, caret, dplyr, RShiny, rjson, plyr, SciPy, scikit-learn
- Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.
- Experience with data-sets of varying degrees of size and complexity including both structured and unstructured data. Piping and processing massive data-streams in distributed computing environments such as Hadoop to facilitate analysis
- Experience in writing code in R and Python to manipulate data for data loads, extracts, statistical analysis and modeling
- Expertise in Banking,Financial,software industry and telecommunication industry areas of Analysis, predictive modeling, forecasting and business intelligence, training and Quantitative project management
- Professional working experience in Machine Learning algorithms such as, linear regression, logistic regression, Naive Bayes, Decision Trees, Clustering, and Principle Component Analysis.
- Good understanding and process of all Software lifecycle Methodology.
- Acquired knowledge in Six Sigma DMAIC Methodology by working in multiple project directly with customer
- Strong knowledge on CMMI process areas and auditing projects
- Experienced in writing complex SQL Quires like Stored Procedures, triggers, joints, and Sub quires.
- Experience in Interpreting problems and providing solutions to business problems using data analysis, data mining, optimization tools, and machine learning techniques and statistics.
- Work with teams across the locations.
- Experienced in analyzing & gathering the business and system requirements. Experience in Preparing case study and brought to the improvements in the project
- Strong written and oral communication skills for giving presentations to non-technical stakeholders.
TECHNICAL SKILLS
Languages: Python (Numpy, Pandas, matplotlib, Scikit-Learn, Tensor flow), R (data.table, dplyr, lubridate, ggplot2, caret), SQL, Transact SQL, Excel VBA, SAS.
Database: MS SQL server, Oracle SQL Developer, MS Access.
Data Visualization Tools: Tableau, Power BI
Big Data technologies: Hadoop,Hive and Pyspark
Agile tools: Jira and confluence
Version control tools: Git, Tortoise Git, Bit Bucket
Analytical and Reporting tools: Excel and Alteryx
PROFESSIONAL EXPERIENCE
Confidential
R Developer - Data Scientist
Responsibilities:
- Develop and maintain complex credit risk models for PPNR and credit risk models (probability of default, loss given default and exposure at default) and NIE modelling leveraging upon R, Python, R Studio, Oracle SQL, Tableau and other ETL Technologies
- Obtain a quantitative basis on model modification and execution, leveraging upon mathematics and statistical programming R, Python modules like pandas, NumPy, matplotlib and scikit learn.
- Performed Exploratory Data Analysis using R. Also involved in generating various graphs and charts for analyzing the data using R and Python Libraries.
- Perform root cause analysis utilizing credit and market risk modelling techniques
- Review existing mathematics and statistics methods and respond to IT service requests for Change Impact analysis, business requests for analysis
- Implement coding changes based on root cause analysis and resolution plans
- Understand client requirements and build ETL solution using Alteryx jobs to perform all business rules
- Manage the stress testing platform of scenarios published by DFAST (adverse and severely adverse scenarios)
- Define and evolve the 14 A Schedules for monthly and yearly reporting using extensive capital planning knowledge.
- Analyze the Marco-economic variables in feature selection in model development and contribute to architectural decisions at a department and bank-wide level with the use of mathematical techniques.
- Collaborate with SIT, UAT, and development teams to build implementation plans and strategy
- Transforming the non-Stannard file formats into standard format consumable by model execution system by using Alteryx designer tools.
- Complex Data blending with Alteryx tool Extracting data from the source system (DB,Excel files,etc.,)Developing and proper testing of Alteryx workflow .
- Work closely with customer's, cross-functional teams, research scientists, software developers, and business teams in an Agile/Scrum work environment to drive data model implementations and algorithms into practice.
Environment: R, Alteryx, Java, Python,GITHub,Oracle 11g,SQl,Excel
Confidential
Data Scientist - R/Python
Responsibilities:
- Part of credit and quantitative team and worked on various Credit Risk Analysis to determine the factors that explain risk factors associated with each customer.
- To build a Binary logistic regression to classify that a customer will be potentially bad
- Arrived credit score and credit score assessment based on statistical analysis using historical data
- Forecast the customer’s transactions using time series analysis using R and Python Libraries
- Arrived character and capacity model to predict customer capacity to repay based on various capacity and character indicators
- Worked with datasets of varying degrees of size and complexity including both structured and unstructured data. Piping and processing massive data-streams in distributed computing environments such as Hadoop to facilitate analysis
- Worked on analyzing Hadoop and responsible for building scalable distributed data solutions using Hadoop
- Model development, model maintenance, model validation
- Dealing time-series models & depending on trend, seasonality and cyclicity ARIMA
- Running predictive model on banking data, review the quotes and decision making for either requested amount and quantitative information.
- Giving presentation to the management and Project team members on models and guiding project team on maintenance and forecast the model
- Providing training to the management and Project team members on statistical concepts and models.
Environment: R,Python,Java,hadoop,Oracle 11g,SQL,PostgresSQL,Excel
Confidential
Data Scientist -R /Python
Responsibilities:
- Carrying out specified data processing and statistical techniques such as sampling techniques, estimation, hypothesis testing, time series, correlation and regression analysis Using R.
- Applied various data mining techniques: Linear Regression & Logistic Regression, classification, clustering.
- Worked on collection of large sets using Python scripting, SparkSQL
- Understand the existing ETL (Extract, Transform and load) workflows and assist in building workflow
- Responsible for building scalable distributed data solutions using Hadoop
- Involved in loading data from UNIX file system to HDFS
- Performed data analysis and data profiling using complex SQL queries on various sources systems including Oracle 10g/11g and SQL Server 2012.
- Arrived independent variables with financial ratios for modelling and forecasting the data using time series analysis -R and Python
- Arrived character and capacity model to predict customer capacity to repay based on various capacity and character indicators
- Model development, model maintenance and model validation
Environment: R,Python, Java,hadoop, unix,Oracle 11g,SQL,PostgresSQL,Excel