Data Scientist Resume
Cincinnati, OH
PROFESSIONAL SUMMARY:
- 7+ years working experience as Data Analyst, Data Scientist with high proficiency in Predictive Modeling, Text mining and Machine Learning.
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating Data Visualizations using R, Python and Tableau.
- Proficient in Python, R, C/C++, SQL, Tableau.
- Experience in Univariate, Multivariate Analysis, model testing, problem analysis, model comparison and validating model, ANOVA, Regression Analysis.
- Expertise in writing complex SQL queries to obtain filtered data for analysis purpose.
- Working knowledge in implementing tree - based models such as Boosting, Random Forest.
- Experience in using Model Pipelines to automate the tasks and put models into production quickly.
- Skilled in System Analysis, Dimensional data Modeling, Database Design and implementing RDBMS specific features.
- Experience in using Tableau, creating dashboards and quality story telling.
- Worked with various python libraries such as NumPy, SciPy for mathematical calculations, Pandas for data preprocessing/wrangling, Mat plot, Seaborn for data visualization, Sklearn for machine learning or deep leaning and NLTK for NLP.
- Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XGBoost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression-based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
- Experience in Text Mining and good knowledge on NLP components such as Natural Language Understanding (NLU) and Natural Language Generation (NLG).
- Knowledge in Natural Language Processing (NLP) with techniques such as Tokenization, Stemming, Lemmatization, Count-Vectorization, Tf-idf.
- Strong software application skills (MS Excel, Access, Word, PowerPoint, Project)
- Deep understanding statistical analysis with P-value, A/B testing, Hypothesis testing, Central Limit Theorem, Bayes' Theorem, Distributions.
- Highly skilled in using Hadoop, Spark, and Hive for basic analysis and extraction of data in the infrastructure to provide data summarization.
- Experience in writing SQL queries, Stored procedures, Functions and Triggers by using PL/SQL.
- Expertise in Oracle, My-SQL technologies. Good exposure to plan and executes all the phases of software development life cycle, which include Analysis, design, development, testing.
- Hands on experience in implementing Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Factor analysis.
- Knowledge in Cloud services such as Microsoft Azure and Amazon AWS.
- Strong problem-solving skills, good communication and good team player.
- Practiced in clarifying business requirements, performing gap analysis between goals and existing procedures/skillsets, and designing process and system improvements to increase productivity and reduce costs.
- Strong understanding of Agile and Scrum Software Development Life Cycle Methodologies.
- Involved in the issue resolution and Root Cause Analysis.
- Experience in working with different operating systems Windows, UNIX, and Linux.
TECHNICAL SKILLS:
Machine Learning: SQL, T-SQL, PL/SQL, java, C, C++, XML, HTML, MATLAB, DAX, Python, Mat lab R.
Statistical Analysis: R, Python, MATLAB, Minitab, Jupyter
RDBMS: Oracle, SQL Server, MS-Access, Teradata, Hadoop-bigdata
Data Modeling : ERWIN, TOAD, MS Visio .
DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Visual Studio, R-Studio
Big Data: Hadoop, Hive, Map reduce, scoop, Impala
IDE’S: NetBeans, Eclipse, PyCharm, PyScripter, PyStudio
Operating System: LINUX, Windows
Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Ralph Kimball and Bill Inmon, Waterfall Model.
PROFESSIONAL EXPERIENCE:
Confidential - Cincinnati, OH
Data Scientist
Responsibilities:
- Participated in all phases of project life including data collection, data mining data cleaning, developing models, validation, and reports creating.
- Responsible for reporting of findings that will use gathered metrics to infer and draw logical conclusions from past and future behavior.
- Performed Time series analysis, Multinomial Logistic Regression, Random Forest, Decision Tree, SVM
- Used Principal Component Analysis & Factor Analysis in feature engineering to analyze high dimensional data in Python.
- Worked on classification/scripting of multiple attribute models by applying SVM and Regular Expressions given product features like title, description etc. & predicting product attribute values using Python
- Used R machine learning library to build and evaluate different models.
- Implemented rule-based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
- Collected data needs and requirements by interacting with the other departments.
- Created various types of data visualizations using Python and Tableau schemas.
- Communicated the results with operations team for taking best decisions.
- Develop new and effective analytics algorithms and wrote the key pieces of mission-critical source code.
- Extracted patterns in the structured and unstructured data set and displayed them with interactive charts using ggplot2 and ggiraph packages in R.
- Built initial models using supervised classification techniques like K-Nearest Neighbor (KNN), Logistic Regression and Random Forests.
- Used a variety of NLP methods for information extraction, topic modeling, parsing, and relationship extraction.
- Experience in Machine learning using NLP text classification using Python.
- NLP (text mining and analysis, topic modeling, Ngram, and Emotion analysis) to extract clinical data from text
- Used nltk package in python to work on Natural Language Processing (NLP) tasks
- Used R and Python for programming for improvement of model. Upgraded the entire models for improvement of the product.
- Performed data Transformation method for Re scaling and Normalizing Variables
- Used packages like Dplyr, tidyr& ggplot2 in R Studio for data visualization and generated scatter plot and high low graph to identify relation between different variables.
- Created various types of data visualizations using Python and Tableau.
- Implemented advanced machine learning algorithms including regression trees, kernel PCA, among other methods in Python and R and in other tools and languages as needed.
- Designing a machine learning pipeline to predict and prescribe and Implemented a machine learning scenario for a given data problem.
Environment: Python, R/R Studio, Tableau, PL /SQL
Confidential - Frederick, MD
Data Scientist
Responsibilities:
- Evaluating the data analytics opportunities to improve the efficiency of claims handling process like Fraud Detection
- Utilized various data analysis and data visualization tools to accomplish data analysis, report design and report delivery.
- Create statistical models based on researched information to provide conclusions that will guide the company and the industry into the future.
- Taking care of missing data after import and encoding the categorical data, when needed.
- Splitting the data into training set, test set and scaling the data in training set and test set, if necessary.
- Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
- Impact of marketing tactics on sales and then forecast the impact of future sets of tactics.
- Developed SQL code to extract data from various databases
- Used R and python for Exploratory Data Analysis and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
- As part of my job responsibilities, I have implemented different machine learning models like regression, classification and clustering.
- Used Python, R and SQL to create Statistical algorithms involving Linear Regression, Logistic Regression, Random forest, Decision trees for estimating the risks.
- Developed statistical models to forecast inventory and procurement cycles.
- Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
- I have implemented machine learning algorithms as part of my job responsibilities using sci-kit learning.
- Work with a range of proprietary, industry standard, and open source data stores to assemble and organize and analyze data.
- Visualizations, Summary Reports and Presentations using R and Tableau.
Environment: R, Python, Tableau, SQL Server
Confidential - Lewisville, TX
Sr. Data Analyst
Responsibilities:
- Worked closely with the Data Governance Office team in assessing the source systems for project Deliverables.
- Used T-SQL queries to pull the data from disparate systems and Data warehouse in different environments.
- Used Data Quality validation techniques to validate Critical Data elements (CDE) and identified various anomalies.
- Presented DQ analysis reports and score cards on all the validated data elements and presented- to the business teams and stakeholders.
- Involved in defining the Source To Target data mappings, business rules, data definitions.
- Extensively used open source tools - R Studio(R) and Spyder (Python) for statistical analysis and building the machine learning.
- Interacting with the Business teams and Project Managers to clearly articulate the anomalies, issues, findings during data validation.
- Performing Data Validation / Data Reconciliation between disparate source and target systems (Salesforce, Cisco-UIC, Cognos, Data Warehouse) for various projects.
- Extracting data from different databases as per the business requirements using Sql Server Management Studio (SSMS).
- Writing complex SQL queries for validating the data against different kinds of reports generated by Cognos.
- Extensively using MS Excel (Pivot tables, VLOOKUP) for data validation.
- Interacting with the ETL, BI teams to understand / support on various ongoing projects.
- Generating weekly, monthly reports for various business users according to the business requirements. Manipulating/mining data from database tables (Redshift, Oracle, Data Warehouse).
- Create automated metrics using complex databases.
- Providing analytical network support to improve quality and standard work results.
- Experience with Version Control (Git)
- Create data pipelines using big data technologies like Hadoop, spark etc.
- Create statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
- Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Hadoop, Spark, Map Reduce, Pig and others
- Interface with other technology teams to extract, transform, and load (ETL) data from a wide variety of data sources
- Provides input and recommendations on technical issues to BI Engineers, Business & Data Analysts and Data Scientists.
Environment: AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, SPSS, MySQL, Eclipse, PL/SQL, SQL connector, Tableau
Confidential - San Antonio, TX
Data Analyst
Responsibilities:
- Gathering the requirements by interacting heavily with the business users, multiple technical teams to design and develop the workflows for the new functional piece.
- Collaborated with various business stakeholders to create Business Requirement Document (BRD), translated gathered high-level requirements into a Functional Requirement Document (FRD) to assist implementation side SMEs and developers, along with data flow diagrams, user stories and use cases
- Part of a Scrum Agile team.
- Experience in SQL joins, sub queries, tracing and performance tuning for better running of queries
- Extensively used joins and sub queries for complex queries involving multiple tables from different databases.
- Performance tuning of stored procedures and functions to optimize the query for better performance.
- Successfully implemented indexes on tables for optimum performance.
- Developed complex stored procedures using T-SQL to generate Ad-hoc reports within SQL Server Reporting services.
- Strong analytical, problem solving skills coupled with interpersonal, and leadership skills
- Interaction with the clients to gather out the requirements and assist them in immediate workarounds for the issues with the application.
- Developed detailed test scenarios as documented in business requirements documents, assisted the test team with UAT.
- Real time usage of Tableau for analytical purpose
Environment: MY SQL, MS Power Point, MS Access, T- SQL, MS Power Point, MS Access, T-SQL, DTS, SSIS, SSRS, SSAS, ETL, MDM, Teradata.
Confidential
Analyst
Responsibilities:
- Assisted Product Manager in documenting business / functional requirements and defining product features
- Performed various SQL / database activities to support data review, data mapping, data quality review, data validation and report generation
- Participated in the development of end-to-end test plan with regression strategy
- Developed QA test plans, test conditions, test cases and test scripts to ensure complete and adequate testing as well as coordinating / conducting user acceptance testing (UAT)
- Monitored testing progress and performance of testing including open defects
- Performed functional, UI, performance and back-end testing, in conjunction with the QA team
- Involved in analyzing functional specifications and created manual test plans and test cases
- Collaborated with the development team to solve the problems encountered in the test scenario runs
- Documented and communicated test results to the relevant stakeholders including Product Manager.
Environment: UNIX, MS Visio, MS Project, MS SharePoint, HTML, Windows, Oracle, PL/SQL