We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

Cincinnati, OH

PROFESSIONAL SUMMARY:

  • 7+ years working experience as Data Analyst, Data Scientist with high proficiency in Predictive Modeling, Text mining and Machine Learning.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating Data Visualizations using R, Python and Tableau.
  • Proficient in Python, R, C/C++, SQL, Tableau.
  • Experience in Univariate, Multivariate Analysis, model testing, problem analysis, model comparison and validating model, ANOVA, Regression Analysis.
  • Expertise in writing complex SQL queries to obtain filtered data for analysis purpose.
  • Working knowledge in implementing tree - based models such as Boosting, Random Forest.
  • Experience in using Model Pipelines to automate the tasks and put models into production quickly.
  • Skilled in System Analysis, Dimensional data Modeling, Database Design and implementing RDBMS specific features.
  • Experience in using Tableau, creating dashboards and quality story telling.
  • Worked with various python libraries such as NumPy, SciPy for mathematical calculations, Pandas for data preprocessing/wrangling, Mat plot, Seaborn for data visualization, Sklearn for machine learning or deep leaning and NLTK for NLP.
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XGBoost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression-based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Experience in Text Mining and good knowledge on NLP components such as Natural Language Understanding (NLU) and Natural Language Generation (NLG).
  • Knowledge in Natural Language Processing (NLP) with techniques such as Tokenization, Stemming, Lemmatization, Count-Vectorization, Tf-idf.
  • Strong software application skills (MS Excel, Access, Word, PowerPoint, Project)
  • Deep understanding statistical analysis with P-value, A/B testing, Hypothesis testing, Central Limit Theorem, Bayes' Theorem, Distributions.
  • Highly skilled in using Hadoop, Spark, and Hive for basic analysis and extraction of data in the infrastructure to provide data summarization.
  • Experience in writing SQL queries, Stored procedures, Functions and Triggers by using PL/SQL.
  • Expertise in Oracle, My-SQL technologies. Good exposure to plan and executes all the phases of software development life cycle, which include Analysis, design, development, testing.
  • Hands on experience in implementing Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Factor analysis.
  • Knowledge in Cloud services such as Microsoft Azure and Amazon AWS.
  • Strong problem-solving skills, good communication and good team player.
  • Practiced in clarifying business requirements, performing gap analysis between goals and existing procedures/skillsets, and designing process and system improvements to increase productivity and reduce costs.
  • Strong understanding of Agile and Scrum Software Development Life Cycle Methodologies.
  • Involved in the issue resolution and Root Cause Analysis.
  • Experience in working with different operating systems Windows, UNIX, and Linux.

TECHNICAL SKILLS:

Machine Learning: SQL, T-SQL, PL/SQL, java, C, C++, XML, HTML, MATLAB, DAX, Python, Mat lab R.

Statistical Analysis: R, Python, MATLAB, Minitab, Jupyter

RDBMS: Oracle, SQL Server, MS-Access, Teradata, Hadoop-bigdata

Data Modeling : ERWIN, TOAD, MS Visio .

DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Visual Studio, R-Studio

Big Data: Hadoop, Hive, Map reduce, scoop, Impala

IDE’S: NetBeans, Eclipse, PyCharm, PyScripter, PyStudio

Operating System: LINUX, Windows

Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Ralph Kimball and Bill Inmon, Waterfall Model.

PROFESSIONAL EXPERIENCE:

Confidential - Cincinnati, OH

Data Scientist

Responsibilities:

  • Participated in all phases of project life including data collection, data mining data cleaning, developing models, validation, and reports creating.
  • Responsible for reporting of findings that will use gathered metrics to infer and draw logical conclusions from past and future behavior.
  • Performed Time series analysis, Multinomial Logistic Regression, Random Forest, Decision Tree, SVM
  • Used Principal Component Analysis & Factor Analysis in feature engineering to analyze high dimensional data in Python.
  • Worked on classification/scripting of multiple attribute models by applying SVM and Regular Expressions given product features like title, description etc. & predicting product attribute values using Python
  • Used R machine learning library to build and evaluate different models.
  • Implemented rule-based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
  • Collected data needs and requirements by interacting with the other departments.
  • Created various types of data visualizations using Python and Tableau schemas.
  • Communicated the results with operations team for taking best decisions.
  • Develop new and effective analytics algorithms and wrote the key pieces of mission-critical source code.
  • Extracted patterns in the structured and unstructured data set and displayed them with interactive charts using ggplot2 and ggiraph packages in R.
  • Built initial models using supervised classification techniques like K-Nearest Neighbor (KNN), Logistic Regression and Random Forests.
  • Used a variety of NLP methods for information extraction, topic modeling, parsing, and relationship extraction.
  • Experience in Machine learning using NLP text classification using Python.
  • NLP (text mining and analysis, topic modeling, Ngram, and Emotion analysis) to extract clinical data from text
  • Used nltk package in python to work on Natural Language Processing (NLP) tasks
  • Used R and Python for programming for improvement of model. Upgraded the entire models for improvement of the product.
  • Performed data Transformation method for Re scaling and Normalizing Variables
  • Used packages like Dplyr, tidyr& ggplot2 in R Studio for data visualization and generated scatter plot and high low graph to identify relation between different variables.
  • Created various types of data visualizations using Python and Tableau.
  • Implemented advanced machine learning algorithms including regression trees, kernel PCA, among other methods in Python and R and in other tools and languages as needed.
  • Designing a machine learning pipeline to predict and prescribe and Implemented a machine learning scenario for a given data problem.

Environment: Python, R/R Studio, Tableau, PL /SQL

Confidential - Frederick, MD

Data Scientist

Responsibilities:

  • Evaluating the data analytics opportunities to improve the efficiency of claims handling process like Fraud Detection
  • Utilized various data analysis and data visualization tools to accomplish data analysis, report design and report delivery.
  • Create statistical models based on researched information to provide conclusions that will guide the company and the industry into the future.
  • Taking care of missing data after import and encoding the categorical data, when needed.
  • Splitting the data into training set, test set and scaling the data in training set and test set, if necessary.
  • Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
  • Impact of marketing tactics on sales and then forecast the impact of future sets of tactics.
  • Developed SQL code to extract data from various databases
  • Used R and python for Exploratory Data Analysis and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
  • As part of my job responsibilities, I have implemented different machine learning models like regression, classification and clustering.
  • Used Python, R and SQL to create Statistical algorithms involving Linear Regression, Logistic Regression, Random forest, Decision trees for estimating the risks.
  • Developed statistical models to forecast inventory and procurement cycles.
  • Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
  • I have implemented machine learning algorithms as part of my job responsibilities using sci-kit learning.
  • Work with a range of proprietary, industry standard, and open source data stores to assemble and organize and analyze data.
  • Visualizations, Summary Reports and Presentations using R and Tableau.

Environment: R, Python, Tableau, SQL Server

Confidential - Lewisville, TX

Sr. Data Analyst

Responsibilities:

  • Worked closely with the Data Governance Office team in assessing the source systems for project Deliverables.
  • Used T-SQL queries to pull the data from disparate systems and Data warehouse in different environments.
  • Used Data Quality validation techniques to validate Critical Data elements (CDE) and identified various anomalies.
  • Presented DQ analysis reports and score cards on all the validated data elements and presented- to the business teams and stakeholders.
  • Involved in defining the Source To Target data mappings, business rules, data definitions.
  • Extensively used open source tools - R Studio(R) and Spyder (Python) for statistical analysis and building the machine learning.
  • Interacting with the Business teams and Project Managers to clearly articulate the anomalies, issues, findings during data validation.
  • Performing Data Validation / Data Reconciliation between disparate source and target systems (Salesforce, Cisco-UIC, Cognos, Data Warehouse) for various projects.
  • Extracting data from different databases as per the business requirements using Sql Server Management Studio (SSMS).
  • Writing complex SQL queries for validating the data against different kinds of reports generated by Cognos.
  • Extensively using MS Excel (Pivot tables, VLOOKUP) for data validation.
  • Interacting with the ETL, BI teams to understand / support on various ongoing projects.
  • Generating weekly, monthly reports for various business users according to the business requirements. Manipulating/mining data from database tables (Redshift, Oracle, Data Warehouse).
  • Create automated metrics using complex databases.
  • Providing analytical network support to improve quality and standard work results.
  • Experience with Version Control (Git)
  • Create data pipelines using big data technologies like Hadoop, spark etc.
  • Create statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
  • Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Hadoop, Spark, Map Reduce, Pig and others
  • Interface with other technology teams to extract, transform, and load (ETL) data from a wide variety of data sources
  • Provides input and recommendations on technical issues to BI Engineers, Business & Data Analysts and Data Scientists.

Environment: AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, SPSS, MySQL, Eclipse, PL/SQL, SQL connector, Tableau

Confidential - San Antonio, TX

Data Analyst

Responsibilities:

  • Gathering the requirements by interacting heavily with the business users, multiple technical teams to design and develop the workflows for the new functional piece.
  • Collaborated with various business stakeholders to create Business Requirement Document (BRD), translated gathered high-level requirements into a Functional Requirement Document (FRD) to assist implementation side SMEs and developers, along with data flow diagrams, user stories and use cases
  • Part of a Scrum Agile team.
  • Experience in SQL joins, sub queries, tracing and performance tuning for better running of queries
  • Extensively used joins and sub queries for complex queries involving multiple tables from different databases.
  • Performance tuning of stored procedures and functions to optimize the query for better performance.
  • Successfully implemented indexes on tables for optimum performance.
  • Developed complex stored procedures using T-SQL to generate Ad-hoc reports within SQL Server Reporting services.
  • Strong analytical, problem solving skills coupled with interpersonal, and leadership skills
  • Interaction with the clients to gather out the requirements and assist them in immediate workarounds for the issues with the application.
  • Developed detailed test scenarios as documented in business requirements documents, assisted the test team with UAT.
  • Real time usage of Tableau for analytical purpose

Environment: MY SQL, MS Power Point, MS Access, T- SQL, MS Power Point, MS Access, T-SQL, DTS, SSIS, SSRS, SSAS, ETL, MDM, Teradata.

Confidential

Analyst

Responsibilities:

  • Assisted Product Manager in documenting business / functional requirements and defining product features
  • Performed various SQL / database activities to support data review, data mapping, data quality review, data validation and report generation
  • Participated in the development of end-to-end test plan with regression strategy
  • Developed QA test plans, test conditions, test cases and test scripts to ensure complete and adequate testing as well as coordinating / conducting user acceptance testing (UAT)
  • Monitored testing progress and performance of testing including open defects
  • Performed functional, UI, performance and back-end testing, in conjunction with the QA team
  • Involved in analyzing functional specifications and created manual test plans and test cases
  • Collaborated with the development team to solve the problems encountered in the test scenario runs
  • Documented and communicated test results to the relevant stakeholders including Product Manager.

Environment: UNIX, MS Visio, MS Project, MS SharePoint, HTML, Windows, Oracle, PL/SQL

We'd love your feedback!