We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

Dallas, TX

PROFESSIONAL SUMMARY:

  • Have around 5 years of extensive IT experience with 8 years of experience in data science with excellent integration of machine learning algorithms on statistical data. Performed Advanced Analytics, Predictive Modeling and Data Science to solve business issues enabling fact - based decision-making.
  • Significant expertise in data acquisition, storage, analysis, integration, machine learning, Predictive modeling, logistic regression, decision trees, data mining methods, forecasting, factor analysis, Ad hoc analysis, A/B testing, multivariate testing, time series analysis, cluster analysis, ANOVA, neural networks and other advanced statistical and econometric techniques.
  • Expertise includes abstracting and quantifying the computational aspects of the problems, designing and applying new statistical algorithms, as well as systems-level software design and implementation in different platforms e.g. R, SAS, Python, Spark. Experience in applying machine learning and statistical modeling techniques to solve business problems.
  • Expert in distilling vast amounts of data to meaningful discoveries at requisite depths. Ability to analyze most complex projects at various levels.
  • Experience in building big data data-intense applications and products using Hadoop ecosystem components like Hadoop, Pig, HIVE, Sqoop, Apache spark, Apache Kafka.
  • The experience of working in text understanding, classification, pattern recognition, recommendation systems, targeting systems and ranking systems using Python.
  • A deep understanding of Statistical Modeling, Multivariate Analysis, Big data analytics and Standard Procedures Highly efficient in Dimensionality Reduction methods such as PCA (Principal component Analysis) etc. Implemented bootstrapping methods such as Random Forests (Classification), K-Means Clustering, KNN (K-nearest neighbors), Naive Bayes, SVM (Support vector Machines), Decision Tree, Linear and Logistic Regression Methods.
  • Experienced in job workflow scheduling and monitoring tools like Oozie and ESP. Experience using various Hadoop Distributions (PIVOTAL, Hortonworks, MapR etc) to fullyimplement and leverage new Hadoop features.
  • Experienced in requirement analysis, application development, application migration and maintenance using Software Development Lifecycle (SDLC).
  • Visualization and dash boarding using Tableau, Python’s Matplotlib, graphing in R.

TECHNICAL SKILLS:

Languages: Python, R, Scala, Java

Machine Learning: Linear regression, Logistic Regression, Naive Bayes, SVM, Decision Trees, RandomForest, Boosting, Kmeans, Bagging etc

Machine learning library: Spark ML, Spark Mllib, Scikit-Learn. NLTK & Stanford NLP

Deep Learning: Tensorflow, Keras, CNN, RNN, NLP- RNN, Deep Neural Nets

Big Data: Hadoop, MongoDB, Hive, Map Reduce, Spark, Cassandra

Databases: MySQL, PostgreSQL, NoSQL

IDE: Jupyter notebook, Spyder, Eclipse, Rstudio

Text Analytics: Stemming, NLTK, Pandas, TFIDF Vectorizer, Word Cloud

Visualization: Tableau, PowerBI, Matlotlib,Seaborn,GGplot

EXPERIENCE:

Confidential, Dallas, TX

Data Scientist

Responsibilities:

  • Played key role in optimizing and benchmarking the Classification models in order to standardize the results across different departments.
  • Applied advanced statistical and predictive modeling techniques to build, maintain, and improve on multiple real-time decision systems. Conducted advanced data analysis and developed complex algorithms.
  • Built models using Statistical techniques and Machine Learning classification models like XG Boost, SVM, and Random Forest. Developed and design advanced predictive analysis models. Model and frame business scenarios that are meaningful and impact critical business processes and/or decisions.
  • Worked with Big Data Technologies such Hadoop, Hive, MapReduce. Extracted data from HDFS and prepared data for exploratory analysis using data munging. Designed experiments, tested hypothesis, and built models.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Worked extensively with AWS services like EC2, S3, VPC, ELB, AutoScalingGroups, Route 53, IAM, CloudTrail, CloudWatch, CloudFormation, CloudFront, SNS, and RDS.
  • Wrote Ansible Playbooks with Python SSH as the Wrapper to Manage Configurations of AWS nodes and Tested Playbooks on AWS instances using Python.
  • Perform data/systems analysis to determine best BI solution (reports, dashboards, scorecards, data cubes, etc) using Tableau.
  • Develop load scripts for extracting, transforming and loading data into Tableau applications.
  • Design and develop new interface elements and objects as required, developed Macros, SET ANALYSIS to provide custom functionality using Tableau
  • Wrote scripts in Python using Apache Spark and ElasticSearch engine for use in creating dashboards
  • Developed and presented clear concise recommendations outlining alternatives and key decision criteria. Prepared graphs using GGplot library and Tableaufor an overview of the analytical models and results.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.

Environment: AWS (S3, EC2), Python (Scikit Learn), Tableau, Tensorflow, Linux Systems(Ubantu), Hive, MongoDB, SQL, Apache Spark, Apache Hadoop. Major Model tested: Neural Networks, SVM, Logistic Regression, k-Nearest Neighbor (kNN): Decision Tree. Ensemble Trees: Random Forest, GBMboost, XGboost

Confidential, Hanover, MD

Data Scientist

Responsibilities:

  • Developed computational and data science solutions for the storage, management, analysis, and visualization of genomic data.
  • Leveraged existing tools and publicly available genomics data to develop, test, or implement bioinformatics pipelines.
  • Extracted patent text and numerical features with python library Beautiful Soup, created Decision Tree algorithm to predict the patent classification on their Diseases.
  • Detected the near-duplicated news by applying NLP methods (e.g. word2vec) and developing machine learning models like label spreading, clustering
  • Provided expertise in statistical methods or machine learning with the goal of applying these techniques to health data.
  • Worked with Mobile Science 2.0, Mobile App teams to build a Classifier for Mobile App users that could be used by the digital marketing team to tailor specific messages to groups of users.
  • Used regulatory genomics/epigenetics & computational approaches in genetics and Patient data to perform clustering to group patients with similar diseases.
  • Algorithms implemented in Python, SQLite, Hadoop, MapReduce, MongoDB, R.
  • Worked with Big Data Technologies such Hadoop, Hive, MapReduce.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Created complex formulas and calculations within Tableau to meet the needs of complex business logic.
  • Combined Tableau visualizations into Interactive Dashboards using filter actions, highlight actions etc., and published them to the web.
  • Developed various data connections from data source to Tableau Desktop for report and dashboard development

Environment: NLP, Python, Hadoop, MapReduce, Tableau, Spark, Hive, R. Major models tested: K-Means Clustering, SVM, Decision Tree based models: CART, CHAID, Information Gain, Random Forest

Confidential, Minneapolis, MN

Data Scientist

Responsibilities:

  • Worked on data cleaning and reshaping, generated segmented subsets using numpy and Pandas in Python. Developed Python scripts to automate data sampling process. Ensured the data integrity by checking for completeness, duplication, accuracy, and consistency.
  • Worked on model selection based on confusion matrices, minimized the Type II error. Generated cost-benefit analysis to quantify the model implementation comparing with the former situation.
  • Wrote and optimized complex SQL queries involving multiple joins and advanced analytical functions to perform data extraction and merging from large volumes of historical data stored in Oracle 11g, validating the ETL processed data in target database.
  • Conducted model optimization and comparison using stepwise function based on AIC value.
  • Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, Gradient Boosting Machine to build predictive model using scikit-learn package in Python.
  • Continuously collected business requirements during the whole project life cycle. Identified the variables that significantly affect the target

Environment: Decision Tree, Logistic regression, Hadoop, Teradata, Python, MLLib, SAS, random forest, OLAP, HDFS, NLTK, SVM, JSON and XML

Confidential

Data Analyst

Responsibilities:

  • Ran SQL queries in Oracle database to analyze and manipulate data. Wrote SAS programs to performed ad-hoc analysis and data manipulation.
  • Created various SAS Reports, Tables, Graphs and Summary analysis on PMS systems being used in these properties.
  • Transferred data from Oracle database as well as MS Excel into SAS for analysis and used filters based on the analysis.
  • Used SAS Import/Export Wizard as well as SAS Programming techniques to extract data from Excel.Used SAS Base programming as well as SAS Enterprise Guide 4.0 to produce various reports, charts and graphs
  • Participated in the technology support team meeting to coordinate, review and determine appropriate hotel property software system for hotel property

Environment: SAS, SQLite, Hadoop, MapReduce, SQL, MS Excel.

Confidential

Java Developer

Responsibilities:

  • Actively participated in all the phases of SDLC including Requirements Collection, Design & Analysis of the Customer Specifications, Development and Customization of the application.
  • Developed the application using Agile/Scrum methodology which involves daily stand ups. Test driven development, continuous integration, demos and test automations.
  • Strong hands-on knowledge of Core JAVA, Web-Based Application, and OOPS concepts.
  • Developed Client Side technologies using HTML, CSS, and Java Script. Developed Server Side technologies using spring, Hibernate, Servlets/JSP, Multithreading.
  • Extensively worked with the retrieval and manipulation of data from the Oracle database by writing queries using SQL and PL/SQL. Web application development by Setting up an environment, configuring an application and Web Logic Application Server.
  • Hands on Experience in coding, unit testing, Integration testing and Bug fixing.

Environment: Oracle/SQL Server and PL/SQL, spring, Hibernate, Ant, Apache, Tomcat, JBOSS, Web logic, UNIX, RDBMS, HTML, CSS, Java Script, JDBC, Eclipse, Multithreading.

We'd love your feedback!