We provide IT Staff Augmentation Services!

Data Scientist/machine Learning/data Analyst Resume

2.00/5 (Submit Your Rating)

Durham, NC

PROFESSIONAL SUMMARY:

  • Around 7 years of hands on experience and comprehensive industry knowledge of Machine Learning, Statistical Modeling, Data Analytics, Data Modeling, Data Architecture, Data Analysis, Data Mining, Text Mining & Natural Language Processing (NLP), Artificial Intelligence algorithms, Business Intelligence, Analytics Models (like Decision Trees, Linear & Logistic Regression, Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and Postgre SQL, Erwin.
  • Strong knowledge in all phases of the SDLC (Software Development Life Cycle) from analysis, design, development, testing, implementation and maintenance.
  • Experienced in Data Modeling techniques employing Data warehousing concepts like star/snowflake schema and Extended Star.
  • Expertise in applying data mining techniques and optimization techniques in B2B and B2C industries.
  • Expertise in writing functional specifications, translating business requirements to technical specifications, created/maintained/modified database design document with detailed description of logical entities and physical tables.
  • Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco - system.
  • Expertise in Data Analysis, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Center.
  • Proficient in Machine Learning, Data/Text Mining, Statistical Analysis & Predictive Modeling.
  • Expertise in data acquisition, storage, analysis, integration, predictive modeling, logistic regression, decision trees, data mining methods, forecasting, factor analysis, cluster analysis, ANOVA and other advanced statistical techniques.
  • Excellent knowledge and experience in OLTP/OLAP System Study with focus on Oracle Hyperion Suite of technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional and multidimensional modeling, physical and logical Data modeling using Erwin tool,
  • Experienced in building data models using machine learning techniques for Classification, Regression, Clustering and Associative mining.
  • Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer.
  • Working experience in Hadoop ecosystem and Apache Spark framework such as HDFS, Map Reduce, HiveQL, SparkSQL, PySpark.
  • Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.
  • Proficient in data visualization tools such as Tableau, Python Matplotlib, R Shiny to create visually powerful and actionable interactive reports and dashboards.
  • Excellent Tableau Developer, expertise in building, publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau(9.x/10.x).
  • Experienced in Agile methodology and SCRUM process.
  • Strong business sense and abilities to communicate data insights to both technical and nontechnical clients.

TECHNICAL SKILLS:

Libraries: Scikit-learns, Keras, TensorFlow, Numpy, Pandas, NLTK, Gensim, Matplotib, ggplot2, Scrapy, BeautifulSoup, Seaborn, Bokeh, networkx, Stats models, Theano

Programming Languages: Python, R, SQL, Scala, Pig, C, MATLAB, Java

Querying languages: SQL, NO SQL, PostgreSQL, MySQL, Microsoft SQL

Machine Learning: Data Preprocessing, Weighted Least Square, PCR, PLS, Picewise, Spline, Quadratic, Discriminant Regression, Logistic Regression, Naive Bayes, Decision Tree, Random, Forest, KNN, Linear Regression, Lasso, Ridge, SVM, Regression Tree, K-means, Ridge and Lasso, Polynomial Regression, Azure, Perceptron, Back Propagation, PCA, LDA. UML, RDF, SPARQL

Visualization Tools: Tableau, Python - Matplotlib, Seaborn

Databases: MySQL, SQL Lite

IDE Tools: PyCharm, Spyder, Eclipse, Visual Studio and NetBeans, Amazon Sagemaker

Project Management: JIRA, Share Point

SDLC Methodologies: Agile, Scrum, Waterfall

Deployment Tools: Anaconda Enterprise, R-Studio, Azure Machine Learning Studio, Oozie, AWS Lambda

PROFESSIONAL EXPERIENCE:

Confidential, Durham, NC

Data Scientist/Machine Learning/Data Analyst

Responsibilities:

  • Developed Fraud detection models to predict the likelihood of customer fraudulent activity, suspicious links, and subtle behavior patterns based on customer attributes like customer’s revenue, type of industry, competitor products and growth rates etc. The models deployed in production environment helped detect in advance and aided sales/marketing teams plan for various retention strategies like price discounts, custom licensing plans etc.
  • Extracted data from HDFS and prepared data for exploratory analysis using data munging
  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XGBoost, SVM, and Random Forest.
  • Participated in all phases of data mining, data cleaning, data collection, developing models, validation, visualization and performed Gap analysis.
  • A highly immersive Data Science program involving Data Manipulation&Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, MongoDB, Hadoop.
  • Setup storage and data analysis tools in AWS cloud computing infrastructure.
  • Installed and used Caffe Deep Learning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
  • Used pandas, numpy, seaborn, matplotlib, scikit-learn, scipy, NLTK in Python for developing various machine learning algorithms.
  • Data Manipulation and Aggregation from different source using Nexus, Business Objects, Toad, Power BI and Smart View.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Coded proprietary packages to analyze and visualize SPCfile data to identify bad spectra and samples to reduce unnecessary procedures and costs.
  • Programmed a utility in Python that used multiple packages (numpy, scipy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, Naive Bayes, KNN.
  • As Architect delivered various complex OLAPdatabases/cubes, scorecards, dashboards and reports.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Used Teradata utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Validated the machine learning classifiers using ROC Curves and Lift Charts.

Environment: Unix, Python 3.5.2, MLLib, SAS, regression, logistic regression, Hadoop 2.7.4, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML and MapReduce.

Confidential, NC

Data Scientist/Data Analyst

Responsibilities:

  • I have also developed a recommender system for the sales team so that users can approach clients with recommendations based on their previous purchases and there is a Reinforcement model using Thompson sampling to help ad campaign to perfectly place our ads on the internet.
  • Developed Fraud Detection using Artificial Neural Network to detect fraud from continuous data from Hadoop cluster, SQL to retrieve data from Oracle database and used ETL for data transformation.
  • Derived data from relational databases to perform complex data manipulations and conducted extensive data checks to ensure data quality. Performed Data wrangling to clean, transform and reshape the data utilizing Numpy and Pandas library.
  • We have worked with data-sets of varying degrees of size and complexity including both structured and unstructured data and Participated in all phases of Data mining, Data cleaning, Data collection, variable selection, feature engineering, developing models, Validation, Visualization and Performed Gap analysis.
  • Developed predictive models on large scale datasets to address various business problems through leveraging advanced statistical modeling, machine learning and deep learning.
  • Perfectly Utilized machine learning algorithms such as clustering, linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Extensively used Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn, SciPy, NLTK in Python for developing various machine learning algorithms.
  • Python programming language for graphically critiquing the datasets and to gain insights to interpret the nature of the data.
  • Researched extensively on the nature of the customers and designed multiple models to perfectly fit the necessity of the client and Performed Extensive Behavioral modeling and Customer Segmentation to discover behavior patterns of customers by using K-means Clustering.
  • Designed and implemented a Recommendation system which leveraged Google Analytics data and the machine learning models and utilized Collaborative filtering techniques to recommend products for different customers.
  • Super Intended usage of open source tools - R Studio(R) and Spyder(Python) for statistical analysis and building the machine learning models. Involved in defining the Source to Target data mappings, Business rules, data definitions.
  • Reinforced the model Performance parameter optimization using Grid search and metric evaluation via regression (RMSE, R2, MSE etc.), classification (Accuracy, precision, recall etc.), threshold calculations using ROC plot to increase the efficiency of the model.
  • Designed and developed standalone data migration applications to retrieve and populate data from Azure Table
  • / BLOB storage to Python, HDInsight and Power BI.
  • Implemented Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to AWS (S3/EC2) and Django platform for the company's core business.
  • Performed hyper parameter tuning by formulating Distributed Cross Validation in Apache Spark to speed up the computation process.
  • Participated in features engineering such as feature generating, PCA, Feature normalization and label encoding with Scikit-learn preprocessing. Data Imputation using variant methods in Scikit-learn package in Python.
  • Implemented batch and real-time model scoring to drive actions. Developed proprietary machine learning algorithms to build customized solutions that go beyond standard industry tools and lead to innovative solutions.
  • Utilized Informatica toolset (Informatica Data Explorer and Data Quality) to inspect legacy data for data profiling.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data.
  • Created reports, dashboards and Sophisticated Visualizations by using Tableau, to explain and communicated data insights, significant features, model's score and performance to perfectly elucidate for both technical and business teams.
  • Work with business stakeholders to refine and respond to their ad hoc requests, and improve their existing reporting and dashboards as necessary.

Environment: Python 3.6.4, R Studio, MLLib, Regression, NoSQL, SQL Server, Hive, Hadoop Cluster, ETL, Spyder 3.6, Agile, Tableau, Java, NumPy, Pandas, Matplotlib, Scikit-Learn, Seaborn, e1071, ggplot2, Shiny, Tensorflow, AWS (EC2, S3), Azure, HTML, XML, Informatica Power Center, Teradata.

Confidential, Dearborn, MI

Data scientist/Data Analyst

Responsibilities:

  • As a Data Scientist, I was responsible for coordinating with Financial Operations teams to extract data and perform ad hoc statistical analyses on complex business problems. Our team developed and automated insight-driven reporting, incorporating statistical techniques where appropriate. My responsibilities also consist of
  • Research and work with technical teams to implement new and emerging technologies that will facilitate better data integrity, reliability, and enrichment for quantitative solutions.
  • Supervised data collection and reporting. Ensure relevant data is collected at designated stages, entered into appropriate database(s) and reported appropriately. Monitored assignments to assure distribution of workload and the assessment of collection efforts.
  • Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis.
  • Hands on expertise in working with different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • I have worked with various kinds of data (open-source as well as internal). I have developed models for labeled and unlabeled datasets, and have worked with big data technologies, such as Hadoop and Spark, and cloud resources, like Azure and Google Cloud.
  • Enforced F-Score, AUC/ROC, Confusion Matrix, Precision, and Recall evaluating different model’s performance.
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Surveyed Deep learning tools on NLP tasks, and extracted information from documents based on part-of-speech tagging, Syntactic parsing, and Named entity recognition.
  • Exported trained models into Protobuf to be served by Tensorflow Serving and performed integration job with client's application.
  • Initiated research into the development of social networking analytical system targeting the Gaming industry using NLP. The automated system is intended to identify social sentiment and key opinion leaders, including their degree of connectedness and reach in social media and the blogosphere.
  • Utilized SAS for developing Pareto Chart for identifying highly impacting categories in modules to find the work distribution and created various data visualization charts.
  • Utilized Sqoop to ingest real-time data. Used analytics libraries SciKit-Learn, MLLIB and MLxtend.
  • Optimized algorithm with Stochastic gradient descent algorithm Fine-tuned the algorithm parameter with manual tuning and automated tuning such as Bayesian Optimization.
  • Used Big data tools Spark (Sparksql, Mllib) to conduct real-time Data analysis.
  • Survey sentiment data. Sentiment scores from past surveys are captured in the latest and average note attitude score fields. The note attitude score is derived from game negative feedback only. If the note attitude is zero, the customer is more satisfied while as the number increases, satisfaction level decreases.

Environment: RedShift, Python 3.x (Scikit -Learn/ Keras/ SciPy/ NumPy/ Pandas/ Matplotlib/ NLTK/ Seaborn), R (ggplot2/ caret/ trees/ arules), Tableau (9.x/10.x), Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering/ Gaussian Mixture Model / Hierarchical Clustering/ Ensemble methods/ Collaborative filtering), JIRA, GitHub, Agile/ SCRUM

Confidential

Data Scientist/Machine Learning

Responsibilities:

  • Identify, analyze and interpret trends or patterns in complex data sets using various regression, classification or clustering ML approaches - Linear & Logistic Regression, Naive Bayes, Decision Trees, Random Forests, Clustering, SVM, Neural Networks, Principle Component Analysis, Bayesian, XGBoost,
  • Identify relevant data sources and sets to mine for client’s business needs and collect large structured and unstructured datasets and variables.
  • This project was focused on customer segmentation based on machine learning and statistical modeling and generate data products to support customer segmentation.
  • Performed data cleaning including transforming variables and dealing with missing value and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Analyzed and made inferences on test subjects based on self-cleaned data using R programming.
  • Develop a pricing model for various product and services bundled offering to optimize and predict the gross margin.
  • Developed predictive causal model using annual failure rate and standard cost basis for the new bundled service offering.
  • Performed Data cleaning process applied Backward - Forward filling methods on dataset for handling missing values.
  • Worked on Amazon Web Services (AWS) cloud services to do machine learning on big data.
  • Experience working with Big Data tools such as Hadoop - HDFS and MapReduce, Hive QL, Sqoop, Pig Latin and Apache Spark (PySpark).
  • Collaborating with the project managers and business owners to understand their organizational processes and help design the necessary reports.

Environment: MS SQL Server, R/R studio, SQL Enterprise Manager, Python, Red shift, MS Excel, Power BI, Tableau, T-SQL, ETL, MS Access, XML, MS office, Outlook, AS E-Miner.

Confidential

Data Analyst/ Jr Data Scientist

Responsibilities:

  • Collaborated with data engineers and operation team to implement the ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Performed data analysis by retrieving the data from the Hadoop cluster.
  • Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
  • Worked on feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing.
  • Used Python (NumPy, SciPy, pandas, Scikit-learn, seaborn) and R to develop a variety of models and algorithms for analytic purposes.
  • Worked on Natural Language Processing with NLTK module of python and developed NLP models for sentiment analysis
  • Experimented and built predictive models including ensemble models using machine learning algorithms such as Logistic regression, Random Forests, and KNN to predict customer churn.
  • Conducted analysis of customer behaviors and discover the value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering. Gaussian Mixture Model and Hierarchical Clustering.
  • Used F-Score, AUC/ROC, Confusion Matrix, Precision, and Recall evaluating different models’ performance.
  • Worked on Natural Language Processing with NLTK module of python and developed NLP models for sentiment analysis.
  • Developed Spark Python modules for machine learning & predictive analytics in Hadoop.
  • Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
  • The jobs were made to run successfully by solving Data quality issues using advanced knowledge of complex SQL, efficient coding practices, macros and stored procedures.

Environment: Hadoop, HDFS, Python, R, Tableau, Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering/ Hierarchical Clustering/ Ensemble methods/ Collaborative filtering), JIRA, GitHub, Agile/ SCRUM, GCP.

We'd love your feedback!