Data Scientist Resume
San Antonio, TX
SUMMARY
- Above 6 years of experience in Data Science and Big Data in the Insurance, Healthcare Services and Business Industry with adept knowledge on Data Analytics, Machine Learning (ML), Predictive Modelling, A/B Testing, Natural Language Processing (NLP) and Deep Learning algorithms.
- Proficient in Data cleaning, Exploratory data analysis (EDA) and Initial Data Analysis (IDA).
- Experienced in facilitating the entire lifecycle of adata science project:DataExtraction,DataPre - Processing, Feature Engineering, Dimensionality Reduction, Algorithm implementation, Back Testing and Validation.
- Expert knowledge in machine learning algorithms such as Ensemble Methods (Random forests), Linear, Polynomial, Logistic Regression, Regularized Linear Regression, SVMs, Deep Neural Networks, Extreme Gradient Boosting, Decision Trees, K-Means, K-NN, Gaussian Mixture Models, Hierarchical models, Naïve Bayes.
- Well versed with dealing with Structured and Unstructured data, Time Series data and statistical methodologies like Hypothesis Testing, ANOVA, multivariate statistics, regression, classification, modeling, decision theory, time-series analysis and Descriptive statistics.
- Proficient in Data transformations using log, square-root, reciprocal, cube root, square and complete box-cox transformation depending upon the dataset.
- Concrete mathematical background in Statistics, Probability, Differentiation and Integration, Linear Algebra and Geometry
- Proficient at wide varieties of Data Science programming languages Python, R, SQL, PySpark, Tableau, Sci-kit Learn, NumPy, SciPy and Pandas.
- Experience with relational and non-relational databases such as MySQL, SQLite and SQL.
- Adroit at employing various Data Visualization tools like Tableau, Matplotlib, Seaborn, ggplot2, and Plotly.
- Extremely organized with demonstrated skills to perform several tasks and assignments simultaneously within the scheduled time.
- Highly competent at wide varieties of Data Science programming languages and Big Data tools such as Python, R, SQL, Tableau, Sci-kit Learn, Hadoop, Spark, and Hive.
- Excellent understanding of Hadoop architecture and complete understanding of Hadoop daemons and various components such as HDFS, YARN, Resource Manager, Node Manager, Name Node, DataNode and Map Reduce programming paradigm.
- Experience exclusively on BIGDataECOSYSTEM using HADOOP framework and related technologies such as HDFS, MapReduce, HIVE, PIG, HBASE, FLUME, OOZIE, SQOOP and this includes working experience in Spark Core, Spark SQL, Spark Streaming and Kafka.
- Experience with complexData processing pipelines, including ETL and Dataingestion dealing with unstructured and semi-structured Data.
- Hands on experience in PySpark and Spark Streaming creating RDD's, applying operations like Transformation and Actions on it.
- Good understanding of creating Conceptual DataModels, Process/ DataFlow Diagrams, Use Case Diagrams, Class Diagrams and State Diagrams.
- Good communication and presentation skills, willing to learn, adapt to new technologies and third-party products.
TECHNICAL SKILLS
LANGUAGES: Python | R | SAS | MATLAB
DATABASE: SQL | NoSQL | Microsoft Access | Oracle | Teradata
BIG DATA TECHNOLOGIES: Hadoop | MapReduce | Hive | Pig | Kafka | Spark | Sqoop
TOOLS: AND UTILITIES: Jupyter | GIT | RStudio | Tableau | PyCharm | Spyder | Visual Studio | PostgreSQL | MySQL | SQLite | Microsoft SQL Server | MongoDB | Cassandra | Neo4j | JSON | MS Access | Mlib | | Redshift | HBase | SQL Server Management Studio (SSMS) | SQL Server Reporting Services (SSRS) | SQL Server Integration Services (SSIS) | Crystal Reports | Excel Power Pivot
MACHINE LEARNING: Logistic Regression | Linear Regression | Support Vector Machines | Decision Trees | Random Forests | Ensemble Models | K-Nearest Neighbors | Gradient Boost | Naïve Bayes | K-Means Clustering | Hierarchical Clustering | Density Based Clustering | Gaussian Mixtures | Principal Component Analysis | Natural Language Processing (NLP)
DEEP LEARNING: Artificial Neural Networks | Convolutional Neural Networks | Multi-Layer perceptron | Recursive Neural Networks | Recurrent Neural Networks | LSTM |BERT| GRU | SoftMax Classifier | Back Propagation | Chain Rule | Dropout
LIBRARIES: NumPy | SciPy | Pandas | Scikit-learn |Theano | TensorFlow | Keras | PyTorch | Caret | Statsmodel | XGBoost | NLTK | dplyr | nnet | Glmnet | H2O | mboost | MATLAB Neural Network Toolbox
GRAPH VISUALIZATION: Tableau | Power BI | Matplotlib | Seaborn | Plotly | ggplot2 | Graphviz | Qlik View | Geoplotlib
CLOUD SERVICES: AWS (Athena, EMR, EC2, RDS, S3, CloudWatch, and KMS), Azure
METHODOLOGIES: Agile | Waterfall
PROFESSIONAL EXPERIENCE
Confidential, San Antonio, TX
Data Scientist
Responsibilities:
- Working for Data and Analytics (Data Science) team in helping Marketing and Finance team in analyzing various data sources to promote the growth of insurance products.
- Worked with marketing team in providing insights on the product sellers using Machine Learning Algorithms.
- Built advanced Machine Learningclassification models likeXG Boost, Random Forest,KNN, SVM Regression and clustering algorithms Hierarchical Clustering and DBSCAN
- Performed Dimensionality Reduction, Model selection and Model boosting methods using Principal Component Analysis (PCA), K-Fold Cross Validation and Gradient Tree Boosting.
- Identified outliers and inconsistencies indataby conducting exploratorydataanalysis (EDA) using python NumPy and Seaborn to see the insights ofdataand validate each feature.
- Performed NLP by using techniques like spaCy, Word2Vec, FastText, Bag of Words, tf-idf, Doc2Vec
- Applied transformer and recurrent language models BERT, GPT2, LSTM
- Developed NLP with Deep Learning algorithms for analyzing text improving over their existing dictionary-based approaches
- Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and load data into HDFS
- Analyzed data to identify glitches and cleaning it to reduce the distortions
- Applied multiple Machine Learning (ML) and Data Mining techniques to improve the quality of product ads and personalized recommendations
- Created distributed environment of TensorFlow across multiple devices (CPU’s and GPU’s) and run them in parallel
- Optimal usage of tensorcores and FP16 training, Using TensorRT for model optimization before deployment
- Managed AWS EC2 instances using Autoscaling groups and used ticketing tools like JIRA to monitor work
- Created various types of data visualizations using Tableau, Power BI, Qlik and other libraries like Matplotlib, Seaborn, ggplot2
- Collaborated with internal business partners to identify needs, recommended improvements
Confidential, Raleigh, NC
Data Scientist
Responsibilities:
- Performed Sentimental analysis in NLP on the email feedback of the customers to determine the tone behind the series of words by Neural Networks techniques like Long-Short Term Memory (LSTM) cells in Recurrent Neural Networks (RNN)
- Deployed Keras and TensorFlow for NLP implementation and trained using cyclic learning rate schedule
- Applied Time Series model - ARIMA, SARIMAX to analyze sales data.
- Used Long-Short Term Memory (LSTM), Prophet for analyzing Time Series Data
- Coordinated the execution of A/B tests to measure the effectiveness of a personalized recommendation system.
- Utilized t-Stochastics Neighborhood Embedding (t-SNE) and Principal Component Analysis(PCA) and to deal with curse of dimensionality
- CreatedNeo4jgraph visualizations of data flow from source to product and performed data lineage searches
- Proficient at statistical metrics like F-Score, AUC/ROC, Confusion Matrix and RMSE to evaluate different model performance
- Implemented descriptive statistics and hypothesis testing using Chi-square, T-test, Pearson correlation and Analysis of variance (ANOVA).
- Generated ETL mappings, sessions and workflows based on business user requirements to stack data from source files, RDBMS tables to target tables
- Extensively used HiveQL andSpark SQLquery to extract the meaningful data and administered to external Hive Table
- Implemented CRUD functionalities of the API to handle query requests fromNeo4jdatabase
- Performeddata visualization using various libraries and designed dashboards with Tableau, generated complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders
- Used GIT HUB as version control software to manage the source code and to keep track of changes to files which is fast and light weight system.
Confidential
Data Scientist
Responsibilities:
- Developed a Churn model for the marketing team to reduce the retention rate of customers
- Analyzed very large data sets to develop insights that increase traffic monetization and merchandise sales without compromising shopper experience
- Built a Proof of Concept (POC) by researching the user behavior and historical trends and developed a fraud detection model strategy using Random Forests and Decision Trees
- Experimented with multiple classification algorithms, such as Linear Regression, Logistic Regression, Support Vector Machine (SVM), Random Forest, Ada boost and Gradient boosting using Python, Scikit-Learn and evaluated the performance on customer discount optimization
- Analyzed data to identify glitches and cleaning it to reduce the distortions
- Performed Chi-Square test, ANOVA test to identify significance between data samples
- Used Forecasting and ARIMA model for time-series analysis of customer behavior and purchase
- Applied multiple Machine Learning (ML) and Data Mining techniques to improve the quality of product ads and personalized recommendations
- Developed NLP with Deep Learning algorithms for analyzing text improving over their existing dictionary-based approaches
- Addressed overfitting by implementing regularization methods like L1 and L2 in algorithms
- Collaborated in building of high-performance low latency system to manage high velocity data streams
- Employed statistical tests such as hypothesis testing, t-test, confidence intervals, error measurements
- Performed various data manipulation techniques in statistical analysis like missing data imputation, indexing, merging, and sampling
- Performed Exploratory Data Analysis (EDA) to maximize insight into the dataset, detect the outliers and extract important variables numerically and graphically
- Created SSIS packages for transferring data from various data sources like SAP R3 system, Oracle, MS Access, Excel, .txt file, .csv files
- Worked with Hadoop Ecosystem covering HDFS, HBase, YARN and MapReduce
- Developed Hive UDF's to bring all the customers emails into a structured format
- Created different charts such as Heatmaps, Bar charts, Line charts
- Worked in creating different visualizations in Tableau using Bar charts, Line charts, Pie charts, Maps, Scatter Plot charts, and Table reports
Confidential
Data Engineer
Responsibilities:
- Queried data from SQL server, imported other formats of data and performed data checking, cleansing, manipulation, and reporting using SAS (Base and Macro) and SQL
- Data Warehousing experience with Oracle, Redshift, Teradata and other MPP databases
- Extracted the data from different sources like Teradata and DB2 into HDFS and HBase using Sqoop
- Developed Oozie Workflows for daily incremental loads to get data from Teradata and imported them into Hive tables
- Applied resampling methods like Synthetic Minority Over Sampling Technique (SMOTE) to balance the classes in large data sets
- Performed Ad hoc analysis and reported the data using SAS and Excel
- Collaborated with database engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction and merging from SQL server database
- Partnered with the data science team to prepare structured and unstructured data that they can use for predictive and prescriptive modeling
- Implemented Installation and configuration of multi-node cluster on Cloud using AWS
- Resolved classification algorithms such as Linear Regression, Logistic Regression, and K-NN to predict the customer churn and customer interface
- Used Python to transform datafrom nested JSON, and various formats into usable data
- Developed Spark code using Python and Spark-SQL for faster testing and processing of data in real time
- Developed full life cycle ofDataLake,DataWarehouse with Bigdata technologies like Spark and Hadoop
- CreatedKafkatopics, provided ACLs to users and setup mirror maker to transfer the databetween twoKafkaclusters
- Involved in team meetings, discussions with business teams to understand the business use cases
- Experienced in using Linux environment and working under matrix organizational structure
Confidential
Business Analyst
Responsibilities:
- Interacted with project manager and SMEs of different divisions (Information Technology, Risk and Operations) to established a business analysis and design methodology around the RUP
- Understood and articulated business requirements through interviews, surveys, and observing underwriters performing daily tasks
- Analyzed existing island automated systems. Performed gap analysis
- Interacted with heads of underwriting department to finalize the business requirements for the integrated underwriting solution.
- Prepared BRD and supporting documents containing the essential business elements, detailed definitions, and descriptions of the relationships between the actors to analyze and document business data requirements.
- Translated requirements into functional and technical specifications.
- Designed Use Cases, Use Case diagrams, Activity diagrams, Sequence diagrams in UML methodology using Rational Rose.
- Conducted Joint Application Development (JAD) sessions with the IT Group.
- Worked closely with the UI team to model the screens, which met user defined requirements, organizational and regulatory standards. .
- Assisted with Test Cases and developed strategies with Quality Assurance group to implement them. Efficiently responded to client inquiries and resolved discrepancies. Identified, prioritized, tested and proved essential business functions to assure compliance with vendor and internal auditing.
- Conducted UAT and collaborated with Quality Assurance Analyst in Rational Clear Quest to track defects and used Rational Clear Case to maintain consistency in the builds.