Data Scientist Resume San Antonio, TX - Hire IT People

SUMMARY

Above 6 years of experience in Data Science and Big Data in the Insurance, Healthcare Services and Business Industry with adept knowledge on Data Analytics, Machine Learning (ML), Predictive Modelling, A/B Testing, Natural Language Processing (NLP) and Deep Learning algorithms.
Proficient in Data cleaning, Exploratory data analysis (EDA) and Initial Data Analysis (IDA).
Experienced in facilitating the entire lifecycle of adata science project:DataExtraction,DataPre - Processing, Feature Engineering, Dimensionality Reduction, Algorithm implementation, Back Testing and Validation.
Expert knowledge in machine learning algorithms such as Ensemble Methods (Random forests), Linear, Polynomial, Logistic Regression, Regularized Linear Regression, SVMs, Deep Neural Networks, Extreme Gradient Boosting, Decision Trees, K-Means, K-NN, Gaussian Mixture Models, Hierarchical models, Naïve Bayes.
Well versed with dealing with Structured and Unstructured data, Time Series data and statistical methodologies like Hypothesis Testing, ANOVA, multivariate statistics, regression, classification, modeling, decision theory, time-series analysis and Descriptive statistics.
Proficient in Data transformations using log, square-root, reciprocal, cube root, square and complete box-cox transformation depending upon the dataset.
Concrete mathematical background in Statistics, Probability, Differentiation and Integration, Linear Algebra and Geometry
Proficient at wide varieties of Data Science programming languages Python, R, SQL, PySpark, Tableau, Sci-kit Learn, NumPy, SciPy and Pandas.
Experience with relational and non-relational databases such as MySQL, SQLite and SQL.
Adroit at employing various Data Visualization tools like Tableau, Matplotlib, Seaborn, ggplot2, and Plotly.
Extremely organized with demonstrated skills to perform several tasks and assignments simultaneously within the scheduled time.
Highly competent at wide varieties of Data Science programming languages and Big Data tools such as Python, R, SQL, Tableau, Sci-kit Learn, Hadoop, Spark, and Hive.
Excellent understanding of Hadoop architecture and complete understanding of Hadoop daemons and various components such as HDFS, YARN, Resource Manager, Node Manager, Name Node, DataNode and Map Reduce programming paradigm.
Experience exclusively on BIGDataECOSYSTEM using HADOOP framework and related technologies such as HDFS, MapReduce, HIVE, PIG, HBASE, FLUME, OOZIE, SQOOP and this includes working experience in Spark Core, Spark SQL, Spark Streaming and Kafka.
Experience with complexData processing pipelines, including ETL and Dataingestion dealing with unstructured and semi-structured Data.
Hands on experience in PySpark and Spark Streaming creating RDD's, applying operations like Transformation and Actions on it.
Good understanding of creating Conceptual DataModels, Process/ DataFlow Diagrams, Use Case Diagrams, Class Diagrams and State Diagrams.
Good communication and presentation skills, willing to learn, adapt to new technologies and third-party products.

TECHNICAL SKILLS

LANGUAGES: Python | R | SAS | MATLAB

DATABASE: SQL | NoSQL | Microsoft Access | Oracle | Teradata

CLOUD SERVICES: AWS (Athena, EMR, EC2, RDS, S3, CloudWatch, and KMS), Azure

METHODOLOGIES: Agile | Waterfall

PROFESSIONAL EXPERIENCE

Confidential, San Antonio, TX

Data Scientist

Responsibilities:

Working for Data and Analytics (Data Science) team in helping Marketing and Finance team in analyzing various data sources to promote the growth of insurance products.
Worked with marketing team in providing insights on the product sellers using Machine Learning Algorithms.
Built advanced Machine Learningclassification models likeXG Boost, Random Forest,KNN, SVM Regression and clustering algorithms Hierarchical Clustering and DBSCAN
Performed Dimensionality Reduction, Model selection and Model boosting methods using Principal Component Analysis (PCA), K-Fold Cross Validation and Gradient Tree Boosting.
Identified outliers and inconsistencies indataby conducting exploratorydataanalysis (EDA) using python NumPy and Seaborn to see the insights ofdataand validate each feature.
Performed NLP by using techniques like spaCy, Word2Vec, FastText, Bag of Words, tf-idf, Doc2Vec
Applied transformer and recurrent language models BERT, GPT2, LSTM
Developed NLP with Deep Learning algorithms for analyzing text improving over their existing dictionary-based approaches
Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database
Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and load data into HDFS
Analyzed data to identify glitches and cleaning it to reduce the distortions
Applied multiple Machine Learning (ML) and Data Mining techniques to improve the quality of product ads and personalized recommendations
Created distributed environment of TensorFlow across multiple devices (CPU’s and GPU’s) and run them in parallel
Optimal usage of tensorcores and FP16 training, Using TensorRT for model optimization before deployment
Managed AWS EC2 instances using Autoscaling groups and used ticketing tools like JIRA to monitor work
Created various types of data visualizations using Tableau, Power BI, Qlik and other libraries like Matplotlib, Seaborn, ggplot2
Collaborated with internal business partners to identify needs, recommended improvements

Confidential, Raleigh, NC

Data Scientist

Responsibilities:

Performed Sentimental analysis in NLP on the email feedback of the customers to determine the tone behind the series of words by Neural Networks techniques like Long-Short Term Memory (LSTM) cells in Recurrent Neural Networks (RNN)
Deployed Keras and TensorFlow for NLP implementation and trained using cyclic learning rate schedule
Applied Time Series model - ARIMA, SARIMAX to analyze sales data.
Used Long-Short Term Memory (LSTM), Prophet for analyzing Time Series Data
Coordinated the execution of A/B tests to measure the effectiveness of a personalized recommendation system.
Utilized t-Stochastics Neighborhood Embedding (t-SNE) and Principal Component Analysis(PCA) and to deal with curse of dimensionality
CreatedNeo4jgraph visualizations of data flow from source to product and performed data lineage searches
Proficient at statistical metrics like F-Score, AUC/ROC, Confusion Matrix and RMSE to evaluate different model performance
Implemented descriptive statistics and hypothesis testing using Chi-square, T-test, Pearson correlation and Analysis of variance (ANOVA).
Generated ETL mappings, sessions and workflows based on business user requirements to stack data from source files, RDBMS tables to target tables
Extensively used HiveQL andSpark SQLquery to extract the meaningful data and administered to external Hive Table
Implemented CRUD functionalities of the API to handle query requests fromNeo4jdatabase
Performeddata visualization using various libraries and designed dashboards with Tableau, generated complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders
Used GIT HUB as version control software to manage the source code and to keep track of changes to files which is fast and light weight system.

Confidential

Data Scientist

Responsibilities:

Developed a Churn model for the marketing team to reduce the retention rate of customers
Analyzed very large data sets to develop insights that increase traffic monetization and merchandise sales without compromising shopper experience
Built a Proof of Concept (POC) by researching the user behavior and historical trends and developed a fraud detection model strategy using Random Forests and Decision Trees
Experimented with multiple classification algorithms, such as Linear Regression, Logistic Regression, Support Vector Machine (SVM), Random Forest, Ada boost and Gradient boosting using Python, Scikit-Learn and evaluated the performance on customer discount optimization
Analyzed data to identify glitches and cleaning it to reduce the distortions
Performed Chi-Square test, ANOVA test to identify significance between data samples
Used Forecasting and ARIMA model for time-series analysis of customer behavior and purchase
Applied multiple Machine Learning (ML) and Data Mining techniques to improve the quality of product ads and personalized recommendations
Developed NLP with Deep Learning algorithms for analyzing text improving over their existing dictionary-based approaches
Addressed overfitting by implementing regularization methods like L1 and L2 in algorithms
Collaborated in building of high-performance low latency system to manage high velocity data streams
Employed statistical tests such as hypothesis testing, t-test, confidence intervals, error measurements
Performed various data manipulation techniques in statistical analysis like missing data imputation, indexing, merging, and sampling
Performed Exploratory Data Analysis (EDA) to maximize insight into the dataset, detect the outliers and extract important variables numerically and graphically
Created SSIS packages for transferring data from various data sources like SAP R3 system, Oracle, MS Access, Excel, .txt file, .csv files
Worked with Hadoop Ecosystem covering HDFS, HBase, YARN and MapReduce
Developed Hive UDF's to bring all the customers emails into a structured format
Created different charts such as Heatmaps, Bar charts, Line charts
Worked in creating different visualizations in Tableau using Bar charts, Line charts, Pie charts, Maps, Scatter Plot charts, and Table reports

Confidential

Data Engineer

Responsibilities:

Queried data from SQL server, imported other formats of data and performed data checking, cleansing, manipulation, and reporting using SAS (Base and Macro) and SQL
Data Warehousing experience with Oracle, Redshift, Teradata and other MPP databases
Extracted the data from different sources like Teradata and DB2 into HDFS and HBase using Sqoop
Developed Oozie Workflows for daily incremental loads to get data from Teradata and imported them into Hive tables
Applied resampling methods like Synthetic Minority Over Sampling Technique (SMOTE) to balance the classes in large data sets
Performed Ad hoc analysis and reported the data using SAS and Excel
Collaborated with database engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction and merging from SQL server database
Partnered with the data science team to prepare structured and unstructured data that they can use for predictive and prescriptive modeling
Implemented Installation and configuration of multi-node cluster on Cloud using AWS
Resolved classification algorithms such as Linear Regression, Logistic Regression, and K-NN to predict the customer churn and customer interface
Used Python to transform datafrom nested JSON, and various formats into usable data
Developed Spark code using Python and Spark-SQL for faster testing and processing of data in real time
Developed full life cycle ofDataLake,DataWarehouse with Bigdata technologies like Spark and Hadoop
CreatedKafkatopics, provided ACLs to users and setup mirror maker to transfer the databetween twoKafkaclusters
Involved in team meetings, discussions with business teams to understand the business use cases
Experienced in using Linux environment and working under matrix organizational structure

Confidential

Business Analyst

Responsibilities:

Interacted with project manager and SMEs of different divisions (Information Technology, Risk and Operations) to established a business analysis and design methodology around the RUP
Understood and articulated business requirements through interviews, surveys, and observing underwriters performing daily tasks
Analyzed existing island automated systems. Performed gap analysis
Interacted with heads of underwriting department to finalize the business requirements for the integrated underwriting solution.
Prepared BRD and supporting documents containing the essential business elements, detailed definitions, and descriptions of the relationships between the actors to analyze and document business data requirements.
Translated requirements into functional and technical specifications.
Designed Use Cases, Use Case diagrams, Activity diagrams, Sequence diagrams in UML methodology using Rational Rose.
Conducted Joint Application Development (JAD) sessions with the IT Group.
Worked closely with the UI team to model the screens, which met user defined requirements, organizational and regulatory standards. .
Assisted with Test Cases and developed strategies with Quality Assurance group to implement them. Efficiently responded to client inquiries and resolved discrepancies. Identified, prioritized, tested and proved essential business functions to assure compliance with vendor and internal auditing.
Conducted UAT and collaborated with Quality Assurance Analyst in Rational Clear Quest to track defects and used Rational Clear Case to maintain consistency in the builds.

We provide IT Staff Augmentation Services!

Data Scientist Resume

San Antonio, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship