Data Analyst (python) Resume
Estero, FL
SUMMARY:
- Over 8 years of experience in Machine Learning, Deep Learning, Data Mining with large datasets of structured and unstructured data, Data Validation, Data acquisition, Data Visualization, Predictive Modeling and developed predictive models that help to provide intelligent solutions.
- Experience with statistical programming languages such as R and Python.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating Data Visualizations using R and Python.
- Hands on Experience on Customer Churn, Sales Forecasting, Market Mix Modeling, Customer Classification, Survival Analysis, Sentiment Analysis, Text Mining, Recommendation Systems.
- Experience in using Statistical procedures and Machine Learning algorithms such as ANOVA, Clustering, Regression and Time Series Analysis to analyze data for further Model Building.
- Strong mathematical knowledge and hands on experience in implementing Machine Learning algorithms like K - Nearest Neighbors, Logistic Regression, Linear regression, Naïve Bayes, Support Vector Machines, Decision Trees, =Random Forests, Gradient Boosted Decision Trees, Stacking Models.
- Experience in building models with Deep Learning frameworks like Tensor Flow and Keras.
- Expertise in Machine learning Unsupervised algorithms such as K-Means, Density Based Clustering (DBSCAN), Hierarchical Clustering and strong knowledge on Recommender Systems.
- Hands on experience in implementing Dimensionality Reduction Techniques like Truncated SVD, Principal Component Analysis, Confidential -Stochastics Neighborhood Embedding ( Confidential -SNE).
- Proficient in advising on the use of data for compiling personnel and statistical reports and preparing personnel action documents, patterns within data, analyzing data and interpreting results.
- Good knowledge on Deep Learning concepts like Multi-Layer Perceptron, Deep Neural Networks, Artificial Neural Networks, Convolutional Neural Networks, Recurrent Neural Networks.
- Hands on experience on Deep Learning Techniques such as Back Propagation, Choosing Activation Functions, Weight Initialization based on Optimizer, Avoiding Vanishing Gradient and Exploding Gradient Problems, Using Dropout, Regularization and Batch Normalization, Gradient Monitoring and Clipping Padding and Striding, Max pooling, LSTM.
- Experience in using Optimization Techniques like Gradient Descent, Stochastic Gradient Descent, Adam, Adadelta, RMS prop, Adagram.
- Experience in building models with Deep Learning frameworks like Tensor Flow and Keras.
- Actively involved in all phases of data science project life cycle including Data Extraction, Data Cleaning, Data Visualization and building Models.
- Extensive hands-on experience and high proficiency in writing complex SQL queries like stored procedures, triggers, joins and subqueries along with that used MongoDB for extraction data.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, MapReduce concepts, and ecosystems including Hive and Pig.
- Experience with data visualization using tools like GGplot, Matplotlib, Seaborn, Tableau, R Shiny and using Tableau software to publish and presenting dashboards, storyline on web and desktop platforms.
- Experienced in python data manipulation for loading and extraction as well as with python libraries such as NumPy, SciPy and Pandas and Spark 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
- Well experienced in Normalization, De-Normalization and Standardization techniques for optimal performance in relational and dimensional database environments.
- Proficient knowledge on Mathematical Matrix Operations, Statistics, Linear Algebra, Probability, Differentiation, Integration and Geometry.
- Extensive experience working in a Test-Driven Development and Agile-Scrum Development.
- Experience in Amazon Web Services (AWS) Cloud services like EC2, S3.
- Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization.
- Highly skilled in using visualization tools like Tableau, ggplot2 and d3.JS for creating dashboards.
- Worked and extracted data from various database sources like Oracle, SQL Server, DB2, Regularly accessing JIRA tool and other internal issue trackers for the Project development.
- Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
- Knowledge of working with Proof of Concepts (PoC's) and gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using Data Munging and Teradata.
- Well experienced in Normalization & De-Normalization techniques for optimum performance in relational and dimensional database environments.
- Experience in using GIT Version Control System. Implemented Kafka for building data pipeline and analytic modules.
- Power user of python libraries including NumPy, Pandas, SciPy, Scikit-learn, statsmodels, Requests, Matplotlib, Plotly, Seaborn, NLTK, Tensorflow, Keras, SQLAlchemy, Flask.
- Utilize NLP applications such as topic models and sentiment analysis to identify trends and patterns within massive data sets.
- In-depth understanding of Enterprise Data Warehouse system, Dimensional Modeling using Facts, Dimensions, Star Schema & Snowflake Schema, and OLAP Cubeslike MOLAP, ROLAP and HOLAP (hybrid). Executed various OLAP operations of Slicing, Dicing, Roll-Up, Drill-Down and Pivot in multidimensional data and analyzed reports in Analysis Toolpak in MS Excel.
- Excellent initiative, innovative thinking skills, and the ability to analyze details and adopt a big picture view and Excellent organizational, project management and problem-solving skills.
SKILL:
Languages: Python, SQL, Java, R, MATLAB
Databases: MySQL, Microsoft SQL Server, Oracle, MongoDB
Independent & pairwise ttests, one: way and two-way factorial ANOVA, Pearson's correlation;
Regression Methods: Linear, Multiple, Polynomial, Decision trees and Support vector;
Logistic, K: NN, Na ve Bayes, Decision trees and SVM;
K: means, DBSCAN, Hierarchical, Expectation maximization;
Association Rule Learning: Apriori, Eclat;
Reinforcement Learning: Upper Confidence Bound, Thompson Sampling;
Deep Learning: Artificial Neural Networks, Convolutional Neural Networks, Recurrent Neural networks with Long short term memory (LSTM), Deep Boltzmann machines;
Dimensionality Reduction: Principal component Analysis (PCA), Linear discriminant Analysis (LDA), Autoencoders;
Text mining: Natural Language processing;
Ensemble Learning: Random forests, Bagging, Stacking, Gradient Boosting;
K: fold cross Validation, A/B Tests, Out of bag sample estimate
Data Visualization: Tableau, Microsoft PowerBI, ggplot2, MatplotLib, Seaborn
Data modeling: Entity relationship Diagrams (ERD), Snowflake Schema
Big Data: Apache Hadoop, HDFS, Kafka, MapReduce, Spark
Cloud Technologies: AWS EC2, S3, Kinesis, Google Colab, Google Compute, Microsoft Azure
Business Intelligence Tools: Tableau, Power BI, SAP Business Intelligence
Other Tools: Spring Boot, Maven, Stata
EXPERIENCE:
Confidential, Estero, FL
Data Analyst (Python)
Responsibilities:
- Designed scalable ETL pipelines in the data warehousing platform by automating the data from multiple SFTP servers into Teradata using Informatica to develop quantifiable metrics and deliver actionable outputs.
- Migrated various data sources to AWS S3 and scheduled ETL jobs using AWS Glue to build tables in AWS.
- Athena by switching multiple sources in Tableau with strong attention to detail resulting in high performance.
- Created self-serve dashboard in Tableau by blending multiple data sources to develop insightful reports on Key Performance Indicators (K.P.I) for the Senior Management.
- Optimized existing process in MS Excel using Python and Tableau to deliver end- to- end business intelligence solutions across HR Operations, HR business partners and talent acquisition.
- Provided ad-hoc support to business requests from the customers with rapidly changing requirements by optimizing complex SQL queries using subqueries, indexing, stored procedures for day-day decision making.
- Responsible for maintaining scalability and manage data quality by building a Python script that parses 150GB historical data, from AS400 to pre-process unstructured data to Teradata.
- Developed optimization model for fleet to put the right cars Confidential right place Confidential right time and Confidential right price.
- Predicted competitor reaction to change in price tiers with Sequence to Sequence models in PySpark.
- Leading development and adoption of web app based planning tool for managing fleet across US.
- Drive positive business impacts such as a 3% increase in vehicle utilization and a 6% increase in average vehicles, through deriving strategic insights from highly varied data sources from Hertz's operations, logistics, reservations, and procurements.
- Understand how customers interact with various products, translate business problems into analytical processes with strong business sense and statistical knowledge, design, evaluate key metrics for products and enhance their performance to help make actionable decisions by working end-to-end projects including data manipulation, predictive modeling, and visualized reporting.
- Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
- Utilized various techniques like Histogram, bar plot, Pie-Chart, Scatter plot; Box plots to determine the condition of the data.
- Used various machine learning algorithms such as Linear Regression, Ridge Regression, Lasso Regression, Elastic net regression, KNN, Decision TreeRegressor, SVM, Bagging Decision Trees, Random Forest, AdaBoost, and XGBoost.
- Leverage a broad stack of technologies — Python, Docker, AWS, Airflow, and Spark to reveal the insights hidden within huge volumes of numeric and textual data.
- Designed and Developed reports, applied transformation for the Data Model, Data validation, established data relationships in Power BI and created supporting documentation for Power BI.
- Designed and trained a machine learning based gradient boost and general additive model with H2O framework in Python to forecast the number of reservations that can be cancelled to facilitate better planning of the supply chain.
- Analyzed the market share of Hertz and competitors in each market to devise strategies, optimize the supply chain and increase profitability.
- Created and developed the forecast model to predict demand of car rentals by hour Confidential each location using Prophet time series algorithm;
- Worked on Advanced SQL skills, fluent in R and/or Python, advanced Microsoft Office skills, particularly Excel and analytical platforms
- Improved the current forecast accuracy by 32% in R.
- Processed millions of rows and Performed in-depth analysis using Python and Oracle Analytics and generated insights to reduce new vehicle preparation time, and created a $15M additional revenue opportunity.
- Ensure that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.
- Monitor products' metrics by extracting large scale key data sets using SQL/R/Python/Hive/Spark, ensure data quality with the understanding of business ecosystems, thus empower exploratory analysis and improve customer experience.
- Apply quantitative modeling techniques such as machine learning algorithms and time series analysis to forecast demands.
- Build dashboards in Tableau that clearly display trends, seasonality, and movements of key metrics break down by segments, deliver simplified and interpretable data analysis to bring data-supported business solutions to executive- level leaders.
- Partner with operations/finance/procurement/revenue departments, identify potential risks and opportunities, present findings to the senior leadership, help them accomplish data-driven decision-making instead of using experience-based estimations.
- Perform Data Cleaning, features scaling, features engineering using pandas and NumPy packages in Python.
- Create Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Create various types of data visualizations using Python and Tableau.
- Communicated fleet plan and coordinated execution with teams in operations and revenue management.
- Experienced in developing, maintaining, automating, visualizing, and communicating analysis to senior leadership.
- Collaborated with cross functional technical teams in Revenue Management, Fleet Operations, Marketing to communicate and gather the required business metrics for successful process implementation.
Environment: (Python, Tableau, Informatica- ETL, Teradata, MS SQL Server, Oracle, AWS, MS Access, MS Excel).
Confidential, Florham Park, NJ
Data Analyst (Python)
Responsibilities:
- Deliver analytics and insight to address a wide range of business needs utilizing various secondary data sources (e.g., Sales, Rx, HCP/Patient/Payer-level data, Formulary data, quantitative research outputs, etc.) .
- Utilize MapReduce and PySpark programs to process data for analysis reports.
- Work on data cleaning to ensure data quality, consistency, and integrity using Pandas/Numpy.
- Perform data preprocessing on messy data including imputation, normalization, scaling, feature engineering etc. using Scikit-Learn.
- Conduct exploratory data analysis using Matplotlib and Seaborn. Maintain and monitor adherence program reporting. Design, experiment and test hypotheses. Apply advanced statistical and predictive modeling techniques to build, maintain and improve on real-time decision-making.
- Built classification models based on Logistic Regression, Decision Trees, Random Forest Support Vector Machine, and Ensemble algorithms to predict the probability of absence of patients.
- Implement and test the model on AWS EC2; collaborated with development team to get the best algorithm and parameters.
- Leverage appropriate advanced and sophisticated methods and approaches to synthesize, clean, visualize and investigate data as appropriate to deliver analytical recommendations aligned with the business need.
- Analyze disease diagnoses, phenotypic traits, patient demographics, and genetics for epidemiological studies.
- Utilize NLP applications such as topic models and sentiment analysis to identify trends and patterns within massive data sets.
- Analyze longitudinal time series data to characterize disease trajectories, disease progression, medication adverse event episodes, drug resistance, and disease comorbidities.
- Use Machine Learning to graph human biology based upon vast data sets. Run experiments, synthesize molecules then do Phase 1 and 2 trials.
- Work with bioinformatics colleagues to conduct integrative analyses of EMR data and genomics data to identify potential novel therapeutic targets, to develop predictive models for prognosis and treatment response, and to stratify patient populations for clinical trials.
- Worked on SQL DML queries
- Identify risks and opportunities that impacts the performance of the business and convert them into analytical solutions and provide appropriate actionable insights.
- Deliver analytics and insight to address a wide range of business needs utilizing various secondary data sources (e.g., Sales, Rx, HCP/Patient/Payer-level data, Formulary data, quantitative research outputs, etc.)
- Analyze disease diagnoses, phenotypic traits, patient demographics, and genetics for epidemiological studies.
- Work with bioinformatics colleagues to conduct integrative analyses of EMR data and genomics data to identify potential novel therapeutic targets, to develop predictive models for prognosis and treatment response, and to stratify patient populations for clinical trials.
- Alteryx was used for Data Preparation and Tableau for Visualization and Reporting. uild predictive models including Support Vector Machine, Decision tree, Naive Bayes Classifier, CNN and RNN basics to predict whether the thyroid cancer cell is under potential danger of spreading by using python scikit-learn.
- Explore and analyse the customer specific features by using Matplotlib and ggplot2. Extract structured data from MySQL databases, developing basic visualizations or analysing A/B test results.
- Leverage BI tools like Tableau Desktop to develop business dashboards enabling leaders for decision making and forecasting the number of credit card defaulters monthly.
- Organize reports and produced rich data visualizations to model data into human-readable form with the Tableau, Matplotlib and Seaborn to show the management team how prediction can help the business.
- Build predictive models including Support Vector Machine, Decision tree, Naive Bayes Classifier, CNN and RNN basics to predict whether the thyroid cancer cell is under potential danger of spreading by using python scikit-learn.
- Design and implement a recommendation system which leveraged Google Analytics data and the machine learning models and utilized Collaborative filtering techniques to recommend courses for different customers.
- Collaborate with data engineers and operation team to implement ETL process, write and optimize SQL queries to perform data extraction to fit the analytical requirements.
- Explore and analyse the customer specific features by using Matplotlib and ggplot2. Extract structured data from MySQL databases, developing basic visualizations or analysing A/B test results.
- Implement training process using cross-validation and evaluated the result based on different performance matrices.
- Leverage BI tools like Tableau Desktop to develop business dashboards enabling leaders for decision making and forecasting the number of credit card defaulters monthly.
- Organize reports and produced rich data visualizations to model data into human-readable form with the Tableau, Matplotlib and Seaborn to show the management team how prediction can help the business.
- Currently considering more factors such as dietary habits, work environment, mental state to explore the possibility of making improved predictions.
Environment: s: Python (Scikit-Learn/Keras/Scipy/Numpy/Pandas/ Matplotlib/Seaborn), Machine Learning (Linear and Non-linear Regressions, Deep Learning, SVM, Decision Tree, Random Forest, XGboost, Ensemble andKNN), MySQL, AWS RedShift, S3, Hadoop Framework, HDFS, Spark (Pyspark, MLlib, Spark SQL), Tableau Desktop and Tableau Server.
Confidential, Plano, TX
Data Scientist /Data Analyst
Responsibilities:
- Developed modules and components load NLP models and also to test pre-trained models against new use-case data.
- Implemented different models like Logistic Regression, Random Forest and Gradient-Boost Trees to predict whether a customer will churn or not.
- Built a Pre-Processing pipeline for handling English and Spanish text data to normalize text data (Tokenization, Lemmatization, Regular expressions to clean unwanted text (URLs, tagging general information - Dates, URL, Phone number etc.).
- Developed a wrapper for the pre-processing pipeline to fix bad Unicode in the text.
- Built a processor which identifies text based on similarity ratio and replaces it with tags.
- Used the Encoder-Decoder architecture (multi-head attention) mechanism to develop a translation model in the pipeline for translating any text other than English and then apply NLP models on the translated text.
- Translation module has the capability to continue re-training the model by adding more data for better context and translation.
- Tailoring visualizations for any data set.
- Time series forecasts (ARIMA).
- Applying a range of machine learning techniques (Random forests, GLM, PCA, Clustering, regression, Anomaly detection)
- Used LDA and Mallet LDA algorithms to identify topics in the customer-agent conversations to understand the customer intents (issues).
- Worked with linguist to analyze and identify sub-topics (intents and sub-intents) from the chat sessions to feed them as input to a labeling application.
- Labeled keywords and entities based on the knowledge from the knowledge from the linguist and the output from the Topic model
- Implemented topic refinement methods by finding the evolution of topics in each chat session and topic probabilities for utterances.
- To identify intents of the users and redirect the chat to the above knowledge group to get it solved.
- Using the data from the labeling application, built an intent classifier neural network model to predict the customer intents from the chat sessions and utterances.
- Built a pipeline using multiple NLP models in the framework to fit, predict, load and save the models and to be used as an API.
- Good Knowledge in NoSQL databases like HBase and MongoDB. Time Series Analysis -ARIMA, Neural Networks, Sentiment Analysis, Forecasting and Text Mining.
- Built predictive models including Support Vector Machine, Random Forests and Naïve Bayes Classifier using Python Scikit-Learn to predict the personalized product choice for each client.
- Designed and developed ETL packages using SSIS to create Data Warehouses from different tables and file sources like Flat and Excel files, with different methods in SSIS such as derived columns, aggregations, Merge joins, count, conditional split and more to transform the data.
- Experience in Hadoop ecosystem components like Hadoop MapReduce, HDFS, Hive, Sqoop, Pig, Flume including their installation and configuration.
- Validated the machine learning classifiers using AUC-ROC Curves and Lift Charts.
- Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
Environment: Python, Spacy, Regex, NLTK, Gensim, Pandas, Tensorflow, Keras, Matplotlib, Seaborn
Confidential, New Orleans, LA
Data Scientist/Data Analyst
Responsibilities:
- Gathering requirements from business and Reviewing business requirements and analyzing data sources.
- Performed Data collection, Data Cleaning, features scaling, features engineering, validation, Visualize, interpret, report findings, and develop strategic uses of data by python libraries like NumPy, Pandas, SciPy, Scikit-Learn.
- Involved with Recommendation Systems such as Collaborative filtering and content-based filtering.
- Studied and implemented Fraud detection models to monitor the unconventional purchases from customer bases and alert them with updates.
- Worked with Credit Analysis, Risk modeling algorithms to implement in customer acquisition strategies into the real time business.
- Implemented various statistical techniques to manipulate the data like missing data imputation, principle component analysis, sampling and Confidential -SNE for visualizing high dimensional data.
- Worked with Customer Churn Models including Random forest regression, lasso regression along with pre-processing of the data.
- Created a text classification model using RNN and LSTM with TensorFlow.
- Explored and visualized the data to get descriptive statistics and inferential statistics for better understanding the dataset.
- Built predictive models including support Vector Machine, Decision tree, Naive Bayes Classifier, Neural Network plus ensemble methods of the models to evaluate how the likelihood to recommend of customer groups would change in different set of service by using pythonscikit-learn.
- Implemented training process using cross-validation and test sets, evaluated the result based on different performance matrices and collected feedback and retrained the model to improve the performance.
- Performed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
- Configured SQL database to store Hive metadata.
- Loaded unstructured data into Hadoop File System (HDFS).
- Customer segmentation based on their behavior or specific characteristics like age, region, income, geographical location and applying Clustering algorithms to group the customers based on their similar behavior patterns.
- The results from the segmentation helps to learn the Customer Lifetime Value of every segment and discover high value and low value segments and to improve the customer service to retain the customers.
- Used Principal Component Analysis and Confidential -SNE in feature engineering to analyze high dimensional data.
- Analyzed and implemented few research proofs of concept models for Real time fraud detection over credit card and online banking purchases.
- Worked with Credit Analysis, Risk modeling algorithms to implement in customer acquisition strategies into the real time business.
- Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, time, Date and Time etc. Integrating with external data sources and APIs to discover interesting trends.
- Involved in various pre-processing phases of text data like Tokenizing, Stemming, Lemmatization and converting the raw text data to structured data.
- Predicted potential credit card defaulters with 82% accuracy with Random Forest.
- Provide expertise and consultation regarding consumer and small business behavior score modeling issues and gives advice and guidance to risk manager using the models in strategies.
- Participate in strategically-critical analytic initiatives around customer segmentation, channel preference and targeting/propensity scoring.
- Build customer journey analytic maps and utilize NLP to enhance the customer experience and reduce customer friction points.
- Personalization, Target Marketing, Customer Segmentation and profiling.
- Performed Data Cleaning, features scaling, featurization, features engineering.
- Used Pandas, NumPy, SciPy, Matplotlib, Seaborn, Scikit-learn in Python Confidential various stages for developing machine learning model and utilized machine learning algorithms such as linear regression, Naive Bayes, Random Forests, Decision Trees, K-means, & KNN.
- Implemented number of Natural Language process mechanism for Chart Bots.
- Customer segmentation based on their behavior or specific characteristics like age, region, income, geographical location and applying Clustering algorithms to group the customers based on their similar behavior patterns.
- The results from the segmentation helps to learn the Customer Lifetime Value of every segment and discover high value and low value segments and to improve the customer service to retain the customers.
- Performed Clustering with historical, demographic and behavioral data as features to implement the Personalized marketing that offers right product to right person Confidential the right time on the right device.
- Used Principal Component Analysis and Confidential -SNE in feature engineering to analyze high dimensional data.
- Addressed overfitting and underfitting by tuning the hyper parameter of the algorithm and by using L1 and L2 Regularization.
- Used Spark's Machine learning library to build and evaluate different models.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
- Applied image processing techniques for general beautification and for computer vision purposes and its image processing toolkit using 3D matrices and manipulating individual pixel values.
- Worked in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
- Completed detailed sentiment analysis using SPSS Modeler Premium (Text Analytics).
Confidential, Cincinnati, OH
Data Scientist
Responsibilities:
- Extracted, transformed and loaded data from multiple data stores into HDFS using Sqoop.
- Used Spark Streaming API to collect real time transactional data.
- Used Python 3.6 programming for handling various datasets and preparing them for further analysis.
- Carried out Statistical Analysis such as Hypothesis and Chi-square tests using R 3.4.
- Initial models were built using supervised classification techniques like K-Nearest Neighbor (KNN), Logistic Regression and Random Forests with Principal component analysis to identify important features.
- Built models using K-means clustering algorithm to create user groups.
- Generated PL/SQL scripts for data manipulation, validation and materialized views for remote instances.
- Reviewed basic SQL queries and edited inner, left, and right joins in Tableau Desktop by connecting live/dynamic and static datasets.
- Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions and Triggers using SQL and PL/SQL.
- Wrote Python scripts to parse XML documents and load the data in database.
- Developed Python scripts to clean the raw data.
- Developed live reports in a drill down mode to facilitate usability and enhance user interaction.
- Query Data from Hadoop/Hive & MySQL data sources to build visualization in Tableau.
- Facilitated the automation process for the Delinquency Report -This report was required to run on a monthly basis.
- Validated regulatory finance data and created automated adjustments using advanced SAS Macros, PROC SQL and various reporting procedures.
- Developed statistical reports with Charts, Bar Charts, Box plots, Line plots using PROC SGPLOT, PROC GCHART and PROCGBARLINE.
- Extensive use of Proc freq, Proc Report and Proc Tabulate for reporting purposes.
- Designed and developed various analytical reports from multiple data sources by blending data on a single worksheet in Tableau Desktop. Involved in creating Tree Map, Heat maps and background maps.
- Involved in generating dual-axis bar chart, Pie chart and Bubble chart with multiple measures and data blending in case of merging different sources.
- Tested dashboards to ensure data was matching as per the business requirements and if there were any changes in underlying data.
- Created reports using analysis output and exported them to the web to enable the customers to have access through Internet.
Confidential, Mahwah, NJ
Data Scientist/Data Analyst
Responsibilities:
- Collaborated with database engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction and merging from SQL server database.
- Gathered, analyzed, and translated business requirements, communicated with other departments to collected client business requirements and access available data.
- Responsible for Data Cleaning, features scaling, features engineering by using NumPy and Pandas in Python.
- Conducted Exploratory Data Analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlation between features.
- Used information value, principal components analysis, and Chi square feature selection techniques to identify.
- Applied resampling methods like Synthetic Minority Over Sampling Technique (SMOTE) to balance the classes in large data sets.
- Designed and implemented customized Linear regression model to predict the sales utilizing diverse . sources of data to predict demand, risk and price elasticity.
- Experimented with multiple classification algorithms, such as Logistic Regression, Support Vector .
- Machine (SVM), Random Forest, AdA boost and Gradient boosting using Python Scikit-Learn and evaluated the performance on customer discount optimization on millions of customers.
- Used F-Score, AUC/ROC, Confusion Matrix and RMSE to evaluate different model performance .
- Performed data visualization and Designed dashboards with Tableau, and generated complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders.
- Used Keras for implementation and trained using cyclic learning rate schedule.
- Overfitting issues was resolved by batch norm, dropout helped to overcome the issue.
- Conducted in-depth analysis and predictive modelling to uncover hidden opportunities; communicate insights to the product, sales and marketing teams.
- Built models using Python and Pyspark to predict the probability of attendance for various campaigns and events.
- Environment: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tableau, SQL, Linux, Git, Microsoft .
- Excel, PySpark-ML, Random Forests, SVM, Tensor Flow, Keras .
Environment: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tableau, SQL, Linux, Git, Microsoft Excel, PySpark-ML, Random Forests, SVM, Tensor Flow, Keras.
Confidential, McLean, VA
Data Scientist/Data Analyst
Responsibilities:
- Performed Data Profiling to learn about behavior with various features such as location, time, Date and Time etc. Integrating with external data sources and APIs to discover interesting trends.
- Built Machine Learning models to identify fraudulent applications for loan pre-approvals and to identify fraudulent credit card transactions using the history of customer transactions with supervised learning methods.
- Compiled data from various sources public and private databases to perform complex analysis and data manipulation for actionable results
- Proficient in design and development of various dashboards, reports using visualizations like bar graphs, scatter plots, pie-charts, geographic visualization and other, making use of actions, other local and global filters according to the end-user requirement.
- Performed Data Cleaning, features scaling, featurization, features engineering.
- Experience in developing reports & dashboard using Qlik View/Sense/Nprinting.
- Used Pandas , NumPy, SciPy, Matplotlib, Seaborn, Scikit-learn in Python Confidential various stages for developing machine learning model and utilized machine learning algorithms such as Logistic regression,Random Forests, Decision Trees, XG Boost to build predictive models.
- Involved in various pre-processing phases of text data like Stemming, Lemmatization and converting the raw text data to structured data.
- Expertise in Data Warehouse/Data mart, OLTP and OLAP implementations. Involved in project scope, Analysis, requirements gathering, data modeling
- Worked with data analysis using Matplotlib and seaborne libraries to do data visualizations for better understanding.
- Implemented number of Natural Language process mechanism for text analysis.
- Customer segmentation based on their behavior or specific characteristics like age, region, income, geographical location and applying Clustering algorithms to group the customers based on their similar behavior patterns.
- The results from the segmentation helps to learn the Customer Lifetime Value of every segment and discover high value and low value segments and to improve the customer service to retain the customers.
- Developed dashboards for Vertica data analysis using Reporting tools like Qliksense and Qlikview
- Performed Clustering with historical, demographic and behavioral data as features to implement the Personalized marketing that offers right product to right person Confidential the right time on the right device.
- Implemented Principal Component Analysis(PCA) in feature engineering to analyze high dimensional data.
- Created and designed interactive applications in Qlik sense for data visualization and analytics
- Used cascading model which helps to give good performance with credit card data.
- Used confusion matrix and log loss to validate the model performance.
- Addressed overfitting and underfitting by tuning the hyper parameter of the algorithm and by using L1 and L2 Regularization .
- Used Spark's Machine learning library to build and evaluate different models.
Environment: Python, Seaborn, Sci-kit learn, Keras, TensorFlow, Machine learning libraries, NLP, NN, Linux, Windows, Google cloud, Flask, Jupyter, Statistical Analysis.