Artificial Intelligence Machine Learning Engineer Resume
Jersey City, NJ
SUMMARY
- Adroit & dynamic Artificial Intelligence Engineer - Machine learning Engineer, Data Analyst and Patent Analyst with around 4 years’ experience in the field of Data Science and AI.
- An effective communicator with excellent relationship management and strong analytical skills, statistical knowledge, problem solving & organizational abilities along with passion for using data to create positive customer experiences.
- Proven skills as a team member to work in sync with the corporate objectives & motivating them for achieving business and individual goals.
TECHNICAL SKILLS
Artificial Intelligence and Machine Learning: Data cleaning, Preprocessing, Data Manipulation, Linear Regression, Logistic Regression, Decision Trees, Random Forest, Bagging and Boosting, Deep learning, Computer Vision- CNN, Clustering, Natural Language Processing, Azure Machine Learning Studio, Azure AutoML, Dockers containerization and Kubernetes, Python, ML Ops SQL, Explainable AI, SHAPLEY, tree explainer, XGBoost Regressor, Azure Cognitive services, Computer Vision, Custom vision, OCR, Text Analytics, Content Moderator, Image and Video Processing, Azure LUIS, Semantic Analysis, REST API, Bot Framework, Bot Emulator, ML Ops CI/CD.
Statistical Method: Predictive Analysis, Hypothesis Testing, Principal Component Analysis, Dimensionality Reduction, Market basket Analysis
Programming Language: Python - Numpy, Pandas, Scikit, Matplotlib, Seaborn, Tensorflow, Keras, Artificial Neural Network, Convolutional Neural Network, PyTorch, LSTM, SQL, Generative AI-ChatGPT, Prompt Engineering.
Software: Jupiter Notebook, Google Colabratory, PyCharm, Spark, Tableau, R, Minitab, MS Excel, MS Powerpoint, SSMS - SQL, MS SQL, GCP, Vertex AI, Python, PyTorch, Kubernetes, Dockers, Power BI, Power Big Query Editor, Azure Machine Learning Studio, Azure Designer, Azure auto ML, Git, GitHub
PROFESSIONAL EXPERIENCE
Confidential, Jersey city, NJ
Artificial Intelligence Machine Learning Engineer
Responsibilities:
- Developed and performed end to end logic, design, and implementation for PoC of Inventory purchase forecast to reduce food wastage using Machine Learning solution on cloud platform.
- Communicate with stakeholders’ business VP, Product manager, Client to understand challenges in business which can be resolved with ML based solutions.
- Retrieved/fetched data from multiple databases, understand the relations between the different entities, stored procedures.
- Extract data using web scraping and integrate it with dataset.
- Performed data analysis, data visualization to understand the trends and pattern, anomaly in the data.
- Designed, built, and trained ML- Artificial intelligence models such as Linear Regression, Decision Tree, Random Forest Regressor, Boosted Decision Trees regressor, CatBoost, LightGBM, XGBoost Regressor, Time series forecasting -ARIMA, ARIMAX
- Performed hyperparameters tuning using Grid Search and Randomized Search cross validations to optimize model performance.
- Experienced in working with AI client for utilizing Azure cognitive microservices for Computer Vision use cases like image tag/ analytics, content moderation, Text extraction (OCR), face recognition.
- Experienced working with Azure LUIS APIs for language detection, phrase extraction, entity recognition
- Experienced and working different data structures
- Experienced in design and building azure chatbot using Bot Framework and test it using bot emulator.
- Experienced working on prompt engineering for generative AI for data synthesis, images, and video creation /augmentation
- Delivered ML classification solution for Student retention problem for higher education
- Designed, build, and implemented classification models like random forest classifier, Artificial Neural network classifier, Multiclass boosted decision tree.
- Experienced in optimization of hyperparameters tuning to obtain best model, best model is evaluated for a metrics
- Experienced in deploying best model as a Kubernetes service.
- Project: Crum and Froster (Insurance)
- Delivered end to end predictive AI solution for Insurance charges predictions
- Experienced working with big data, data cleaning, scrubbing, data preprocessing, feature engineering
- Designed, build, and implement Regression models like Decision tree regressor, Random Forest, linear regressor for Insurance charges predictions
- Experienced in deploying the model as ACI
- Experienced working on end-to-end energy price prediction POC.
- Worked on Exploratory data analysis, Feature Extraction/Engineering, data wrangling, design, build and optimize the ML predictive model.
- Create business level dashboard report on the results achieved with best Machine Learning model.
- Worked on interpreting the results with Explainable AI tools like Shapely Values, Explainable Graphs and charts for manifesting the business impact of the use case.
- Knowledge on Big Data frameworks and visualization tools (Hadoop, Spark, Tableau)
- Experienced in building web services using framework like Flask
- Experienced in using prompt engineering for generating /synthesizing data from Open AI platform
- Experienced building Chat Bot and integrate it with business application
- Experienced using multiple Azure Cognitive microservices APIs for business applications like OCR for extracting text from Pdf forms, face detection and recognition, NLP- sentiment analysis, NLU- language identification analysis, text analytics- image to text, speech to text, NLU- Azure LUIS for identifying the utterance and intent and entities from text and images.
- Deployed machine learning pipelines as container instance / kubernetes service on cloud platform for model with best results and consumed inference pipeline predictions with REST API endpoint.
- Created business level dashboard reports for predictions generated using ML models
- Visualize ML model predictions for Explainable AI using SHAP values, SHAP library.
Environment: and tools: Python, SQL, Jupyter Notebook, Pycharm, TensorFlow,Pytorch, Keras, NumPy, Pandas, Azure Machine Learning Studio, Azure SDK, Power BI, SQLite, Matplotlib, Seaborn, R, Azure Machine Learning studio, Azure Congitive Services, OCR, Computer Vision, Custom Vision, Auto ML, Azure Cognitive Microservices, Azure Computer Vision, Explainable AI, Statistical Analysis, Hypothesis Testing, REST API, Git, GitHUB, PyTorch, ggplot2, dplyr, Hiplot, Tableau, Kubernetes, Azure AI, poweshell, Generative AI
Confidential
Data Science Intern
Responsibilities:
- Project: Developing Employee Attrition Prediction Model for Organization and develop an application for the employee attrition prediction.
- Performed data cleaning, descriptive analysis to get the statistical summary and visual representation of the data to study the pattern.
- Experienced in an exploratory data analysis (EDA), statistical analysis, market basket analysis and data preprocessing.
- Performed visualization of target variable vs categorical variable and target variable vs numerical variables.
- Find the probability distribution of data by performing hypothesis testing.
- Build various types of models like Logistic Regression, Decision tree, Random Forest, Support Vector Machine,
- Worked on hyperparameter tuning for model optimization and evaluate the model performance
- Comparing the model adequacy parameters and accuracy parameters to determine the best model for the data
- Deployed the model on production environment as Web application using flask Framework
- Used the predictive model for business improvement purpose
- Skills and Tools: Python, Numpy, Pandas, sklearn, scipy, scikit, matplotlib, seaborn, data cleaning, data preprocessing, Statistical Analysis, Mann-Whitney Test, Binom test for Confidence Interval, Multi-linear regression, ridge regression, Lasso regression, Logistic Regression, Decision tree, Random Forest, XGBoost regressor, Market basket analysis, Extra tree regressor, Support Vector Classifier, Cross validation, data pipeline, Dockers containerization, Kubernetes, Flask.
Environment: s: Python, R, Azure ML, Jupyter, Tensorflow, Keras, Matplotlib, Seaborn, Kubernetes, Flask framework, REST API.
Confidential
PGP- AIML/AI Consultant
Responsibilities:
- Supervised Learning - Linear Regression:
- Performed data cleaning, data preprocessing and exploratory data analysis.
- Built linear regression model that can effectively predict the price of used cars to help the business in devising profitable strategies using differential pricing. The model performance is evaluated by root mean squared error, mean absolute error and r2 score.Skills and Tools: EDA, Linear regression, Linear regression assumptions, business insights and suggestions.
- Logistic Regression- AllLifeBank Personal Loan Campaign Modelling:
- An exploratory data analysis is performed to find the trends in data.
- Built decision tree classifier model. To solve the issue of class imbalance, class weight was applied.
- Grid search tuning was used to obtain optimal hyperparameters.
- Post pruning method was applied to get the best model.
- Post pruning was applied to achieve maximum recall for the decision tree.
- Generalized model was achieved with recall score of 98.6% which had the minimum overfitting (train recall = 99%).
- Graphs of important features were plotted which helped the marketing department to identify the potential customers with a higher probability of purchasing the loan.Skills and Tools: EDA, Data Pre-processing, Logistic regression, finding the optimal threshold using AUC-ROC curve, Decision trees, Pruning. Ensemble Technique- Travel Package Purchase Prediction:
- Travel company dataset was used to analyze the customers information.
- Ensemble model stronger learner) was built by using weak learner models to predict the potential customer who will purchase the newly introduced package.
- Multiple bagging and boosting models were generated with different hyperparameters tunings.
- XGBoost Tuned had highest test recall score (77%) with test accuracy of 88% and least overfitting on train data.
- Skills and Tools: EDA, Data Preprocessing, Customer Profiling, Bagging Classifiers - Bagging and Random Forest, Boosting Classifier - AdaBoost, Gradient Boosting, XGBoost, Stacking Classifier, Hyperparameter Tuning using GridSearchCV, and Business Recommendations.Feature Selection, Model Selection and Tuning- Bank credit card churn:
- All different types of models such as logistic regression, random forest, gradient boost and XGBoost models were built.
- Applied regularization techniques to the models.
- Data was undersampled and oversampled to achieve the best model.
- Model hyperparameters tuning was performed by using gridsearch CV and randomized search CV to achieve the optimal hyperparameters. Used XGBoost model tuned by Randomized SearchCV to achieve the accuracy of 83% and recall of 100% on train and validation set.
- After achieving final model, pipeline was used to put the model into production.
- Skills and tools: Cross validation, up and down sampling, regularization, Pipelines and hyperparameter tuning
- Unsupervised Learning (Clustering)-AllLife Bank Credit Card Customer Segmentation:
- To provide recommendations to bank on how to better market and service these customers, Used KMeans clustering, Hierarchical Clustering, and agglomerative Clustering to perform market segmentation in existing customers, based on their spending patterns and past interactions with the bank.
- For KMeans clustering, silhouette coefficient is used to decide the number clusters present in customers.
- For hierarchical clustering, cophenetic coefficient is used to identify the segments.
- The cluster profiling is performed to achieve the customer segments.
- Skills and Tools: EDA, Clustering (K-means and Hierarchical), Cluster Profiling.
- Deep Neural Network- Bank Customer Churn Prediction: To identify the customers that are more likely to churn by building an artificial Neural. Giving different class weights also helped find more errors in minor class, which was class of interest in this case. With this class weights (0:10, 1:90), I have achieved the best score, recall: 87.5% and accuracy: 67.4%. For this project, the recall was measure of evaluation. Resampling techniques are applied to balance the data and then train the model, the accuracy received after training and testing the undersampled data is 83.7% and recall score is 66.7%
- Skills and Tools: Tensorflow, Keras, ANN, Google colab.
- Computer Vision - Image Classification using CNNs: The image preprocessing techniques are applied such as channel resizing, guassian blurring, image resizing, image normalization, one hot coding. OpenCV library is used for image processing. The computer vision model is built to identify the plant seedlings species from 12 different species using a convolutional neural network with the help of keras, Convolutional Neural Network. The regularization methods are used such as adding drop out layers and filters are used. Model tuning is done by using early stopping method and model checkpoint. kill and Tools: Keras, CNN, Working with Images- Gaussian Blurring and converting images to grayscale, OpenCV.
Confidential
Data Analyst /Junior Data Scientist
Responsibilities:
- Assist both technical and business stakeholders for Oil, gas and Aviation domain
- Evaluated existing procedures and methods, identify and document items such as database content, structure, application subsystemsSynthesized data into meaningful conclusions and actionable recommendations
- Coordinated with UAT (User acceptance Testing)
- Designed and developed Tableau visualization solutions.
- Mastered the ability to design and deploy rich Graphic visualizations with Drill Down and Drop-down menu option and Parameters using Tableau.
- Experienced working on Oil and gas commodities predictive AI.
- Design, build and validated Crude oil production prediction ML models and optimized the hyperparameters for best model.
- Developed data visualization using Cross tabs, Histogram, Heat maps, Box and Whisker charts, Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart in Tableau.
- Worked extensively with Advance analysis Actions, Calculations, Parameters, Background images, Maps, Trend Lines, Statistics, and table calculations.
- Identify & clean the noise from the structured and unstructured data collected from various sources.
- Build and train Machine Learning models to process videos, image, text and evaluate Deep Learning models performance (ANN, CNN), NLP (Word2vec, Glove, Tf-IDF matrix, Tenserflow, BERT).
- Worked on Bank transaction fraud detection prediction using Logistic regression, decision tree. Pruning techniques are used to overcome the issue of overfitting.
- Performed text preprocessing (lemmitization, stemming, tokenization, contractions removal, special characters, and html tag removal), word embedding is done using TF-IDF vectorization and random forest model is build and model is tuned using optimal base learner. The performance of model is evaluated and top words are found for positive and negative sentiments.
Environment: s: SQL, Databases, Python, Jupyter, Tableau, Power BI, Tensorflow, Keras, Matplotlib, Seaborn, Kubernetes
Confidential
Patent Analyst
Responsibilities:
- Perform patent searching, patent analysis, patent mapping and report preparation.
- Preparing various kinds of reports like Prior-art Search, Patent ability Search, Infringement Search, Invalidity Search, Patent Map, Landscaping of the patents on Metals extraction, production, and heat treatment processes.
- Deliver good quality reports within deadlines.
- Utilize various patent searches and mapping tools like Micropat, Delphion, Q-Pat, Patbase, Dervent innovation index, Total patent, and Thomson innovation.
- Make use of scientific literature search tools like Scifinder, Ekaswa CDs and other non-IP search engines.
Confidential
Metallurgical Engineer
Responsibilities:
- Perform precipitation hardening heat treatment of AlSiCu alloy and study effect of temperature and time on mechanical properties.
- Perform cold rolled cladding of Al alloy and stainless steel to study the improvement in mechanical properties.
- Landscaping of the patents on Metals extraction, production, and heat treatment processes.
- Deliver good quality reports within deadlines.
- Utilize various patent searches and mapping tools like Micropat, Delphion, Q-Pat, Patbase, Dervent innovation index, Total patent, and Thomson innovation.
- Make use of scientific literature search tools like Scifinder, Ekaswa CDs and other non-IP search engines. Performed text preprocessing (lemmitization, stemming, tokenization, contractions removal, special characters, and html tag removal), word embedding is done using TF-IDF vectorization and random forest model is build and model is tuned using optimal base learner. The performance of model is evaluated and top words are found for positive and negative sentiments