Machine Learning Engineer Resume
SUMMARY
- 14 + years of professional IT experience which includes around 3 years of experience in developing Machine Learning Models and 2 years of experience in developing applications in Big Data ecosystems.
- Performed Univariate Analysis and analyzed Descriptive Statistics like Mean, Median, Mode, Range, Standard Deviation, Variance and Missing data Treatment, Outlier Detection and Treatment, Normality Check with Skewness and Kurtosis, Presented the results on Histograms, Box Plots etc.,
- Performing Statistical Modelling, Data Extraction, Data Screening & Cleaning, Data Exploration, Feature Selection, Feature Engineering and Data Visualization, Linear and Logistic Modelling, Dimensionality Reduction and implementing Machine Learning algorithms on large - scale to deliver resourceful insights and inferences targeted towards boosting revenues and enriching customer experiences.
- Experienced in solving Classification and Regression problems using Machine Learning algorithms such as Linear Regression, Logistic Regression, Decision Trees, Random Forests, Support Vector Machines, Naïve Bayes etc., and optimizing the performance using Ensemble Learning techniques like Bagging, Boosting and Slacking.
- Experienced in handling petabytes of data using Apache Spark (Pyspark). Hands on experience in using various spark APIs like Spark SQL, Spark Streaming, Spark Mllib, Spark ML and GraphX. Experience in working with different data structures of spark like DataFrames, RDDs, Datasets.
- Worked extensively on Spark SQL DataFrames, performed basic DataFrame operations, created UDFs, worked on Caching, Persisting and Repartitioning the DataFrames.
- Experience in Text Mining and Natural Language Processing techniques like Topic modelling, tokenizing, stemming and lemmatizing. Used Natural Language Processing Toolkit (NLTK) and Spacy.
- Proficient in evaluating and analyzing the performance of model with Cross validation, ROC-AUC, Confusion Matrix, Precision, Recall, Log-Loss, MSPE, MSAE, R Square, F1-Score and creating Performance reports using tables and visualizations.
- Expertise in solving problems like Underfitting, Overfitting, Poor-Quality data, Irrelevant features, Insufficient data, Non-representative data using Data mining, predictive modelling, Data Cleaning, Data Screening, Feature Selection, Feature Engineering, Model Building, Evaluating, Optimizing Hyperparameters with GridSearch and Random Search.
- Experience with Python libraries including NumPy, Pandas, SciPy, Scikit-learn, Spacy, Plotly, Matplotlib, Seaborn, Theano, TensorFlow, Keras, NLTK and R modules like ggplot2.
- Expertise in using big data frameworks such as Hadoop, Spark, PySpark, Spark SQL, Hive, Pig, Apache Kafka, Sqoop, Oozie, Apache Atlas, Flume, Storm, YARN, and NoSQL databases such as Cassandra and MongoDB.
- Expert at Web Scraping using python libraries like Scrapy, BeautifulSoup, Urllib, Regular Expressions etc.,
- Knowledge on CI/CD using Jenkins, Ansible, Kubernetes, Mesos, OpenStack for deployment of models.
- Execute the deployed models through oozie scheduler.
- Experience in analyzing data using Hadoop Ecosystem including HDFS, Hive, Spark, Spark Streaming, Nifi, Elastic Search, Kibana, Kafka, HBase, Zookeeper, Sqoop, Flume.
- Expertise in Writing, Configuring, Deploying Spark Applications on a Hadoop Cluster.
- Experience in creating Spark Scala jars using Intellij IDE and executing them.
- Experience in writing queries for moving data from HDFS to Hive and analyzing data using Hive-QL.
- Very good Knowledge of Partitions, bucketing, Join optimization, Query optimization concepts in Hive.
- Effective Team Player with excellent Inter-Personal Skills, Communication and Listening Skills and committed to deliver the Projects on Schedule.
TECHNICAL SKILLS
Regression Methods: Linear, Multiple, Polynomial, Decision trees and Support vector;
Classification: Logistic Regression, K-NN, Naïve Bayes, Decision trees and SVM;
Deep Learning: Artificial Neural Networks, Convolutional Neural Networks, Recurrent Neural networks with Long Short-Term memory (LSTM), Mask R-CNN;
Dimensionality Reduction: Singular Value Decomposition (SVD), Principal component Analysis (PCA), Linear discriminant Analysis (LDA);
Text Analytics/Natural Language Processing: Stemming, NLTK, Spacy, TFIDF, Word2Vec, Doc2Vec, Topic Modelling;
Ensemble Learning: Random forests, Bagging, Stacking, Gradient Boosting.
Scripting Languages: Python (NumPy, Pandas, Scikit-Learn, TensorFlow, re, pickle, Seaborn, Flask, Matplotlib, OPenCV, Tensorflow, Keras, Pytorch), R, Java, Scala, C#, C++, SQL,AngularJS.
ETL: Hadoop (Sqoop/Hive)
Web Scraping: BeautifulSoup, Scrapy, Selenium, Urllib, requests, Regular Expressions
Databases: MongoDB, Cassandra, MySQL, PostgreSQL, Oracle, Microsoft SQL Server, Amazon Dynamo DB, Redshift.
Data Visualization: Tableau, ggplot2, Plotly, Matplotlib, Seaborn.
Big Data Tools: Apache Hadoop, Hive, Spark, Sqoop, Oozie, Pig, Kafka, Flask, Atlas, HDFS, YARN, Zeppelin Notebook.
Cloud Services & VCS: Google Kubernetes Engine, Docker, AWS (EC2), Microsoft Azure, Git, GitHub, Bitbucket.
PROFESSIONAL EXPERIENCE
Confidential
Machine Learning Engineer
Responsibilities:
- Extracted data from Denodo using pyspark. Performing various text pre-processing techniques like converting to lower case, removing punctuations and stop-words, converting number to words, removing white spaces and special characters, stemming and lemmatization using nltk library.
- Focused on Natural Language Processing techniques and used NLP methods for information extraction, topic modelling and parsing to explore trends in the customer contention data.
- Worked with text feature engineering techniques n-grams, TF-IDF, word2vec etc.
- Applied various Classification models such as Naïve Bayes, Logistic Regression,Random Forests, Support Vector Classifiers,Stochaistic Gradient Descent, RNN, LSTM from scikit-learn and Keras library.
- Addressed Overfitting and Underfitting by using K-fold Cross Validation.
- Performed Confusion Matrix and Classification report to evaluate accuracy and performance of different models used. Evaluated the model’s performance using various metrics like Precision, Recall, F-Score, AUC-ROC, Cross Validation to test the models with different batches of data to optimize models.
- Applied and Tried manual Hyper-parameter tuning using Grid Search to get better performance.
- Performed Confusion Matrix and Classification report to evaluate accuracy and performance of different models used. Evaluated the model’s performance using various metrics like Precision, Recall, F-Score, AUC-ROC, Cross Validation to test the models with different batches of data to optimize models.
Confidential
Machine Learning and Big Data Engineer
Responsibilities:
- Performed Variable Identification and checked for percentage of Missing Values, Data Types, Outliers etc.,
- Performed Univariate Analysis and analyzed Descriptive Statistics like Mean, Median, Mode, Range, Standard Deviation, Variance and check for Missing data, Detect Outliers, Normality Check with Skewness and Kurtosis, Presented the results on Histograms, Box Plots etc.,
- Performed Bivariate analysis using Correlation and Inferential Statistical tests like Z-test, T-test, Chi-Square, ANOVA to Check Multicollinearity and Singularity and presented the results using scatter plots, bar charts, line charts etc.,
- Performed Outlier Detection and Treatment in Python using different techniques like Median Absolute Deviation (MAD), Minimum Covariance Determinant, Histograms and Box plots.
- Performed Feature Engineering such as Missing Value Imputation, Normalization and Scaling, Outliers Detection and Treatment, One-Hot-Encoding, Splitting Features and used Label Encoder to convert categorical variables to numerical values using python scikit-learn library.
- Performed Exploratory Data Analysis (EDA) to visualize through various plots and graphs using matplotlib, NumPy, Pandas, Scikit-learn and seaborn libraries of python, and to understand and discover the patterns on the Data. Calculated Pearson Correlation Coefficient to deal with Multicollinearity.
- Applied various Classification models such as Naïve Bayes, Logistic Regression,Random Forests, Support Vector Classifiers, from scikit-learn library and improved performance of the model by using various Ensemble learning like Random Forests, XGBoost and Gradient Boosting using Scikit-learn
- Addressed Overfitting and Underfitting by using K-fold Cross Validation.
- Applied K-means clustering to look for churn patterns among customers based of various features.
- Performed Confusion Matrix and Classification report to evaluate accuracy and performance of different models used. Evaluated the model’s performance using various metrics like Precision, Recall, F-Score, AUC-ROC, Cross Validation to test the models with different batches of data to optimize models.
- Applied and Tried manual Hyper-parameter tuning using Grid Search to get better performance. to train the model. Created 3 node cluster using spark and Applied different Transformations and Actions on spark. Used different spark APIs like Spark SQL to create SparkDataFrames, spark.ml and spark.mllib to create machine learning models using spark.
- Worked on Caching, Persisting and Repartitioning the DataFrames.