Ml/research Engineer Resume
Franklin Lakes, NJ
SUMMARY
- High performing expert with over 7 years of hands - on experience in data munging, machine learning, Artificial Intelligence and operations research to offer solid skills in the field of data science and big data analytics
- Always passionate for new challenges and continuous pushing the limits of expertise
- Experienced in working with Business Users/Product owners/Stakeholders to deliver BI solutions, Self-service reports, Statistical and predictive analysis
- Managed multiple projects involving complex big data projects, machine learning, text mining, ETL, BI, data governance, data quality, security & compliances, industry standard best practices
- Worked and extracted data from various database sources like Oracle and SQL Server
- Experience in most Data scrubbing/data techniques - Feature selection, One-hot encoding, binning, Normalization, Standardization & handling Missing data
- Deep understanding of statistical concepts & techniques like Probability, Likelihood, Hypothesis testing, A/B Testing, Interpreting p-values, t-tests, ANOVA, ARIMA
- Unearthed the raw data by doing the Exploratory Data Analysis (Classification, Splitting, Cross-validation) by using Machine Learning packages like Pandas and NumPy in Python
- Utilized various techniques like Histogram, Bar plot, Pie-Chart, Scatter plot, Box plots to determine the condition of the data using Matplotlib, Seaborn and plotly packages.
- Good Knowledge of Big Data techniques like MapReduce, Hadoop, Hive, s, etc.
- Data Ingestion and automation of the movement of data securely between disparate data sources and systems. Real-time dataflow management, streaming analytics, integrating data lakes and Data demand Planning.
TECHNICAL SKILLS
Data Science Toolbox: R, Python 2.7/3.0, SAS, MATLAB, Jupyter notebooks, Spyder, Visual Code, PyCharm, TensorFlow, Keras
Database: Oracle, MS Access, Microsoft SQL Server 2012/2014, Hive, MongoDB
Machine Learning: Classification, Clustering and Regression Techniques, Supervised and Unsupervised Learning, Time Series and Forecasting, Deep Learning (ANN, CNN, RNN, LSTM), Neural Networks, Deep Learning, NLP, NLTK, Image processing, Computer Vision
Data Modeling Tools: Erwin, ER Studio, Snowflake-Schema Modeling, FACT and Dimension Tables, Pivot Tables
BI Tools: Tableau 10.5, Power BI, Kibana, Crystal Reports, plotly.io, seaborn, Matplotlib
Languages: SQL, PL/SQL, XML, R, Python, SQL, SQL Server, C++, JAVA, HTML, UNIX shell scripting
Applications: Toad for Oracle, Oracle SQL Developer, MS Word, MS Excel, MS Project, MS Power Point, Teradata
Big Data: Hadoop, Scala, PySpark, Hive, MapReduce, Sqoop, Oozie
Operating Systems: Microsoft Windows 9x / NT / 2000/XP / Vista/7/10 and UNIX, Mac OS
Methodologies: Agile, System Development Life Cycle (SDLC), Waterfall Model, CRISP-DM, Data science lifecycle
Cloud services: AWS - Sagemaker, Lambda, EC2, RDS, ALB/NLB, S3, Glue, Kinesis Firehose, GCP - Big Query, Auto ML, DataProc, DataFlow, Cloud Build, GKE
Integration Tools: DevOps, Dockers, Kubernetes, Jenkins, Ansible, PyMongo, API gateways, RESTful
PROFESSIONAL EXPERIENCE
ML/Research Engineer
Confidential, Franklin lakes, NJ
Responsibilities:
- Involved in gathering, scoping of business requirements, project goals with Stakeholders & managers while uncovering and defining multiple dimensions.
- Participated in continuous interaction with Data Engineering team for obtaining data, data sources, data demand planning and data quality.
- Delivered various executive level dynamic dashboard/KPI metrics which talks about user engagement & fraud analytics using Kibana
- Utilized big data tools for MLOps like GCP, Big Query, DataProc for streamlining data lakes. AutoML for automating the model building process
- Implement Clustering algorithm on customer database to run Classification algorithm to determine whether a transaction is Fraud or not.
- Built a basic k-means clustering model to investigate the underlying factors that defines the customers behavior, purpose of sale and transaction life cycle etc.
- Used Ensemble methods, k-fold data validations, parameter tuning, retraining & pipelined the models for customer behavior analysis, Data credibility assessment & Unusual transactions.
- Leveraged A/B Testing, cross validation techniques that helped detecting the false positives and scoring the performances
- Interacted prediction functionality of the models to the web development team using joblib, REST API, API Gateways and Flask
- Continuous Integration/Continuous Delivery (CI/CD) of scaled models to production level using Jenkins, and Ansible
- Followed Containerization & Orchestration for CI/CD using DevOps tools like Docker - images & containers, Kubernetes with Cloud Build & GKE
- Followed Cross Industry Standard Process for Data Mining (CRISP-DM) lifecycle for Data collection & processing under Scrum Methodology
- Communicated the results & findings to Stakeholders & Business users
Tools: /Environments: MLOps, A/B Testing, GCP, AutoML, Big Query, Scala, DataProc, Cloud Build, GKE (Google Kubernetes Engine), TensorFlow, REST API, Flask, joblib, CI/CD (DevOps) - Jenkins, Dockers, Kubernetes, Ansible, Elastic Beanstalk, CRISP-DM
Data Scientist
Confidential, Silver Spring, MD
Responsibilities:
- Part of Data Science/ML team that drive efforts around risk management, regulatory compliance, and operational efficiency.
- Apply best practices and machine learning techniques to identify optimal modelling approaches, and design, build and partnering to implement models that address the business problems.
- Implemented predictive models to current business and performed what-if analysis on the structure of models.
- Manipulated, processed and analyzed huge data using Python (Pandas) and SQL.
- Utilized Apache Spark using Scala, PySpark (API) & Py4j library to analyze large chunks of data.
- Leveraged Athena, Glue, Kinesis, EC2 to streamline data lake stored in Amazon S3.
- Experience implementing algorithms in regression analysis, trend finding, user engagement, forecasting, statistical analysis of high dimensional data.
- Designed easy to follow visualizations using Tableau software and published dashboards on Tableau Online and Tableau Desktop.
- Resolve problems through rigorous analytic techniques of machine learning, text mining.
- Develop predictive customer churn model for identifying customers with high potential of disconnecting from the network. This was accomplished using models from neural network and logistic regression
- Took part in Proof-of-Concept project understanding the vendor contracts using Computer Vision (CV) and Optical Character Recognition (OCR)
- Accessed MongoDB database using PyMongo driver library. Created & maintained Aggregation pipelines with methods like match, group, bucket & facet.
- Constantly load balance, scale, monitor and update the model service/microservice infrastructure to deploy the Prediction API to Production.
- In-depth expertise in Statistical Procedures like Parametric and Non-Parametric Tests, Hypothesis Testing, ANOVA, Interpreting P values.
Tools: /Environments: MongoDB, PyMongo, JSON, Scala, PySpark, Amazon web services-AWS, APIs, Tableau, Python - scikit libraries, Statistical procedures - Hypothesis testing, ANOVA, p-values, AWS - Sagemaker, EC2, S3, RDS, ALB/NLB, Lambda, Athena, Glue, Kinesis Firehose