Data Science Consultant (ai Engineer) Resume
SUMMARY:
- Impactful and innovative Data Scientist with 3+ years of extensive experience in Data Science.
- Certification in AWS Machine Learning Specialty
- Experienced with deploying machine learning apps developed in Python using Flask framework and deploy them using Jenkins and/or kubernetes to on - premise cloud.
- Experienced using open source and commercial license OCR tools to scanned documents and perform downstream NLP analytics.
- Experienced with Geospatial analytics tools like ArcGIS, GeoPandas, Shapely.
- Experienced with using object detection using Keras, Pytorch, Mx-Net and Tensorflow APIs.
- Experienced using State of the Art Algorithms like BERT and XLNET for text classification
- Experienced using time series algorithms like FBProphet, Sarimax, ARIMA, AR-DEEP (GluonTS) for forecasting
PROFESSIONAL EXPERIENCE:
Data Science Consultant (AI Engineer)
Confidential
Responsibilities:
- Develop, test, validate and refine predictive models using artificial intelligence, deep learning and machine learning to optimize customer experiences, revenue generation, operational effectiveness, marketing success and other business outcomes.
- Partner with Business Clients, establish professional relationship, and communicate with analytics clients in order to understand business needs.
- Frame Problems with Stakeholder, research and construct problem frames in order to understand the analysis context and scope that will provide timely, useful results.
- Lead Project Teams and participate in multidisciplinary analytics project teams.
- Interview Subject Matter Experts, plan and conduct individual interviews with experts to gain valid information and data needed for analysis.
- Communicate Results to Decision Makers, explain the Results and conclusions of the analytics process in both written and oral presentation formats.
- Create recommendations that are practical, actionable, and have material impact, in addition to being well-supported by analytical models and data.
- Identify unique opportunities to collect new data.
- Perform data studies of new and diverse data sources.
- Find new uses for existing data sources.
- Conduct statistical modeling and experiment design.
- Implement automated processes for efficiently producing scale models.
- Design, modify and build new data processes.
- Generate algorithms and create computer models.
- Collaborate with database engineers and other scientists.
- Implement new or enhanced software designed to access and handle data more efficiently.
- Developed and operationalized an object detection model to detect customer-owned, in-ground swimming pools.
- Identifying this subset of customers helps the Smart Saver Program target them for promotional marketing initiatives.
- The project saves $5,000 annually and the information collected will benefit future marketing promotions.
- Extracted line items- SKU, Item Description, Item Cost, Tax, Date, Vendor from scanned receipts.
- Used an OCR to change the images into text and used entity recognition system to extract them. Used dependency parser to group the relevant entities together.
- Created an image classification model that compares actual and expected solar plant performance and identifies the driver of any deviation in those two metrics. While it could be handled as a time-series problem, the values are processed into a graph and identified using computer vision AI that is more than 93% accurate. The project is expected to save the business over $500,000 annually.
- Worked on digitizing handwritten stickies. Stickies written during agile design thinking sessions were digitized using handwriting recognition technologies using Convolutional Neural Networks and Word Beam Search.
- Contributed to the development of a text-classifier using pre-trained BERT model.
- The model helps Duke's Customer Experience group accurately and autonomously classify freeform text written by customers in satisfaction surveys. This initiative is expected to save the business over $150,000.
- Based on description (text) of a work request, recommend the most similar work order done in the past. Planners use the tasks from the most similar work orders to plan for the new work request. It is expected to save the planning by a significant amount totaling 36.4k Man Hours/year in Operational Excellence savings. Did all the work-initial business research, model development, microservice development, API development, deployment, scaling and feedback loop to train the model automatically on user feedbacks.
- As part of the operations team, wrapped models created by myself and other data scientists in the team using flask app and deployed them to Pivotal Cloud Foundry, Kubernetes, AWS and Azure using Docker Containers.
Environment: AWS Sagemaker, AWS S3, Python 3.x, Git, Anaconda, Shell Script, Hadoop, Scikit-Learn, Tensorflow, Keras, Tesseract, SQL server, Oracle, Numpy, SpaCy, OpenCV, Mx-Net, Kubernetes, Docker, Flask, Unix, Azure Machine Learning Studio, Azure ML, docker-compose, nirmata
Data Scientist
Confidential
Responsibilities:
- Used SQL to pull data from the enterprise data warehouse
- Developed a neural net using Keras API, LSTM, Attention with Context, Bidirectional GRU to get superior classification on patient satisfaction survey.
- Deployed the text classification model to cloud as a web service.
- Develop, test, validate and refine predictive models using artificial intelligence, deep learning and machine learning to optimize customer experiences, revenue generation, operational effectiveness, marketing success and other business outcomes.
- Leverage Image, text and numeric data within Confidential using AI tools to deliver results
- Partner with Business Clients, establish professional relationship, and communicate with analytics clients in order to understand business needs.
- Frame Problems with Stakeholder, research and construct problem frames in order to understand the analysis context and scope that will provide timely, useful results.
- Lead Project Teams and participate in multidisciplinary analytics project teams.
- Interview Subject Matter Experts, plan and conduct individual interviews with experts to gain valid information and data needed for analysis.
- Communicate Results to Decision Makers, explain the Results and conclusions of the analytics process in both written and oral presentation formats.
Environment: Python 3.x, Git, Anaconda, Shell Script, Hadoop, Scikit-Learn, Tensorflow, Keras, Tesseract, SQL server, Oracle, Numpy, SpaCy, OpenCV
Data Scientist
Confidential, El Paso, TX
Responsibilities:
- Used R/Python to write scripts to help the company replace excel as the primary tool reducing the time taken for data cleaning/analysis by more than 60%
- Perform statistical analyses and test them using hypothesis testing.
- Provided strategies to collect and store data that facilitates analytics and visualization.
- Used R to model the effects of interventions - educating school children on healthy food intakes and modifying the school lunch - on the obesity profile and vegetable intake.
- Used Machine Learning tools to analyze the data gathered from Survey.
Environment: Python 3.x, R 3.x, Microsoft Excel, Microsoft Access
Graduate Researcher- Data Science
Confidential, El Paso, TX
Responsibilities:
- Used Convolutional Neural Network to model the movement of currency pairs on the Confidential market using Python (Keras/Tensorflow/Scikit) with 75% accuracy on 4h windows
- Used image recognition to detect whether or not a molecule is present in a scanned image of electron microscope at UTEP using Python Machine Learning (tensorflow/Keras)
- Used Word2Vec, Bag of words, Word Embeddings for sentiment analysis in text using Natural Language processing Tools for text data.
- Reduced dimensions of high dimensional data using Principal Component Analysis.
- Used data visualization tools like MatplotLib, Plotly to visualize data.
- Identified Overfitting and Underfitting and tuned the models to optimize the fitting.
Environment: Python 3.x, Anaconda, R 3.x, Azure Machine Learning Studio, Microsoft Excel, Deep learning frameworks-TensorFLow, Keras, Numpy, Ms Access, SQL, ScikitLearn
Data Scientist
Confidential
Responsibilities:
- Oversaw the bidding process and prepared requirement documents for the first Automated Storage and Retrieval Library in Ireland at the University of Limerick, Glucksman Library.
- Used R, Access, SQL, MySQL and Excel to identify bottlenecks in warehouse material flows for Pfizer, Novartis, Pepsi, Premier Stationery, and Merck.
- Led the company's machine learning and statistical modeling effort including building predictive models and generate data products to support customer segmentation, product recommendation and allocation planning; prototyping and experimenting ML/DL algorithms and integrating into production system for different business needs.
Environment: Python 3.x, R 3.x, Tableau Desktop, Microsoft Excel, Ms Access, SQL