Sr. Data Scientist / Machine Learning Engineer Resume
CA
PROFESSIONAL SUMMARY:
- Around 5 years of experience in Data Analysis, Machine Learning, Data mining with large data sets of Structured and
- Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization and discovering meaningful business insights.
- Proficient in applying Statistical Modelling and Machine Learning techniques (Linear Regression, Logistic Regression, Decision Trees, Random Forest, SVM, K - Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Factor analysis, PCA, Ensembles and good knowledge on Recommendation Systems .
- Expertise in Scrapy and beautiful soup libraries for designing web crawlers.
- Excellent Software Development Life Cycle (SDLC) with good working knowledge of testing methodologies, disciplines, tasks, resources and scheduling.
- Excellent knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
- Experience working with Web languages such as Html and CSS .
- Experience working with Weka and Meka (Multi-label classification).
- Strong Data Analysis skills using business intelligence, SQL and MS Office Tools.
- Strong SQL Server programming skills, with experience in working with functions, packages and triggers.
- Experience in visualization tools like, Tableau for creating dashboards.
- Hands on experience on building Recommendation Engines and Natural Language Processing.
- Expertise designing the web crawlers for data gathering and application of LDA.
- Strong experience in using Excel and MS Access to dump the data and analyse based on business needs.
- Knowledge in Cloud services such as Amazon AWS.
- Using Agile methodology to develop a project when working on a team.
- Expert in python libraries such as NumPy, SciPy for mathematical calculations, Pandas for data preprocessing/wrangling, Mat plot, Seaborn for data visualization, Sklearn for machine learning, Theano, TensorFlow, Keras for Deep leaning and NLTK for NLP.
- Expert in using Model Pipelines to automate the tasks and put models into production quickly.
- Expertise in Dimensionality Reduction techniques like PCA, LDA, Singular Value Decomposition technique.
- Expertise in k-Fold Cross Validation and Grid Search for Model Selection.
- Have knowledge of relational databases like Oracle SQL and SQLite.
- Experience in writing SQL, Sub Queries.
- Good knowledge on the five stages of Design Thinking Methodology.
- Excellence in handling Big Data Ecosystems like Apache Hadoop, MapReduce, Spark, HDFS Architecture, Cassandra, HBase, Sqoop, Hive , Pig, MLlib, ELT.
- Well experienced in Core Java - asynchronous programming, multithreading, collections and a few design patterns.
- Good Knowledge on Version control systems such as Git, SVN, Github, bitbucket.
- Explicitly fashioned in writing SQL queries for various RDBMS such as SQL Server, MySQL, Microsoft SQL, Postgre SQL, Teradata and Oracle , NoSQL databases such as MongoDB, HBase and Cassandra to handle unstructured data.
- Practically engaged in Evaluating Models performance using A/B Testing, K-fold cross validation, R-Square, CAP Curve, Confusion Matrix, ROC plot, Gini Coefficient and Grid Search.
- Robust participation for functioning in fast-paced multi-tasking environment both independently and in the collaborative team. Adequate with challenging projects and work in ambiguity to solve complex problems. A self-motivated exuberant learner.
TECHNICAL SKILLS:
Scikit: learns, Keras, TensorFlow, Numpy, Pandas, NLTK, Gensim, Matplotib, ggplot2, Scrapy, BeautifulSoup, Seaborn, Bokeh, networkx, Stats models, Theano.
Programming L anguages: Python, R, SQL, Scala, Pig, C, MATLAB, Java.
Querying languages: SQL, NO SQL, PostgreSQL, MySQL, Microsoft SQL.
Machine Learning: Data Preprocessing, Weighted Least Square, PCR, PLS, Picewise, Spline, Quadratic Discriminant Regression, Logistic Regression, Naive Bayes, Decision Tree, Random Forest, KNN, Linear Regression, Lasso, Ridge, SVM, Regression Tree, K-means, Ridge and Lasso, Polynomial Regression, Azure, Perceptron, Back Propagation,PCALDA. UML, RDF, SPARQL
Tableau, Python: Matplotlib, Seaborn
Databases: MySQL, SQL Lite
IDE Tools: PyCharm, Spyder, Eclipse, Visual Studio and NetBeans, Amazon Sagemaker.
Project Management: JIRA, Share Point
SDLC Methodologies: Agile, Scrum, Waterfall
Anaconda Enterprise, R: Studio, Azure Machine Learning Studio, Oozie 4.2, AWS Lambda.
PROFESSIONAL EXPERIENCE:
Confidential, CA
Sr. Data Scientist / Machine Learning Engineer
Responsibilities:
- Designed a web crawler to gather the data.
- Studying and analysing the HTML and CSS scripts of the web pages.
- Obtaining the required data from the data and storing data as a Json file.
- Applying Reinforced learning algorithms like Upper Confidence Bound on plausible data.
- Selecting features, building and optimizing classifiers using Machine Learning Techniques.
- Performed data wrangling to clean, transform and reshape the data utilizing Pandas library. Analysed data using SQL, R, Scala, Python, and presented analytical reports to management and technical teams.
- Developed a sentiment analysis model to find out the user sentiment about the product using machine learning algorithms & deep learning RNN's.
- Worked with text feature engineering techniques n-grams, TF-IDF, word2vec etc.
- Developed text classification algorithm using classical machine learning algorithms & applied the state-of-the-art machine learning algorithms such as deep neural networks and RNN's.
- Reduced the log-loss error to below 1.0 for text classification problem using the machine learning & deep learning algorithms.
- Developed low-latency applications and interpretable models using machine learning algorithms.
- Used the AWS SageMaker to quickly build, train and deploy the machine learning models.
- Worked with different performance metrics such as f-1 score, precision, recall, log-loss, accuracy and AUC etc.
- Worked with Machine learning algorithms like Regressions (linear, logistic), SVMs and Decision trees.
- By thorough systematic search, demonstrated performance surpassing the state-of-the-art (deep learning)
- Used Text Mining and NLP techniques find the sentiment about the organization.
- Used clustering technique K-Means to identify outliers and to classify unlabeled data.
- Applying NLP and text based extraction techniques.
- Worked with data-sets of varying degrees of size and complexity including both structured and unstructured data.
- Generously practiced on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
- Using Patterns and variations in characteristics of data supporting the predictive analysis.
- Implemented Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to AWS (S3/EC2) and Django platform for the company's core business.
- Data wrangling to clean transform and reshape the data utilizing Numpy and Pandas library.
- Contribute to data mining architectures, modelling standards, reporting, and data analysis methodologies.
- Conduct research and make recommendations on data mining products, services, protocols, and standards in support of procurement and development efforts.
- Used Data Quality validation techniques to validate Critical Data elements (CDE) and identified many anomalies. Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel and Python.
- Involved in defining the Source to Target data mappings, Business rules, and data definitions.
- Worked with different data science teams and provided respective data as required on an ad-hoc request basis.
- Guided and advised both application engineering and data scientist teams in mutual agreements/provisions of data.
- Applying techniques such as multivariate regressions, Bayesian probabilities, clustering algorithms, machine learning, dynamic programming, stochastic-processes, queuing theory , algorithmic knowledge to efficiently research and solve complex development problems.
- Extracting the data from Azure Data Lake into HDInsight Cluster (INTELLIGENCE + ANALYTICS) and applying spark transformations & Actions and loading into HDFS.
- Application of engineering methods to define, predict and evaluate the results obtained.
- Leading, and working with other data scientists in designing effective analytical approaches taking into consideration performance and scalability to large datasets
- Application of projects knowledge of data wrangling techniques and scripting languages.
- Performing unit and system testing to validate the output of the analytic procedures against expected results.
- Identifying research opportunities on clients.
- Provided business metrics for the projects to show improvements.
- Compiling and presenting complex information using Tableau.
Environment: Python (Scikit -Learn/ Keras/ SciPy/ NumPy/ Pandas/ Matplotlib/ NLTK/ Seaborn), R (ggplot2/ caret/ trees/ arules), Tableau, Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering /Hierarchical Clustering/ Ensemble methods/ Collaborative filtering/ multivariate regressions/ Bayesian probabilities/clustering algorithms/ dynamic programming ).
Confidential, MA
Sr. Data Scientist / Machine Learning Engineer
Responsibilities:
- Gathering, retrieving and organising data and using it to reach meaningful conclusions.
- Developed a system for collecting data and generating their findings into reports that improved the company.
- Setting up the analytics system to provide insights.
- Initially the data was stored in MongoDB. Later the data was moved to Elasticsearch.
- Used Kibana to visualize the data collected from Twitter using Twitter REST APIs.
- Developed a multi class, multi label 2-stage classification model to identify depression- related tweets and classify depression- indicative symptoms. Utilized the created model to calculate the severity of depression in a patient using Python, Scikit learn, Weka and Meka.
- Conceptualized and created a knowledge graph database of news events extracted from tweets using Java, Virtuoso, Stanford CoreNLP, Apache Jena, RDF.
- Producing and maintaining internal and client-based reports.
- Creating stories with data that a non-technical team could also understand.
- Worked on Descriptive, Diagnostic, Predictive and Prescriptive analytics.
- Implementation of Character Recognition using Support vector machine for performance optimization.
- Monitored the Data quality and integrity of data was maintained to ensure effective functioning of department.
- Managed database design and implemented a comprehensive Star-Schema with shared dimensions.
- Implemented Normalization Techniques and build the tables as per the requirements given by the business users.
- Developed and maintained stored procedures, implemented changes to database design including tables and views and Documented Source to Target mappings as per the business rules.
- Analysing end user requirements, communicating and modelling them to the development team.
- Took responsibility to bridge between technologists and business stakeholders to drive innovation from conception to production.
- Machine learning automatically scores user assignment based on few manually scored assignments.
- Utilized various techniques like Histogram, bar plot, Pie-Chart, Scatter plot, Box plots to determine the condition of the data.
- Researching and developing Predictive Analytic solutions and creating solutions for business needs.
- Worked on data processing on very large datasets that handle missing values, creating dummy variables and various noises in data.
- Mining large data sets using sophisticated analytical techniques to generate insights and inform business decisions.
- Building and testing hypothesis, ensuring statistical significance and building statistical models for business application.
- Developed Machine Learning algorithms with Spark Mlib standalone and Python.
- Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning.
- Performed data pre-processing tasks like merging, sorting, finding outliers, missing value imputation, data normalization, making it ready for statistical analysis.
- Implemented various machine learning models such as regression, classification, Tree based and Ensemble models.
- Performed model Tuning by adjusting the Hyper parameters and raised the model accuracy.
- Validated different models developed applying appropriate measures such as k-Fold cross validation, AUC, ROC to identify the best performing model.
- Created Machine Learning and statistical methods, (SVM, CRF, HMM, sequential tagging) or willingness to intensely learn.
- Building data platforms for analytics, advanced analytics in Azure.
- Managing Tickets using basic SQL queries.
- Segmented the customers based on demographics using K-means Clustering.
- Implementing various machine learning algorithms in spark using MLLib.
- Performed Segmentation on customer’s data to identify target groups for new loans using Clustering techniques such as K-Means and further processed using Support Vector Regression.
- Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring.
- Accomplished multiple tasks from collecting data to organizing and interpreting statistical information.
- Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau, and Power BI.
Environment: Python 3.6.4, R Studio, MLLib, Regression, SQL Server, Hive, Hadoop Cluster, ETL, Tableau, NumPyPandas, Matplotlib, Power BI, Scikit-Learn, ggplot2, Shiny, TensorFlow, Teradata.
Confidential
Associate Data Analyst
Responsibilities:
- Streamlining information by integrating data from multiple data sets into one database system.
- Creating database triggers and designing tables.
- Creating statistics out of the data by analysing and generating reports.
- Cleaning database by removing data files and unnecessary information.
- Running SQL queries to serve solutions to customer generated tickets.
- Performing specific data queries and writing scripts.
- Collecting data from multiple sources and adding it to the database.
- Research and reconcile data discrepancies occurring among various information systems and reports.
- Identifying new sources of data and methods to improve data collection, analysis and reporting.
- Testing prototype software and participating in approval for a new software.
- Identifying areas with data inaccuracies and also the trends in growing data inaccuracies.
- Contributing to the methods using large data sets and complex processes.
- Finding trends and patterns to make recommendations to the clients.
- Noting down the patterns weekly, monthly and quarterly.
- Collaborating with marketers, salespeople, data architects and database developers.
- Working with web developers to collect data and streamlining the data reuse.
- Worked in importing and cleansing of data from various sources like Teradata, Oracle, flat files with high volume of data.
- Utilized Python to cluster credit card holders and implemented predictive analysis
- Wrote and executed unit, system, integration and UAT scripts in a data warehouse projects.
- Developed Informatica Mappings using various transformations and PL/SQL Packages to extract, transformation and loading of data.
- Wrote Python program to parse and upload csv files into PostgreSQL Database. HTTP Request Library was used for Web API call.
- Wrote SQL for data profiling and developed data quality reports.
Environment: Python, R Studio, MLlib, Regression, SQL Server, Hive, Hadoop Cluster, ETL, Tableau, NumPyPandas, Matplotlib, Power BI, Scikit-Learn, ggplot2, Shiny, TensorFlow, Teradata.
Confidential
Data Analyst
Responsibilities:
- Successfully Completed Junior Data Analyst Internship in Confidential .
- Built an Expense Tracker and Zonal Desk.
- Identifying inconsistencies, correcting them or escalating the problems to next level.
- Assisted in development of interface testing and implementation plans.
- Analysing data for data quality and validation issues.
- Analysing the websites regularly to ensure site traffic and conversion funnels are performing well.
- Collaborating with Sales and marketing teams to optimize processes that communicate insights effectively.
- Creating and maintaining automated reports using SQL.
- Understood all the Hadoop architecture and drove all the meetings
- Conducted safety check to make sure that my team is feeling safe for the retrospectives
- Aided in data profiling by examining the source data
- Extracting features from the given data set and use them to train and evaluate different classifiers that are available in the WEKA tool. Using these features, we differentiate spam messages from legitimate messages.
- Created numerous SQL queries to modify data based on data requirements and added enhancements to existing procedures.
- Implemented statistical modelling techniques in Python.
- Conducted safety check to make sure that my team is feeling safe for the retrospectives
- Aided in data profiling by examining the source data
- Performed data mappings to map the source data to the destination data
- Developed Use Case Diagrams to identify the users involved. Created Activity diagrams and Sequence diagrams to depict the process flows
Environment: Python, Matlab, Oracle, HTML5, Tableau, MS Excel, Server Services, Informatica PowerCenterSQL, Microsoft Test Manager, Adobe Connect, MS Office Suite, LDAP, Hive, Spark, Pig, Oozie.