Machine Learning Resume
NY
SUMMARY:
- I have 8+ years of work experience designing, building and implementing analytical and enterprise application using machine learning, Python, R, Scala,and Java.
- GoodExperience with a focus onBig data, Deep Learning, Machine Learning, Image processing or AI.
- Very good hands - on in Spark Core, Spark SQL, Spark Streaming and Spark machine learning using Scala and Python programming languages.
- Has very good experience implementing and handling end - to - end data science products.
- Good experience in periodic model validation and optimization workflows for the data science products developed.
- Good experience in extracting and analyzing the very large volume of data covering a wide range of information from a user profile to transaction history using machine learning tools.
- Collaborated with engineers to deploy successful models and algorithms into production environments.
- Good understanding of model validation processes and optimizations.
- An excellent understanding of both traditional statistical modeling and Machine Learning techniques and algorithms like Regression, clustering, ensembling (random forest, gradient boosting), deep learning (neural networks), etc.
- Proficient in understanding and analyzing business requirements, building predictive models, designing experiments, testing hypothesis, and interpreting statistical results into actionable insights and recommendations.
- Fluency in Python with working knowledge of ML & Statistical libraries (e.g. Scikit-learn, Pandas).
- Experience in processing real-timedata and building ML pipelines end to end.
- Very Strong in Python, statistical analysis, tools, and modeling.
- Very good hands-on experience working with large datasets and Deep Learning algorithms using apache spark and TensorFlow.
- An excellent understanding of both traditional statistical modeling and Machine Learning techniques and algorithms like Regression, clustering, ensembling (random forest, gradient boosting), deep learning (neural networks), etc.
- Good knowledge of recurrent neural networks, LSTM networks,and word2vec.
- Goodexperience in refining and improving our image recognition pipeline.
- Deep interest in learning both the theoretical and practical aspects of working with and deriving insights from data.
- Developed highly scalable classifiers and tools by leveraging machine learning, Apache spark & deep learning
- Worked under the direction of CSO to develop an effective solution to a predictive analytics problem, testing a number of potential machine learning algorithms of apachespark.
- Good experience in extracting and analyzingthe very large volume of data covering a wide range of information from a user profile to transaction history using machine learning tools.
- Built state-of-the-art statistical procedures, algorithms,and models to solve a range of problems in diverse domains.
- Proficient code writing capability in a major programming language such as Python, R, Java,and Scala.
- Good experience with deep learning frameworks like Caffe and TensorFlow.
- Experience using Deep Learning to solve problems in Image or Video analysis.
- Good understanding of Apache Spark features& advantages over map reduce or traditional systems.
- Very good hands-on in Spark Core, Spark SQL, Spark Streaming and Spark machine learning using Scala and Python programming languages.
- Solid Understanding of RDD Operations in Apache Sparki.e. Transformations & Actions, Persistence(Caching),Accumulators, Broadcast Variables.
- In-depth understanding of Apache spark job execution Components like DAG, lineage graph, Dag Scheduler, Task scheduler, Stages and task.
- Developed highly scalable classifiers and tools by leveraging machine learning, Apache spark & deeplearning.
- Highly organized and detail oriented, with a strong ability to coordinate and track multiple deliverables, tasks,and dependencies.
- Experience in exposing Apache Spark as web services.
- Worked under the direction of CSO to develop an effective solution to a predictive analytics problem, testing a number of potential machine learning algorithms of apache spark.
- Experience in real-time processing using Apache Spark and Kafka.
- Have good working experience of No SQL database like Cassandra and MongoDB.
- Delivered at multiple end-to-end Bigdata analytical based solutions and distributed systems like Apache Spark.
- Experience leveraging DevOps techniques and practices like Continuous Integration, Continuous Deployment, Test Automation, Build Automation and Test
- Hands on experience leading delivery through Agile methodologies
- Experience in managing code on GitHub
- Good hands on experience on Spring & Hibernate framework.
- Solid understanding of object-oriented programming.
- Familiarity with concepts of MVC, JDBC, and RESTful.
- Familiarity with build tools such as Maven and SBT.
- Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning
TECHNICAL SKILLS:
Languages: Python,R,Scala,and Java
Spark ML,Spark MLLib, Scikit: Learn. NLTK & Stanford NLP
Deep learning framework: TensorFlow
Big Data Frameworks: Apache Spark,Apache Hadoop, Kafka, Mongo DB,Cassandra.
Machine learning: Linear regression, Logistic Regression, Naive Bayes, SVM, Decision Trees, Random Forest, Boosting, Kmeans,Bagging etc.
Big data Distribution: Cloudera & Amazon EMRCloud
Web Technologies: Flask,Django and spring MVC
Front End Technologies: JSP, HTML5, Ajax, JQuery and XMLServers
Web server: Apache2, Nginx Web Sphere,and Tomcat
Visualization Tool: Apache Zeppelin, Matplotlib,and Tableau.
Databases: Oracle, MySQL,and PostgreSQL.
No SQL: MongoDB and Cassandra
Operating Systems: Linux and windows
Scheduling Tools: Airflow &oozie.
PROFESSIONAL EXPERIENCE:
Confidential
Machine Learning
Responsibilities:
- Converted data from PDF to XML using python script in two ways i.e. from raw xml to processed xml and from processed xml too.CSV files.
- Developing a generic script for the regulatory documents.
- Used python Element Tree(ET) to parse through the XML which is derived from PDF files.
- Data which is stored in sqlite3 datafile(DB.) were accessed using the python and extracted the metadata,tables,and data from tables and converted the tables to respective CSV tables.
- Used the XML tags and attributes to isolate headings,side-headings,and subheadings to each row in CSV file.
- Used Text Mining and NLP techniques find the sentiment about the organization.
- Deployed a spam detection model and performed sentiment analysis of customer product reviews using NLP techniques.
- Developed and implemented predictive models of user behavior data on websites, URL categorical, social network analysis, social mining and search content based on large-scale MachineLearning.
- Developed predictive models on large-scale datasets to address various business problems through leveraging advanced statistical modeling, machine learning,and deep learning.
- Extensively used Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn, SciPy and NLTK in R for developing various machine learning algorithms.
- Used R programming language for graphically critiquing the datasets and to gain insights to interpret the nature of the data.
- Researching on Deep Learning to implement NLP
- Clustering, NLP, Neural Networks. Visualized and presented the results using interactive dashboards.
- Involved in the transformation of files from GITHUB to DSX.
- Involved in the execution of CSV files in Data Science Experience.
- The major part is like being a part of the project, importing the converted CSV file to Confidential internal API which is InfoSphere Information Governance Catalog
- Used Beautiful Soup for web scraping (Parsing the data)
- Developed the code to capture the description which comes under headings of index section to the description column of CSV row.
- Used some other python libraries like PDFMiner, PyPDF2, PDFQuery, Sqlite3.
- Converted the uni-code to a nearest possible string (ASCII value) using Uni-decode module.
- Adding a column to each CSV row which gives the parent Index number of the given row.
Environment: R Studio, AWS S3, NLP, EC2, Neural networks, SVM, Decision trees, MLbase, ad-hoc, MAHOUT, NoSQL, Pl/SQL, MDM, MLLib & Git.
Confidential, NY
Data Scientist
Responsibilities:
- Performed data exploratory, data visualizations, and feature selections using Python and Apache Spark.
- Scaled Scikit-learn machine learning algorithms using apache spark.
- Using techniques such as Fast Fourier Transformations, Convolution Neural Networks,and Deep learning.
- I develop Deep Convolution and Recurrent Neural Networks with TensorFlow and have significant Risk Management & Quantitative Finance experience.
- Used multiplemachine learning algorithms, including random forest and boosted tree, SVM, SGD, neural network, and deep learning using TensorFlow.
- Used Python, Convolution Neural Networks (CNN), Deep Belief Networks (DBN), Theano, cafe etc.
- Applied unsupervised and supervised learning methods in analyzing high-dimensional data. Proficient use of Python Scikit-learn, pandas, and NumPy packages.
- Performed data modeling operations using Power Bi, Pandas, and SQL.
- Utilized Python libraries wxPython, NumPy, Twisted and matplotlib
- Used python libraries like Beautiful Soup and matplotlib.
- Developed and implemented predictive models of user behavior data on websites, URL categorical, social network analysis, social mining and search content based on large-scale Machine Learning,
- Wrote scripts in Python using Apache Spark and ElasticSearch engine for use in creating dashboards visualized in Grafana.
- Lead development for Natural Language Processing (NLP) initiatives with chat-bots and virtual assistants.
- Developed highly scalable classifiers and tools by leveraging machine learning, Apache spark & deep
- Converted Pandasdata frame dataset to apache spark data frame.
- Used multiple machine learning algorithms, including random forest and boosted tree, SVM, SGD, neural network, and deep learning using TensorFlow.
- Collaborated with engineers to deploy successful models and algorithms into production environments.
- Collaborated with a diverse team that includes statisticians, Chief Science Officer,and engineers to build data science project pipelines and algorithms to derive valuable insights from current and new datasets.
- Used PySparkdata frame to read text data,CSV data,Image data from HDFS, S3,andHive.
- Cleaned input text data using PySpark Machine learning feature exactions API.
- Created features to train algorithms.
- Used various algorithms of PySparkMLAPI.
- Trained model using historical data stored in HDFS and Amazon S3.
- Used Spark streaming to load the trained model to predict real-time data from Kafka.
- Stored the result in MongoDB.
- Utilized various new supervised and unsupervised machine learning algorithms/software to perform NLP tasks and compare performances
- The web application can pick data which is stored in MongoDB.
- Used Apache Zeppelin to visualization of Big Data.
- Fully automated job scheduling, monitoring, and cluster management without human interventionusing airflow.
- Build apache spark as Web service using a flask. worked with input file formats like an orc, parquet, Json, Avro.
- Developed highly scalable classifiers and tools by leveraging machine learning, Apache spark & deep learning.
- Wrote Spark SQL UDFs, Hive UDFs.
- Optimized Spark coding suing Performance Tuning of apache spark.
- Optimized machine learning algorithms based on need.
- Used amazon elastic MapReduce (EMR) to process a hugenumber of datasets using Apache spark and TensorFlow.
Environment: Machine learning, Scikit-learning,Pandas, Spark core, Spark SQL, Spark streaming, Python, airflow,Amazon EMR, ec2, s3,pandas,NumPy,matplotlib, TensorFlow,Kafka,flask,MongoDB,Hive, HDFS,GitHub, REST & airflow.
Confidential - Columbus, OH
Data Scientist
Responsibilities:
- Collaborated with internal stakeholders to understand business challenges and develop analytical solutions to optimize business processes.
- Performed analysis using industry leading text mining, data mining, and analytical tools and open source software.
- Used MATLAB, C/C++ with OpenCV and SVM, Neural Networks, Random Forest as classifiers.
- Generated graphical reports using python package NumPy and matplotlib.
- Built various graphs for business decision making using Pythonmatplotlib library.
- Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning (ANN and CNN), Theano, Keras,andTensorFlow.
- Built and trained a deep learning network using TensorFlow on the data, and reduced wafer scrap by 15%, by predicting the likelihood of wafer damage. A combination of the z-plot features, image features (pigmentation) and probe features are being used.
- Experienced in ArtificialNeuralNetworks(ANN) and DeepLearning models using Theano, TensorFlow and Keras packages using Python.
- Used Natural Language Processing (NLP) to pre-process the data, determine the number of words and topics in the emails and form cluster of words
- Cleaned input text data using PySparkMachine learning feature exactions API.
- Used Pandas data frame for exploratory data analysis on sample dataset.
- Wrote Scikit learn based machine learning algorithms for building POC’s on sample dataset.
- Analyzedstructured, semi-structured and unstructured dataset using map-reduce and apache spark.
- Implemented end to end lambda architecture to analyze streaming and batch dataset.
- Used Apache Mahout’s scalable machine learning algorithms for building recommendation engine, for building classification and regression model.
- Converted mahout’s machine learning algorithms to RDD based ApacheSparkMLLib to improve performance.
- Optimizedmachine learning algorithms based on need.
- Automatic music/news/POI recommendation inside the vehicle by using GPS location, passenger conversation, behavior,and mood. Using machine learning and natural language.
- Smart state-of-charge monitor for electric vehicles based on RecurrentNeuralNetwork and Seq2Seq forecast.
- Build multiple features of machine learning using python, Scala,andJava based on need.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Migrated single machine learning machine learning algorithms to Parallel processing algorithms.
- Developed Hive queries for ad-hoc analysis.
- Used amazon elastic MapReduce (EMR) to process a hugenumber of datasets using Apache spark and TensorFlow.
- Lead Data Scientist for development of Machine Learning and NLP engines utilizing health population Data.
- Analyzed the partitioned and bucketed data and compute various metrics for reporting.
- Involved in loading data from RDBMS and weblogs into HDFS using Sqoop and Flume
- Involved in building complex streaming data Pipeline using Kafka and Apache Spark.
- Worked on loading the data from MySQL to HBase where necessary using Sqoop.
- Exported the result set from Hive to MySQL using Sqoop after processing the data.
- Optimized hive queries.
- Optimized MapReduce and apache spark jobs.
- Wrote custom input formats in map reduce to analyses image dataset.
- Wrote Hive UDF’s based on need.
Environment: Hadoop, Map Reduce, Hive, Mahout, Apache Spark, Python, Scikit learn, Pandas, NumPy, Java, Maven, Eclipse, MySQL, Kafka,Sqoop & Flume.
Confidential, NY
Data Scientist
Responsibilities:
- Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
- Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for modeling.
- Designed and automated the process of score cuts that achieve increased close and good rates using advanced R programming.
- Utilized Convolution Neural Networks to implement a machinelearning image recognition component.
- Managed datasets using Panda data frames and MySQL, queried MYSQL relational database(RDBMS) queries from python using Python-MySQL connector MySQL dB package to retrieve information.
- Utilized standard Python modules such as csv, itertools,and pickle for development.
- Tech stack is Python 2.7/PyCharm/Anaconda/pandas/NumPy/unittest/R/Oracle.
- Developed large data sets from structured and unstructured data. Perform data mining.
- Partnered with modelers to develop data frame requirements for projects.
- Performed Ad-hoc reporting/customer profiling, segmentation using R/Python.
- Tracked various campaigns, generating customer profiling analysis and data manipulation.
- Provided python programming, with detailed direction, in the execution of data analysis that contributed to the final project deliverables. Responsible for data mining.
- Analyzed large datasets to answer business questions by generating reports and outcome.
- Worked with a team of programmers and data analysts to develop insightful deliverables that support data-driven marketing strategies.
- Executed SQL queries from R/Python on complex table configurations.
- Retrieving data from the database through SQL as per business requirements.
- Create, maintain, modify and optimize SQL Server databases.
- Manipulation of Data using python Programming.
- Adhering to best practices for project support and documentation.
- Understanding the business problem, build the hypothesis and validate the same using the data.
- Managing the Reporting/Dashboarding for the Keymetrics of the business.
- Involved in data analysis using different analytic techniques and modeling techniques.
Environment: Python,Oracle,Python, Scikit learn,Pandas, NumPy, SciPy, NLTK,Jupyter notebook,R,and Studio
Confidential
Data Analyst/Data Modeler
Responsibilities:
- Developed end to end enterprise Applications using Spring MVC, REST and JDBC Template Modules.
- Written well designed testable, efficient java code.
- Understanding and analyzing complex issues and addressing challenges arising during the software development process, both conceptually and technically.
- Implemented best practices of Automated Build, Test and Deployment.
- Developed design patterns, data structures,and algorithms based on project need.
- Worked on multiple tools such as Toad, Eclipse, SVN, Apache,andTomcat.
- Deployed models via APIs into applications or workflows
- Worked on User Interface technologies like HTML5, CSS/SCSS.
- Wrote Stored procedure and SQL queries based on project need.
- Deployed built jar into the application server.
- Created Automated Unit Tests using Flexible/Open Source Frameworks
- Developed Multi-threaded and Transaction Handling code (JMS, Database).
Environment: Java, Spring MVC, Hibernate, JMS, HTML5, CSS/SCSS, Junit, Eclipse, Tomcat,and Oracle.