We provide IT Staff Augmentation Services!

Machine Learning Resume

5.00/5 (Submit Your Rating)

NY

SUMMARY:

  • I have 8+ years of work experience designing, building and implementing analytical and enterprise application using machine learning, Python, R, Scala,and Java.
  • GoodExperience with a focus onBig data, Deep Learning, Machine Learning, Image processing or AI.
  • Very good hands - on in Spark Core, Spark SQL, Spark Streaming and Spark machine learning using Scala and Python programming languages.
  • Has very good experience implementing and handling end - to - end data science products.
  • Good experience in periodic model validation and optimization workflows for the data science products developed.
  • Good experience in extracting and analyzing the very large volume of data covering a wide range of information from a user profile to transaction history using machine learning tools.
  • Collaborated with engineers to deploy successful models and algorithms into production environments.
  • Good understanding of model validation processes and optimizations.
  • An excellent understanding of both traditional statistical modeling and Machine Learning techniques and algorithms like Regression, clustering, ensembling (random forest, gradient boosting), deep learning (neural networks), etc.
  • Proficient in understanding and analyzing business requirements, building predictive models, designing experiments, testing hypothesis, and interpreting statistical results into actionable insights and recommendations.
  • Fluency in Python with working knowledge of ML & Statistical libraries (e.g. Scikit-learn, Pandas).
  • Experience in processing real-timedata and building ML pipelines end to end.
  • Very Strong in Python, statistical analysis, tools, and modeling.
  • Very good hands-on experience working with large datasets and Deep Learning algorithms using apache spark and TensorFlow.
  • An excellent understanding of both traditional statistical modeling and Machine Learning techniques and algorithms like Regression, clustering, ensembling (random forest, gradient boosting), deep learning (neural networks), etc.
  • Good knowledge of recurrent neural networks, LSTM networks,and word2vec.
  • Goodexperience in refining and improving our image recognition pipeline.
  • Deep interest in learning both the theoretical and practical aspects of working with and deriving insights from data.
  • Developed highly scalable classifiers and tools by leveraging machine learning, Apache spark & deep learning
  • Worked under the direction of CSO to develop an effective solution to a predictive analytics problem, testing a number of potential machine learning algorithms of apachespark.
  • Good experience in extracting and analyzingthe very large volume of data covering a wide range of information from a user profile to transaction history using machine learning tools.
  • Built state-of-the-art statistical procedures, algorithms,and models to solve a range of problems in diverse domains.
  • Proficient code writing capability in a major programming language such as Python, R, Java,and Scala.
  • Good experience with deep learning frameworks like Caffe and TensorFlow.
  • Experience using Deep Learning to solve problems in Image or Video analysis.
  • Good understanding of Apache Spark features& advantages over map reduce or traditional systems.
  • Very good hands-on in Spark Core, Spark SQL, Spark Streaming and Spark machine learning using Scala and Python programming languages.
  • Solid Understanding of RDD Operations in Apache Sparki.e. Transformations & Actions, Persistence(Caching),Accumulators, Broadcast Variables.
  • In-depth understanding of Apache spark job execution Components like DAG, lineage graph, Dag Scheduler, Task scheduler, Stages and task.
  • Developed highly scalable classifiers and tools by leveraging machine learning, Apache spark & deeplearning.
  • Highly organized and detail oriented, with a strong ability to coordinate and track multiple deliverables, tasks,and dependencies.
  • Experience in exposing Apache Spark as web services.
  • Worked under the direction of CSO to develop an effective solution to a predictive analytics problem, testing a number of potential machine learning algorithms of apache spark.
  • Experience in real-time processing using Apache Spark and Kafka.
  • Have good working experience of No SQL database like Cassandra and MongoDB.
  • Delivered at multiple end-to-end Bigdata analytical based solutions and distributed systems like Apache Spark.
  • Experience leveraging DevOps techniques and practices like Continuous Integration, Continuous Deployment, Test Automation, Build Automation and Test
  • Hands on experience leading delivery through Agile methodologies
  • Experience in managing code on GitHub
  • Good hands on experience on Spring & Hibernate framework.
  • Solid understanding of object-oriented programming.
  • Familiarity with concepts of MVC, JDBC, and RESTful.
  • Familiarity with build tools such as Maven and SBT.
  • Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning

TECHNICAL SKILLS:

Languages: Python,R,Scala,and Java

Spark ML,Spark MLLib, Scikit: Learn. NLTK & Stanford NLP

Deep learning framework: TensorFlow

Big Data Frameworks: Apache Spark,Apache Hadoop, Kafka, Mongo DB,Cassandra.

Machine learning: Linear regression, Logistic Regression, Naive Bayes, SVM, Decision Trees, Random Forest, Boosting, Kmeans,Bagging etc.

Big data Distribution: Cloudera & Amazon EMRCloud

Web Technologies: Flask,Django and spring MVC

Front End Technologies: JSP, HTML5, Ajax, JQuery and XMLServers

Web server: Apache2, Nginx Web Sphere,and Tomcat

Visualization Tool: Apache Zeppelin, Matplotlib,and Tableau.

Databases: Oracle, MySQL,and PostgreSQL.

No SQL: MongoDB and Cassandra

Operating Systems: Linux and windows

Scheduling Tools: Airflow &oozie.

PROFESSIONAL EXPERIENCE:

Confidential

Machine Learning

Responsibilities:

  • Converted data from PDF to XML using python script in two ways i.e. from raw xml to processed xml and from processed xml too.CSV files.
  • Developing a generic script for the regulatory documents.
  • Used python Element Tree(ET) to parse through the XML which is derived from PDF files.
  • Data which is stored in sqlite3 datafile(DB.) were accessed using the python and extracted the metadata,tables,and data from tables and converted the tables to respective CSV tables.
  • Used the XML tags and attributes to isolate headings,side-headings,and subheadings to each row in CSV file.
  • Used Text Mining and NLP techniques find the sentiment about the organization.
  • Deployed a spam detection model and performed sentiment analysis of customer product reviews using NLP techniques.
  • Developed and implemented predictive models of user behavior data on websites, URL categorical, social network analysis, social mining and search content based on large-scale MachineLearning.
  • Developed predictive models on large-scale datasets to address various business problems through leveraging advanced statistical modeling, machine learning,and deep learning.
  • Extensively used Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn, SciPy and NLTK in R for developing various machine learning algorithms.
  • Used R programming language for graphically critiquing the datasets and to gain insights to interpret the nature of the data.
  • Researching on Deep Learning to implement NLP
  • Clustering, NLP, Neural Networks. Visualized and presented the results using interactive dashboards.
  • Involved in the transformation of files from GITHUB to DSX.
  • Involved in the execution of CSV files in Data Science Experience.
  • The major part is like being a part of the project, importing the converted CSV file to Confidential internal API which is InfoSphere Information Governance Catalog
  • Used Beautiful Soup for web scraping (Parsing the data)
  • Developed the code to capture the description which comes under headings of index section to the description column of CSV row.
  • Used some other python libraries like PDFMiner, PyPDF2, PDFQuery, Sqlite3.
  • Converted the uni-code to a nearest possible string (ASCII value) using Uni-decode module.
  • Adding a column to each CSV row which gives the parent Index number of the given row.

Environment: R Studio, AWS S3, NLP, EC2, Neural networks, SVM, Decision trees, MLbase, ad-hoc, MAHOUT, NoSQL, Pl/SQL, MDM, MLLib & Git.

Confidential, NY

Data Scientist

Responsibilities:

  • Performed data exploratory, data visualizations, and feature selections using Python and Apache Spark.
  • Scaled Scikit-learn machine learning algorithms using apache spark.
  • Using techniques such as Fast Fourier Transformations, Convolution Neural Networks,and Deep learning.
  • I develop Deep Convolution and Recurrent Neural Networks with TensorFlow and have significant Risk Management & Quantitative Finance experience.
  • Used multiplemachine learning algorithms, including random forest and boosted tree, SVM, SGD, neural network, and deep learning using TensorFlow.
  • Used Python, Convolution Neural Networks (CNN), Deep Belief Networks (DBN), Theano, cafe etc.
  • Applied unsupervised and supervised learning methods in analyzing high-dimensional data. Proficient use of Python Scikit-learn, pandas, and NumPy packages.
  • Performed data modeling operations using Power Bi, Pandas, and SQL.
  • Utilized Python libraries wxPython, NumPy, Twisted and matplotlib
  • Used python libraries like Beautiful Soup and matplotlib.
  • Developed and implemented predictive models of user behavior data on websites, URL categorical, social network analysis, social mining and search content based on large-scale Machine Learning,
  • Wrote scripts in Python using Apache Spark and ElasticSearch engine for use in creating dashboards visualized in Grafana.
  • Lead development for Natural Language Processing (NLP) initiatives with chat-bots and virtual assistants.
  • Developed highly scalable classifiers and tools by leveraging machine learning, Apache spark & deep
  • Converted Pandasdata frame dataset to apache spark data frame.
  • Used multiple machine learning algorithms, including random forest and boosted tree, SVM, SGD, neural network, and deep learning using TensorFlow.
  • Collaborated with engineers to deploy successful models and algorithms into production environments.
  • Collaborated with a diverse team that includes statisticians, Chief Science Officer,and engineers to build data science project pipelines and algorithms to derive valuable insights from current and new datasets.
  • Used PySparkdata frame to read text data,CSV data,Image data from HDFS, S3,andHive.
  • Cleaned input text data using PySpark Machine learning feature exactions API.
  • Created features to train algorithms.
  • Used various algorithms of PySparkMLAPI.
  • Trained model using historical data stored in HDFS and Amazon S3.
  • Used Spark streaming to load the trained model to predict real-time data from Kafka.
  • Stored the result in MongoDB.
  • Utilized various new supervised and unsupervised machine learning algorithms/software to perform NLP tasks and compare performances
  • The web application can pick data which is stored in MongoDB.
  • Used Apache Zeppelin to visualization of Big Data.
  • Fully automated job scheduling, monitoring, and cluster management without human interventionusing airflow.
  • Build apache spark as Web service using a flask. worked with input file formats like an orc, parquet, Json, Avro.
  • Developed highly scalable classifiers and tools by leveraging machine learning, Apache spark & deep learning.
  • Wrote Spark SQL UDFs, Hive UDFs.
  • Optimized Spark coding suing Performance Tuning of apache spark.
  • Optimized machine learning algorithms based on need.
  • Used amazon elastic MapReduce (EMR) to process a hugenumber of datasets using Apache spark and TensorFlow.

Environment: Machine learning, Scikit-learning,Pandas, Spark core, Spark SQL, Spark streaming, Python, airflow,Amazon EMR, ec2, s3,pandas,NumPy,matplotlib, TensorFlow,Kafka,flask,MongoDB,Hive, HDFS,GitHub, REST & airflow.

Confidential - Columbus, OH

Data Scientist

Responsibilities:

  • Collaborated with internal stakeholders to understand business challenges and develop analytical solutions to optimize business processes.
  • Performed analysis using industry leading text mining, data mining, and analytical tools and open source software.
  • Used MATLAB, C/C++ with OpenCV and SVM, Neural Networks, Random Forest as classifiers.
  • Generated graphical reports using python package NumPy and matplotlib.
  • Built various graphs for business decision making using Pythonmatplotlib library.
  • Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning (ANN and CNN), Theano, Keras,andTensorFlow.
  • Built and trained a deep learning network using TensorFlow on the data, and reduced wafer scrap by 15%, by predicting the likelihood of wafer damage. A combination of the z-plot features, image features (pigmentation) and probe features are being used.
  • Experienced in ArtificialNeuralNetworks(ANN) and DeepLearning models using Theano, TensorFlow and Keras packages using Python.
  • Used Natural Language Processing (NLP) to pre-process the data, determine the number of words and topics in the emails and form cluster of words
  • Cleaned input text data using PySparkMachine learning feature exactions API.
  • Used Pandas data frame for exploratory data analysis on sample dataset.
  • Wrote Scikit learn based machine learning algorithms for building POC’s on sample dataset.
  • Analyzedstructured, semi-structured and unstructured dataset using map-reduce and apache spark.
  • Implemented end to end lambda architecture to analyze streaming and batch dataset.
  • Used Apache Mahout’s scalable machine learning algorithms for building recommendation engine, for building classification and regression model.
  • Converted mahout’s machine learning algorithms to RDD based ApacheSparkMLLib to improve performance.
  • Optimizedmachine learning algorithms based on need.
  • Automatic music/news/POI recommendation inside the vehicle by using GPS location, passenger conversation, behavior,and mood. Using machine learning and natural language.
  • Smart state-of-charge monitor for electric vehicles based on RecurrentNeuralNetwork and Seq2Seq forecast.
  • Build multiple features of machine learning using python, Scala,andJava based on need.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Migrated single machine learning machine learning algorithms to Parallel processing algorithms.
  • Developed Hive queries for ad-hoc analysis.
  • Used amazon elastic MapReduce (EMR) to process a hugenumber of datasets using Apache spark and TensorFlow.
  • Lead Data Scientist for development of Machine Learning and NLP engines utilizing health population Data.
  • Analyzed the partitioned and bucketed data and compute various metrics for reporting.
  • Involved in loading data from RDBMS and weblogs into HDFS using Sqoop and Flume
  • Involved in building complex streaming data Pipeline using Kafka and Apache Spark.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Exported the result set from Hive to MySQL using Sqoop after processing the data.
  • Optimized hive queries.
  • Optimized MapReduce and apache spark jobs.
  • Wrote custom input formats in map reduce to analyses image dataset.
  • Wrote Hive UDF’s based on need.

Environment: Hadoop, Map Reduce, Hive, Mahout, Apache Spark, Python, Scikit learn, Pandas, NumPy, Java, Maven, Eclipse, MySQL, Kafka,Sqoop & Flume.

Confidential, NY

Data Scientist

Responsibilities:

  • Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
  • Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for modeling.
  • Designed and automated the process of score cuts that achieve increased close and good rates using advanced R programming.
  • Utilized Convolution Neural Networks to implement a machinelearning image recognition component.
  • Managed datasets using Panda data frames and MySQL, queried MYSQL relational database(RDBMS) queries from python using Python-MySQL connector MySQL dB package to retrieve information.
  • Utilized standard Python modules such as csv, itertools,and pickle for development.
  • Tech stack is Python 2.7/PyCharm/Anaconda/pandas/NumPy/unittest/R/Oracle.
  • Developed large data sets from structured and unstructured data. Perform data mining.
  • Partnered with modelers to develop data frame requirements for projects.
  • Performed Ad-hoc reporting/customer profiling, segmentation using R/Python.
  • Tracked various campaigns, generating customer profiling analysis and data manipulation.
  • Provided python programming, with detailed direction, in the execution of data analysis that contributed to the final project deliverables. Responsible for data mining.
  • Analyzed large datasets to answer business questions by generating reports and outcome.
  • Worked with a team of programmers and data analysts to develop insightful deliverables that support data-driven marketing strategies.
  • Executed SQL queries from R/Python on complex table configurations.
  • Retrieving data from the database through SQL as per business requirements.
  • Create, maintain, modify and optimize SQL Server databases.
  • Manipulation of Data using python Programming.
  • Adhering to best practices for project support and documentation.
  • Understanding the business problem, build the hypothesis and validate the same using the data.
  • Managing the Reporting/Dashboarding for the Keymetrics of the business.
  • Involved in data analysis using different analytic techniques and modeling techniques.

Environment: Python,Oracle,Python, Scikit learn,Pandas, NumPy, SciPy, NLTK,Jupyter notebook,R,and Studio

Confidential

Data Analyst/Data Modeler

Responsibilities:

  • Developed end to end enterprise Applications using Spring MVC, REST and JDBC Template Modules.
  • Written well designed testable, efficient java code.
  • Understanding and analyzing complex issues and addressing challenges arising during the software development process, both conceptually and technically.
  • Implemented best practices of Automated Build, Test and Deployment.
  • Developed design patterns, data structures,and algorithms based on project need.
  • Worked on multiple tools such as Toad, Eclipse, SVN, Apache,andTomcat.
  • Deployed models via APIs into applications or workflows
  • Worked on User Interface technologies like HTML5, CSS/SCSS.
  • Wrote Stored procedure and SQL queries based on project need.
  • Deployed built jar into the application server.
  • Created Automated Unit Tests using Flexible/Open Source Frameworks
  • Developed Multi-threaded and Transaction Handling code (JMS, Database).

Environment: Java, Spring MVC, Hibernate, JMS, HTML5, CSS/SCSS, Junit, Eclipse, Tomcat,and Oracle.

We'd love your feedback!