We provide IT Staff Augmentation Services!

Data Scientist/machine Learning Resume

5.00/5 (Submit Your Rating)

Atlanta, GA

SUMMARY

  • Having 8+ years of experience in NLP/NLU/NLG/AI/machinelearning/Computer vision/Probabilistic Graphical Models/Inferential statistics/Graph Theory/System Design.
  • 4 years in progress, Approved Patent in Statistical Modeling and traffic pattern analysis.
  • Part of R&D team to build new analytics POC's usingApache Spark, Scala, Randmachine learning.
  • I can help you build futuristic AI bots to assist/replace human in various business domains.
  • Proficientto understandofSparkcore, Spark SQL, Spark Streaming and Spark MLlib.
  • Expert level understandingin Application Design, Development and testing in Mainframe environments usingPL/1,COBOL,EGL,Easytrieve,DB2, JCL, QC& VAG.
  • Regression analysis,Statistical testanalysis,Reportand Dashboardgeneration, Data management.
  • Git,Java,MySQL,MongoDB,Neo4J,AngularJS,SPSS,Tableau.
  • Python,Numpy, Scikit - Learn,genism,NLTK,Tensorflow,keras.
  • Experience inMachine Learning, Statistics, Regression- Linear, Logistic, Poisson, Binomial.
  • Single handed built a model to replace the job of doer in the pension sector. This model (Patent under progress) generates experience from structured data and learns through a bootstrapping mechanism new experience from unseen data.
  • Single handed Built and designed a whole Information extraction bot POC for KYC extraction. This bot is using adaptivelearningtechniques and uses some custom supervised classifiers for entity and relation extraction.
  • Hands on experience in Conversational voice Assistant using Google Dialog flow.
  • Experience building solutions for enterprises, context-awareness, pervasive computing, and/or application of machine learning
  • Research and development ofmachinelearningpipeline design for Optical Character Recognition (Handwritten), anomaly detection system using multi variate Gaussian model. Healthcare diagnostics systems using PGM (BN).
  • Comfortable presenting to senior management, business stakeholders, and external partners.
  • KBC, Chatbots, Adaptive SupervisedLearning(deterministic classification), UnsupervisedLearningmethods for IE, ANN and DeepNN for NLP and Chatbots, Probabilistic models for NLG and inferences, Decision science.
  • Hands on experience in data mining algorithms and approach.
  • Good at algorithm and design techniques.
  • Fluency in modern programming languages such as Java, C# or C++.
  • Comfortable presenting to senior management, business stakeholders, and external partners.
  • Architecture and Design of reusable server components for the web as well as Mobile applications.
  • Strong programming expertise Python and strong in Database SQL.
  • Solid coding and engineering skills in Machine Learning
  • Proficient in Python, experience building, and product ionizing end-to-end systems
  • Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning
  • Experience with file systems, server architectures, databases, SQL, and data movement (ETL).
  • Experience with Hadoop systems
  • Experience with Supervised or Unsupervised machine learning algorithms.

TECHNICAL SKILLS

Languages: Python, R, Scala and Java, SQL, PL/SQL, ASP, Visual Basic, XML, Python, SQL, T-SQL, SQL Server, C, C++, JAVA, HTML, UNIX shell scripting, PERL, R

Machine learning library: Spark ML, Spark Mllib, Scikit-Learn. NLTK & Stanford NLP

Deep learning framework: Tensorflow, Google Dialogflow

Big Data Frameworks: Apache Spark, Apache Hadoop, Kafka, Mongo DB, Cassandra.

Machine learning: Linear regression, Logistic Regression, Naive Bayes, SVM, Decision Trees, Random Forest, Boosting, Kmeans, Bagging etc

Big data Distribution: Cloudera & Amazon EMRCloud

Web Technologies: Flask, django and spring MVC

Front End Technologies: JSP, HTML5, Ajax, JQuery and XML Servers

Web server: Apache2, Nginx Web Sphere and Tomcat

Visualization Tool: Apache Zeppelin, Matplotlib and Tableau.

Databases: Oracle 11g/12c, Mysql and Postgress, MS Access, SQL Server 2012/2014, Sybase and DB2, Teradata14/15, Hive

No SQL: MongoDB and Cassandra

Operating Systems: Linux and windows

Scheduling Tools: Airflow & oozie.

PROFESSIONAL EXPERIENCE

Confidential, Atlanta, GA

Data Scientist/Machine Learning

Responsibilities:

  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
  • Installed and used Caffe Deep Learning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Developing Voice Bot using AI (IVR ), improving the interaction between Human and the Virtual Assistant .
  • Development and Deployment using Google Dialogflow Enterprise.
  • Worked asDataArchitectsand ITArchitectsto understand the movement ofdataand its storage and ER Studio 9.7
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Power BI and Smart View.
  • Implemented Agile Methodology for building an internal application.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Rapid model creation in Python using pandas, numpy, sklearn, and plot.ly for data visualization. These models are then implemented in SAS where they are interfaced with MSSQL databases and scheduled to update on a timely basis.
  • Data analysis using regressions, data cleaning, excel v-look up, histograms and TOAD client and data representation of the analysis and suggested solutions for investors
  • Attained good knowledge in Hadoop Data Lake Implementation and HADOOP Architecture for client business data management.
  • Extracted data from HDFS and prepared data for exploratory analysis using data munging.

Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2,, MLLib, SAS, regression, logistic regression, Hadoop, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML, MapReduce, Google Dialog Flow

Confidential, Raritan, NJ

Data Scientist

Responsibilities:

  • Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
  • Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis.
  • Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
  • Perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined effects.
  • Worked withDatagovernance,Dataquality,datalineage,Dataarchitectto design various models and processes.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with BigData/Hadoop.
  • Designeddatamodels anddataflow diagrams using Erwin and MS Visio.
  • Research on improving IVR used internally in J&J.
  • Developing IVR For clinics so that the callers can receive anonymous access to test results.
  • Performed data cleaning and imputation of missing values using R.
  • Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce
  • Take up ad-hoc requests based on different departments and locations.
  • Determined regression model predictors using Correlation matrix for Factor analysis in R
  • Built Regression model to understand order fulfillment time lag issue using Scikit-learn in Python
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc
  • Empowered decision makers with data analysis dashboards using Tableau and Power BI
  • Interface with other technology teams to extract, transform, and load (ETL) data from a wide variety of data sources
  • Own the functional and non-functional scaling of software systems in your ownership area.
  • Provides input and recommendations on technical issues to BIEngineers, Business & DataAnalysts and Data Scientists.
  • As an Architect implemented MDM hub to provide clean, consistent data for a SOA implementation.
  • Developed, Implemented & Maintained the Conceptual, Logical & PhysicalDataModels using Erwin for Forward/Reverse Engineered Databases.
  • Established Data architecture strategy, best practices, standards, and roadmaps.
  • Lead the development and presentation of a data analytics data-hub prototype with the help of the other members of the emerging solutions team.

Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, QlikView, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

Confidential -Plano, TX

Data Scientist/Machine Learning

Responsibilities:

  • Involved in defining the source to target data mappings, business rules, and data definitions.
  • Performing data profiling on various source systems that are required for transferring data to ECH using
  • Defining the list codes and code conversions between the source systems and the data mart using Reference Data Management (RDM).
  • Utilizing Informatica toolset (Informatica Data Explorer, and Informatica Data Quality) to analyze legacy data for Data Profiling.
  • Worked on DTS Packages, DTS Import/Export for transferring data between SQL Server
  • Involved in upgrading DTS packages to SSIS packages (ETL).
  • Performing an end to end Informatica ETL Testing for these custom tables by writing complex SQL Queries on the source database and comparing the results against the target database.
  • Using HP Quality Center v 11 for defect tracking of issues.
  • Expertise in applying data mining techniques and optimization techniques in B2B and B2C industries and proficient in Machine Learning, Data/Text Mining, Statistical Analysis and Predictive Modeling.
  • Extracting the source data from Oracle tables, MS SQL Server, sequential files and excel sheets.
  • Developing and maintaining Data Dictionary to create metadata reports for technical and business purpose.
  • Predictive modeling using state-of-the-art methods
  • Build and maintain dashboard and reporting based on the statistical models to identify and track key metrics and risk indicators.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
  • Parse and manipulate raw, complex data streams to prepare for loading into an analytical tool.
  • Migrating Informatica mappings from SQL Server to Netezza Foster culture of continuous engineering improvement through mentoring, feedback, and metrics
  • Broad knowledge of programming, and scripting (especially in R / Java / Python)
  • Implemented Event Task for execute Application Automatically.
  • Involved in developing Patches & Updates Module.
  • Proven experience building sustainable and trustful relationships with senior leaders.

Environment: Erwin 8, Teradata 13, SQL Server 2008, Oracle 9i, SQL*Loader, PL/SQL, ODS, OLAP, OLTP, SSAS, Informatica Power Center 8.1.

Confidential -Fort Lauderdale, FL

BI Developer/Data Analyst

Responsibilities:

  • Involved in detail designing of data marts by using Star Schema and plan data marts involving shared dimensions.
  • Conducted one-to-one sessions with business users to gather data for Data Warehouse requirements.
  • Part of team analyzing database requirements in detail with the project stakeholders through Joint Requirements Development (JRD) sessions.
  • Developed an Object modeling in UML for Conceptual Data Model using Enterprise Architect.
  • Developed logical and Physical data models using Erwin to design OLTP system for different applications.
  • Facilitated transition of logical data models into the physical database design and recommended technical approaches for good data management practices.
  • Worked with DBA group to create Best-Fit Physical Data Model with DDL from the Logical Data Model using Forward engineering.
  • Worked with the ETL team to document the transformation rules for data migration from OLTP to Warehouse environment for reporting purposes.
  • Extensive system study, design, development and testing were carried out in the Oracle environment to meet the customer requirements.
  • Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW)
  • Used Teradata utilities such as Fast Export, Multi LOAD for handling various tasks.
  • Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata.
  • Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables
  • Created entity process association matrices using Zachman Framework, functional decomposition diagrams and data flow diagrams from business requirements documents.
  • Used Model Manager Option in Erwin to synchronize the data models in Model Mart approach.
  • Gather various reporting requirements from Business Analysts.
  • Worked on enhancements to the Data Warehouse model using Erwin as per the business reporting requirements.
  • Hands on experience in implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
  • Performed K-means clustering, Multivariate analysis, and Support Vector Machines in R.
  • Written complex Hive and SQL queries for data analysis to meet business requirements.
  • Written complex SQL queries for implementing business requirements
  • Reverse Engineering the reports and identified Data Elements (in the source system) . Dimensions, Facts and Measures required for reports.
  • Developed data mapping documents between Legacy, Production, and User Interface Systems.
  • Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
  • Generated ad-hoc repots using Crystal Reports 9and SQL Server Reporting Services (SSRS).

Environment: Erwin r9.5, DB2, Teradata, SQL-Server2008, Informatica 9.1, Enterprise Architect, Power Designer, MS SSAS, Crystal Reports, SSRS, ER Studio, Lotus Notes, Windows XP, MS Excel, word and Access.

Confidential

Data Analyst/Data Modeler

Responsibilities:

  • Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
  • Involved in defining the source to target data mappings, business rules, data definitions.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Responsible for defining the functional requirement documents for each source to target interface.
  • Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Involved in defining the business/transformation rules applied for sales and service data.
  • Define the list codes and code conversions between the source systems and the data mart.
  • Worked with internal architects and, assisting in the development of current and target state data architectures.
  • Coordinate with the business users in providing appropriate, effective and efficient way to design the new reporting needs based on the user with the existing functionality.
  • Remain knowledgeable in all areas of business operations in order to identify systems needs and requirements.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Implementation of Metadata Repository, Maintaining Data Quality, Data Cleanup procedures, Transformations, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans
  • Performed data quality in TalendOpenStudio.
  • Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
  • Enterprise Metadata Library with any changes or updates.
  • Generate weekly and monthly asset inventory reports.

Environment: Erwin r7.0, SQL Server 2000/2005, Windows XP/NT/2000, Oracle 8i/9i, MS-DTS, UML, UAT, SQL Loader, OOD, OLTP, PL/SQL, MS Visio, Informatica.

Confidential

Data Analyst

Responsibilities:

  • Analyze business information requirements and model class diagrams and/or conceptual domain models.
  • Gather & Review Customer Information Requirements for OLAP and building the data mart.
  • Performed document analysis involving creation of Use Cases and Use Case narrations using Microsoft Visio, in order to present the efficiency of the gathered requirements.
  • Calculated and analyzed claims data for provider incentive and supplemental benefit analysis using Microsoft Access and Oracle SQL.
  • Analyzed business process workflows and assisted in the development of ETL procedures for mapping data from source to target systems.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Terra-data.
  • Responsible for defining the key identifiers for each mapping/interface
  • Responsible for defining the functional requirement documents for each source to target interface.
  • Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
  • Enterprise Metadata Library with any changes or updates.
  • Document data quality and traceability documents for each source interface.
  • Establish standards of procedures.
  • Generate weekly and monthly asset inventory reports.
  • Managed the project requirements, documents and use cases by IBM Rational RequisitePro.
  • Assisted in building an Integrated LogicalDataDesign, propose physical database design for building the data mart.
  • Document all data mapping and transformation processes in the Functional Design documents based on the business requirements.

Environment: SQL Server 2008R2/2005 Enterprise, SSRS, SSIS, Crystal Reports, Windows Enterprise Server 2000, DTS, SQL Profiler, and Query Analyzer.

We'd love your feedback!