We provide IT Staff Augmentation Services!

Sr Data Scientist Resume

5.00/5 (Submit Your Rating)

Seattle, WA

PROFESSIONAL SUMMARY:

  • Above 8+ years of experience in Machine Learning, Data mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modelling, Data Visualization.
  • Experience in coding SQL/PL SQL using Procedures, Triggers, and Packages.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
  • Excellent Knowledge of Relational Database Design, Data Warehouse/OLAP concepts, and methodologies.
  • Proficient in managing entire data science project life cycle including Data Acquisition, Data Preparation, Data Manipulation, Feature Engineering, Statistical Modelling, Testing and Validation, Visualization and Reporting.
  • Proficient in Machine learning algorithms like Linear Regression, Ridge, Lasso, Elastic Net Regression, Decision Tree, Random Forests and more advanced algorithms like ANN, CNN, RNN, Ensemble methods like Bagging, Boosting, Stacking.
  • Excellent performance in Model Validation and Model tuning with the Model selection, K - ford cross-validation, Hold-Out Scheme and Hyper parameter tuning by Grid search and HyperOpt.
  • Advanced experience with Python (2.x, 3.x) and its libraries such as NumPy, Pandas, Scikit-learn, XGBoost, LightGBM, PyTorch, Matplotlib, Seaborn.
  • Strong Knowledge in Statistical methodologies such as Hypothesis Testing, ANOVA, Principal Component Analysis (PCA), Monte Carlo Sampling and Time Series Analysis.
  • Experience in multiple software tools and languages to provide data-driven analytical solutions to decision makers or research teams.
  • Good Knowledge in NoSQL databases like MongoDB and Apache Impala.
  • Develop, maintain and teach new tools and methodologies related to data science and high-performance computing.
  • Expertise in Technical proficiency in Designing, Data Modelling Online Application, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
  • Excellent Software Development Life Cycle (SDLC) with good working knowledge of testing methodologies, disciplines, tasks, resources, and scheduling.
  • Experience in designing Data marts, Star Schema, Snowflake Schema for Data Warehouse concepts like ODS, MDM architecture.
  • Good Knowledge in developing ETL programs for supporting Data Extraction, Transformation and loading data using Informatica.
  • Good knowledge of Data Governance/Data classification and reporting tools like Tableau.
  • Passionate about gleaning insightful information from massive data assets and developing a culture of sound, data-driven decision making
  • Developed predictive data models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Social Network Analysis, Cluster Analysis, and NeuralNetworks.
  • Proficient in writing complex SQL queries, stored procedures, normalization, database design, creating indexes, functions, triggers, and sub-queries.
  • Experience in troubleshooting test scripts, SQL queries, ETL jobs, data warehouse/data mart/data store models.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Hands on experience with RStudio for doing data pre-processing and building machine learning algorithms on different datasets.
  • Collaborated with the lead Data Architect to model the Data warehouse in accordance withFSLDM subject areas, 3NF format, Snowflake schema.
  • Worked and extracted data from various database sources like Oracle, SQL Server,and DB2.
  • Implemented machine learning algorithms on large datasets to understand hidden patterns and capture insights.
  • Predictive Modeling Algorithms: Logistic Regression, Linear Regression, Decision Trees, K-Nearest Neighbours, Bootstrap Aggregation (Bagging), Naive Bayes Classifier, Random Forests, Boosting, Support Vector Machines.
  • Flexible with Unix/Linux and Windows Environments, working with Operating Systems like Centos5/6, Ubuntu13/14, Cosmos.

TECHNICAL SKILLS:

Python Libraries: NumPy, SciPy, Pandas, Matplotlib, Seaborn, Bokeh, Plotly, Scikit-Learn, XGBoost, LightGBM, Keras.

Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.

Statistics: Hypothetical Testing, ANOVA, Confidence Intervals, Bayes Law, MLE, Fish Information, Principal Component Analysis (PCA), Cross-Validation, correlation.

Cloud Computing Tools: Microsoft Azure

Databases: Microsoft SQL Server 2008, MySQL 4.x/5.x, Oracle 11g, 12c, DB2, Teradata

NO SQL Databases: Apache Impala, Cassandra, MongoDB

Development Tools: Microsoft SQL Studio, Eclipse, NetBeans, IntelliJ

Database Tools: SQL Server Data Tools, Visual Studio, Spotlight, SQL Server Management Studio, Query Analyzer, Enterprise Manager, JIRA, Profiler

Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, Cognos7.0/6.0.

Data Modelling Tools: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, Oracle Designer, SAP Power designer, Enterprise Architect.

Operating Systems: All versions of UNIX, Windows, LINUX, Macintosh HD, Sun Solaris

PROFESSIONAL EXPERIENCE:

Sr Data Scientist

Confidential, Seattle,WA

  • Worked with business users, business analysts, program managers, project managers, system analysts for reviewing business requirements
  • Collaborate with manager and Insurance agents to create and execute marketing strategy using Salesforce Marketing cloud with the focus on acquiring and retaining customers.
  • Good experience of software development in Python and IDEs: pycharm, Jupyter Notebook.
  • Developed multitude of machine learning algorithms such as customer segmentation, Decision tree to improve Insurance agent’s engagement with customers and developed marketing mix model to effective reach prospective customers and improved ROI by 20%.
  • Used Python to write data into JSON files for testing Student Item level information. Created scripts for data modelling and data import and export
  • Data preparation Includes Data Mapping of unlined data from various formats, Identifying The missing data, Finding the correlations, scaling and removing the Junk data to further process the data for building a predictive model into Apache Spark
  • Data cleaning, pre-processing, imputation, transformation, scaling, feature engineering, data aggregation, merge data frames, descriptive statistics, data visualization, score assessment mapping, reporting on Tableau dashboards
  • Worked on Azure databases, the database server is hosted on Azure and use Microsoft credentials to login to the DB rather than the Windows authentication that is typically used
  • Used Docker to run the local instance of the application to laptop. Run Docker app in the background to test the application and simultaneously query the local instance of the database in order to see which tables are inserted/created. Used commands such as Docker-compose up
  • Built Airflowworkflow to execute PySparkscripts to pull bulk data into database (we process around 150 Million records)
  • Closely working with Document Processing and Scoring Services Expert Team to find the rule sets to build a predictive model and performed visualization for getting in depth knowledge and correlation between variables
  • Conduct exploratory data analysis using Pandas, NumPy, Matplotlib, Scikit-learn, SciPy, and NLTK in Python for developing various machine learning algorithms
  • Perform data extraction and manipulation over large relational datasets using SQL, Python, and other analytical tools.
  • Imported customer account data and transactional using Spark MapReduce, visualized the range and distribution of features using Matplotlib, performed data scrubbing, feature scaling and dealt with missing values.
  • Trained Data with Different Classification Models such as Decision Trees, Random forest, Linear & Logistic Regression, KNN models to classify quartiles & predict scores
  • Used Python libraries and SQL queries/sub queries to create several datasets which produced statistics, tables, figures, charts and graphs.
  • Manage, develop, and design a dashboard control panel, Graphical representations, Pie Charts etc., for Program to see the Student Performance in individual reports using Django, HTML, CSS, JavaScript, and JQuery calls.
  • Follow the process of updating and maintaining JIRA support ticket, Project story and its sub-tasks workflow process and communicating with ticket submitter. Maintained a track of all the loads in JIRA.
  • Upload detailed documents on process flows, ETL flows, explanation of scripts used for validating data files, DataMart tables in confluence for knowledge sharing and team building.

Environment : Python, Django, HTML, CSS, JavaScript, Shell Scripting, SQL, Visual Studios, Integration Services, Power BI, Azure ML, Tableau, Dockers

Machine Learning Engineer/Data

Confidential, Boston,MA

Responsibilities:

  • Developed Neural network based predictive model and risk analytics utilizing large amounts of structured and unstructured data such as industry sentiment, Stock movements and correlations in economic factors. Reduced time required for stock screening by 75%.
  • Used Pandas, Numpy, Seaborn, Scipy, Matplotlib, Sci-kit-learn, and NLTK in Python for developing various machine learning algorithms.
  • Implemented PySpark jobs for batch processing to handle massive volume of data from various data sources - Bloomberg, Government publications, unstructured news articles, etc. and data persisted in HDFS. Configured a CI/CD pipeline in Kubernetes and Docker Swarm.
  • Installed and used PyTorchLearning Framework.
  • Developed Tableau dashboard for analyzing business cycle economic indicators for major economies to identify macro-economic investment opportunities and reduce systemic risk.
  • Develop Python, Pyspark, HIVE scripts to filter/map/aggregate data. Scoop to transfer data to and from Hadoop.
  • Developed a Machine Learning test-bed with 24 different model learning and feature learning algorithms.
  • Performed text mining using NLTK to identify the trends in overall market to find stocks affected by poor sentiment or to exit companies that are becoming inflated due to sentiment with 80% accuracy. Stored news articles in Mongo NoSQL DB.
  • Optimized client’s portfolio using PyTorch reinforcement learning to combine current holdings, market valuation, macroeconomic indicators and stock fundamental indicators and increased portfolio value by 15% over two years.
  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, Mongo DB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
  • Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
  • Developed various Qlik-View Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data.
  • Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
  • Designed both 3NF data models for ODS, OLTP systems and Dimensional DataModels using Star and Snow flake Schemas.
  • Updated Pythonscripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL PLUS and PL/SQL.

Environment: AWS, R, Informatica, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Data Scientist

Confidential - McLean, VA

Responsibilities:

  • Communicated and coordinated with end client for collecting data and performed ETL to define the uniform standard format.
  • Queried and retrieved data from SQL Server database to get the sample dataset.
  • In the pre-processing phase, used Pandas to clean all the missing data, datatype casting and merging or grouping tables for the EDA process.
  • Used PCA and another feature engineering, feature normalization and label encoding Scikit-learn pre-processing techniques to reduce the high dimensional data (>150 features)
  • In data exploration stage used correlation analysis and graphical techniques in Matplotlib and Seaborn to get some insights about the patient admission and discharge data.
  • Worked on development of data warehouse, Data Lake and ETL systems using relational and non-relational tools like SQL, No SQL.
  • Built and analyzed datasets using R, SAS, MATLAB, and Python (in decreasing order of usage).
  • Applied linear regression in Python and SAS to understand the relationship between different attributes of the dataset and causal relationship between them.
  • Performs complex pattern recognition of financial time series data and forecast of returns through the ARMA and ARIMA models and exponential smoothening for multivariate time series data
  • Experimented with predictive models including Logistic Regression, Support Vector Machine (SVC), Random Forest provided by Scikit-learn, XGBoost, LightGBM,andNeural network by Keras to predict showing probability and visiting counts.
  • Creating and supporting a data management workflow from data collection, storage, and analysis to training and validation.
  • Wrangled data, worked on large datasets (acquired data and cleaned the data), analyzed trends by making visualizations using mat plot lib and python.
  • Implemented, tuned and tested the model on AWSLambda with the best performing algorithm and parameters.
  • Collected the feedback after deployment, retrained the model to improve the performance.
  • Designed, developed and maintained daily and monthly summary, trending and benchmark reports in Tableau Desktop.

Environment: SQL Server 2012/2014, AWS EC2, AWS Lambda, AWS S3, AWS EMR, Linux, Python3.x (Scikit-Learn, NumPy, Pandas, Matplotlib), R, Machine Learning algorithms, Tableau.

Data Analyst/Data Modeler

Confidential, Weston, FL.

Responsibilities:

  • A complete study of the in-house requirements for the data warehouse. Analyzed the DW project database requirements from the users in terms of the dimensions they want to measure and the facts for which the dimensions need to be analyzed.
  • Prepared the questioners to users and created various templates for dimensions and facts.
  • Conducting user interviews, gathering requirements, analyzing the requirements using Rational Rose, Requisite pro RUP
  • Designed and developed Use Cases, Activity Diagrams, Sequence Diagrams, OOD (Object-oriented Design) using UML and Visio
  • Created logical data model from the conceptual model and its conversion into the physical database design using Erwin
  • Created a dimensional model based on star schemas and designed them using Erwin.
  • Used Erwin for creating tables using ForwardEngineering
  • Responsible for identifying and documenting business rules and creating detailed Use Cases
  • Defined the naming standards for data warehouse
  • Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
  • Develop the test plan, test conditions and test cases to be used in testing based on business requirements, technical specifications and/or product knowledge.
  • Collected the information about the existing ODS by reverse engineering the ODS.
  • Identified/documented data sources and transformation rules required to populate and maintain data warehouse content.
  • Assisted in designing the overall ETL strategy
  • Generated DDL statements for the creation of new Sybase objects like table, views, indexes, packages and stored procedures.
  • Created Data Stage Server jobs to load data from sequential files, flat files, and MS Access
  • Used Data Stage Manager for importing metadata from the repository, new job categories and creating new data elements
  • Used the Data Stage Designer to design and develop jobs for extracting, cleansing, transforming, integrating, and loading data into different Data Marts.

Environment : Erwin 8.2, PL/SQL Developer, Teradata, TOAD Data Analyst - 2.1, Oracle 11g, QlikView 11.6, Quality Center- 9.2, Informatica Power center 9.1, TD SQL Assistant, Microsoft Visio.

We'd love your feedback!