We provide IT Staff Augmentation Services!

Data Scientist/data Analyst Resume

4.50/5 (Submit Your Rating)

New York, NY

SUMMARY

  • Highly efficient Data Scientist/Data Analyst with 8+ years of experience in Data Analysis, Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modelling, Data Visualization, Web Scraping. Adept in statistical programming languages like R and Python including Big Data technologies like Hadoop, Hive.
  • Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modelling (decision trees, regression models,clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K - fold cross-validation and data visualization.
  • Experience in using various packages in R and python-like ggplot2, caret, dplyr, Rweka, gmodels, twitter, NLP, Reshape2, rjson, plyr, pandas, NumPy, Seaborn, SciPy, Matplotlib, sci-kit-learn, Beautiful Soup.
  • Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.
  • Hands on experience with big data tools like Hadoop, Spark, Hive, Pig, PySpark, Spark SQL,PySpark Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
  • Expertise in Business Intelligence, Data Warehousing, and Reporting tools in Financial, Trading & Telecom industry
  • Good Knowledge in Proof of Concepts (PoC's), gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data munging.
  • Good industry knowledge, analytical &problem-solving skills and ability to work well within a team as well as an individual.
  • Proficient in Power BI, Tableau, Qlik and R-Shiny data visualization tools to analyze and obtain insights into large datasets and to create visually powerful and actionable interactive reports and dashboards.
  • Experience with multiple programming languages, includingPython, Scala, SQL, Shell scripting
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Experience and Technical proficiency in Designing, Data Modelling Online Applications, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
  • Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
  • Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization.
  • Highly skilled in using visualization tools like Tableau, ggplot2, dash, flask for creating dashboards.
  • Worked and extracted data from various database sources like Oracle, SQL Server, DB2, regularly accessing JIRA tool and other internal issue trackers for the Project development.
  • Highly creative, innovative, committed, intellectually curious, business savvy with good communication and interpersonal skills.
  • Extensive experience in Data Visualization including producing tables, graphs, listings using various procedures and tools such as Tableau.

TECHNICAL SKILLS

Languages: HTML5, DHTML, WSDL, CSS3, C, C++, XML, R/R Studio, SAS Enterprise Guide, SAS, R (Caret, Weka, ggplot), Perl, MATLAB, Mathematica, FORTRAN, DTD, Schemas, JSON, Ajax, Java, Scala

NO SQL Databases: Cassandra, HBase, MongoDB, MariaDB

Software/Libraries: Keras, Caffe, TensorFlow, OpenCV, Scikit-learn, Pandas, NumPy, Microsoft Visual Studio, Microsoft Office.

Development Tools: Microsoft SQL Studio, IntelliJ, Eclipse, NetBeans.

Machine Learning Algorithms: Neural Networks, Decision trees, Support Vector Machines, Random forest, Convolutional Neural Networks, Logistic Regression, PCA, K- means, KNN.

Development Methodologies: Agile/Scrum, UML, Design Patterns, Waterfall

Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos 7.0/6.0.

BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies

PROFESSIONAL EXPERIENCE

Confidential, New York, NY

Data Scientist/Data Analyst

Responsibilities:

  • Created an aggregated report daily for the client to make investment decisions and help analyze market trends.
  • Built an internal visualization platform for the clients to view historic data, make comparisons between various issuers, analytics for different bonds and market.
  • The model collects, merges daily data from market providers and applies different cleaning techniques to eliminate bad data points.
  • Data Retrieval using Confidential Data Studio and PostgreSQL.
  • Built the model on Azure platform using Python and Spark for the model development and Dash by plotly for visualizations.
  • Built REST APIs to easily add new analytics or issuers into the model.
  • Automate different workflows, which are initiated manually with Pythonscripts and Unix shell scripting.
  • Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.
  • Create, activate and program in Anaconda environment.
  • Experience in developing complex Informatica maps and strong in Data warehousing concepts along with understanding of standard ETL transformation methodologies.
  • Worked on predictive analytics use-cases using Python language.
  • Clean data and processed third party spending data into manoeuvrable deliverables within specific format with Excel macros and python libraries such as NumPy, SQLAlchemy and matplotlib.
  • Used Pandas as API to put the data as time series and tabular format for manipulation and retrieval of data.
  • Helped with the migration from the old server to Jira database (Matching Fields) with Pythonscripts for transferring and verifying the information.
  • Analyze Format data using Machine Learning algorithm by Python Scikit-Learn.
  • Experience in python, Jupyter, Scientific computing stack (numpy, scipy, pandasand matplotlib).
  • Perform troubleshooting, fixed and deployed many Pythonbugfixes of the two main applications that were a main source of data for both customers and internal customer service team.
  • Write Pythonscripts to parse JSON documents and load the data in database.
  • Generating various capacity planning reports (graphical) using Python packages like Numpy, matplotlib.
  • Used common data science toolkits, such as R, Python, NumPy, Keras, Theano, TensorFlow, and etc.
  • Analyzing various logs that are been generating and predicting/forecasting next occurrence of event with various Pythonlibraries.
  • Created Autosys batch processes to fully automate the model to pick the latest as well as the best bond that fits best for that market.
  • Created a framework using plotly, dash and flask for visualizing the trends and understanding patterns for each market using the history data.
  • Used python APIs for extracting daily data from multiple vendors.
  • Used Spark and SparkSQL for data integrations, manipulations.Worked on a POC for creating a docker image on azure to run the model

Environment: Python, Pyspark, Spark SQL, Plotly, Dash, Flask, Post Man Microsoft Azure, Autosys, Docker

Confidential, Georgia

Data Scientist/Data Analyst

Responsibilities:

  • Implemented Data Exploration to analyze patterns and to select features using Python SciPy.
  • Built Factor Analysis and Cluster Analysis models using Python SciPy to classify customers into different target groups.
  • Designed an A/B experiment for testing the business performance of the new recommendation system.
  • Supported Map Reduce Programs running on the cluster.
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Participated in Data Acquisition with Data Engineer team to extract historical and real-time data by using HadoopMap Reduce and HDFS.
  • Integrate Qlik view with hive for quick visualizations.
  • Communicated and presented default customers profiles along with reports using Python and Tableau, analytical results and strategic implications to senior management for strategic decision making Developed scripts in Python to automate the customer query addressable system using python which decreased the time for solving the query of the customer by 45% * Collaborated with other functional teams across the Risk and Non-Risk groups to use standard methodologies and ensure a positive customer experience throughout the customer journey.
  • Performed Data Enrichment jobs to deal missing value, to normalize data, and to select features.
  • Developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
  • Analyzed the partitioned and bucketed data and compute various metrics for reporting.
  • Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to database.
  • Developed Hive queries for analysis, and exported the result set from Hive to MySQL using Sqoop after processing the data.
  • Created HBase tables to store various data formats of data coming from different portfolios.
  • Worked on improving performance of existing Pig and Hive Queries.
  • Created reports and dashboards, by using D3.js and Tableau 9.x, to explain and communicate data insights, significant features, models scores and performance of new recommendation system to both technical and business teams.
  • Fully automated the reporting process usingShell scripting, Macros & pivot tables
  • Data Profiling using Confidential Data Studio and PostgreSQL.
  • Utilize SQL, Excel and several Marketing/Web Analytics tools (Google Analytics, Bing Ads, AdWords, AdSense, Criteo, Smartly, Survey Monkey, and Mailchimp) in order to complete business & marketing analysis and assessment.
  • Used Git 2.x for version control with Data Engineer team and Data Scientists colleagues.
  • Used Agile methodology and SCRUM process for project developing.
  • KT with the client to understand their various Data Management systems and understanding the data.
  • Experience in building models with deep learning frameworks like TensorFlow, PyTorch, and Keras.
  • Creating meta-data and data dictionary for the future data use/ data refresh of the same client.
  • Structuring the Data Marts to store and organize the customer's data.
  • Running SQL scripts, creating indexes, stored procedures for data analysis
  • Data Lineage methodology for data mapping and maintaining data quality.
  • Prepared Scripts in Python and Shell for Automation of administration tasks.
  • Maintained PL/SQL objects like packages, triggers, procedures etc.
  • Mapping flow of trade cycle data from source to target and documenting the same.
  • Performing QA on the data extracted, transformed and exported to excel.
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Extracted data from HDFS and prepared data for exploratory analysis using datamunging
  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest.
  • A highly immersive Data Science program involving DataManipulation & Visualization, Web Scraping,Machine Learning, Python programming, SQL, GIT, UnixCommands, NoSQL, MongoDB, Hadoop.
  • Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.

Environment: ER Studio 9.7, MDM, GIT, Unix, Python (SciPy, NumPy, Pandas, StatsModel, Plotly), MySQL, Excel, Google Cloud Platform, Tableau 9.x, D3.js, SVM, Random Forests, Naïve Bayes Classifier, A/B experiment, Git 2.x, Agile/SCRUM., MLLib, SAS, regression, logistic regression, Hadoop, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML, Map Reduce.

Confidential - Princeton, NJ

Data Scientist

Responsibilities:

  • Applied Lean Six Sigma process improvement in plant and developed Capacity Calculation systems using purchase order tracking system and improvement inbound efficiency by 23.56%.
  • Worked with Machine learning algorithms like Linear Regressions (linear, logistic etc.) SVMs, Decision trees for classification of groups and analyzing most significant variables such as FTE, Waiting times of purchase orders and Capacities available and applied process improvement techniques.
  • And calculated Process Cycle efficiency of 33.2% and identified value added and non-value added activities
  • And utilized SAS for developing Pareto Chart for identifying highly impacting categories in modules to find the work force distribution and created various data visualization charts.
  • Performed univariate, bivariate and multivariate analysis of approx. 4890 tuples using bar charts, box plots and histograms.
  • Participated in features engineering such as feature creating, feature scaling and One-Hot encoding with Scikit-learn.
  • Converted raw data to processed data by merging, finding outliers, errors, trends, missing values and distributions in the data.
  • Generated detailed report after validating the graphs using R, and adjusting the variables to fit the model.
  • Writing SQL Scripts to load the relevant Data in Qlik View Applications
  • Worked on Clustering and factor analysis for classification of data using machine learning algorithms.
  • Developed Descriptive statistics and inferential statistics for Logistics optimization, Average hours per job, Value throughput data to at 95% confidence interval.
  • Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration.
  • Created SQL tables with referential integrity and developed advanced queries using stored procedures and functions using SQLserver management studio.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, &KNN for data analysis.
  • Used packages like dplyr, tidyr& ggplot2 in R Studio for data visualization and generated scatter plot and high low graph to identify relation between different variables.
  • Worked on Business forecasting, segmentation analysis and Data mining and prepared management reports defining the problem; documenting the analysis and recommending courses of action to determine the best outcomes.
  • Worked on various Statistical models like DOE, hypothesis testing, Survey testing and queuing theory.
  • Experience with risk analysis, root cause analysis, cluster analysis, correlation and optimization and K-means algorithm for clustering data into groups.
  • Coordinate with data scientists and senior technical staff to identify client's needs and document assumptions.

Environment: SQL Server 2012, Jupyter, R 3.1.2, Python, MATLAB, SSRS, SSIS, SSAS, MongoDB, HBase, HDFS, Hive, Pig, Microsoft office, SQL Server Management Studio, Business Intelligence Development Studio, MS Access.

Confidential, WI

Data Scientist/Data Analyst

Responsibilities:

  • Aggregate all available information about the customer. The data that is obtained for predicting the churn is classified in the following categories.
  • Demographic data, such as age, gender, education, marital status, employment status, income, home ownership status, and retirement plan.
  • Policy-related data, such as insurance lines, number of policies in the household, household tenure, premium, disposable income, and insured cars.
  • Claims, such as claim settlement duration, number of claims that are filed and denied.
  • Complaints, such as number of open and closed complaints.
  • Survey sentiment data. Sentiment scores from past surveys are captured in the latest and average note attitude score fields. The note attitude score is derived from customer negative feedback only. If the note attitude is zero, the customer is more satisfied while as the number increases, satisfaction level decreases.
  • Responsible for building data analysis infrastructure to collect, analyze, and visualize data.
  • Experienced Data Analyst with solid understanding of Data Mapping, Data warehousing (OLTP, OLAP), Data Mining, Data Governance and Data management services with Quality Assurance.
  • Data elements validation using exploratory data analysis (univariate, bivariate, multivariate analysis).
  • Missing value treatment, outlier capping and anomalies treatment using statistical methods.
  • Variable selection was done by making use of R-square and VIF values.
  • Deployed Machine Learning (Logistic Regression and PCA) to predict customer churn.

Environment: Statistical tools: R - 3.3.0, Python 3.0, SQL Server, MS-Excel, MS-PowerPoint.

Confidential

Data Analyst

Responsibilities:

  • Developed and implemented predictive models using Natural Language Processing Techniques and machine learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering, KNN, PCA and regularization for data analysis.
  • Designed and developed Natural Language Processing models for sentiment analysis.
  • Applied clustering algorithms i.e. Hierarchical, K-means with help of Scikit and Scipy.
  • Developed visualizations and dashboards using ggplot, Tableau.
  • Worked on development of data warehouse, Data Lake and ETLsystems using relational and non-relationaltools like SQL, No SQL.
  • Built and analyzed datasets using R, SAS, Matlab and Python (in decreasing order of usage).
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Powerbase and Smart View.
  • Implemented Agile Methodology for building an internal application.
  • Good knowledge of HadoopArchitecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and Map Reduce concepts.
  • As Architect delivered various complex OLAPdatabases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages (scipy, numpy, pandas).
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Used Teradata15 utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAPTarget Systems
  • Data Analysis in Teradata and Google Cloud Platform ( GCP) with BigQuery.
  • Maintenance in the testing team for System testing/Integration/UAT.
  • Involved in preparation & design of technical documents like Bus Matrix Document, PPDMModel, and LDM & PDM.
  • Understanding the client business problems and analyzing the data by using appropriate Statistical models to generate insights.

Environment: R, Erwin, Tableau, MDM, QlikView, ML Lib, PL/SQL, HDFS, Teradata, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

Confidential

Python Developer

Responsibilities:

  • Develop, test, deploy and maintain the application.
  • Worked with UI team to maintain and monitor the application using JAVA and Python.
  • Developed web-based applications using Python, Django, PHP, Flask, Webapp2
  • Developed Angular Controllers, Services, Filters and HTML templates using Angular Directive.
  • Rewrite existing Java application in Python module to deliver certain format of data.
  • Wrote Pythonscripts to parsecsv files and load the data in database.
  • Generated property list for every application dynamically using Python.
  • Developed automated testing framework for command-line based tests on Linux using Objected Oriented Perl and for selenium-based tests using Python.
  • Developed Test Cases using Easy Mock and used POSTMAN to test REST Calls.
  • Worked in agile environment and developed CI/CD(Continuous Integration and Continuous Deployment) pipeline through Jira, GitHub, Team City, Pypi, Docker Hub, Supported Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
  • Expert in writing Python modules to extract/load asset data from the MySQL source database.
  • Wrote and executed various MySQL database queries fromPython- MySQL connector and MySQL database package.
  • Using Python/Django, HTML, CSS, Sass,JavaScript, SQL, and Postgre SQL, maintaining and adding software

Environment: Python,Django, Angular.js, XML, CSS, HTML, DHTML, JavaScript, SQL, PostgreSQL, Jira, REST API, Mongo DB.

We'd love your feedback!