We provide IT Staff Augmentation Services!

Data Scientist Resume

5.00/5 (Submit Your Rating)

Greensboro, NC

SUMMARY

  • A highly motivated, adaptable and skilled graduate seeking for the Data Scientist position using python, MySQL, which could utilize my skills and experience from my academic and work experiences. Have excellent communication and interpersonal skills and can work with the team at any level.
  • Highly efficient Data Scientist/Data Engineer with over 5+ years of experience in areas including Data Analyst, Statistical Analysis, Machine Learning, Data mining with large data sets of structured and unstructured data in banking, travel services, strong functional knowledge, business processes and latest market trends and manufactory industries.
  • Expertise in complete software development life cycle process that includes Analysis, Design, Development, Testing and Implementation in Hadoop Eco System, Documentum 6.5 sp2suits of products and Java technologies .
  • Extensive working experience with Python including Scikit - learn, Pandas and Numpy .
  • Extensive experience in Hive, Sqoop, Flume, Hue and Oozie .
  • Integration Architect & Data Scientist experience in Analytics, Big Data, BPM, SOA, ETL and Cloud technologies.
  • Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.
  • Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
  • Proficient in SAS/BASE, SAS EG, SAS/SQL, SAS MACRO, SAS/ACCES
  • Experience in end-to-end implementation of data warehouse project based on the SASEG .
  • Experience in extract data from database such as DB2, Oracle, and SME-IM, MAD, M240 and UNIX server using SAS .
  • Experienced the full software life cycle in SDLC, Agile and Scrum methodologies.
  • Skilled in Advanced Regression Modeling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts.
  • Experience in Data Analysis, Data Profiling, Data Integration, Migration, Data governance and Metadata Management, Master Data Management and Configuration Management.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Hands on experience in implementing LDA, Naïve Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems .
  • Proficient in Statistical Modeling and Machine Learning techniques ( Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) i n Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Experienced in Big Data with Hadoop 2, HDFS, Map Reduce, and Spark .
  • Experienced in Spark 2.1, Spark SQL and PySpark.
  • Skilled in using dplyr and pandas in R and python for performing exploratory data analysis.
  • Experience working with data modeling tools like Erwin, Power Designer and ERStudio .
  • Experience in designing star schema, Snow flake schema for Data Warehouse, ODS architecture.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Experience and Technical proficiency in Designing, Data Modeling Online Applications, Solution Lead for Architecting Data Warehouse / Business Intelligence Applications.
  • Good understanding of Teradata SQL Assistant, Teradata Administrator and data load Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, Pivot Tables and OLA Preporting .
  • Having good experience in NLP with Apache, Hadoop and Python .
  • Experience working with data modeling tools like Erwin, Power Designer and ER Studio .
  • Experience in designing star schema, Snow flake schema for Data Ware house, ODS architecture.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau .
  • Excellent work ethics, self-motivated, quick learner and team oriented. Continually provided value added services to the clients through thoughtful experience and excellent communication skills

TECHNICAL SKILLS

Databases: Oracle, MySQL, MSSQL Server, Sybase, Postgre SQL, Mongo DB, NoSQL.

Big data Framework: HDFS, Map Reduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Flume and HBase, Amazon EC2, S3 and RedShift), Spark, Storm, Impala, Talend 6, DMX-h.

Programming Languages: Python, SQL, T-SQL, Matlab, C, C++, HTML, PL/SQL, XML, DHTML, HTTP, Java, Hadoop.

Database Design Tools and Data Modeling: Physical & logical data modeling, Dimensions tables, Kimball.

IDE: Eclipse, IntelliJ, NetBeans, IBM Rational Application Developer (RAD)

Web Servers: JBoss, WebLogic, WebSphere, Tomcat, Jetty, Apache

Reporting Tools: Shiny, Power BI, Tableau, Jasper Reports, BIRT, Crystal Reports.

Statistical Software: SPSS, R, SAS

Tools: and Utilities: Crystal Reports, Power Pivot, DTS, SQL Server Enterprise Manager, SQL Server Profiler, Microsoft Management Console, Visual Source Safe 6.0, Excel Power Pivot, SQL Server, ProClarity, Microsoft Office 2007/10/13, Visual Studio v14, Excel Data Explorer, Tableau 8/10 Import & Export Wizard.Net.

Technologies/Tools: Azure Machine Learning, SPSS, Rattle, Caffe, Tensor flow, Theano, Torch, Keras, NumPy.

Operating Systems: UNIX and Linux, Microsoft Windows 8/7/XP.

PROFESSIONAL EXPERIENCE

Confidential, Greensboro, NC

Data Scientist

Responsibilities:

  • Worked as a Data Modeler/Analyst to generate DataModels using Erwin and developed relational database system.
  • Used R, Python, MATLAB and Spark to develop a variety of models and algorithms for analytic purposes.
  • Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, and Pair RDD's.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.
  • Data Warehouse, DataMigration Application under Extensive hands on experience using ETL tools like Talend, Informatica.
  • Performed data integrity checks, data cleaning, exploratory analysis and feature engineer using R and Python.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
  • A highly immersive Data Science program involving Data Manipulation and Visualization, Web Scraping, Machine Learning, GIT, SQL, Unix Commands, Python programming, No SQL, Mongo DB, Hadoop.
  • Extensively worked on Data ModelingtoolsErwinDataModeler to design the data models.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • A highly immersive DataScience program involving Data Manipulation & Visualization, Web Scraping, MachineLearning, SQL, GIT, Unix Commands, No SQL, Mongo DB.
  • Built analytical data pipelines to port data in and out of Hadoop/HDFS from structured and unstructured sources and designed and implemented system architecture for AmazonEC2 based cloud-hosted solution for client.
  • Performed K-means clustering, Multivariate analysis and Support Vector Machines in PythonandR.
  • Professional Tableau user (Desktop, Online, and Server), Experience with Keras and Tensor Flow.
  • Created map reduce running over HDFS for data mining and analysis using R and Loading & Storage data to PigScript and R for MapReduce operations and created various types of data visualizations using R, and Tableau.
  • Worked on machinelearning on large size data using Spark and MapReduce.
  • Performed dataanalysis by using Hive to retrieve the data from Hadoopcluster, SQL to retrieve data from Oracledatabase.
  • Developed Spark/Scala,Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Designed tables and implemented the naming conventions for Logical and Physical DataModels in Erwin 7.0.
  • Designed logical and physical data models for multiple OLTP and Analytic applications.
  • Wrote simple and advanced SQL queries and scripts to create standard and Adhoc reports for senior managers.
  • Created S3 buckets and managed roles and policies for S3 buckets. Utilized S3 buckets and Glacier for file storage and backup on AWS cloud. Used DynamoDB to store the data for metrics and backend reports.
  • Worked with ElasticBeanstalk for quick deployment of services such as EC2 instances, Load balancer, and databases on the RDS on the AWS cloud environment.
  • Used Java code to connect AWSS3 buckets by using AWSSDK, to access media files related to the application.
  • Created Data QualityScripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
  • Used AmazonSimpleWorkflow service (SWF) for data migration in data centers which automates the process and tracks every step and logs are maintained in S3bucket.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Programmed a utility in Python that used multiple packages (numpy, scipy, pandas)
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, Naive Bayes, KNN.
  • Updated Pythonscripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Used Teradata utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTPSourceSystems to OLAP Target Systems
  • Created SSIS Packages using Pivot Transformation, Execute SQLTask, DataFlow Task, etc. to import data into the data warehouse.
  • Developed and implemented SSIS, SSRS and SSAS application solutions for various business units across the organization.

Environment: R/ Python, SQL, GIT, HDFS, Pig, Hive, Oracle, DB2, Tableau Unix Commands, NoSQL, MongoDB, SSIS, SSRS, SSAS, AWS,S3,EC2,RDS,SWF,Dynamo DB, Glacier, Erwin, Tableau, OBIEE.

Confidential, Kansas City, Missouri

Data Scientist

Responsibilities:

  • Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machinelearning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, and time, Date and Time etc.
  • Application of various machine learning algorithms and statistical Modeling like decision trees, regression models, neuralnetworks, SVM,clustering to identify Volume using Scikit-learn package in python, MATLAB.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources. Used clustering technique K-Means to identify outliers and to classify unlabeled data.
  • Determined customer satisfaction and helped enhance customer experience using NLP
  • Performed data visualization with Tableau and D3.js, and generated dashboards to present the findings
  • Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection.
  • Prototyping and experimenting ML/DL algorithms and integrating into production system for different business needs.
  • Researched the existing client processes and guided the team in aligning with the HIPAA rules and regulations for the systems for all the EDI transaction sets.
  • Analyse traffic patterns by calculating autocorrelation with different time lags.
  • Ensured that the model has low False Positive Rate.
  • Addressed overfitting by implementing the algorithm regularization methods like L2 and L1.
  • Used Principal Component Analysis in feature engineering to analyze high dimensional data.
  • Performed Multinomial Logistic Regression, Random forest, DecisionTree, SVM to classify package is going to deliver on time for the new route.
  • Performed data analysis by using Hive to retrieve the data from Hadoopcluster, SQL to retrieve data from Oracledatabase.
  • Used Python and Spark to implement different machinelearning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting and Neural Network
  • Used MLLib, Spark'sMachinelearning library to build and evaluate different models.
  • Performed DataCleaning, features scaling, features engineering using pandas and NumPy packages in python.
  • Developed MapReduce pipeline for feature extraction using Hive.
  • Created DataQualityScripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.

Environment: R/Python, CDH5, HDFS, Hadoop, Hive, Impala, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, MATLAB, Spark SQL, PySpark.

Confidential - Scottsdale, AZ

Data Scientist

Responsibilities:

  • Work independently and collaboratively throughout the complete analytics project lifecycle including data extraction/preparation, design and implementation of scalable machine learning analysis and solutions, and documentation of results.
  • Performed statistical analysis to determine peak and off-peak time periods for rate-making purposes
  • Conducted analysis of customer data for the purposes of designing rates.
  • Identified root causes of problems and facilitated the implementation of cost-effective solutions with all levels of management.
  • Application of various machine learning algorithms and statistical modeling like decision trees, regression models, clustering, SVM to identify Volume using Scikit-learn package in R.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Hands on experience in implementing NaiveBayes and skilled in RandomForests,DecisionTrees, Linear and Logistic Regression, SVM, Clustering, Principle Component Analysis.
  • Performed K-means clustering, Regression and Decision Trees in R.
  • Partner with technical and non-technical resources across the business to leverage their support and integrate our efforts.
  • Worked on Text Analytics and Naive Bayes creating word clouds and retrieving data from social networking platforms.
  • Pro-actively analyze data to uncover insights that increase business value and impact.
  • Prepared Data Visualization reports for the management using R.
  • Hold a point-of-view on the strengths and limitations of statistical models and analyses in various business contexts and can evaluate and effectively communicate the uncertainty in the results.
  • Application of various machine learning algorithms and statistical modelings like decision trees, regression models, and SVM.
  • Approach analysis in multiple ways to evaluate approaches and compare results.

Environment: Python, R, SQL, and SQL Script. Regression analysis, Decision Tree, Naïve Bayes, SVM, K-Means Clustering and KNN.

Confidential

Data Modeler

Responsibilities:

  • Developed Internet traffic scoring platform for ad networks, advertisers and publishers (rule engine, site scoring, keyword scoring, lift measurement, linkage analysis).
  • Responsible for defining the key identifiers for each mapping/interface.
  • Clients include eBay, Click Forensics, Cars.com, Turn.com, Microsoft, and Looksmart.
  • Designed the architecture for one of the first analytics 3.0. online platforms: all-purpose scoring, with on-demand,SaaS, API services. Currently under implementation.
  • Web crawling and text mining techniques to score referral domains, generate keyword taxonomies, and assess commercial value of bid keywords.
  • Developed new hybrid statistical and data mining technique known as hidden decision trees and hidden forests.
  • Reverse engineering of keyword pricing algorithms in the context of pay-per-click arbitrage.
  • Implementation of Meta data Repository, Maintaining DataQuality, Data Clean up procedures, Transformations, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans
  • Performed dataquality in TalendOpenStudio.
  • Automated bidding for advertiser campaigns based either on keyword or category (run-of-site) bidding.
  • Creation of multimillion bid keyword lists using extensive web crawling. Identification of metrics to measure the quality of each list (yield or coverage, volume, and keyword average financial value).
  • Enterprise Meta data Library with any changes or updates.
  • Document data quality and traceability documents for each source interface.
  • Establish standards of procedures.
  • Generate weekly and monthly asset inventory reports.

Environment: Erwin r7.0, SQL Server 2000/2005, Windows XP/NT/2000, Oracle 8i/9i, MS-DTS, UML, UAT, SQL Loader, OOD, OLTP, PL/SQL, MS Visio, Informatica.

We'd love your feedback!