We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

Parsippany, NJ

SUMMARY

  • Having 6+ years of IT industry experience encompassing in Machine Learning, Datamining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Skilled in performing Data Parsing, Data Manipulation and Data Preparation with methods including describe Data contents, compute descriptive statistics of Data, regex, split and combine, remap, merge, subset, reindex, melt and reshape
  • Experience in using various packages in Python and R like ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitteR, NLP, Reshape2, rjson, dplyr, pandas, NumPy, Seaborn, SciPy, matplotlib, Scikit - learn, Beautiful Soup, Rpy2
  • Extensive experience in Text Analytics, developing different StatisticalMachine Learning, Datamining solutions to various business problems and generating data visualizations using R, Python, and Tableau.
  • Experienced in advanced statistical analysis and predictive modeling in the structured and unstructured data environment.
  • Extensive experience in Data Visualization including producing tables, graphs, listings using various procedures and tools such as Tableau
  • Hands on experience of Data Science libraries in Python such as Pandas, NumPy, SciPy, Scikit-learn, Matplotlib, Seaborn, Beautiful Soup, Orange, Rpy2, Lib SVM, neurolab, NLTK.
  • Good Understanding of working on Artificial Neural Networks and Deep Learning models using Theano and Tensor Flow packages using in Python.
  • Experienced in Machine Learning Classification Algorithms like Logistic Regression, K-NN, SVM, Kernel SVM, Naive Bayes, Decision Tree & Random Forest classification.
  • Hands on experience on R packages and libraries like ggplot2, Shiny, h2o, dplyr, reshape2, plotly, RMarkdown, ElmStatLearn, CA Tools etc.
  • Worked in large scale database environment like Hadoop and MapReduce, with working mechanism of Hadoop clusters, nodes and Hadoop Distributed File System (HDFS).
  • Strong expertise in ETL, Data warehousing, Operational Data Store (ODS), Data Marts, OLAP and OLTP technologies.
  • Expertise in applying data mining techniques and optimization techniques in B2B and B2C industries and proficient in Machine Learning, Data/Text Mining, Statistical Analysis and Predictive Modeling.
  • Analytical, performance-focused, and detail-oriented professional, offering in-depth knowledge of data analysis and statistics; utilized complex SQL queries for data manipulation.
  • Equipped with experience in utilizing statistical techniques which include Correlation, Hypotheses modeling, Inferential Statistics as well as data mining and modeling techniques using Linear and Logistic regression, clustering, decision trees, and k-mean clustering.
  • Expertise in building Supervised and Unsupervised Machine Learning experiments using Microsoft Azure utilizing multiple algorithms to perform detailed predictive analytics and building Web Services models for all types of data: continuous, nominal, and ordinal.
  • Expertise in using Linear & Logistic Regression and Classification Modeling, Decision-trees, Principal Component Analysis (PCA), Cluster and Segmentation analyses, and have authored and co-authored several scholarly articles applying these techniques.
  • Mitigated risk factors through careful analysis of financial and statistical data. Transformed and processed raw data for further analysis, visualization, and modeling.
  • Proficient in research of current process and emerging technologies which need analytic models, data inputs,and output, analytic metrics and user interface needs.

TECHNICAL SKILLS

Languages: R/R Studio, SAS Enterprise Guide, SAS, R (Caret, Weka, ggplot), Perl, MATLAB.

NO SQL Databases: Cassandra, HBase, MongoDB, Maria DB

Software/Libraries: Keras, Caffe, TensorFlow, OpenCV, Scikit-learn, Pandas, NumPy, Microsoft Visual Studio, Microsoft Office.

Development Tools: Microsoft SQL Studio, IntelliJ, Eclipse, NetBeans.

Machine Learning Algorithms: Neural Networks, Decision trees, Support Vector Machines, Random forest, Convolutional Neural Networks, Logistic Regression, PCA, K- means, KNN.

Development Methodologies: Agile/Scrum, UML, Design Patterns, Waterfall

Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos 7.0/6.0.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, Kimball &Inmon Methodologies

PROFESSIONAL EXPERIENCE

Confidential, Parsippany, NJ

Data Scientist

Responsibilities:

  • Confidential designs, manufactures and markets innovative, high-quality, high-performance motorized products for recreation and utility use to the international market through global distribution channels.
  • Responsible for modeling complex business problems, discovering business insights and identifying opportunities through the use of statistical, algorithmic, data mining, and visualization techniques.
  • Applied advanced analytics skills, with proficiency at integrating and preparing large, varied datasets, architecting specialized database and computing environments, and communicating results.Used R, Python, MATLAB,and Spark to develop a variety of models and algorithms for analytic purposes.
  • Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, and Pair RDD's.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS,and PL/SQL Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snowflake Schemas.
  • Data Warehouse, Data Migration Application under Extensive hands-on experience using ETL tools like Talend, Informatica.
  • A highly immersive Data Science program involving Data Manipulation and Visualization, Web Scraping, Machine Learning, GIT, SQL, Unix Commands, Python programming, No SQL, Mongo DB, Hadoop.
  • Built analytical data pipelines to port data in and out of Hadoop/HDFS from structured and unstructured sources and designed and implemented system architecture for AmazonEC2 based cloud-hosted solution for the client.
  • Created MapReduce running over HDFS for data mining and analysis using R and Loading & Storage data to Pig Script and R for MapReduce operations and created various types of data visualizations using R, and Tableau.
  • Worked on machine learning on large size data using Spark and MapReduce.
  • Performed data analysis by using Hive to retrieve the data from the Hadoop cluster, SQL to retrieve data from Oracle database.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Collaborate with unit managers, end users, development staff, and other stakeholders to integrate data mining results with existing systems.
  • Designed tables and implemented the naming conventions for Logical and Physical Data Models in Erwin 7.0.
  • Provide expertise and recommendations for physical database design, architecture, testing, performance tuning and implementation.
  • Wrote simple and advanced SQL queries and scripts to create standard and Adhoc reports for senior managers.
  • Collaborated the data mapping document from a source to target and the data quality assessments for the source data.
  • Created S3buckets and managed roles and policies for S3 buckets. Utilized S3 buckets and Glacier for file storage and backup on AWS cloud. Used Dynamo DB to store the data for metrics and backend reports.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations usingPython and Tableau.
  • Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
  • Designed and developed user interfaces and customization of Reports using Tableau and OBIEE and designed cubes for data visualization, mobile/web presentation with parameterization and cascading.
  • Used Teradata utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Developed and implemented SSIS, SSRS and SSAS application solutions for various business units across the organization.

Environment: Python, SQL, GIT, HDFS, Pig, Hive, Oracle, DB2, Tableau Unix Commands, NoSQL, MongoDB, SSIS, SSRS, SSAS, AWS,S3,EC2,RDS,SWF,Dynamo DB, Glacier, Erwin, Tableau, OBIEE.

Confidential, Santa Clara,CA

Data Scientist

Responsibilities:

  • Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, &KNN for data analysis.
  • Revitalized the use, and promoted the importance of data derived insights to support strategic planning and tactical decisions. Created and socialized the use of key metrics/KPIs.
  • Developed strategies to access, integrate, and analyze data from disparate sources/platforms for transactions involving credit card, debit card, check, cash, and other payment methods.
  • Directed team of developers to integrate data from three platforms, formerly three different companies, into a single database as a preliminary step toward developing a dedicated EDW.
  • Identified customers and modeled potential savings (~$500K in two months) associated with a recommended switch of select clients to a different processor prior to the fiscal year's peak processing month.
  • Developed MapReduce/SparkPython modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Machine Learning Projects based on Python, SQL, Spark and SAS advanced programming. Performed data exploratory, data visualizations, and feature selections
  • Applications of machine learning algorithms, including random forest and boosted tree, SVM, SGD, neural network, and deep learning using CNTK and TensorFlow.
  • Extensively used open source tools - RStudio(R) and Spyder (Python) for statistical analysis and building the machine learning.
  • Performed data analysis by using Hive to retrieve the data from the Hadoop cluster, SQL to retrieve data from Oracle database and used ETL for data transformation.
  • Performed data cleaning and feature selection using MLLib package in PySpark and working with deep learning frameworks such as Caffe, Neon.
  • Developed Spark/Scala, R, and Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Tracking operations using sensors until certain criteria is met using Airflow technology.
  • Responsible for different Data mapping activities from Source systems to Teradata using utilities like TPump, FEXP, BTEQ, MLOAD, and FLOAD.
  • Developed PL/SQL procedures and functions to automate billing operations, customer barring and number generations
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction.
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.

Environment: Python, MDM, MLLib, PL/SQL, Tableau, SQL Server, MLLib, Scala NLP, SSMS, ERP, CRM, Netezza, Cassandra, SQL, PL/SQL, SSRS, Informatica, Spark, Azure, R Studio, MongoDB, JAVA, HIVE.

Confidential - Chicago, IL

Data Modeler/Data Analyst

Responsibilities:

  • Involved in requirement gathering, data analysis and Interacted with Business users to understand the reporting requirements, analyzing BI needs for the user community.
  • Created Entity/Relationship Diagrams, grouped and created the tables, validated the data, identified PKs for lookup tables.
  • Involved in modeling (Star Schema methodologies) in building and designing the logical data model into Dimensional Models.
  • Developed ETL routines using SSIS packages, to plan an effective package development process and design the control flow within the packages.
  • Worked with Big Data Architects for setting up Big Data Platform in the organization and on Hive platform to create Hive Data Models
  • Developed customized training documentation based on each client's technical needs and built a curriculum to help each client learn both basic and advanced techniques for using PostgreSQL.
  • Took an active role in the design, architecture,and development of user interface objects in QlikView applications. Connected to various data sources like SQL Server, Oracle,and flat files.
  • Presented the Dashboard to Business users and cross-functional teams, define KPIs (Key Performance Indicators), and identify data sources.
  • Deliver end to end mapping from source (Guidewire application) to target (CDW) and legacy systems coverages to Landing Zone and to Guidewire Reporting Pack.
  • Performed the Data Accuracy, Data Analysis, Data Quality checks before and after loading the data.
  • Resolved the data type inconsistencies between the source systems and the target system using the Mapping Documents.
  • Generated tableau dashboards for Claims with forecast and reference lines.
  • Designed, developed, implemented and maintained Informatica Power center and Informatica Data Quality (IDQ) application for matching and merging process.

Environment: Erwin8.2, Oracle 11g, OBIEE, Crystal Reports, Toad, Sybase Power Designer, Datahub, MS Visio, DB2, QlikView 11.6, Informatica .

We'd love your feedback!