Data Analyst/Machine Learning Engineer Resume Round Lake, IL - Hire IT People

SUMMARY

6 years of experience as a Data Analyst, Data Scientist in e - commerce, healthcare and IT organizations.
Experience in business intelligence and data analysis, Reporting, Data Preparation, Data warehousing, Visualizations, software and predictive model design and development using Tableau, RDBMS tools, IBM Cognos, Power BI, Excel, python.
Solid Knowledge in SDLC, Water Fall Model, Spiral Model and Agile Scrum Methodologies from the organizations. Knowledge in SDLC, Water Fall Model, Spiral Model and Agile Scrum Methodologies from the organizations.
Xtensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python, and Tableau.
Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale a cross a massive volume of structured and unstructured data.
Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards on web and desktop platforms.
Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principal Component Analysis and good knowledge on Recommender Systems.
Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian) in Forecasting/ Predictive Analytics, Segmentation.
Worked and extracted data from various database sources like Oracle, SQL Server, DB2, and Teradata.
Well experienced in Normalization& De-Normalization techniques for optimum performance in relational and dimensional database environments.
Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
Expertise in all aspects of Software Development Lifecycle (SDLC) from requirement analysis, Design, Development Coding, Testing, Implementation, and Maintenance.
Hand on working experience in machine learning and statistics to draw meaningful insights from data.
Hands on experience on Spark Mlib utilities such as classification, regression, clustering, collaborative filtering, dimensionality reduction.
Experience in Documentation tools like MS Visio, MS Project, and MS Office to create reports required for the projects and client submissions.
Extensive knowledge in Machine learning, Data mining, SQL Queries and Databases.
Proficient in writing queries and sub queries to retrieve the data using SQL from various servers including Microsoft SQL, Oracle and MySQL.
Well versed with Data Migration, Data Profiling, Data Extraction/ Transformation/ Loading.
Experience in data transformation, data mapping from source to target database schemas, data cleansing procedures using.
Experience in designing star schema, Snowflake schema for Data Warehouse, ODS architecture by using tools like Erwin Data Modeler, Power Designer, Embarcadero E-R Studio and Microsoft Visio.
Experience in Data transformation and Data mapping from source to target database schemas and also data cleansing.
Experienced in Data loading using PL/SQL Scripts and SQL Server Integration Services packages (SSIS).
Skillful in Data Analysis using SQL on Oracle, MS SQL Server, DB2 & Teradata.
Designed data marts using Ralph Kimball and Bill Inmon dimensional data modeling techniques.
Experience in Oracle, SQL and PL/SQL including all database objects: Stored procedures, stored functions.
Responsible for detail architectural design and data wrangling, data profiling to ensure data quality of vendor data, Source to target mapping.
Experience in Big Data Hadoop Ecosystem in ingestion, storage, querying, processing and analysis of big data.
Good understanding and hands on experience in setting up and maintaining NoSQL Databases like Cassandra and HBase.
Expertise on Relational Data modeling (3NF) and Dimensional data modeling.
Extensive experience on usage of ETL& Reporting tools like SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS).
Deep knowledge in Linux environments and hands on experience in Terminal and UNIX commands.
Interacted with several SMEs, Stakeholders, Clients and Vendors from different regions around the world over the years for better understanding of their business processes and expertise in gathering business requirements and Scrum methodology.
Worked on generating ad hoc report and analyzing the data for fixing the errors and expert taking initiatives in taking Business decisions.

TECHNICAL SKILLS

Languages: Python, R, SAS, Java-SQL, PL/SQL, SQL, MATLAB

Databases: SQL Server, MS-Access, Oracle 11g/10g/9i and Teradata, Hadoop

Big Data technologies: Hadoop, Hive, MapReduce, Kafka.

DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Data Stage, Visual Studio, Crystal Reports.

Tools: and Utilities: SQL Server Management Studio, Visual Studio .Net, Microsoft Management Console, Visual Source Safe 6.0, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel Data Explorer

Machine Learning: Linear Regression, Logistic Regression, Gradient boosting, Random Forests, Maximum likelihood estimation, Clustering, Classification Association Rules, K-Nearest Neighbors (KNN), K-Means Clustering, Decision Tree (CART & CHAID), Neural Networks, Principal Component AnalysisWeight of Evidence (WOE) and Information Value (IV), Factor AnalysisSampling Design, Time Series Analysis

PROFESSIONAL EXPERIENCE

Confidential - Round Lake, IL

Data Analyst/Machine Learning Engineer

Responsibilities:

Worked for Global Technical Services Team supporting Sales Data & Insights at Confidential International Health care corporation.
Generated Monthly Global Metric/ KPI reports for the Global Service Performance to US, CA, Europe and Asian Regions supporting PLM project.
Worked in large-scale database environments like Hadoop and MapReduce, with working mechanism of Hadoop clusters, nodes and Hadoop Distributed File System (HDFS).
Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
Analyze Data and Performed Data Preparation by applying historical model on the data set in AZUREML.
Implemented big data processing applications to collect, clean and normalization large volumes of open data using Hadoopecosystems such as PIG, HIVE, and HBase. Application of various machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, neural networks, deep learning, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
Implemented Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to AWS (S3/EC2) and Django platform for the company's core business.
Worked on AWS S3 buckets and intra cluster file transfer between PNDA and s3 securely.
Developed predictive models on large scale datasets to address various business problems through leveraging advanced statistical modeling, machine learning and deep learning.
Extensively used Pandas, Numpy, Seaborn, Matplotlib, Scikit-learn, SciPy and NLTK in R for developing various machine learning algorithms.
Built multi-layers Neural Networks to implement Deep Learning by using Tensor flow and Keras.
Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Caffe, Neon etc.
Conducted a hybrid of Hierarchical and K-means Cluster Analysis using IBM SPSS and identified meaningful segments of customers through a discovery approach.
Develop Spark/Scala, Python, R for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources. Used clustering technique K-Means to identify outliers and to classify unlabeled data.
Evaluate models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like ElasticSearch, Kibana etc.
Work with NLTK library to NLP data processing and finding the patterns.
Categorize comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text Analytics.
Ensure that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.
Use Principal Component Analysis in feature engineering to analyze high dimensional data.
Create and design reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
Perform Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
Implemented different models like Logistic Regression, Random Forest and Gradient-Boost Trees to predict whether a given die will pass or fail the test.
Perform data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from the database and used ETL for data transformation.
Use MLlib, Spark's Machine learning library to build and evaluate different models.
Perform Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
Analyze Data and Performed Data Preparation by applying the historical model to the data set in AZUREML.
Worked on weekly GSP BI reports and supported Ad-hoc reports with request by Users/Engineers using Cognos 11.
Generated Product Performance KPI Dashboards using Tableau.
Worked on SQL queries to generate reports and create ODBC connections using Oracle GSP.
Updated/Managed data and queries using PLSQL and retrieved data related to equipment’s performance and quality reports.
Converted Excel spreadsheets into automated MS Access 2013 Database for KPI tracking tools.
Worked parallelly on different regions using Oracle PLSQL and gained knowledge on various domains in databases.
Generated Monthly Product Performance Reports using GSP BI.
Automated tasks for Ad-Hoc, metric reports and excel spreadsheets to reduce the work load, quality and to enhance time redundancy
Performed weekly and bi weekly team meetings with team.
Automated Access with Excel to extract information from various Excel files and imported them into MS Access.
Held Walk throughs with QA, BA, SMEs and Stake holders for better understanding of Data Analysis with Reports.
Created Dashboards using IBM Cognos 11.0 and Tableau.

Confidential - Rockville, Maryland

Data Scientist/Data Analyst

Responsibilities:

Worked on a federal project for DAIDS Regulatory support (NAIDS division) as a consultant for health Informatics Company.
Generated weekly, monthly ad-hoc reports and clinical trial reports.
Responsible for checking the daily inbound, outbound data activities and update appropriate EMR data in the back end of the production server.
Assisted in Anomaly Detection alarm algorithm code for daily inbound/outbound activities using Java Technology.
Extensively worked on MS SQL server to retrieve, validate and generate the clinical data using SQL queries.
Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, &KNN for data analysis.
Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data lifecycle management in both RDBMS, Big Data environments.
Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
Developed MapReduce/SparkPython modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
Performed Source System Analysis, database design, data modeling for the warehouse layer using MLDM concepts and package layer using Dimensional modeling.
Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem).
Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
Hands-on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of the database.
Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
Worked on customer segmentation using an unsupervised learning technique - clustering.
Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi-Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction.
Analysed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
Worked by fulfilling the federal compliant rules and consistently handled production support for Java and Back end activities within the internal product delivery systems.
Worked with senior programmers and QA personnel in performing unit and user acceptance tests to ensure best performance and compatibility of MS Access database or applications on various platforms as well as provided regular system updates as required.
Worked closely with manager, vendors(sub-contractors) and held status meetings on the project.
Assisted production support for the project and took actions by providing appropriate resolutions.
Used SSRS to generate vendor related requirement reports on request.

Confidential - Fort Myers, FL

Data Analyst/Data Scientist

Responsibilities:

Developed applications of Machine Learning, Statistical Analysis, and Data Visualizations with challenging data Processing problems in sustainability and biomedical domain.
Compiled data from various sources public and private databases to perform complex analysis and data manipulation for actionable results.
Designed and developed Natural Language Processing models for sentiment analysis.
Worked on Natural Language Processing with NLTK module of python for application development for automated customer response.
Performed Data collection, Data cleaning and Data visualization using RStudio, and feature engineering using python libraries such as pandas and NumPy, performed Deep feature synthesis and extracted key statistical findings to develop business strategies.
Constructed new vocabulary to encode the various in a machine-readable format using Bag of words and TF-IDF.
Executed process in parallel using distributed environment of TensorFlow across multiple devices (CPUs & GPUs).
Employed NLP to classify text within the dataset. Categorization involved labelling natural texts with relevant categories from predefined set.
A gradient boosted Decision Tree Classifier was trained using Extreme Gradient Boosting to identify whether a cohort was a promoter or detractor.
The NLP text analysis monitored, tracked and classified user discussion about product and service in online discussion.
Performed data collection, data cleaning in a huge dataset which had many missing data & extreme outliers from Hadoop workbooks and explored data to draw relationships and correlations between variables.
Applied linear regression in Python and SAS to understand the relationship between different attributes of dataset and causal relationship between them.
Performs complex pattern recognition of financial time series data and forecast of returns through the ARMA and ARIMA models and exponential smoothening for multivariate time series data.
Pipelined (ingest/clean/munge/transform) data for feature extraction toward downstream classification.
Used Cloudera Hadoop YARN to perform analytics on data in Hive.
Wrote Hive queries for data analysis to meet the business requirements.
Built model and algorithm templates on python using and python package for deployment on the entire data set using HDFS and MapReduce.
Used two sample independent t-test to assets the difference in mean purchases across dichotomous variable such as gender and material status use one-way ANOVA and tukey parameter to assess the difference between mean purchases across polychotomous.
Used Multiple Linear Regression, Decision Tree Regression, support vector regression & ensemble learning like Bagging, Random Forests & Gradient Boosting Machine to train 70% of the model & the models were optimized Grid Search & the predictions were made on the test set using each trained model.

Confidential

Data Analyst

Responsibilities:

Worked on resolving tickets as a team member in Retail Business Services supported ALM and PLM projects.
Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
Prepared ETL technical Mapping Documents along with test cases for each Mapping for future developments to maintain SDLC and Migration process. Used Talend for Extraction and Reporting purpose.
Actively taken part in Data Profiling, Data Cleansing, Data Migration, Data Mapping and actively helped ETL developers to Compare data with original source documents and validate Data accuracy.
Worked on Tableau, to create dashboards and visualizations.
Analyze customer data in Python and R to track correlations in customer behavior, define user segments to implement process and product improvements.
Conducted reverse engineering based on demo reports to understand the data without documentation.
Generated new data mapping documentations and redefined the proper requirements in detail.
Generated different Data Marts for gathering the tables needed (Member info, Claim info, Transaction info, Appointment info, Diagnose info) from SQL Server Database.
Created ETL packages to transform data into the right format and join tables together to get all features required using SSIS.
Processed data using Python pandas to examine transaction data, identify outliers and inconsistencies.
Conducted exploratory data analysis using python NumPy and Seaborn to see the insights of data and validate each feature through different charts and graphs.
Built predictive models including Linear regression, Lasso Regression, Random Forest Regression and Support Vector Regression to predict the claim closing gap by using python scikit-learn.
Responsible for writing SQL scripts which analyze the aggregated business data.
Developed KPI, Quality and financial reports and visualizations using Tableau.
Developed weekly predictive models for competitive pricing and limiting of products.
Used Pivot tables, VLOOKUP, Index, Match and charts extensively using Excel for the business requirement.
Created visualizations to present analysis findings to the internal team and prime vendors.
Performed data collation, interpretation and visualization using Excel, SQL and Tableau.
Worked extensively on writing SQL queries to retrieve vendor related business data.
Responsible for data management of products’ inventory, lifecycle management and procurement using varied vendor management tools such as Procurement Portals, YUMA and Agile.
Held walkthroughs with the developers and QA teams to create a better understanding of the system requirements.

Confidential

Business Intelligence Analyst

Responsibilities:

Created and formatted clinical reports from hospital clientele to assess performance and supply chain statistics using IBM Cognos.
Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application.
Worked with other teams to analyze customers to analyze parameters of marketing.
Conducted Design reviews and Technical reviews with other project stakeholders.
Was a part of the complete life cycle of the project from the requirements to the production support.
Responsible for optimizing SQL queries to get data results which are accurate and validated accuracy of data to ensure database integrity.
Used IBM Cognos Framework Manager for importing and retrieving Meta data from database and Query Studio for Meta data manipulation and leveraging client queries and requirements.
Generated Dashboards using Microsoft Power BI.
Responsible for data collection and assessment to optimize and store data.
Used IBM Cognos 8 to generate prompt & burst reports and various charts (Pie Charts, Bar Charts, Colum Charts, etc.) for data analysis.
Developed ETL processes for data conversions and construction of data warehouse using IBM InfoSphere DataStage.
Used Star Schema and designed Mappings between sources to operational staging targets.
Involved in defining the business/transformation rules applied to sales and service data.
Define the list codes and code conversions between the source systems and the data mart.
Provided On-call Support for the project and gave a knowledge transfer for the clients.
Used Rational Application Developer (RAD) for version control.
Developed transformations using jobs like Filter, Join, Lookup, Merge, Hashed file, Aggregator, Transformer and Dataset.
Worked with internal architects and, assisting in the development of current and target state data architectures.
Coordinate with the business users in providing anappropriate, effective and efficient way to design the new reporting needs based on the user with the existing functionality.
Remain knowledgeable in all areas of business operations to identify systems needs and requirements.
Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
Created schedules to run reports at specific time in client environment.

Confidential

Jr. Java Developer

Responsibilities:

Worked on an E- Commerce Computer Sales and Management project to design databases and to retrieve the data with the help of SQL scripting using SQL server.
Initiated object-oriented programming best practice and design pattern to enable flexibility and easy maintainability.
Used UML to create use cases, sequence diagrams, collaboration diagrams and class diagrams, then implemented Java.
Improved overall performance via multithreading, collections, synchronization, and exception handling strategies.
Performed troubleshooting, fixed and deployed many JAVA bug fixes of the applications and involved in fine tuning of existing processes followed advance patterns and methodologies.
Wrote stored procedures, triggers and database normalization.
Implemented and configured Hibernate Ehcache for avoiding unnecessary database hits.
Designed web pages for company using HTML and Apache Tomcat.
Worked on localhost and Admin Console.
Enhanced validation process by creating server-side validation library for user input to UI from exception handler.
Ensured application integration for business object using RESTful service by creating service end points.
Maintained data using hibernate, JPA and JDBC in MySQL database.
Developed and implemented ERP project solutions with team of technical engineers.
Trained client on use of software solutions and customized app according to business needs.

We provide IT Staff Augmentation Services!

Data Analyst/machine Learning Engineer Resume

Round Lake, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship