We provide IT Staff Augmentation Services!

Big Data Analyst Resume

4.00/5 (Submit Your Rating)

NJ

SUMMARY

  • Microsoft and SAS certified, Highly - skilled data analyst bringing more than 11+ years of expertise in Big Data Technology, Data Mining, Data Warehousing Data,Analysis, and Data Visualization .Thrive to use econometric, and thereby guide and help businesses in their decision-making and run efficiently created data lake using Hadoop and HBase and Hive.
  • Winner of Summer GIO Hackathon 2015 at JOHN DEERE
  • Expertise in designing concise,pertinent visualizations usingTableau, Power BI,R Studiosoftware and publishing and presenting dashboards on web and desktop platforms.
  • Extensive experience using R packages like (foreign, gretl, rattle, and quantMode). Also familiarity with finance statistical analysis such as lme4,MASS,mice,mlogit,Rcmdr,survival,truncreg
  • Robust user of Python packages for statistical analysis such asStatsmodels, Scikit-learn, Numpy, Pandas, NLTK
  • Competent in Normalization/ De-Normalization techniques for optimal performance in relational and dimensional database environmentsand maintaining referential Integrity by using Triggers and Primary and Foreign Keys.
  • Saved $300K/year operation cost of Global Database Administrator team, saved 267 days’ database team work.
  • Excellency in MS Excel with proficiency in Lookups, Pivot Tables and understanding of VBA Macros.
  • Extensive experience in in-depth data analysis on different data bases and structures. Strong knowledge in writing T-SQL and PL SQL Queries, Dynamic-queries, sub-queries, CTEs and complex joins.

TECHNICAL SKILLS

Programming Languages: Java, PL/SQL, JCL, COBOL

Analytics Languages: R, Python, SQL, Scala, SAS

IDE: RStudio, Eclipse, Intellij, PyCharm, PySpark, Weka

AWS Technologies: S3, EC2, SQS, SNS, EMR

Databases: Mainframe, Oracle, NoSQL, MySQL, ETL

Web Development: Java Script, HTML, CSS

Big Data Technologies: MapReduce, Hive, Spark, HBase, Pig, Yarn, Apache Azure

Data Visualization: Tableau Desktop/Server, Weka,Power BI, Pivot Table, VBA, V-lookup

SAS Skills: SAS-BASE, GRAPH, MACRO, SQL, ODS, STAT, MINER

Competencies: Logistic and Linear Regression, Time Series Analysis, CHAID,Factor Analysis, CART, Survival Analysis

PROFESSIONAL EXPERIENCE

Confidential, NJ

Big Data Analyst

Responsibilities:

  • Fraud Detection: Anomaly detection method was used to detect outliers in financial transactions for the American Express clients applying the logic on the Spark framework which allows for large scale data processing. Anomalies detected are reported to the downstream teams for further action on the client account.
  • Hospital Readmission Project: Taking Hospital readmission data set, after visualizing the patterns in Tableau and built a predictive model in R to predict readmission risk..
  • Behavioral Analysis: Analyzed more than 100K Patient Records for early readmission risk using Py-Spark and Spark Machine Learning Library (MLlib).
  • Surgical Schedule Optimization: Designed optimal surgical scheduling and staff planning for Medical College by building generalized linear model and using AMPL optimization tool, this helped in 10% reduction in the under allocated operating hours.
  • Revenue Analysis: Worked on movie revenue data sets and devised a dynamic forecasting model through Regression Stepwise, KNN and En- semble techniques. Average Ensemble model results are 92.3% accurate.
  • Twitter sentiment analysis using Python and NLTK:Implemented sentiment analysis of the tweets (mobile carriers) using NLTK sentiment analysis and twitter API
  • Processed and cleaned the data by treating missing values using imputation method.
  • Detected and treated outliers ran stepwise regression and all subset regression methods to choose effective variables to build the revenue model.
  • Developed a predictive algorithm using Decision Tree (Regression Tree) to implement Pricing Optimization.
  • Identified optimum prices of products so that items can be sold for maximum profit, by sustaining its demand.
  • Predicted the revenue using linear modeling as well as ran the price elasticity model to show what happens to the revenue when the price of product increases, which helped to improve profit by 23%.
  • Applied Logistic Regression (GLM), Linear Discriminant Analysis, and K-Nearest Neighbors to identify fraudulent customers using customer Total Pay credit card transaction data.
  • Caught fraudulent activity more quickly and efficiently, leading to a drop-in cost of fraud to the customer by nearly 96% and a drop in the cost of goods sold by over 95%.
  • Provided additional analyses where needed to determine inefficiencies within the department and implemented the fixes to these problems.
  • Provided special reporting and ad hoc reporting (scholar/fund data /market values/book values) as needed. Researched and resolved data integrity issues.
  • Initiated, compiled, and communicated the merge of Word and OneNote process and procedure documents into one resource file.
  • Initiated and maintained many improvements to multiple FileMaker Endowed Scholarship data bases resulting in significant time-savings and increased efficiencies:
  • Created a multi-script process pulling data from multiple data bases to be merged into one pdf file for endowed scholarship donor acknowledgement and reporting.
  • Designed and created a FileMaker scholar thank-you letter proofing layout which increased colleague efficiency and timeliness. Created additional layouts that mirrored Dartmouth stationery eliminating the need for letter-head stock.
  • Created scripts which identified all funds supported by a household (e.g. husband and wife with multiple and separate funds) and then produced the household's scholar announcement letter.
  • Merged the data from two separate scholarship data bases into one data base resulting in a significant maintenance time-savings.
  • Performed data up data to Scholarship fund, Monitored fund, In Memory Of and Prizes/Awards data bases by importing data from multiple FileMaker and Oracle data base sources (Financial Aid Office, Advance, data Warehouse and I Modules).
  • Provided technical and software support as well as training (FileMaker, Excel, Acrobat Pro, printing and processes) to colleagues.
  • Extracted, compiled and tracked data, and analyzed data to generate reports in a variety of layoutsExcel, PDF, Tableau and SAS dashboard and Modelled data structures for multiple projects using Mainframe and Oracle
  • Maintained the data integrity during extraction, ingestion, manipulation, processing, analysis and storage.
  • Presented more than 15 impactful visualization time series dashboards and stories by employing Tableau desktop and server, Excel, pivot tables, Power BI, SAS and Visual Basic macros
  • Presented more than 15 impactful visualization time series dashboards and stories by employing Tableau desktop and server, Excel, pivot tables, SQL queries, Power BI, SAS and Visual Basic macros
  • Responsible for enhancements of data model according to the business requirements.
  • Developed the scripts for creating the sequences in all the data bases that will Acoma data e the extended enterprise key in it.
  • Analyzed and developed the performance improvements of required data and tables.
  • Deployed the changes to various environments and tested the changes.
  • Worked on the QA and staging builds in TFS and merged all the builds based on the requirement.
  • Enhancements of already existing models that reduces the data redundancies.
  • Worked on the parent and child hierarchy relating to the keys and created the script for their sequences according to their levels so that data migration from each data base will have no big challenges.
  • Worked with Java team to accommo data e the changes in the front end.
  • Modified the PL SQL packages for better performance of the jobs and batch processes.
  • Responsible for enhancements of data model according to the business requirements.
  • Developed the scripts for creating the sequences in all the data bases that will accommo data e the extended enterprise key in it.
  • Analyzed and developed the performance improvements of required data and tables.
  • Deployed the changes to various environments and tested the changes.
  • Worked on the QA and staging builds in TFS and merged all the builds based on the requirement.

Confidential, Westchase, FL

Big Data Analyst

Responsibilities:

  • Leveraged Statistical Analysis Azure Library to group multiple data plans into one group to create cushion capacity of 15 % to avoid overage for different vendor plans.
  • Achieved: Approx. $160 K saving in one and half month tenure for more than 5 teams.
  • Analyzed, retrieved and aggregated data from multiple vehicles vendor to perform data mapping, to precisely append either incorrectly mapped or missing data storage and transfer using HDFS and HIVE query.
  • Responsible for building and maintaining effective working relationships with business teams, as well as other external vendor and customer data management team
  • Identified inconsistencies in data collected from different sources and worked with business owners/stakeholders to assess business and risk impact, provided solution to business owners.
  • Worked with consumers and different teams to gain insights about the telematics data usage by their team and their target of cost saving through data usage planning analysis generated Data Lake using Hadoop to push raw data.
  • Analyzed business data requirements, data plans requirements, data overage requirement specifications, and accountable for documenting data plan modification and usage monitoring creating data storage in Hadoop and run strong queries using Spark.
  • Communicated with data vendor such as Vodafone, Verizon, Iridium and ORBCOMM to understand their business data plan strategies.
  • Access Caterpillar telematics data plan information from multiple data provider’s portal in SQL, SAS,Oracle format and run data profiling, data cleaning and data analysis on raw data using R packages and advance SQL querying.
  • Modelled advanced visual basic application macros on various vendor data reports to plan data usage structure with minimum overage cost using Excel and VBA.
  • Employed time series algorithm on data usage to visualize future scope of data plan usage and overage estimation to generate cost saving successful data usage plan for teams.
  • Created backdating data plan process to avoid non-required data overage cost for 5 teams by the help of VBA, Tableau and SQL query report generation
  • Performed data analysis and data profiling using complex SQL queries on various sources systems.
  • Developed share point documentation template to support findings, project status and assign specific tasks.
  • Involved with data profiling of multiple sources using SQL Management Studio and presented initial discovery in Excel tables and reports.
  • Used project management tools such as Kanban and Share point to keep stakeholder updated about project.
  • LeveragedSentiment analysis to established consumersfeedback systemby MapReduce and text mining in java.
  • Developed and systematizedend-to-end statistical model on high-volume data sets by manipulating data using Hivequeries and Spark on Hadoop for faster results, and resolve issues of stakeholders under tight deadlines
  • Achieved: 343 products showed selling growth after profiling
  • Business requirement gathering through one-to one and group meeting with Vendors, Order Management team and Supply Chain team. Presented initial developed KPI frameworks to gain line of sight project.
  • Employed time series algorithm on parts sales to visualize future scope of part and vehicles requirement to estimate required availability of various parts of vehicle in inventory and generate cost saving successful inventory management plan for team.
  • Extracted, compiled and tracked data, and analyzed data to generate reports in a variety of layoutsExcel, PDF, Tableau and SAS dashboard and Modelled data structures for multiple projects using Mainframe and Oracle
  • Maintained the data integrity during extraction, ingestion, manipulation, processing, analysis and storage.
  • Presented more than 15 impactful visualization time series dashboards and stories by employing Tableau desktop and server, Excel, pivot tables, Power BI, SAS and Visual Basic macros
  • Presented more than 15 impactful visualization time series dashboards and stories by employing Tableau desktop and server, Excel, pivot tables, SQL queries, Power BI, SAS and Visual Basic macros
  • Modelled basic analytical models using Python, R through an Spark R and API on 25 % threshold data Achieved: 15% growth in inventory planning accuracy

We'd love your feedback!