Big Data Platform Support Sme / Principal Data Scientist Resume
0/5 (Submit Your Rating)
SUMMARY:
- L1, L2, L3 support for big data platform, data visualization & cloud applications (Spark, Hadoop - HDFS, Hive, Impala, Redshift, S3, Nexus, Trifacta, Alation).
- Data Science Studio ( DSS - Dataiku ) SME.
- Creating SOP’s for support and incident resolution tasks.
- Access management, resolving incidents, tasks, projects within SLA.
- Configuring big data and data visualization applications.
- Active and proactive monitoring of production nodes using tools like Zabbix, Grafana & cloud watch for platform support.
- Building data pipelines for big data and cloud-based applications (ansible, docker, EMR, other traditional ETL tools).
- Configuring execution engines (hive, spark, impala) on HDFS (Cloudera, hortonworks ) .
- Extensive use of Linux clients (mobaxterm, putty) & WinSCP to run configuration commands, disk clean up, memory monitoring, all support and DevOps tasks.
- Proficient with collaborative tools like box, slack, yammer, jive, service now, JIRA & confluence to perform big data platform, cloud & analytical tools applications support task.
- Consolidating structured and unstructured data from disparate data sources to build data products and eventually deploy or integrate solution with other applications in production systems.
- Tuning, fitting and optimizing models; features engineering and deploying models via web service or BI tools.
- Design, architecture & development of analytic solutions to solve business problems.
- Implement these solutions and train analysts, developers to use the analytic tools and / or data products.
- Exploring data sets to generate patterns using time series models, cross-correlations with time lags, signal processing & filtering techniques, spectral analysis and correlograms.
- Configuring HIVE on HDFS, Horton works, Impala for big data analytics.
- Using python scripts to read & move flat files, web scraping using APIs to extract data in JSON & XML.
- Cleaning, rescaling and munging data using R, SAS & python; dimensionality reduction & normalization.
- Building models for machine learning algorithms by using several techniques to solve business problems. This would include: K-nearest neighbor, Naïve Bayes, Simple Linear Regression, Multiple Regression, Logistic Regression, Decision Trees & Neural Networks for supervised learning algorithms.
- Utilizing clustering and reinforcement algorithms for unsupervised learning models.
- Refactoring map reduce code in python and java to optimize query performance.
- Using enhanced techniques to reduce overfitting or underfitting including; Random forest.
- Leveraging natural language processing, recommender systems & network analysis to build custom data products and solve complex business problems.
- Utilizing NumPy, pandas, scikit-learn and other python and R libraries to build data products and decision engines.
- Using statistical and probability techniques, linear algebra, gradient descent, hypothesis and inference to build models for machine learning algorithms.
- Build data products by extracting data from IoT devices and do complex event processing for decision engines, predictive models and live streaming dashboards for monitoring.
- Rapid prototyping of products & solutions after analyzing business problems and going through iterations and simulations of possible solutions; thinking outside the box and challenging status quo techniques for problem solving.
- Critical thinking & domain knowledge in several industries including: Banking, Finance, Telecommunications, Oil & Gas, Pharmaceuticals, Healthcare technology, Supply chain & logistics, Marketing, Consulting, Professional Services and Information Technology.
- Extending the spotfire platform using Ironpython, R, S+, Spotfire SDK and JavaScript.
- Building complex advanced visualizations all available chart types and custom JavaScript or ironpython charts.
- Proficient with advanced custom expressions.
- Scripting OVER, Statistical, Spatial, Ranking, Math, Logic, Binning, Conversion, Date & Time, Text and Property functions.
- Creating and registering custom data functions in R and/or S-Plus. Running SAS and MATLAB scripts through Spotfire.
- Embedding spotfire analytic data products in web portals, websites and SharePoint.
- Data visualization best practices, interactive dashboards and guided analytics.
- Advanced geomapping configuration with multilayer integration.
- Building elements, joins, procedures & infolinks in info model layer.
- Library administration, Information Designer and Administration Manager Proficiency.
- Managing licenses, setting up and configuring spotfire users (5000+).
- Deploying cluster of spotfire servers including web player server, load-balancing servers, automation services server, statistical services server and spotfire servers.
- Upgrading, patching, monitoring, ldap integration, installations and all other administration duties.
- Configuring Spotfire Application Data Services for multiple environments (Composite) including Netezza, Teradata, MS PDW, Oracle and other big data sources.
- Scheduled updates and automation services xml jobs.
- Server monitoring using geneous, splunk and creating alerts for exceptions.
- Spotfire infrastructure design and platform configuration for clusters, high availability of web player servers.
- Configuring spotfire information model by designing and developing back end stored procedures & complex queries for spotfire server information links.
- Deploying web server based dashboards in Spotfire 4.x, Spotfire 5.x., Spotfire 6.x and Spotfire 7.x
- Spotfire center of excellence standards domain knowledge.
- Analysts, building knowledge base, and documentation.
- Connecting with data; using the Tableau interface to effectively create powerful visualizations.
- Create calculations including string manipulation, advanced arithmetic calculations, custom aggregations and ratios, date math, logic statements and quick table calculations.
- Build advanced chart types and visualizations: bar in bar charts - bullet graphs, box and whisker plots - pareto charts, build complex calculations to manipulate data, using statistical techniques to analyze data, using parameters and input controls to give users control over certain values, implement advanced geographic mapping techniques and using custom images and geocoding to build spatial visualizations of non-geographic data, combine data sources by joining multiple tables and using data blending, make visualizations perform as well as possible by using the data engine, extracts, and using connection methods correctly, build better dashboards using techniques for guided analytics, interactive dashboard design, and visual best practices, implement efficiency tips and tricks.
- Using groups, bins, hierarchies, sorts, sets, and filters to create focused and effective visualizations.
- Using Measure Name and Measure Value fields to create visualizations with multiple measures and dimensions.
- Tableau Administrator: Windows Server monitoring of Tableau Servers (externally), and internally using Tableau Administrative view workbook.
- Tableau directory service integration using Active Directory.
- Full utilization of TABCMD and TABADMIN to do server-side auditing and administration of groups, users, sites, server status through batch scripting or simple DOS prompt commands.
- Implementing end-to-end workbook, database, trusted security strategies by leveraging ISMEMBEROF, FULLNAME, etc. so as to achieve the desired level of security required by end-users.
- Implementing user or core-based licensing strategies.
- Design and deployment of high availability, failover, and distributed Tableau configurations across multiple domains.
- SAML implementation with reverse proxy.
- Tableau JOLT for stress testing.
- Configuration of Tableau VIZQL, Background, and Data Engine processes to adjust for performance across distributed configurations.
- Using F5 load-balancing for very active Tableau servers.
- Dashboard performance recording and tuning. DirectX and browser compatibility in improving user desktop performance.
- Full utilization of Tableau in-built Postgres database server to monitor user, browser, and server activity.
TECHNICAL SKILLS:
Tools: Hortonworks (Hadoop), Apache Spark, Python, R, SAS Base & Enterprise, SSAS, Rapid Miner, AWS, Azure, Spotfire, Tableau, Qlikview, Orange, SSRS, SPSS, Matlab, Mathematica, Maple, PerformancePoint, Excel, Salesforce, Google Analytics, PowerBI, Azure ML Studio, Looker, Dataiku-DSS
PROFESSIONAL EXPERIENCE:
Confidential
Big Data Platform Support SME / Principal Data Scientist
Responsibilities:
- McKesson: Implemented contract and provider predictive analytics data product. This data product included a series of data products tracking contract and provider metrics, KPI’s, regression analysis with contract coverage, clinical studies, SLA’s, drug manifestations, etc.
- Bristol Myers Squibb: Designed and developed operational business insights and data products for real world research data on hybrid cloud platform. End to end implementation of product; data modeling, data mining, ETL, database environment deployment and administration, data wrangling, data product consolidation and release with modeling algorithms for predictive and prescriptive analytics. Compliance and governance with HIPAA and other healthcare industry codes like ICD-9, 10, CPT, etc.
- Monsanto: Designed and developed a global supply chain perfect order data product and other finance, marketing & research data products. Redesigned and configured spotfire server and web player server clustered High-Availability platform to accommodate a 1000+ user base across North America, South America, Europe, and Asia & Africa. This included a hybrid architecture scaling up and scaling out with cloud integration.
- Redesigned and developed a global supply chain dashboard and guided analytics operational intelligence tool using spotfire.
- Designed a reporting new data model from data-warehouse (Teradata), other big data sources (structured & unstructured) and data virtualization layer (Info model-spotfire) to feed the spotfire analytic data products and dashboards with optimum performance.
- Refactoring and development of reports and dashboards using spotfire and extending the platform for extra and custom capabilities with iron-python, JavaScript and R.
- Setting up and configuring automation services for data refreshes and migrations to different spotfire server environment and deploying models to production.
- of developers and analysts, knowledge base documentation, setting up a spotfire BI standards center of excellence with best practices use cases well documented.
- Collaborated with data science team to integrate machine learning algorithms and predictive analytics data products within the spotfire platform.
- Planned out and executed spotfire patch and upgrade for 7.0 & 7.5
- ConocoPhillips: Designed and developed operations real-time analytic data products and dashboards for production engineering support, finance, water disposal, production optimization and consolidated data products using spotfire for business unit.
- USDA: Designed and developed predictive analytics data product using tableau for managing business loans to small business, farmers, etc. Product included weather data, census data, loan status data, etc.
- Frontier Communications: Redesigned and architected spotfire platform integrating it with AS400 systems via data virtualization platform.
- Confidential: Integrating SAS & R into Spotfire & Tableau to build dashboards and analytic products for different lines of business. Developed automated workflows and data products end to end for Governance and compliance department. Also worked with offshore administration teams maintaining spotfire and tableau server environments. Pioneer member of team that established ETL and BI center of excellence. Transitioned to data science team building machine-learning algorithms, text mining and other data products to support different LOB’s and enable bank to meet SLA’s with client agreements.
Confidential
Business Systems Analyst / SQL Developer /C#, Java, .NET Developer
Responsibilities:
- SQL scripting & tuning, stored procedures, reporting, systems design, analysis & implementation, enterprise data management, requirements gathering, technical & operational support using Microsoft technology stack, SAP & Oracle ERP. Application design and development.
Confidential
Database / Application Developer II
Responsibilities:
- Designed, developed & implemented ERP system. Requirements gathering & business process modeling for enhancements to ERP system. Logical database design for applications & maintenance of Ops prod DB.