We provide IT Staff Augmentation Services!

Big Data Developer Resume

0/5 (Submit Your Rating)

Scottsdale, AZ

SUMMARY:

  • More than 12 years of Professional Services customer facing experience architecting large scale database solutions .Major focus on big data architecture design, data streaming, analytics and integrations. Experienced in large scale data streaming projects with Kafka and Nifi Hortonworks Data Hub 2.0. Big Data Analytics Architect with Kafka, NiFi and AWS EMR, S3, DMS, EC2 cloud based solutions.
  • The process of data analysis, discovery, and model - building is often iterative as you target and identify the different information that you can extract. You must also understand how to relate, map, associate, and cluster it with other data to produce the result. Identifying the source data and formats, and then mapping that information to our given result can change after you discover different elements and aspects of the data.
  • Proven track record for data mining development processes. Analytics developer with R Revolution on Hadoop and Cassandra. Cloud design such as Azure and Amazon Cloud (AWS-EC2) DMS, EMR and S3 instances.. Extensive Data Streaming development with Kafka, Nifi and Spark technologies.
  • SAS analytics, and data ETL and data mining developer
  • Performance based Hortonworks, Cloudera, and DataStax Cassandra data processing architecture derived from innovation in performance metrics algorithms design supported by data research.
  • Data mining skills include research and development for foundational requirements plans aligned with Business Use Case methodologies. Hadoop Administration for 600 clusters. AWS Administration for EC2.
  • Develop in Spark, and Scala, Data Frames (Spark SQL). Data Integration with NIFI Hortonworks Data Hub Data mining with SAS Enterprise Data Miner, HP Vertica, Presto, Impala, Hive, S3 and Cassandra dbs.
  • Expert on Data Integration stream processes with Kafka and Data Hub technologies.
  • Streaming Data: Development of Secure Data Stream technologies for SSL data streams
  • Hbase and Cassandra Key Value pair database cluster design and fast indexing configurations
  • 3 Spunk Implementations, 3 Kafka Implementations, 4 HortonWorks implementations

TECHNICAL SKILLS:

Core ETL tools: HortonWorks Data hub 2.0 (NiFi). Boomi; including Informatica Big Data and Power Center, and SAP Data Services 4 . ELK Stack experience 3 yrs.

Data Models: ERWIN ERM used for Star Scheme DW with Teradata, SAP BW, Neteeza and Oracle.

Big Data File Formats: Parquet, Thrift and Avero, GIT, Maven Frameworks and Json/REST api. Predictive analytics with SAS Enterprise Data Miner and R Revolutions tools.

O/S Admin Support: Windows Server 2008, 2012, 2016, Linux Redhat, Umbuntu, SUSE, VM-Ware

Cloud Admin Support: AWS 4/5 yrs. Azure 2.5 yrs. Google 2 yrs.

Big Data Admin: Hortonworks Stack 5 years, AWS- 5 yrs, Cloudera Stack 3.5 yrs, DataStax 3 yrs.

PROFESSIONAL EXPERIENCE:

Confidential, Scottsdale AZ

Big data developer

Responsibilities:

  • Big Data Lead Architect for developing new projects and building project teams. Working on multiple projects: Hortonworks Stack and DataStax Cassandra Stack are primary focus of solutions. Data Architect with AWS Cloud (3 projects), EMR, Aurora, Redshift, S3 DBS, Three DataStax Cassandra projects. Azure Cloud Solutions and BI analytics with Tableau.
  • Development of data steams with NIFI, Kafka Topics, Spark Stream analysis and CDC processes. Greenplum Database integration and administration with clients who need managed data streams and work flows. Implement Puppet Linux server configuration processes. AWS Data replication schemes with AWS DMS. Data administrator for Cassandra, Hortonworks (Hive and Hadoop), and Cloudera - (Impala and Hive).

Confidential, Austin TX

Data Architect

Responsibilities:

  • Working on Gaming Industry streaming metrics analysis. ETL design, SQL schemes and queries, Metrics design for data analysis. Dimensions and Fact table design n BI data marts. Development of Tableau BI reports, Splunk Reports for data streaming, and custom metrics analysis per requirements gathering and business cases.
  • Designed Spunk Architecture, POC and implementation to measure streaming data flows from Games on casino floors to gather Game performance metrics. Developed Splunk load tests to measure capacity of game machines sending streaming data to Splunk Enterprise server 6.3.1. Splunk Enterprise was capable to handle 4000 streaming data feeds with 1k to 5 k data packets processes at 1 to 3 seconds with very good results.
  • Splunk Alerts design and tested to measure on various metrics and thresholds out of expect ranges. Splunk benchmark reports develop on custom data collected from Splunk Forwarders using TCP forwarder and TCP indexer. Test Splunk messaging with new Http/Https indexer with very good results. Extensive experience with Splunk Indexers, forwarders, alerts, custom data collection, custom reporting and metrics benchmarks. I can do end to end Splunk implementation, development and customization of data, alerts, logging benchmarks and reporting.

Environment: has AWS EMR development on Hadoop for data analysis and Metadata storage.

Confidential, Washington D.C

Responsibilities:

  • Architecture and deployment on Aws EC2 Cloud, supporting Hortonworks 2.3 and ETL Staging done for US DOL Government. Develop Statement of Work, RFI, RFP and POC for new Hadoop Hortonworks Stack installed and configured on AWS EC2 Cloud. Develop Project Plans and times lines, Develop AWS EC2 Architecture for support of Hortonworks Hadoop Stack 2.3. Install AWS EC2 Server Instances (4 Servers) on Linux Red Hat 7 Clusters. Security provided by 128 bit Key access to Linux EC2 Gateway Server. Develop Kafka topics created to route data from various sources via DOL website. Informatica
  • ETL Server for Database Staging of Data to migrate to Hive, Hbase and Acculumo Databases running on HDP and HW 2.3 Stack. Develop ETL processes with Talend, Sqoop and Oozie. Develop Data Mining and Metrics with ELK Stack (EleasticSearch, Logstash and Kibana). Connect tableau and SAP BO 4.2 BI tools to Hive via JDBC and develop metrics reports. Hadoop Admin work done on HW 2.3 Stack, with Optimization tuning of HDFS clusters and Ambari, Yarn2, MapReduce2, Zookeeper, Atlas and Falcon Server.

Confidential, Atlanta GA

Responsibilities:

  • DataStax Cassandra implementation with Kafka and Spark streaming financial data organized by various values and stores key value pairs index for risk analyst investment stock planning using predictive analysis metrics and data categorized for fast analysis and no ETL required. . Hadoop Hive Optimization: Cloudera manager administration tasks
  • CPU core sizing, memory sizing I/O bandwidth on Network managed for Rack awareness. Later, data analysis is done in Hive using NoSQL to retrieve data from Cassandra. HUE interface used to test and run some jobs in Hive, the deployed on Oozie and Sqoop. Stream Processing with Apache Spark: Distributed real-time computation system for processing fast, large streams of data. Storm adds reliable real-time data processing capabilities to HDP 2.1. Storm in Hadoop helps capture new business opportunities with low-latency dashboards, security alerts, and operational enhancements

Confidential, San Jose CA

Responsibilities:

  • Implement SSL secure data streams via Kafka and AWS SSL. Implemented Hortonworks Ranger and Kerbose HDFS Clusters security model for data transfer and database access architecture. Ambari Performance and Administration advisory: Hive/Hadoop design criteria is to pull research data based on extended data mapping attributed added to data when data is stored and synch by Falcon and Zookeeper in HDFS clusters. Advanced metadata processes managed in Hive Stinger synchronously. This new MDM design includes Implementation of Falcon data governance via Zookeeper regional data cluster organizations and managed by
  • Confidential 2.2 Future state to included machine learning to automatically categorize data more quickly and add multiple data attributes. End goal is to make data research algorithms much faster than currently available processes for use with R Revolution data mining. Data that is unstructured becomes highly structured and categorized in a multitude of definitions, which shortens cluster query times. Data Governance with Apache Falcon: a framework for simplifying data management and pipeline processing in Apache Hadoop®. It enables users to automate the movement and processing of datasets for ingest, pipelines, disaster recovery and data retention use cases. Instead of hard-coding complex dataset and pipeline processing logic, users can now rely on Apache Falcon for these functions Operations with Apache Ambari: HDFS 2.3 includes the very latest version of Apache Ambari and which now supports Apache Storm, Apache Falcon and Apache Hive, provides extensibility and rolling restarts, as well as other significant operational improvements.

Confidential, Phoenix AZ and Des Moines IA

Develop ELT, Reporting

Responsibilities:

  • Data Marts, Design BO Universe and data modeling for 4 sub projects. Develop ELT models with ETL
  • Informatica big data tools used with Greenplum DW, Teradata DW and Hadoop; develop data merge (CDC) delta and data streams integration processes. Developed ETL Jobs and real time data flows. Define source and target mapping of data tables to joins for aggregate data structures.
  • Developed reporting database schemes (Star and other DBS schemes) for ODS DBS, CRM DBS and operational data applications supporting Mortgage Bank DB systems called Core and other financial Operations in bank.
  • Provided gap analysis on data messaging and data storage for WF DEAL/CORE Mortgage System complete architecture documentation creation, analysis and recommendations to advanced design supporting data back end fault tolerance and monitoring transactions through the various complex routing processes. \
  • Analysis of best practices for COBIT IT audit standards, risk analysis and application of design patterns. BI reporting performance metrics collected in data mart, from monitoring tools and report development.

Confidential, Mechanicsburg, PA

Responsibilities:

  • Roles: Big Data Solutions Architect: POC with Hadoop HDFS Data warehouse and related technologies to include: Business Objects Enterprise 4. Hadoop and BI
  • Developed HIVE and Map-reduce tools to design and manage HDFS data blocks and data distribution methods. Multiple Data Sources: Teradata, Oracle and SAP BW, DB2 integrated (ETL feeds) to Hive DBS running on Hadoop HDFS foundation.
  • Stand up new Hadoop HDFS cluster environment for Navy Logistics program.
  • Deployed HIVE, PIG and H-catalog tools for data management on Hadoop HDFS clusters
  • Developed 5 ( 2.5 terabyte x 5) data clusters with multi-sources databases
  • Design and test new MapReduce processes for data analysis feeds (Sqoop and Flume) into BI reporting tools Business Objects 4 and Tableau.
  • Install and implement Informatica MDM Tools and configure with Hadoop and Neteeza DW.
  • Design Architecture plan for MDM: Multi-Sources DBS as listed:
  • SAP Data-Services ETL import to HDFS: Data sources integrated from Oracle, Neteeza, Teradata and feeding Hadoop clusters developed on Hive DW Structure.
Confidential, Juno Beach FL

Data Architect

Responsibilities:

  • Manage technical and business rules issues for Big Data design blueprint (Cassandra and Hive)
  • Design, Develop and Implement: Cassandra, and HIVE, (Hive running on Hadoop)
  • Hadoop and Hive database modeling and ETL schemes developed and tested.
  • Data sizing metrics developed Cassandra (NoSQL) analysis from migrated OLTP database.
  • Development and administration ETL cube processes with Oozie and Informatica Data Exchange.
  • SAS Metrics ported to run on SAP Hana appliance.
  • Cassandra sourced multiple data schemes such as Teradata DBS and Oracle DBS.
  • ETL processes designed and deployed for Production, QA and Development landscapes
  • MDM Architecture Deployment Plan and Blueprint recommendations (SAP MDM)

Confidential, Boise ID

Sr. DW/BI Architect

Responsibilities:

  • Designed 2012 SAP MDM and BI strategy and architecture to move client Big Data BI analysis.
  • Developed ETL- Informatica to move data in to HIVE and Hadoop Clusters.
  • Deployed BI /Data Warehouse with Data archives and Teradata DW data models for BI DB design. Develop custom application in SharePoint 2010 to capture QA Data testing results from Data Services ETL Tools. Scripts developed java script and VB script.
  • Developed INFORMAICA Power Center and Data Exchange ETL architecture for Data loads
  • Teradata DW and Data Mart Prototype Data Models
  • Data Matching to Company database and UPPS to validate new data warehouse data migration
  • Developed BI Analytics DBS (Multi-Dimensional DBS) for distributed HDFS DB models.
  • Integration and conversion of SAS metrics to run with BusinessObjects reporting and migration to Hadoop Map-reduce. Data Modeling for both Teradata and new DW
  • Designed new BI Databases with Erwin v9 - Star Scheme, 3 n form

Confidential, Jackson MI

Project Falcon Lead

Responsibilities:

  • Data Mapping and Blueprinting steps, develop business rules with ETL processes
  • Data Cleanses and Data Matching with USPS to correct company client database
  • Direct Report to PMO Team for Corporate Data Migration (Project Falcon )
  • Blueprint Roadmap for SAP MDM, BI and BW architecture
  • HANA deployment and administration and Data Steward implementation
  • Data Mapping and Blueprinting steps team effort (Team leader for 10 ETL developers)
  • Design Data Migration processes with Data Services ETL tools, and Dashboard Metrics Analysis
  • Develop MDM Plan for legacy Data and move cleansed data into new MDM structure
  • Data Cleansing Processes developed with Data Services QA Module
  • Content BI reporting: SAP SCM, SRM, Vertex, MM
  • Managed team of 6 DS developers. Manage technical and business rules issues

Confidential, Tempe, AZ

SAP MDM Architect

Responsibilities:

  • SAP HANA deployment and configuration, POC for dimensional data analysis (Cubes)
  • MDM Architecture for DBS Sources: Teradata, oracle 10g, MS SQL 2008 for HANA processing
  • BI Content Reporting to with Data Governance standards
  • Data Migration effort t cleanse data with current customers and older data with USPS
  • Architect for Security roles and SSO across SAP Landscapes
  • Configure Transports configuration, new ECC Upgrade from 5 to 6.
  • ETL Tools used Informatica Power Center 9 and SQL SSIS
  • BusinessObjects 4.0 Enterprise Web-Intelligence Reports developed for POC
  • Data Services 4 and Administration

Confidential, Atlanta GA

Sr. SAP BI and BW Data Architect

Responsibilities:

  • Designed and deployed Database Cluster with Oracle 11i RAC.
  • Global BI architecture and design details confidential. Teradata Data Source and SAP BW
  • Informatica Power Center 8 used to develop ETL processes for data migrations globally.
  • Teradata 9 Data Migrations to Oracle and SAP BW
  • Informatica Power Center ETL Architect for BI data marts

Confidential

Business Objects Team Leader and Sr. BI Architect for multiple SAP BO projects

Responsibilities:

  • QA Process to fixed, test and deploy Voyager latest build 3.2
  • Architected many new BOE XI 3.1 systems and designed Universes
  • BODS-Data Services (ETL) used for data migration schemes
  • BusinessObjects Universe Design and Report Design/ Development
  • Integration to BW Data Sources (Info sets) to EDW Teradata Database
  • Informatica Power Center 8 Implemented. Teradata ETL and Date warehouse models and more

We'd love your feedback!