We provide IT Staff Augmentation Services!

Graph Data Architect Resume

4.00/5 (Submit Your Rating)

SUMMARY

  • 8 years of IT experience in analysis, design, implementation & administration of Business Intelligence solutions (SSIS/SSRS), DATASTAGE 9.1, NETEZZA TWINFIN/MUSTANG, Linux (Rhel, Suse), EXCEL MACRO VBA, SQL with MS T - SQL, NZ-SQL, PL-SQL, Big Data Hadoop/Haas, SPARK, Hive, Sqoop in development, testing & production environment.
  • Expert database designer and fluent in all aspects of database development including, E-R modeling, Dimensional modeling, Star/Snowflake schema normalization and optimization using Kimball methodology.
  • Experience with Semantic Web ( OWL/RDF ) based model driven development
  • Strong concepts of Data Warehousing strategies for ETL, ELT and SCD etc.
  • Experienced in Big Data / Hadoop ecosystem Data transfer, CDC and data Transformations
  • Extensive experiences in developing Stored Procedures, UDF, Views, CTE, Triggers, Cursors, and Complex Queries by T-SQL, NZ PL/SQL, and Oracle PL/SQL.
  • Solid understanding of IBM Netezza, MS SQL Server, Hadoop, Hive, Spark, NoSQL in Dev and Testing
  • Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Netezza
  • Performance Tuning, and Optimization, using SQL Profiler, Performance Monitor and Query Optimizer in OLTP, logs in Hive NoSQL, and toDebugString in Spark RDD.
  • Good understanding in Netezza Database Backup-Restore, investigating and applying necessary Linux Server upgrades, hot fixes and Security Patches
  • Hands on experience with Netezza in NZSQL, NZLOAD, External Tables, Workload management and Resource Allocation.
  • Experience in migrating the data using Sqoop from HDFS to Relational Database System (Netezza) and vice-versa according to client's requirement.
  • Experienced Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD’s.
  • Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive, NoSQL and Netezza.
  • Experienced in performance tuning of Spark jobs by setting Batch Interval time, correct level of Parallelism and memory tuning.
  • Experienced in performance tuning of Spark Context by specifying number of Executer, number of cores for each executer and amount of memory for each executer.
  • Highly proficient in design and development of DataStage Jobs and SSIS packages,
  • Experienced on linking servers with Linux MOUNT Services to work on different platforms to run DS Jobs required File transfers or other Mounting benefits between Linux to Linux or Windows to Linux Servers
  • Experienced in writing Scripts using BASH, PYTHON, SCALA, SQL, Macro VBA, BASIC, and touch base in C#, JAVA.
  • Extensive experience in Requirement Analysis, Work flow analysis, Design, Development & Implementation, Testing & Deployment of complete software development life cycle (SDLC) in field of Desktop Applications, Microsoft web and client/server technologies.

TECHNICAL SKILLS

  • BigData Architecture
  • NETEZZA
  • Hadoop/Haas
  • SPARK
  • Hive
  • Beehive Sqoop
  • MS SQL Server
  • MDM
  • DQS
  • IBM DATASTAGE 8/9
  • TOAD
  • ODBC Admin Management
  • ERWIN Platinum
  • MS Visio
  • ER-Designer
  • T-SQL
  • VB
  • ADO.NET
  • HTML
  • XML
  • Scala
  • LINUX bash shell scripting

PROFESSIONAL EXPERIENCE

Confidential

Graph Data Architect

Responsibilities:

  • Architecting/Building/ RDF Data data models in Turtle (.ttl) format
  • Building Scala Api for backend support for Graph Database User Interface
  • Worked on back-end using Scala 2.12.0 to perform Http Client logics to publish RDF triples.
  • RDF / Semantic Web developing in company’s network to define categories of data.
  • Built conceptual RDF models representing the structure of SPO relations,
  • Built api, in the credit derivatives space in reference data (client or product) by distributed frameworks (hbase, hadoop, spark)
  • Coded Scala Api to insert/delete Predicates in Graph DB after Transforming and mapping incoming data
  • Integrated Front to back sales/trading system to manage position, pricing and risk management system for all financial instruments with Python
  • Created Jenkins devops CI/CD by Git code pipeline for scala and RDF code

Confidential

Big Data Technical Architect

Responsibilities:

  • Responsible for Creating Spark Scala Scripts for Type 2 CDC
  • Replaced DW ETL Process With open Source Apache Spark
  • Created Hadoop Processes, Hive/impala Tables
  • Created Spark DataFrame API, Linux Shell and Hadoop Cluster maintaining Scripts
  • Consolidated Hadoop Cluster and Replaced DW ETL Processes with Apache Spark and Impala
  • Created POC for Hadoop Hive hql to Spark RDD (16 Nodes-version 1.6), and Converted Hive hql queries to Spark HiveContext processes
  • Reduced Data with using Schema design for data sets on Spark RDD, Spark SQL, Hive on Tez
  • Created Spark DataFrame application to read from HDFS and analyzed 10 TB data using Yard framework to measure performance

Confidential

Cloudera Hadoop Admin/Architect

Responsibilities:

  • Deployed CDH Hadoop Cluster and relevant all Cloudera Applications
  • Developed Stream line Flume->Spark->Hbase->Solr Application
  • Backup/snapshot Linux Server Nodes as well as CDH cluster
  • Created/ configured NRT Lilly indexer to Solr to Banana Dashboard
  • Architect the Data Lake, Landing Zone, Raw in curated, and consumption zones
  • All other Hadoop admin /responsibilities

Confidential

Cloudera Hadoop Architect

Responsibilities:

  • Implemented Big Data solutions and analyzed Hadoop requirements for best design approach
  • Developed and Managed CDH Hadoop logical and software architecture.
  • Designed, optimized and executed Impala SQL for data analytical systems.
  • Reviewed administrator processes and updated system configuration documentation.
  • Maintained Hadoop Clusters, created solutions for failures and assist CDH admins

Confidential

Big Data Risk Analysis Consultant

Responsibilities:

  • Tested/Moved Autosys jobs from retired DB Server to New DB Server by ensuring Dependency is not affected
  • Created Autosys jobs in New DB by using the job definitions and dependencies from retired Server
  • Created Control tables in Netezza to manage full control of Monthly Data load Process
  • Added CDC load, Reinstatement load and Full Load options to monthly Data load
  • Set up Jobs to transfer extract files from linux to Windows servers using NDM
  • Created Databases, Tables, Views in New Server that matches the Schema of the old Database
  • Created extract .csv files using external tables by Netezza
  • Created bash Scripts in Linux to Give report about Batch and Box Jobs in Autosys
  • Fixed View Definition bugs in NZ-SQL in Netezza
  • Optimized the query performance in Netezza in Newly consolidated server by using Aginity SPU utilization and Query plans
  • Created VBA Macro codes in Excel by using ActiveX and Form controls
  • Created DSN connection modification script in VBA for Pivot Table source schema.ini to NZ created extract .csv files
  • Wrote VBA Macro codes in Excel to change DSN connection String of Pivot Tables
  • Created Reports in Excel by using Pivot Table Functionality and sourcing the Data from Netezza
  • Created Reports in Excel by using Pivot Table sourcing the .csv files created in Netezza
  • Created identical DataMart and created its’ Process flow from NZ into Hadoop
  • Created DataMarts in NoSQL Hadoop Hive with added CDC and Reinstatements from Netezza
  • Imported Data from NZ to HDFS using Sqoop and Hive to Aggregate the data in Yarn using NoSQL
  • Set up Impala db and connected to Java Web Server REST Api, Serverlets with POJO and DAO library
  • Worked on a live 120 nodes Hadoop cluster running CDH4.4, 30 TB data size 240 TB capacity, RF= 3
  • Imported, exported, Analyzed and Transformed data into HDFS and Hive (version 0.1) using Sqoop, created hql scripts.
  • Migrated hadoop cluster of 120 edge nodes to other shared cluster (HaaS - Hadoop as a service) and setup the environments (DEV, SIT and UAT) from scratch.
  • Spark: Created POC for Hadoop Hive hql to Spark RDD (16 Nodes-version 1.6), and Converted Hive hql queries to Spark HiveContext processes
  • Reduced Data with using Schema design for data sets on Spark RDD, SparkQL, Hive, Hive on Tez, Impala
  • Created Spark DataFrame application to read from HDFS and analyzed 10 TB data using Yard framework to measure performance
  • Used Partitions, Bucketing in Hive designed, Managed and created External tables in Hive for performance optimization
  • Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into Netezza OLTP system through Sqoop.
  • Optimized existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD’s.
  • Optimized Spark Context by specifying number of Executer, number of cores for each executer and amount of memory for each executer.
  • Responsible for all the SLA meet times to make sure the Hadoop job’s run in time.

Tools: Netezza Striper, Spark, Impala, Hadoop Yarn, Hive, Sqoop, Autosys, Scala, Linux Batch, Excel Macro VBA, Excel Pivot, SVN, ODBC/DSN connection mngr

Confidential, St. Louis, MO

BigData Mortgage Risk Analyst

Responsibilities:

  • Involved in gathering user requirements, application design, analysis and testing (debugging) of hardware requirements to meet business needs.
  • Reverse engineered DB Schema models and staging area models from existing SQL Server 2012 database, modified and configured into SQL Server 2014 database using VS 2010
  • Used VS 2010, Normalization, Dimension Modeling and Enterprise Manager for Logical and physical database development.
  • Created new OLAP database and DataMarts on Netezza Twinfin for analytic report end users (OLAP)
  • Created new OLTP database on MS SQL Server 2014 in Normalized from for upsert data transaction.
  • Created new database objects like Tables, Sprocs, functions, Materialized Views on Netezza, cursors, tables, views on MS SQL according to business logics.
  • Developed and maintained Netezza NZSQL coding and designed Netezza destination tables.
  • Created SSIS packages to load files into stage tables in Netezza DW by using Bulk Copy Program (BCP ) and NZLOAD
  • Developed complete end to end Bigdata processing in Hadoop echo system, UDF/Mapreduce jobs, SFTP to transfer and receive the files from various upstream and downstream systems.
  • Optimized hive scripts to use HDFS efficient by using various compression mechanisms.
  • Created hive schemas using performance techniques like partitioning, bucketing and Implemented all VAP processing’s in hive tables.
  • Written sqoop scripts to import and export Hive tables data in various RDBMS systems.
  • Written PIG scripts to process unstructured data and available to process in Hive.
  • Developed Unix Shell Scripts to automate the Data Load processes to the target Netezza DB
  • Transformed bulk amount of data from 2 Flat file sources into Netezza DW by using External Table
  • Developed SSIS packages to import the data from Flat Files, Oracle and SQL Server into new OLTP DB IN SQL Server and carried OLTP data into Netezza with other newly created IS packages.
  • Created SSIS packages to clean, transform, and moved data using Lookup, conditional split, for each loop, multi cast, data conversion, Derived column, script components and data transformations.
  • Create VBA programs to automatically update Excel workbooks, encompassing class and program modules and external data queries.
  • Extensively use Excel functions in development, focusing on read/write integration to databases.
  • Interacted with Risk Managers, and Portfolio Managers to get inputs for portfolio instruments to calculate risk analysis metrics in VBA Macro
  • Created new UDF Function in VBA Macro to make formulas dynamic that used in report tabs
  • Updated JIRA tickets to match evolving business requirements and to assign tickets back to developers with fix requests or approvals as reports are tested.

Tools: SQL Server 2012/14 Netezza Twinfin, Hadoop, hive, Scoop, Pig Excel Macro VBA, MS VS 2010, SSIS 2012/14

Confidential, Kansas City, MO

Lead Data Warehouse Analyst/DBA/DBD

Responsibilities:

  • Prepared strategic plans for Netezza data warehousing projects and related quality documentation.
  • Migrated the production database to Disaster Recovery (DR) unit, and scheduled Linux shell scripts for automated executions that provides ‘nz restore’ incremental replica of the production database.
  • Performed establishment and maintenance of databases for assigned projects as per business requirements.
  • Implemented procedures for daily database monitoring, backup and configuration changes.
  • Developed queries, backups, and session management and log management records in detailed manner.
  • Created new Netezza Data Base objects like tables, materialized views, Functions, Stored procedures.
  • Migrated Netezza Mustang to Netezza Twinfin
  • Recreated backup and recovery scripts of databases through third party devices and Netezza utilities that decreased Prod Downtime from 45 min to 8 min
  • Provided Admin assistance for efficient performance and security functions of Netezza Twinfin.
  • Created Stored Procedures (Over 1000 lines ) for retrieving Metadata and creating the estimate values of missing Data and replacing it automatically btwn NETEZZA and MS SQL 2012.
  • Created custom Event Rules and modified existing ones
  • Controlled Netezza Work Load Management and Resource Allocations for resource utilization and efficient user service.
  • Tracking and managing DataStage server performance and usage
  • Good knowing of Details Server DataStage Administration
  • Debugged and solved connectivity issues between Infosphere server and source/destination Databases.
  • Designed and daily scheduled DataStage Project Backup file (.dsx) with Linux shell scripting
  • Designed the ETL processes using DataStage to load data from SQL Server 2012 and Netezza
  • Designed, developed DataStage mappings, enabling the extract, transport and loading of the data into target tables.
  • Used DataStage stages namely Datasets, Sort, Lookup, Peek, Standardization, Row Generator stages, Remove Duplicates, Filter, External Filter, Aggregator, Funnel, Modify, and Column Export in accomplishing the ETL Coding.
  • Tuned DataStage jobs to enhance their performance.
  • Used DataStage Director and its run-time engine to schedule running the solution, testing and debugging its components, and monitoring the resulting executable versions.
  • Wrote Release notes, Deployment documents and scheduled the jobs via AS400 ROBOT Scheduler.
  • Used DataStage Parallel Extender stages namely Datasets, Sort, Lookup, Change Capture, Funnel, Peek, Row Generator stages in accomplishing the ETL Coding.
  • Designed and developed the jobs for extracting, transforming, integrating, and loading data using DataStage Designer.
  • Developed job sequencer with proper job dependencies, job control stages, triggers.
  • Used the DataStage Director and its run-time engine to monitor and troubleshoot running jobs.

We'd love your feedback!