We provide IT Staff Augmentation Services!

Sr. Big Data Architect Resume

2.00/5 (Submit Your Rating)

Kansas City, MO

SUMMARY:

  • 10+ years of experience in SDLC with key emphasis on the trending Big Data Technologies - Spark, Scala, Spark Mlib, Hadoop, Tableau, Cassandra.
  • Good Knowledge of Big Data and Data Warehouse Architecture and Designing Star Schema, Snow flake Schema , Fact and Dimensional Tables, Physical and Logical Data Modeling using Erwin, ER Studio. 
  • Architect, design & develop Big Data Solutions practice including set up Big Data roadmap, build supporting infrastructure and team to provide Big Data.
  • Architecting and implementing Portfolio Recommendation Analytics Engine using Hadoop MR, Oozie, Spark SQL, Spark Mlib and Cassandra. 
  • Excellent understanding of Hadoop architecture and underlying framework including storage management.
  • Extensive experience in data modeling, data architect, solution architect, data warehousing & business intelligence concepts and master data management (MDM) concepts. 
  • Expertise in architecting Big data solutions using Data ingestion, Data Storage
  • Experienced in Worked on NoSQL databases - Hbase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Extensive knowledge in architecture design of Extract, Transform, Load environment using Informatica Power Center
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Experience in integration of various data sources definitions like SQL Server, Oracle, Sybase, ODBC connectors & Flat Files.
  • Experience in Handling Huge volume of data in/out from Teradata/Big Data.
  • Experience in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and Map Reduce open source tools/technologies. 
  • Architecting, Solutioning and Modeling DI (Data Integrity) Platforms using sqoop, flume, kafka, Spark Streaming, Spark Mllib, Cassandra.
  • Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
  • Strong expertise on Amazon AWS EC2, Dynamo DB, S3, Kinesis and other services 
  • Expertise in data analysis, design and modeling using tools like ErWin.
  • Expertise in Big Data architecture like hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL.
  • Extensive experience in using Teradata BTEQ, FLOAD, MLOAD, FASTXPORT utilities.
  • Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Experienced in using various Hadoop infrastructures such as Map Reduce , Hive , Sqoop , and Oozie .
  • Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloud watch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshit, RDS, Aethna, Zeppelin & Airflow. 
  • Experienced in testing data in HDFS and Hive for each transaction of data.
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
  • Strong Experience in working with Databases like Oracle 12C/11g/10g/9i, DB2, SQL Server 2008 and MySQL and proficiency in writing complex SQL queries.
  • Experienced in using database tools like SQL Navigator, TOAD.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
  • Experienced in using Flume to transfer log data files to Hadoop Distributed File System (HDFS)
  • Knowledge and experience in job work-flow scheduling and monitoring tools like Oozie and Zookeeper.
  • Good experience in Shell programming.
  • Knowledge in configuration and managing- Cloudera’s Hadoop platform along with CDH3&4 clusters.
  • Knowledge and experience of architecture and functionality of NOSQL DB like Cassandra and Mongo DB .

TECHNICAL SKILLS:

Hadoop /Big Data: Sqoop, Oozie, Flume, Scala, Akka, Kafka, Storm, Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper.

Data Modeling Tools: Erwin R6/R9, Rational System Architect, IBM Infosphere Data Architect, ER Studio and Oracle Designer.

No SQL Databases: Cassandra, mongo DB, Dynamo DB

Database Tools: Microsoft SQL Server12.0, Teradata 15.0, Oracle 11g/9i/12c and MS Access 

Frameworks: MVC, Struts, Spring, Hibernate.

Operating Systems: UNIX, Linux, Windows, Centos, Sun Solaris.

Databases: Oracle 12c/11g/10g/9i, Microsoft Access, MS SQL

Languages: PL/SQL, Pig Latin, HQL, R, Python, XPath, Spark

ETL/Data warehouse Tools: Informatica 9.6/9.1/8.6.1/8.1 , SAP Business Objects XIR3.1/XIR2.

Tools: OBIE 10g/11g, SAP ECC6 EHP5, Go to meeting, Docusign, Insidesales.com, Share point, Mat-lab.

PROFESSIONAL EXPERIENCE:

Confidential, Kansas City, MO

Sr. Big Data Architect

Responsibilities:

  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive.
  • Implementation of Big Data ecosystem (Hive, Impala, Sqoop, Flume, Spark, Lambda) with Cloud Architecture.
  • Designed and deployed full SDLC of AWS Hadoop cluster based on client's business need. 
  • Experience on BI reporting with at Scale OLAP for Big Data.
  • Developed Complex ETL code through Data manager to design BI related Cubes for data analysis at corporate level. 
  • Unified data lake architecture integrating various data sources on Hadoop architecture.
  • Used Sqoop to import the data from RDMS to Hadoop Distributed File System (HDFS).
  • Involved in loading and transforming large sets of data and analyzed them by running Hive queries and Pig scripts.
  • Developed software routines in R, Spark, SQL to automate large datasets calculation and aggregation.
  • Integrated NoSQL database like Hbase with Map Reduce to move bulk amount of data into HBase.
  • Redesigned the existing Informatica ETL mappings & workflows using Spark SQL. 
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts. 
  • Ingest data into Hadoop/Hive/HDFS from different data sources.
  • Writing Scala code to run SPARK jobs in Hadoop HDFS cluster .
  • Define and manage the architecture and life cycle of Hadoop and SPARK projects
  • Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
  • Designed the data processing approach within Hadoop using Pig.
  • Identify query duplication, complexity and dependency to minimize migration efforts 
  • Technology stack: Oracle, Hortonworks HDP cluster, Attunity Visibility, Cloudera Navigator Optimizer, AWS Cloud and Dynamo DB.
  • Experience in AWS, implementing solutions using services like (EC2, S3, RDS, Redshift, VPC)
  • Worked as a Hadoop consultant on (Map Reduce/Pig/HIVE/Sqoop).
  • Worked with Spark and Python.
  • Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time. 
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
  • Performed data profiling and transformation on the raw data using Pig, Python.
  • Experienced with batch processing of data sources using Apache Spark.
  • Developing predictive analytic using Apache Spark Scala APIs.
  • Involved in working of big data analysis using Pig and User defined functions (UDF).
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers. 
  • Implement enterprise grade platform (mark logic) for ETL from mainframe to NOSQL (cassandra).
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Developed Spark streaming application to pull data from cloud to Hive table.
  • Used Spark SQL to process the huge amount of structured data.
  • Assigned name to each of the columns using case class option in Scala.
  • Implemented Spark GraphX application to analyze guest behavior for data science segments.
  • Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
  • Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP) and MapR.
  • Experience in integrating oozie logs to kibana dashboard. 
  • Extracted the data from MySQL, AWS RedShift into HDFS using Sqoop. 
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Experience on BI reporting with at Scale OLAP for Big Data
  • Expert in performing business analytical scripts using Hive SQL
  • Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs. 
  • Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
  • Supported the daily/weekly ETL batches in the Production environment .
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS

Environment: Big Data, Informatica, Sybase, Spark, YARN, HIVE, Pig, Scala, Python, Hadoop, AWS, Dynamo DB, Kibana, Cloudera, EMR, Redshift, NOSQL, Machine learning, Sqoop, MYSQL.

Confidential, Auburn Hills, MI

Big Data Architect

Responsibilities:

  • Architecting, managing and delivering the technical projects /products for various business groups. . 
  • All the data was loaded from our relational DBs to HIVE using Sqoop. We were getting four flat files from different vendors. These were all in different formats e.g. text, EDI and XML formats.
  • Architected all the ETL data loads coming in from the source system and loading into the data warehouse .
  • Ingest data into Hadoop / Hive/HDFS from different data sources. 
  • Created Hive External tables to stage data and then move the data from Staging to main tables 
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
  • Experienced in working with Apache Storm. 
  • Implemented all the data quality rules in Informatica data quality. 
  • Involved in Oracle PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Migrated large volume of PB data warehouse data to HDFS.
  • Utilize AWS services with focus on big data Architect /analytics / enterprise data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making.
  • Experience in data cleansing and data mining. 
  • Design AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function.
  • Worked on tools Flume, Storm and Spark
  • Proof-of-concept to determine feasibility and product evaluation of Big Data products
  • Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Design of Redshift Data model, Redshift Performance improvements/analysis
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Developed Spark jobs to transform the data in HDFS.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • AWS Cloud and On-Premise environments with Infrastructure Provisioning / Configuration.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
  • Involved in developing Map-reduce framework, writing queries scheduling map-reduce
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.

Environment: Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, AWS, AWS EMR, Redshift, Flume, Apache Hadoop, Informatica Data Quality, Informatica Metadata Manager, HDFS, Hive, Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.

 Confidential, Chicago, IL

Sr. Data Engineer

Responsibilities:

  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data. 
  • Developed data Mart for the base data in Star Schema, Snow-Flake Schema involved in developing the data warehouse for the database.
  • Worked on Unit Testing for three reports and created SQL Test Scripts for each report as required 
  • Extensively used Erwin as the main tool for modeling along with Visio.
  • Used R machine learning package predicted performance of certain samples.
  • Worked on HDFS and Cassandra with huge amounts of data using Apache Kafka.
  • Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
  • Involved in forecast based on the present results and insights derived from Data analysis.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Populated HDFS and Cassandra with huge amounts of data using Apache Kafka
  • Worked on Metadata Repository (MRM) for maintaining the definitions and mapping rules up to mark.
  • Trained Spotfire tool and gave guidance in creating Spotfire Visualizations to couple of colleagues 
  • Integrated NoSQL database like Hbase with Map Reduce to move bulk amount of data into HBase. 
  • Develop complex ETL mappings using Informatica Power Center and sessions using Informatica workflow manager.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting. 
  • Experienced with Pig Latin operations and writing Pig UDF's to perform analytics.
  • Analyzed the business requirements by dividing them into subject areas and understood the data flow within the organization.
  • Worked on Programming using PL/SQL, Stored Procedures, Functions, Packages, Database triggers for Oracle.
  • Designed various practical Data Visualizations, Charts, Dashboards, Prototypes and Demo, published it in various Tableau workbooks for Analytical Projects and Data Visualization teams. 
  • Created a Data Mapping document after each assignment and wrote the transformation rules for each field as applicable.
  • Analyzing data with Hive, Pig and Hadoop Streaming.
  • Worked on Unit Testing for three reports and created SQL Test Scripts for each report as required 
  • Configured & developed the triggers, workflows, validation rules & having hands on the deployment process from one sandbox to other.
  • Created automatic field updates via workflows and triggers to satisfy internal compliance requirement of stamping certain data on a call during submission.
  • Extensively used Erwin as the main tool for modeling along with Visio 
  • Established and maintained comprehensive data model documentation including detailed descriptions of business entities, attributes, and data relationships. 
  • Worked on Metadata Repository (MRM) for maintaining the definitions and mapping rules up to mark.
  • Developed data Mart for the base data in Star Schema, Snow-Flake Schema involved in developing the data warehouse for the database.
  • Developed enhancements to Mongo DB architecture to improve performance and scalability.
  • Forward Engineering the Data models, Reverse Engineering on the existing Data Models and Updates the Data models.
  • Performed data cleaning and data manipulation activities using NZSQL utility.
  • Analyzed the physical data model to understand the relationship between existing tables. Cleansed the unwanted tables and columns as per the requirements as part of the duty being a Data Analyst. 
  • Analyzed and understood the architectural design of the project in a step by step process along with the data flow .
  • Created DDL scripts for implementing Data Modeling changes. Created ERWIN reports in HTML, RTF format depending upon the requirement, Published Data model in model mart, created naming convention files, co-coordinated with DBAs' to apply the data model changes. 

Environment: Erwin r8.2, Oracle SQL Developer, Oracle Data Modeler, Informatica Power Center, Hive, Apache Storm, Sqoop, Flume, Oozie, Apache Kafka, Zookeeper, HBase, Machine Learning, Hadoop, HDFS, Map Reduce, Teradata 14, SSIS, R, Business Objects, SQL Server 2008, Windows XP, MS Excel.

Confidential

Data Analyst

Responsibilities:

  • Performed standard management duties on SQL Server database including creating users accounts, monitoring daily backups, and performing regular analysis of database performance to suggest improvements.
  • Assisted Kronos project team in SQL Server Reporting Services installation.
  • Developed SQL Server database to replace existing Access databases.
  • Performed testing and analysis of databases using SQL Server analysis tools.
  • Translated business requirements into working logical and physical data models for Data warehouse, Data marts and OLAP applications.
  • Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
  • Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
  • Optimized the existing procedures and SQL statements for the better performance using EXPLAIN PLAN, HINTS, SQL TRACE and etc. to tune SQL queries.
  • The interfaces were developed to be able to connect to multiple databases like SQL server and oracle.
  • Designed and created web applications to receive query string input from customers and facilitate entering the data into SQL Server databases.
  • Performed thorough data analysis for the purpose of overhauling the database using SQL Server.
  • Designed and implemented business intelligence to support sales and operations functions to increase customer satisfaction.
  • Converted physical database models from logical models, to build/generate DDLscripts
  • Maintained warehouse metadata, naming standards and warehouse standards for future application development. 
  • Extensively used ETL to load data from DB2, Oracle databases. 
  • Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
  • Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Expertise and worked on Physical, logical and conceptual data model
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using star and snow flake Schemas
  • Wrote and executed unit, system, integration and UAT scripts in a data warehouse projects.
  • Extensively used ETL methodology for supporting data extraction, transformations and loading processing, in a complex EDW using Informatica.
  • Worked and experienced on Star Schema, DB2 and IMS DB.

Environment: ERWIN, UNIX, Oracle, PL/SQL, DB2, Teradata SQL assistant, DQ analyzer

We'd love your feedback!