Big Data & Google Cloud Consultant Resume

SUMMARY:

A hands on Big Data & Analytics Architect with 14 years’ experience and strong skill set in Big Data, Google Compute Cloud, Hadoop, Amazon AWS, Microsoft Azure. A Certified Scrum Master and Entrepreneur involved in implementation of agile projects in financial services and tech start - ups delivering high quality software in tight deadlines. Strong team player, leader and motivator.
Data Consultant for Confidential working with Google Compute Cloud Data Flow and Big Query to manage and move data within a 200 Petabyte Cloud Data Lake for GDPR Compliance
Confidential Big Data Architect working with AWS EMR, Redshift Specturm, Athena, Glue, Data Catalogs, Kinesis streams, Informatica, Lambda functions, Cloudwatch
Big Data and Analytics Architect for Confidential in New York creating a multi region Azure cloud based data lakes using MS Azure HDInsight, SQL DW, Azure Blobs, AzureSQL, DocumentDB, HBase, Spark, Kafka, Docker, Kubernetes, PMML, IBM BigIntegrate
Consultant working on Confidential ’s Data Architecture for a data lake with PB+ data volume using Hortonworks, AWS, Hive, ORC Files, Falcon, Cascading, Flume, Cassandra, Spark
Designed a data-warehouse for Confidential using Amazon AWS EMR, hadoop, Pig, Hive, Sqoop, AWS EC2, S3, Kinesis, Cloud Formation, Data Pipeline, Redshift, Qlikview.
Designed ETL flows with Oracle ODI to process up to 100 million banking transactions a day. Certified Scrum Master, MSc in Information Systems and Oracle Certified Professional.

COMPUTER PROFICIENCY:

CAREER SUMMARY:

Confidential, New York

Big Data & Google Cloud Consultant

Responsibilities:

Consultant for Confidential working with Google Compute Cloud Data Flow and Big Query to manage and move data within a 200 Petabyte Cloud Data Lake for GDPR Compliance.
Designed Google Compute Cloud Data Flow jobs that move data within a 200 PB data lake
Implemented scripts that load Google Big Query data and run queries to export data

Confidential, New York

Big Data & Cloud Architect

Responsibilities:

Consultant working on Confidential clients like NovoNordisk creating a AWS based cloud data lakes.
Designed the NovoNordisk data architecture with AWS EMR, Redshift Specturm, Athena, Glue, Data Catalogs, Kinesis streams, Informatica, Lambda functions, Cloudwatch etc
Created EMR Clusters with Autoscaling with Core and Task nodes and Ranger integration
Created and performance tuned Redshift clusters with encryption, Redshift spectrum based S3 data queries
Created AWS Lambda function which loads data from s3 into Redshift with batch mode with SNS and DynamoDB integration

Confidential, Bay Area, New York

Co Founder

Responsibilities:

Confidential, New York

Big Data & Analytics Architect

Responsibilities:

Defined Global Data Architecture using MS Azure HDInsight, SQL DW, Azure Blobs, AzureSQL, DocumentDB, HBase, Neo4J, Spark, Kafka, Polybase, IBM DataStage / BigIntegrate, Infosphere IGC
Designed Analytics and Model management Archite cture for Python, Scala, R, Java based models, Zementis ADAPA / PMML based model deployment, Microsoft RevoR, AzureML, PowerBI
Defined model runtime and management, using Docker containers, Kubernetes, REST APIs
Designed data / ETL pipelines using Azure Data Factory, IBM DataStage / Infosphere, Azure Copy, Polybase, Multi region data replication

Confidential, New York, Pittsburgh

Big Data Architect

Responsibilities:

Defined ETL strategy using Informatica Big Data Edition’s Blaze component that can work in Hive / Map Reduce / Spark / Native mode which abstracts away mappings from actual code.
Created road map for securing sensitive PHI and other health data at Highmark using using HDFS TDE & OS Encryption strategies, Ranger, Knox, Kerberos, Protegrity, Dataguise
Designed cluster Architecture to separate Analytics and Operational workloads and merge in Archiving and DR options into it.
Defined big data virtualization strategy comparing various tools like Denodo, Cisco Composite and Informatica Data Services

Confidential, Seattle

Big Data Architect

Responsibilities:

Created Data Architecture roadmap and Data Governance policies for big data and presented to C Level execs
Designed cluster architecture for components like Hortonworks, AWS EMR, Spark, Falcon, Oozie, ORC, Cassandra, HDFS, Flume & Kafka for streaming
Implemented Data Governance policies with Knox and Ranger for data access management & auditing and Apache Atlas for Meta & Master Data Management
Created HDFS structures’ best practices and AWS S3 bucket naming & security policies
Pushed ORC file adoption which resulted in 3x faster jobs and 2.5x better compression
Designed Data Frames and RDDs for Spark jobs that ran 20 x faster than older MR jobs
Tuned Spark jobs using confs & hive queries using ORC columnar, Tez, CBO explain plans
Designed falcon jobs to replicate between environments and archival using HDFS tiers
Created conformed dimensions like Geo Dim and designed porting into Hadoop as UDFs / Lucene index and finally into Data Stax Cassandra to support recommender systems

Confidential

Big Data Architect

Responsibilities:

Designed a data-warehouse from scratch to process up to 50TB of data using hadoop.
Analyzed clickstream data from Google analytics with Big Query.
Capacity planned for cloud infrastructure with AWS console, EC2 Instances, AWS EMR, S3
Created AWS EMR cluster using Cloud Formation scripts, IAM, Kinesis Firehose, Data Pipeline,
Designed fact tables living in Amazon s3 in flat files in Avro and Parquet format on which Hive based table structures and expose the fact tables to Qlikview reporting.
Designed Sqoop based data exchange from Redshift, Oracle, SQL Server, Mysql
Designed star schemas in Amazon Redshift using compression encodings, data distribution keys, sort keys, and table constraints
Designed APIs to load data from Omniture, Google Analytics, Google Big Query

Confidential, New York

Co-Founder & Data Architect

Responsibilities:

Created Confidential App, which was featured on the Discovery Channel, Chicago Tribune
Tuned EMR Spark cluster / jobs by Spark Executors’ memory, Containers, EC2 instances,
Set up Cloudera CDH5 Hadoop cluster with hdfs, hbase, pig, oozie, zookeeper to store and process GTFS and user data.
Set up Data Stax Cassandra cluster, Ops Centre for processing of temporal geospatial data and implemented connector for Spark execute on Cassandra data frames.