Big Data & Google Cloud Consultant Resume
4.00/5 (Submit Your Rating)
SUMMARY:
- A hands on Big Data & Analytics Architect with 14 years’ experience and strong skill set in Big Data, Google Compute Cloud, Hadoop, Amazon AWS, Microsoft Azure. A Certified Scrum Master and Entrepreneur involved in implementation of agile projects in financial services and tech start - ups delivering high quality software in tight deadlines. Strong team player, leader and motivator.
- Data Consultant for Confidential working with Google Compute Cloud Data Flow and Big Query to manage and move data within a 200 Petabyte Cloud Data Lake for GDPR Compliance
- Confidential Big Data Architect working with AWS EMR, Redshift Specturm, Athena, Glue, Data Catalogs, Kinesis streams, Informatica, Lambda functions, Cloudwatch
- Big Data and Analytics Architect for Confidential in New York creating a multi region Azure cloud based data lakes using MS Azure HDInsight, SQL DW, Azure Blobs, AzureSQL, DocumentDB, HBase, Spark, Kafka, Docker, Kubernetes, PMML, IBM BigIntegrate
- Consultant working on Confidential ’s Data Architecture for a data lake with PB+ data volume using Hortonworks, AWS, Hive, ORC Files, Falcon, Cascading, Flume, Cassandra, Spark
- Designed a data-warehouse for Confidential using Amazon AWS EMR, hadoop, Pig, Hive, Sqoop, AWS EC2, S3, Kinesis, Cloud Formation, Data Pipeline, Redshift, Qlikview.
- Designed ETL flows with Oracle ODI to process up to 100 million banking transactions a day. Certified Scrum Master, MSc in Information Systems and Oracle Certified Professional.
COMPUTER PROFICIENCY:
- Hadoop, Map-Reduce, Flume, Pig, Spark
- Docker, Kubernetes, Azure Container Services
- Hadoop YARN, Cascading, Ambari, Hue
- Azure Data Factory, Polybase, AWS Athena
- Java, PMML, Python, AzureML
- Zementis ADAPA, ODG FastScore
- Informatica BDE, IBM DataStage / BigIntegrate
- Hadoop HDFS, AWS S3, Azure Blob
- Oracle Data Integrator 11G / ETL
- AWS Data Pipeline, Kinesis, Azure Data Factory
- Enterprise Architect, ERWin
- PowerBI, Qlikview
- Cloudera CDH, Hortonworks, AWS EMR, HDInsight
- Apache Falcon, Oozie
- Google Big Query, AWS Redshift Spectrum
- Kafka, Spark Streaming, Kinesis Streams
- Hbase, DocumentDB, Cassandra
- Lucene, Solr, Kafka, Sqoop
- Teradata, Azure SQLDW, Oracle 11G
- Jira, Bamboo, Confluence, Gliffy, Fisheye
- Liquibase / Flyway (DB Change Mngt)
- Linux (shell, bash scripting)
- Git, Tortoise Sub Version SVN
- Linux Red Hat, Centos, AWS EC2, CoreOS
CAREER SUMMARY:
Confidential, New York
Big Data & Google Cloud Consultant
Responsibilities:
- Consultant for Confidential working with Google Compute Cloud Data Flow and Big Query to manage and move data within a 200 Petabyte Cloud Data Lake for GDPR Compliance.
- Designed Google Compute Cloud Data Flow jobs that move data within a 200 PB data lake
- Implemented scripts that load Google Big Query data and run queries to export data
Confidential, New York
Big Data & Cloud Architect
Responsibilities:
- Consultant working on Confidential clients like NovoNordisk creating a AWS based cloud data lakes.
- Designed the NovoNordisk data architecture with AWS EMR, Redshift Specturm, Athena, Glue, Data Catalogs, Kinesis streams, Informatica, Lambda functions, Cloudwatch etc
- Created EMR Clusters with Autoscaling with Core and Task nodes and Ranger integration
- Created and performance tuned Redshift clusters with encryption, Redshift spectrum based S3 data queries
- Created AWS Lambda function which loads data from s3 into Redshift with batch mode with SNS and DynamoDB integration
Confidential, Bay Area, New York
Co Founder
Responsibilities:
- Created ARKit2 based AR apps
- Created GCP ML Engine & Tensorflow based AI Training processes
Confidential, New York
Big Data & Analytics Architect
Responsibilities:
- Defined Global Data Architecture using MS Azure HDInsight, SQL DW, Azure Blobs, AzureSQL, DocumentDB, HBase, Neo4J, Spark, Kafka, Polybase, IBM DataStage / BigIntegrate, Infosphere IGC
- Designed Analytics and Model management Archite cture for Python, Scala, R, Java based models, Zementis ADAPA / PMML based model deployment, Microsoft RevoR, AzureML, PowerBI
- Defined model runtime and management, using Docker containers, Kubernetes, REST APIs
- Designed data / ETL pipelines using Azure Data Factory, IBM DataStage / Infosphere, Azure Copy, Polybase, Multi region data replication
Confidential, New York, Pittsburgh
Big Data Architect
Responsibilities:
- Defined ETL strategy using Informatica Big Data Edition’s Blaze component that can work in Hive / Map Reduce / Spark / Native mode which abstracts away mappings from actual code.
- Created road map for securing sensitive PHI and other health data at Highmark using using HDFS TDE & OS Encryption strategies, Ranger, Knox, Kerberos, Protegrity, Dataguise
- Designed cluster Architecture to separate Analytics and Operational workloads and merge in Archiving and DR options into it.
- Defined big data virtualization strategy comparing various tools like Denodo, Cisco Composite and Informatica Data Services
Confidential, Seattle
Big Data Architect
Responsibilities:
- Created Data Architecture roadmap and Data Governance policies for big data and presented to C Level execs
- Designed cluster architecture for components like Hortonworks, AWS EMR, Spark, Falcon, Oozie, ORC, Cassandra, HDFS, Flume & Kafka for streaming
- Implemented Data Governance policies with Knox and Ranger for data access management & auditing and Apache Atlas for Meta & Master Data Management
- Created HDFS structures’ best practices and AWS S3 bucket naming & security policies
- Pushed ORC file adoption which resulted in 3x faster jobs and 2.5x better compression
- Designed Data Frames and RDDs for Spark jobs that ran 20 x faster than older MR jobs
- Tuned Spark jobs using confs & hive queries using ORC columnar, Tez, CBO explain plans
- Designed falcon jobs to replicate between environments and archival using HDFS tiers
- Created conformed dimensions like Geo Dim and designed porting into Hadoop as UDFs / Lucene index and finally into Data Stax Cassandra to support recommender systems
Confidential
Big Data Architect
Responsibilities:
- Designed a data-warehouse from scratch to process up to 50TB of data using hadoop.
- Analyzed clickstream data from Google analytics with Big Query.
- Capacity planned for cloud infrastructure with AWS console, EC2 Instances, AWS EMR, S3
- Created AWS EMR cluster using Cloud Formation scripts, IAM, Kinesis Firehose, Data Pipeline,
- Designed fact tables living in Amazon s3 in flat files in Avro and Parquet format on which Hive based table structures and expose the fact tables to Qlikview reporting.
- Designed Sqoop based data exchange from Redshift, Oracle, SQL Server, Mysql
- Designed star schemas in Amazon Redshift using compression encodings, data distribution keys, sort keys, and table constraints
- Designed APIs to load data from Omniture, Google Analytics, Google Big Query
Confidential, New York
Co-Founder & Data Architect
Responsibilities:
- Created Confidential App, which was featured on the Discovery Channel, Chicago Tribune
- Tuned EMR Spark cluster / jobs by Spark Executors’ memory, Containers, EC2 instances,
- Set up Cloudera CDH5 Hadoop cluster with hdfs, hbase, pig, oozie, zookeeper to store and process GTFS and user data.
- Set up Data Stax Cassandra cluster, Ops Centre for processing of temporal geospatial data and implemented connector for Spark execute on Cassandra data frames.