Cloud Solution Architect/consultant Resume
SUMMARY:
- 10+ years of professional experience in IT with 6 years as Cloud, Analytics and Big Data Architect/Engineer
- GCP Certified Professional Cloud Engineer
- Proficiency in all cloud service models like SaaS, PaaS, IaaS
- 5+ years of experience in architecting and engineering data pipelines and data lakes
- 4+ years experience in creating ETL mappings and scripts and creating BI/Reporting dashboards
- 3+ years experience with Hadoop ecosystem technologies like BigSQL, Spark, Zeppelin, YARN, HDFS, MapReduce, YARN, Hive, Pig, Ambari Infra, Ambari, ZooKeeper, Sqoop, NiFi, Flume, Kafka, Ranger, KNOX, Kerberos
- 4+ experience in architecting, deploying and developing Log search/aggregation/analytics solutions using Elastic Stack, Splunk and Apache Solr
- 1.5+ years experience in architecting the analytics and developing solutions on Azure and Google clouds technologies, GCP Fundamentals: Core Infrastructure, Coursera certified.
- 5+ years experience in conceptual, logical and physical data modeling and database development using both SQL and NoSQL database solutions
- Experienced in handling structured, semi - structured and unstructured data
- Experience in creating UNIX bash/shell scripts
- Architecting enterprise applications using BDAT (Business, Data, Application and Technology) principles based on TOGAF framework
- Experience leveraging Analytics and Data Science in Insurance, Banking, Entertainment, Digital Transformation and Ecommerce initiatives
- Experience in Requirement Analysis, test execution, Change Management, Defect and Incident Management
- Leading, providing mentorship to the teams of various sizes. Dealing with cross BU/LOB teams for various project initiatives
- Effective planning and organizational skills with the ability to adapt to change and perform effectively under pressure
- Excellent analytical, problem-solving, and decision-making skills, verbal and written communication skills, interpersonal and negotiation skills.
TECHNICAL SKILLS:
BI/Reporting: Tableau, Cognos, Kibana, Google Analytics, Google DataStudio
ETL: HIVE, Spark, Logstash, BigIntegrate/DataStage, Talend, MS Excel, DataProc, DataFlow, Cloud Functions, BigQuery
Hadoop EcoSystem: Hortonworks HDP, IBM BigInsights, HDFS, Hive, Pig, Kafka, Flume, Sqoop, Elasticsearch, Zookeeper, Spark, Spark SQL, Spark Streaming, Zeppelin, Ambari, Ranger, Knox, Kerberos, NiFi, HDF, BigSQL
Databases: Cassandra, Redis DB, Oracle 11g/10g, MySQL 5, MariaDB, MSSQL Server 2012/2008, SQL, PL/SQL, Oracle SQL developer, SQL* Loader, Toad
Solution Design/Modelling: MS Visio, MySQL Workbench, Oracle SQL Developer
Languages: Scala, Java, Groovy on Grails, UNIX Shell Scripting, Ruby on Rails, C#, Nodejs
DevOps Software: Gitlab, Github, Eclipse, Redmine, Basecamp, Toad, JIRA, Confluence, BitBucket, Sharepoint, Ansible, Jenkins, Bamboo, Elastic Stack, Splunk, Docker, Sourcetree, TortoiseGit, SonarQube
Operating Systems: UNIX/Linux (Redhat, CentOS, Ubuntu, Debian), Windows 10/7
Cloud Technologies: Google GCP, Microsoft Azure
PROFESSIONAL EXPERIENCE:
Confidential
Cloud Solution Architect/Consultant
Responsibilities:
- Architected the Scotia Rewards project to migrate the EDL data to GCP, using as much as possible the existing work in HiveQL’s (data integration) to the Google Dataproc cluster for historical and recurring/delta loads. Using as much as the lift and shift architecture pattern.
- Present and persuade the solution/design architecture to the various stakeholders e.g. LOB team, ISO, Enterprise Architecture, Compliance teams and others
- Designed and developed event triggered data pipeline based on Cloud Pubsub for ingestion of the PII and non-PII data to the landing area on the Google Cloud Storage (GCS) buckets
- Extraction of the current EDL data using HiveQL’s and stored into the Edge node
- Helped in Diyotta configuration (ETL tool) to ingest the data from Edge node to Staging area of GCP using Cloud KMS DEK and KEK envelope encryption.
- POCed with Google DLP API to detect various sensitive InfoTypes with simulated real bank data
- Developed the Datastudio dashboards for Operations and Executive groups using the BigQuery events dataset to monitor the data pipeline and daily loads
- Developed and deployed Dataflow jobs to write events data from Pubsub to BigQuery and from Pubsub to Pubsub
- Code promotions with the bank’s CI/CD pipelines (Accelerator) using Jenkins and Bitbucket
- Deployments of the Artifactory builds using bank’s Impeller pipeline templates and used Maestro for GCP project level configurations
- Deployed Topics, publishers, subscribers, Cloud Functions, Dataflow jobs, IAM policies, BigQuery datasets, Storage Buckets using the bank’s Impeller pipeline (Infrastructure as Code) (Google Deployment Manager)
- Defining and provisioning Cloud IAM policies and roles for the service accounts by different GCP components
- Developed operational system and application level alerts and dashboards using Stackdriver logging, monitoring and error reporting
- Part of active Agile/Scrum environment for project executions
Technical Environment: Lucidchart, GCP (Cloud Storage, Pubsub, BigQuery, DataStudio, Cloud Functions, DLP API, Deployment Manager, Stackdriver Logging and Monitoring, Dataproc, Dataflow,), HiveQL, Nodejs, Jenkins, SonarQube, Fortify, Bitbucket, JIRA, Confluence, Slack, Artifactory
Confidential
Big Data Architect/Engineer
Responsibilities:
- Provide overall architect responsibilities including roadmaps, leadership, planning, technical innovation, security, data governance etc.
- Present and persuade the solution/design architecture to the various stakeholders e.g. Data, ISO, Enterprise Architecture, Compliance teams and others
- Provide technical and process leadership for projects, defining and documenting information integrations between systems and aligning project goals with reference architecture
- Architected the Zoning architecture for the Enterprise Data Lake (EDL) solution leveraging the information classification standards and Identity and Access control (IAM) standards of the enterprise
- Extraction of source data from multiple legacy applications and heterogeneous technologies such as mainframes z/OS, mainframe DB2 LUW based, SQL Servers files into HDFS Ingestion Zone using the Sqoop, DataStage jobs and Spark jobs
- Provide technical leadership and governance to the big data team and the implementation of the solution architecture of the Hadoop ecosystem BigInsights, MapReduce, BigSQL, Pig, Hive, HCatalog, Spark, HBase, Storm, Kafka, Flume, HDFS, Oozie, Ambari, Ranger, KNOX, Kerberos.
- Compared the Google GCP , Amazon AWS and Microsoft Azure for the Data Lake and Analytics machine learning toolset migration
- Worked on the planning and capacity requirements for the migration path of IBM BigInsights (on-prem) solution to cloud native GCP based solution. This involved tools like DataProc, DataFlow, Cloud Functions, Google Cloud Storage and Pub/Sub.
- Defining our Cloud IAM policies and roles for different GCP components
- Architected the Data integration (acquisition, ingestion and publication/consumption) patterns, security, naming standards for the solution in co-operation with the ETL and BI architects
- Lead the hardening the security for the Data Lake solution using Ranger KMS encryption for the Data at Rest and encrypted communication for Data in Transit
- Lead the access provisioning activity for Ranger KNOX, HDFS, YARN, Hive and BigSQL policies based on the Zoning Architecture
- Integrated the Data Lake Solution with enterprise offered solutions for Security Event & Incident Management (SEIM), Logging Solution (Splunk), PAM solution (CyberArk), Identity management solution (ITIM) etc.
- Facilitated and participated in different ISO activities of doing auditing, penetration and vulnerability scan, firewall audit, AD groups/service/user account scans for the data lake.
- Guided the team on using these tools, big data best practices and helping in deployments
Technical Environment: IBM BigInsights v4.2.5, IBM IIS BigIntegrate/DataStage 11.5, Information Governance Catalog, (IGC), Spark 2.1.0, Hadoop 2.7, Ambari, Solr, HDFS, Ranger, KNOX, Kerberos, Splunk, YARN, Spark Thrift Server, BigSQL, Cognos, ERWin, Tableau, MS Visio, GCP DataProc, DataFlow, PubSub
Confidential
Big Data Architect/Engineer/Developer
Responsibilities:
- Architected the whole solution and data pipeline for the project
- Deploy the HDFS, YARN, Spark and Zeppelin clustered environment on the TD Cloud
- Deploy the Cassandra cluster environment on the TD Cloud
- Development to do on the fly Value at Risk (VAR) calculation for the different risk models and risk factors for calculating the risk across portfolio’s and different hierarchy levels
- Extraction of source files (.csv) files into Cassandra cluster using the batch Spark job
- Develop the Spark code using the Spark Data Source, Dataset/Dataframe and RDD API
- Develop the Spark ETL batch jobs to load the data from Cassandra, to the transformations and aggregations and load partially pre-aggregated views back to Cassandra
- Connecting Spark to the Cassandra cluster using Spark Cassandra Connector from Datastax
- Deployed the Spark Thrift Server for connecting to different JDBC/ODBC based driver for visualizations and analytics with in-memory Spark Context data structures
- Used Angular 2 for doing the front-end development for visualizations and analytics, calling the REST based web service
- Deployed and tested Livy REST API for submitting batch and interactive Spark jobs
- Developed some POC work on the Spark Structured Streaming API
- Maintaining and monitoring the clustered environment for Spark Standalone Cluster and Cassandra Ring
- Use the Scala SBT to build the Uber Jar’s
- Guide the team on using these tools and helping in deployments
Technical Environment: Spark 2.1.0, Scala 2.11, Hadoop 2.7, HDFS, YARN, Zeppelin Notebook 0.7.1, Spark Thrift Server, Cassandra 3.0.13, Angular 2, MSSQL Server, SQL Server Management Studio, MS Visio, IntelliJ Idea, SBT, Spark Job Server, Livy REST Spark API, Rackspace Cloud
Confidential
Data Architect/Modeller
Responsibilities:
- Gathering the requirement from the client on the current state of the models i.e. database schemas for the L1, L2 and L3 stages of the database
- Analysing the current state of the schema and processes in place for the data warehousing project
- Designing the data warehouse models and relationships in MS Visio
- Documenting the best practices of the data warehouse models and processes involved
- Analysing and comparing the fact and dimension tables for the best practices and doing gap analysis
- Analysing the source raw file formats and naming conventions with the best practices
- Documenting the recommendations for the schema and file naming standards
- Analysing and suggesting the change management and release lifecycle of the dimension tables
Technical Environment|: MSSQL Server, SQL Server Management Studio, MS Visio, MS Excel, MS PowerPoint
Confidential
Big Data Architect/Engineer
Responsibilities:
- As a SME on this project, designing the Business Intelligence requirements for the application logging, monitoring, troubleshooting and later data analytics
- Architected the whole solution for the project with Hadoop and Elastic Stack
- Deployed the Development and Performance/UAT environments for Elastic Stack and Hortonworks Data Platform (HDP)
- Experience in developing, testing, debugging components to support a Data Lake
- Designed the detailed Data Lake Models for Data Ingestion in Microsoft Visio.
- Implemented data collection, loading, data QA, data cleansing, enhancing, data transformation pipeline
- Develop ETL scripts in Logstash and loaded data into highly distributed Kafka topics from which data was consumed in both Elasticsearch (short term) and HDFS (long term) data stores
- Deployed and clustered environment for Kafka and created topics.
- Develop mappings to extract data from different sources such as OS, database, middleware and custom application logs load into data warehouse
- Developed Flume job to pull data from Kafka Queue to dump it into HDFS storage
- Develop Hive QL to load data in real time into Hive (external and internal) tables from HDFS
- Developed queries in HQL to do transformations, joins and aggregations on the log data to make it better fit BI tool
- Used Spark to load log data from HDFS into RDD’s to further data analytics
- Used various Spark interpreters to do write Scala based transformation and actions on the JSON based log data
- Used Sqoop to pull the data from Oracle DBMS into HDFS storage
- Extensively worked on Hive tables, partitions and buckets for analyzing large volumes of data
- Develop Business Intelligence dashboards and reports for the key performance indicators
- Interacting with the end users on a regular basis for resolving issues pertaining to the reports and data enrichment and cleansing
- Involved in the continuous enhancements and fixing of production issues.
- Developed the onboarding process to onboard different groups across the bank on this project
- Lead a team of 4-5 people for the project and mentoring/educating for the best practices
- Planned the RBAC model for the project for correct authentication and authorization
- Created System Build Guides for the Production deployments
- Worked in Agile and Scrum environment
Technical Environment: HDP 2.4, HDFS, Spark, Scala, Hive, Pig, Kafka, Java, Logstash 2.2, Elasticsearch 2.2.1, Kibana 4.4, Shield, Marvel, Watcher, Beats, Filebeat, Topbeat, Docker, CentOS, RedHat 6, BitBucket, Jira, Confluence, Microsoft Sharepoint, ZooKeeper, JDBC
Confidential
Big Data Architect/Engineer
Responsibilities:
- Architected the whole solution and data pipeline for the project
- Deploy the BI and ETL system with ELK Stack for decision making
- Implemented data collection, data cleansing, enhancing, data transformation, data QA and loading pipeline
- Develop mappings to extract data from different sources such as Twitter, Oracle, MySQL, CSV/Flat files and logs and load into Elasticsearch and Hadoop backend
- Develop Kafka topics and to fetch the event logs into Logstash
- Develop Hive QL to load data in real time into Hive (external and internal) tables from Elasticsearch
- Developed queries in HQL to do transformations, joins and aggregations on the data
- Develop Histograms, line charts, drill through, master-detail, chart and complex reports which involved multiple filters, multi-page, multi-query reports against multiple databases
- Develop BI dashboards and reports for the key performance indicators in Kibana
- Developed the REST bases API’s for different museum exhibits in Groovy on Grails framework.
- Interacting with the End users on a regular basis for resolving issues pertaining to the reports
- Modifying the existing reports based upon the change request by the user
- Designing Conceptual, Logical, and Physical data models and developing database schema supporting the API development for the exhibits and the mobile program
- Implement various performance tuning techniques on mappings and database queries
- Developing and testing database backup and restore plans in coordination with IT
- Ensuring that backup and recovery procedures are functioning correctly
- Created UNIX shell scripts to automate the jobs for local and remote deployments, database sanitizations and production systems database backups and restores
- Involved in the continuous enhancements and fixing of production problems
- Writing database and analytics documentation
- Controlling access permission and privileges for different applications across organization
- Documenting and educating other team members on database design and usage
- Provides support for application development teams including mentoring on best practices for database usage, and by participating in code walkthroughs
Technical Environment: Java, Groovy on Grails, HDP 2.3, Kafka, Hive, ZooKeeper, Logstash 2.1, Elasticsearch 2.1, Kibana 4.2, Beats, Filebeat, MySQL Server 5.6, MySQL Designer, TOAD, SQL*Loader, Tomcat 6, CentOS, RedHat 6, Gitlab, Redmine, Basecamp
Confidential
Analytics/BI Developer
Responsibilities:
- Architected the whole solution for Business Intelligence/ETL platform for mobile based video games, which tracks the game play metrics for all genres of games and visualize these in real time dashboards
- Implemented data collection and load, data QA, data cleansing and enhancing, data transformation
- Designed and developed the data collection REST based API which embeds itself into the game and is easily deployable into the game with the use of plugin. Records both online and offline game metrics
- Develop ETL mappings and loaded data into Data Warehouse
- Develop mappings to extract data from different sources such as MySQL and logs and load into the target
- Designed and managed the backend for storing the processed data in Redis DB for collecting and aggregating stats
- Developed visualizations and reports for the real time game play metrics and aggregations
- Developed BI dashboards for the key performance indicators
- Develop Histograms, line charts, pie charts, drill through, master-detail, chart and complex reports which involved multiple filters, multi-page, multi-query reports against multiple databases. Used filters for efficient data retrieval
- Created UNIX shell scripts to automate the jobs for local and remote deployments and production systems database backups and restores
- Assisted in designing Logical and Physical Data Models for the game
- Involved in the continuous enhancements and fixing of production problems
- Documenting and educating other team members on database design and usage
- Writing database and analytics documentation
Technical Environment: MySQL 5.1, ERWin, Redis DB, C# .NET, Ruby on Rails, Visual Studio 2010, Basecamp, Git
Confidential
Database Developer
Responsibilities:
- Heavily involved in database conceptual and physical modeling , data restoration, maintenance, backups, security and troubleshooting performance issues
- Work with application developers and other database administration staff to research, define, and correct application data related issues
- Heavily involved in gathering user requirement, analysis, designing, coding, testing and implementation
- Creation of database objects like tables, indexes, views, materialized views, procedures and packages
- Involved in the continuous enhancements and fixing of production problems
- Bug fixing for existing web applications
Technical Environment: Oracle 10g, MSSQL Server 2012, MySQL 5.1, MySQL Workbench, MySQL Designer, TOAD, SQL*Loader, SQL Developer, SQL* Plus, Cent OS
Confidential
Database Developer
Responsibilities:
- Involved in database modelling, development, query optimization and administration
- Automated different weekly and monthly reports using UNIX shell and PHP scripting
- Designed logical and physical data models using Erwin
- Wrote sequences for automatic generation of unique keys to support primary and foreign key constraints in data conversions
- Worked on SQL*Loader to load data from flat files obtained from various facilities every day
- Creation of database objects like tables, views, materialized views, procedures and packages using oracle tools like Toad and SQL* plus
- Creating indexes on tables to improve the performance by eliminating the full table scans and views for hiding the actual tables and to eliminate the complexity of the large queries
- Involved in the continuous enhancements and fixing of production problems
Technical Environment: Oracle 10g, MSSQL Server 2008, TOAD, SQL*Loader, SQL Developer, SQL* Plus, ERWin, VB .NET, PHP, Cent OS