Product Architect - Senior Hadoop/bigdata-cloud Resume
SUMMARY
- Certified Senior Solution Architect/Senior Technical manager/Technical Delivery Manager/Product Architect - Cloud (AWS, AZURE, GCP)-Hadoop-Spark-Big Data Analytics -DevOps-EDW-EAI-ETL-SAP HANA BI BW BODS, Hadoop Integration-SAP VORA with over 16 years of IT experience.
- Architected, Designed, Developed & Managed multiple Go-live Implementations on Enterprise Data Warehouse, Analytics, ERP, Business Intelligence, Big Data/Hadoop eco-systems leveraging environment scalability, open source tools and libraries, NoSQL DBs and Cloud/SaaS /PaaS/IaaS/FaaS model features to handle massive data in Structured/Unstructured/Streaming/IOT/Batch/BlockChain formats.
- Blue printing enterprise Data warehousing, Cloud deployment, Analytics solutions, Vision, Data Modeling,Integration, Data governance, Implementing in line with Ralph Kimball/Inmon/,Hadoop Data Lake Design(Capacity,Modeling), Lambda Architecture, TOGAF,SOA,DevOps,SAP’s LSA++ methologies, Hands-on coding & Managing Enterprise Data warehouses, Data Marts, Data Quality,Data Lineage, Data Retention, Data Audit Control and maintenance, Master Data Management, Cross Platform Integration with ERP systems, Solution Optimization, Admin activities, End-to-End process automation/Scheduling, Business Intelligence reporting, Dashboarding, Analytics,Machine Learning & AI.
- Indepth experience in Technical solution blue-printing, Architecture, Design, Development, Admin experience along with Delivery & Support, DevOps ensuring seamless Development and automate deployments; Lead large diverse teams adhering to Agile & Scrum, ITIL methodologies; Ensuring planned solutions integrate effectively with all domains in both business and technical environments.
- Extensive expertise in Design and handling ETL- EDW, EDI with BI, Data Services, large SAP ERP Business Warehouses, OLAP Cubes, HANA, Data Modelling, Reporting with integration to non-ERP systems, OLTP & OLAP systems, Analyzing Legacy systems in terms of Data Migration, Gap analysis, Application migration, SAP Integration with Hadoop on VORA platform.
- Work with business and Technical teams for requirement assessment, submission or evaluation of RFI’s/RFQ’s/RFPs for new products, identify opportunities and provide the recommendations on process integration across lines of business, proposal writing, choosing optimal technology stack for business usecases and requirements. Good exposure to Pre-Sales.
- Designed and Built Ingestion / Transformation / Consumption mechanisms on Data Lakes in On-Prem and Cloud. Defined and Implemented Data journey across all layers. Established phases of data persistance and data consumption patterns ensuring Data Security by handling Personal Identifiable Information (PII), Non-PII, sensitive data in Lake and Cloud. Working closely with Vendors and consultants to ensure the solution is heading towards right direction.
- Good exposure to Security Architecture, Design, Admin activities in terms of setting up sutiable platform for solutions across organization. Align and adjust to requirements of business solution need and ask by ensuring CI & CD processes.
- SumUp- Started my proffesional journey as EnterpriseETL-BI Developer, progressed towards Designing and Architecting Enterprise Data Warehouses & BI, Near real time,ERP, BIG Data,Data lakes, Cloud application platforms with Cross platform integration, Technical Solutioning, managing projects from an end-to-end implementation in terms of project management, release stabilization and delivery standpoint; synchronizing business and technology expecations are my core.
TECHNICAL SKILLS
HADOOP Eco Systems: Cloudera, Horton Works, IBM Big Insights
Cloud Platforms: AWS, Azure, Google Cloud Platform, Bluemix BIGDATA Tools
Data Formats: HDFS, Apache SPARK, Hive, Apache Sqoop, Apache Pig, Apache Storm, Kafka, Flume, Spark Streaming, Apache Nifi, SAMZA, Map-Reduce, Tez, BigSql, Impala, NoSQL, ApachParquetORC, Oozie, Apache Airflow, Cloudera Data Steward Studio, DiyottaETL.
NoSQL DBs: HBASE, Cassandra, Phoenix MongoDB, HBase, Redis, DynamoDB, Document DB
HADOOP Eco Systems: Cloudera, Horton Works, IBM Big Insights
Cloud Platforms: AWS, Azure, Google Cloud Platform, Bluemix BIGDATA Tools
NoSQL DBs: HBASE, Cassandra, PheonixDB MongoDB, HBase, Redis, DynamoDB, Document DB
Databases & Appliance: SAP HANA, HP-Vertica, Oracle, Teradata, Sybase, MS-SQLServer, DB2, Netezza, IBM Infosphere MDM
Enterprise ETL Tools: DATASTAGE 7.5.x/8.x/9.x/11.X (On Hadoop Edge Node) PX Edition, Informatica, SAS, SSIS/SSAS, Pentaho Data Integration, Talend, IBM Infosphere Data Quality, Data Steward, IGC (Information Governance Catalog - Data Governance and Lineage), SNOWFLAKE-Cloud based Datawarehouse
Enterprise Reporting Tools: MicroStrategy, Cognos10,11.X, Cognos Analytics, TABLEAU, QilkView, SSRS, Power BI, IBM Cognos Analytics, QlikView.
Enterprise Modeling Tools: UML, CA-ER-win, SAP HANA Modeler, SAP BOBJ - IDT, BPMN-2.0
Data Lineage: IBM Infosphere IGC (Information Governance Catelog), Hadoop side - Apache ATLAS, Collibra
SAP BW, Modeling & ETL: SAP BI/BW 7.X-7.4, SAPBW4HANA, SAP HANA, SAP BODS 4.X, Information Steward, SAP VORA-Hadoop Integration, SAP Smart Data Access, SAP ECC 6.0, S4HANA, SAP Fiori.
Operating Systems: UNIX, Linux, Microsoft Windows
DevOps-Virtualization: Docker, Kubernetes, Vagrant, Jenkins, GIT
Programming Languages: PLSQL, UNIX Shell Scripting, Core Java, Scala, Python
Server-Side Technologies: HTML, DHTML, Core Java.
Scheduling Tools: Apache Airflow, AUTOSYS, Control-M, TIDAL, Unix-Crontab, AQUA, Oozie
Testing Exposure & Tools: Big Data Testing, ETL Testing, SAP Testing, SOAPUI, RESTAPI HP Quality Centre, Manual, API Testing, Database testing. Build & Versioning Tools SBT tool, Eclipse, SCCS, Tortoise SVN, HERMES Tool, TFS, GitHub, BitBucket.
Teradata & Oracle Utilities: Export, Import, SQL*Loader, Fast Load, Multi Load, Fast Export, TPump, BTEQ Project & Management
Tools: and Methodologies: MS-Project, Visio, JIRA, IBM Infosphere Data-Architect, SCRUM, Lean-Agile
PROFESSIONAL EXPERIENCE
Confidential
Product Architect - Senior Hadoop/Bigdata-Cloud
Responsibilities:
- Architecting the Restructuring effort of the current product base running on SQLServer, C++ to Hadoop Lake leveraging NoSQL DBs.
- Blueprinting the high-level approach to make the transition to Big Data world. Deriving the impact point and brain storming with existing customer base to derive future data need and capacity to process.
- Marking the decision factors and weights in terms of priorities and technically designing a flexible and feasible solution, which could be achieved in a time, bound agile way.
- Choosing the right tools fit for the requirement and designing common components for enterprise wide use
- Architect Define and Design solution flow in terms of green field implementation, migration strategy, integration aspects to automation and go live.
- Migration strategy from existing Data warehousing ETL Platform to Hadoop Lake for large customers and scalable NoSQL db for mid-range customers.
- Architecting Data zoning, Access patterns, Security, consumer data points, DR, Hybrid Cloud aspect implementation HDInsights - Azure, Azure DataLake, AWS, AWS DevOps, GoogleCloud based on Customer preference of the product deployment up and running.
- Security, Production planning, solution automation, DEV-OPS, release automations, implementing scheduler at enterprise level.
- Part of Core Architecture group responsible for design and Architecture of new product line on Hadoop and Cloud. Currently rolling out CAT (Consolidated Audit trail) reporting for Order, Routes, Trades across clients.
Environment: Hadoop- HortonWorks(HDF), C#, SqlServer,Mysql, RabbitMq, JIRA, Agile,Scrum, Kafka,Apache Spark, Cassandra, MongoDB,RedisDB,Apache Nifi, Kylo-ETL,Apache Pheonix, Druid, RedisDB,Azure HD Insights, Apache Airflow,Google Cloud Platform(2-POC’s),Cloudera Data steward Studio, DevOps,Cloud Foundry, CLOUD Security, CI/CD, AWS,AWS DevOps, QlikView
Confidential
Senior DWH-ETL-BI-BIG DATA Solution Architect
Responsibilities:
- Lead and Designed Hadoop Ingestion Patters for the GTB Data initiative.
- Performed two POCs with two of Scotia’s distinctive Hadoop Lakes (Enterprise Data Lake - EDL & TenX) and weighing option to move forward with one of the Lake to hold the data.
- Designed & Implemented Ingestion Patterns with agreement from, willing parties all the stakeholders, handling Data in multiple fronts.
- Tracking back to True source to get the raw data, effective extract patterns and feed to ingestion mechanism ensuring transparency in terms of Data lineage and business lineage.
- Providing Technical road maps, technology feasibilities, providing required technical expertise. Clearing ambiguities in terms of implementation, results and outcome.
- Laying down Data Zoning, Data journey, Lineage, Transformations and Business Intelligence best practices.
- Producing reusable designs and code to teams to proceed with replication and assist them clearing any roadblocks using above mentioned technologies and tools.
- Hands-On on code and Design of Handling release management activities with code migration and automation.
- Providing recommendations for process integration across lines of business or business capabilities.
- Collaborate with EA team on enterprise architecture practice best practices and business solutions
- Taking new areas of technology space to work in POCs upfront to provide technical feasibility analysis and benchmarking
Environment: HORTON WORKS - HADOOP, Apache SPARK, Scala, Apache Tez, IBM Infosphere Datastage 11.XPX-IBM Infosphere Governance Catalog- DATA Governance, IBM Infosphere Data Architect, Cognos Analytics, HIVE, Sqoop, Kafka, Docker, Apache Atlas, PostgresDB, Apache Airflow, Cassandra, Tableau, Lean Agile, Scrum, Diyotta-ETL, SnowFlake-CloudbasedEDW (POC).
Confidential
Snr DWH-ETL-BI-BIG DATA Solution Architect /Snr Manager
Responsibilities:
- Architect, Solution, design & Code Hadoop End-to-End implementation in two phases which involves a POC and actualizing to production solution following Lambda Architecture for both Streaming and Batch Mode.
- Model & Design Data warehouse using Hive/Bigsql, used Sqoop to transfer data from existing legacy systems into HDFS in specific format as required based on volumes. Have written Pig Scripts to transform certain user
- Requirements. Written HIVE UDF’s in Core Java for certain user requirements, integrated SPRAK and HIVE to operate SPARK in HIVE context.
- Have used Flume to capture Network traffic Streaming data into HDFS and created Partitioned Hive/Bigsql tables to access to down streams systems.
- The incoming streams are aligned through KAFKA and passed onto Flume as topic specific for better grouping of data
- Defining and managing Hadoop External partitioned tables created on Parquet files landed at HDFS and ensuring that latest partitions are recognized using HIVE functions to refresh MetaStore.
- Managing the Data partitions and Data Lake with Hot & Cold data for Data analytics and Data scientists.
- Have used DataMeer and Tableau for dash boarding and visualizations.
- Have setup plain vanilla sand box for testing sample data and performing POC on processing using Spark and comparing the processing time to better demonstrate the processing difference. Have used Scala as part of Spark coding to reduce the coding lines and complexity.
- Work with Hadoop Admins to optimize the environment based on requirements and fine tuning the components.
- Have fined tuned system performance parameters and involved in integrating Hive with Spark to have HIVE context in Spark.
- Used ETL Datastage for other diverse sources with Master data to be cleansed and landed into HDFS and BI reporting purposes.
- Sample Testing using Hue, manage Geo teams across platforms and Geos following SCRUM and Agile methodologies,
- Creating & Updating Micro and Macro documents, code reviews, lead a team of 17 members.
- Facilitate RFP process, liaising with SME to write the key elements of the business and technical portions of the RFP, evaluating RFP response and communicating with Procurement staff to facilitate timely resources and project advancement.
- Worked on POC on Setting up AWS, SPARK Streaming, STORM and SAMZA for better message channeling and storing in Cassandra DB with SPARK for processing the data access requests for faster accessibility.
- Designed Oozie workflows to automate and schedule in Hadoop
Environment: HADOOP-IBM BIGINSIGHTS V4.1, BIGSQL, IBM Info-Streams, RedisDB, Lambda Architecture,IBM-Infosphere-ETL-Datastage, Cognos 10, Data Modeling, SAS, Unix Shell Scripting,Oozie Scheduler,Pentaho, DataApache PARQUET, CLOUDERA, HIVE, Apache PIG, Sqoop, Flume, Kafka, SPARK, Scala, DataMeer and Tableau, CassandraDB, MongoDB, AWS, Agile& Scrum, JIRA,Cognos