Big Data Architect Resume
Jersey, CitY
SUMMARY:
- 16+ years' experience in leading architecture, engineering, development & deployment of Data technologies, EAI, SOA, Microservices, Messaging & In - Memory capabilities on Premises and Cloud (AWS) platform.
- Deep experience in designing and implementing of Data & Analytics solutions including ODS, Data Warehouses, Lakes, Pipelines, BI, Reporting, Master & Data, Metadata, Data Quality, Modelling, Catalogues & Governance.
- Sound understanding of various data solution patterns and when to use them: ETL/ELT, RDBMS, Normalization/De-normalization, Key-Value, In-Memory, Wide Column, Columnar, Graph, Text Indexing, Streaming & Messaging.
- Extensive experience in building Data Lakes, HUB & Data Pipelines using Big Data technologies & tools such as Apache Hadoop, Cloudera, HDFS, MapReduce, Spark, YARN, Delta-lake, Hive, Impala, Beam, Ignite & Kudu.
- Practical experience delivering and supporting Cloud strategies including migrating legacy products and implementing SaaS integrations.
- Experience in design and implementing of data solutions on AWS using S3, Athena, Glue, EMR, DMS, Lambda, EC2 & Redshift.
- Extensive Experience building/operating highly available, distributed systems of data extraction, ingestion, and processing of large data sets using MapReduce and Spark (Scala, Java, SQL & Python).
- Proficiency in MPP cloud data-warehouse-as-a-service offerings, such as Snowflake, BigQuery and Redshift on AWS cloud.
- Experience in design and development of Data Solutions using RDBMS (Oracle, SQL Server, Teradata) & NoSQL databases (HBase, Cassandra, MongoDB).
- Experience in Snowflake - data modelling, building data pipelines, ELT using SnowSQL, SnowPipe & stored Procedures.
- Experience building highly scalable real-time Data Pipelines using Apache Flink, Storm, Kafka and Spark Streaming on AWS.
- Experience working with ETL/ELT concepts of data integration, consolidation, enrichment, and aggregation using DBT & Snowflake.
- Hands-on and deep un
PROFESSIONAL EXPERIENCE:
Confidential, Jersey City
Big Data Architect
Responsibilities:
- Lead the definition and implementations of Data Capabilities like Data Discovery & Classification, Catalog, Lineage, Integration, Mesh and Lakes leveraging AWS cloud and on - premises data technologies within CITI TTS group. Partnered with product owners & business SMEs to analyze the business needs and provided supportable and sustainable Data solutions. Ensured overall technical solutions are aligned with business needs and adheres to CITI architectural guiding principles. Design & implemented CDC pipeline for CITI commercial to process payments real-time using Spark Streaming, HDFS, HBase,
- HIVE and Kafka extensively. The platform optimized to scale peta-byte of data, support fast insert/update throughput & fast seek queries. Lead in designing & building of scalable batch data pipeline to retrieve data from GIW system (AS400) using Hadoop, Spark and load into Snowflake (AWS) with NPI data tokenized using Protegrity. Collaborated with stakeholders from Risk & Compliance and Engineering offices in designing technical solutions for managing data classification, governance and security in Snowflake. Define data (metadata), identify systems of record and authoritative sources, created data quality rules, created data flow diagrams, reconciliation and applied standards and controls for ETL pipelines. Build analytics tools using Java, Scala & Python that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
- Developed ETL solution to load data from Operational Data Store (ODS) & Datamart into Snowflake and performed ELT using DBT to transformed into domain optimized data. Build data strategy to improve data quality and operational efficiencies. Enabled enhanced reporting capabilities including business intelligence dashboards and data visualization software (Tableau). Replaced legacy third party vendor data pipelines with Snowflake MarketData to increase operational efficiency and reduce cost significantly. Proposed and lead POCs of latest cloud services & data platform technology to enhanced, optimized operations and management of the Enterprise Data Platform. Created strategies & plans for data capacity planning, life cycle data management, scalability, backup, archiving and ensured data security & privacy standards are implemented including: role-based security, encryption & tokenization. Actively participated in Agile Scrum development processes including continuous integration, prototyping and testing in a strongly collaborative environment.
Confidential
Big Data Architect
Responsibilities:
- Provided architectural guidance, technological vision and solution development data platforms to support product strategy & roadmap and advanced analytics in JPMC consumer banking business. Design and build batch data pipeline to ingest Debit & Credit data from ODS systems to HIVE staging tables. Build transformation logic to transform raw data to domain optimized using Spark SQL & Scala. Implemented streaming pipeline to ingest real - time card activities data from source system to HIVE using Kafka and Spark streaming for use by Data Science team to run analytics and ML. Build pipeline for optimal extraction, transformation, and loading of data from EDW using Spark (AWS) and load into S3 for data-science team to run ad-hoc queries using Glue and Athena. Build high volume, distributed, and scalable data platform capabilities using Spark, S3 & Redshift and enables data access to applications and dashboard using microservices (Java).
- Collaborated with the analytics team members to optimize data pipelines and bring analytical prototypes to production. Improve data quality through testing, tooling and continuously evaluating performance Ensure all technology standards, policy, control points and governance are well-defined and measured for success. Conducted research and development with emerging technologies, determine their applicability to business use cases, document & communicated their recommended usage. Collaboratively lead all Data solutions leveraging Agile, hands-on engineering mindset with relentless focus on quality, scalability, performance, and timely delivery. Design & worked with DEV team to deliver common messaging bus platform using Kafka for routing alerts/logging/error to multiple channels. Developed utility tools for data validation, reconciliation and UDFs for column-based encryption of NPI data.
Confidential
Integration Architect
Responsibilities:
- Worked with Product Managers, Analytics & Business teams to review & gathered the data/reporting/analytics requirements to build trusted and scalable data models, data extraction processes and data applications in GWIM. Reviewed business requirements and translated into data models. Created logical & physical data models using best practices to ensure high data quality and reduced redundancy. Design and developed scalable data ingestion framework to transform a variety of datasets, capture metadata, lineage and implement data quality using Hadoop, MapReduce, SSIS & Sql Server. Design and maintained database structures including tables, views, triggers, procedures, functions, indexes, materialized views, partitioning strategies, and compression. Worked on development of new products and enhancements to existing systems iteratively by building quick POCs and converting ideas into real products Migrated legacy Midas data - warehouse into Hadoop Data-Lake for wealth management customers. Consolidated data from legacy stores (MIDAS, Datamart) into Hadoop platform and performed data quality check like profiling, wrangling & filtering using Spark pipeline and save into HIVE. Worked on
NPI initiative to capture, log and monitor business user/FA's activities related to Non-public info using Hadoop/Spark/Kafka pipelines. Worked on migrating Logging/Audit messaging framework from TIBCO EMS to Kafka. Design and configured Auto-sys, control-M & oozie jobs to run data load jobs and Spark/MR/HIVE processes. Build middleware platform for brokerage account opening (self-direct, guided investment and FA assisted) using Java. Developed processes, components, micro-services, SOAP and REST webservices based on SOA approach for integration, orchestration & data processing using Java. Developed custom rules engines and Implemented event-based solutions using vendor tools (TIBCO BE, ODM & Portrait) for evaluating tier/commissions based on client's asset.