Service Reliability Engineer (sre) Resume
SUMMARY:
- A seasoned professional disaster recovery and business continuity management planner and executioner, with over thirty years of experience performing Business Capability Modeling (BCP), Risk Compliance Self - Assessments (RCSA), and continuity of business planning for major enterprises. My experience led me to specialize in Enterprise Confidential and Corporate providing companies with contingency recovery plans and compliance adherence plans and procedures that support continued non-interrupted service that meets clients Service Level Agreements and protects the company reputation and brand. I also led Recovery Plan testing for standalone and cloud-based application recovery with table top exercises and full system testing processes. Familiar with tools like RPX Confidential Platform, Fusion Risk Management, and LDRPS.
- My recent Role as a Site Reliability Engineer (SRE) allowed me to work with a team of experts to automate the Application migration and recovery process from: on-premises silo; to the AWS cloud environment, for development (AWS Services and Patterns), testing (IV&V, Regression, Information Architecture (IA), User Acceptance Testing (UAT), Production Acceptance Testing (PAT), and receiving an Authorization To Operate (ATO)). We then also provided Production Support and Maintenance assistance to the Production Cloud environment. Additional automation efforts included Chaos Testing, Failover / Failback testing, SLA Observability, triggering for “Alarms”, "Alerts" and "Actions to be Taken" in response service level threshold crossings (both up and down).
- To obtain a leadership position where I can apply my comprehensive IT Industry Management and business knowledge to assist client companies achieve an optimized, safeguarded, and compliant enterprise. To work closely with clients and end users to ensure proper delivery of secure systems and services that meet and exceed their resiliency expectations by complying to Governance, Risk, Compliance (GRC) and Confidentiality, Integrity, and Availability (CIA) standards. To additionally ensure that Enterprise Confidential, Business Continuity, and Corporate processes are achieved in a Continuously Available manner, while implementing Service Reliability Engineering principles.
TECHNICAL SKILLS:
Site Reliability Engineer: Responsible for guiding the production of systems at Confidential to adhere to SRE standards and automated DR procedures, developed at Google, that include Monitoring (Logs), Observability (Dashboards), Efficiency (SLA Adherence), Resiliency (Service Level Continuity), Compliance (GRC / CIA), Operational Excellence (Efficiency / Performance), and Automation to reduce Toil and Manual Processes whenever possible. The purpose of an SRE is to aid DEV/SEC/OPS teams produce reliable products and services with improved efficiency through Dashboard Observability comprised of Key Performance Indicators (KPIs) driving SLIs (Service Level Indicators), SLOs (Service Level Objectives), and SLAs (Service Level Agreements) that comply to contractual guidelines through the interpretation of SLIs and dashboard thresholds. When thresholds are crossed, they produce Alerts and Actions to be Taken . The SRE will automate the Alerts and Actions to respond to and mitigate anomalies to avoid violating SLA parameters resulting in penalties when they occur.
Enterprise Resiliency and Corporate: Including: Project Initiation; Project Plan Creation, Coordination, and Management Reporting. Recovery Team Selection and . Business Impact Analysis (BIA); Risk Assessment; Recovery Tool Selection; Recovery Tool Implementation; Recovery Plan Creation. Testing, Prototyping, Implementation, and Rollout. Recovery and Awareness Program; Community Coordination of Recovery Activities (Business Park / Building Neighbors, etc.). Adherence to Governmental Organization Guidelines (GRC, Confidential, OSHA, OEM, Homeland Security, etc.); Integration of Recovery Operations within everyday functions in adherence to Version & Release Management.
Information Security Management Systems, including: ISO27001, SIEM (Security Incident Event Management), SA (Security Analytics), SOC (Security Operations Center), Load Balancing and Error Handling setup and enforcement via Software Defined Data Centers (SDDC) and VM Vendor Systems, Intrusion Detection, Persistent Penetration Attacks, Firewalls, End-Point Protection, and a full range of security products.Monitoring and Observability, Dynatrace, Splunk, LightStep, DataDog, JIRA, Confluence, MELT (Multiple Event Log Trace), Amazon CloudWatch, Open Telemetry Distributed Tracing with Spans, Amazon EC2, Amazon S3 Amazon RDS, Amazon Lambda, VMware / vSphere.
PROFESSIONAL EXPERIENCE:
Confidential
Service Reliability Engineer (SRE)
Responsibilities:
- Responsible for guiding the resiliency of production systems at Confidential to adhere to SRE standards and automated procedures, that include: Monitoring (Logs), Observability (Dashboards), Efficiency (SLA Adherence and Error Budget), Chaos Testing (Chaos Toolkit, Chaos Monkey, Gremlin), Resiliency (Service Continuity), Compliance (GRC / CIA), Operational Excellence (Efficiency / Performance), and Automation to reduce Toil and Manual Processes whenever possible. Additionally, defined and implemented Infrastructure as Code (IaC), Monitoring as Code (MaC), Application Performance Monitoring (APM), and Observability as Code (OaC) to improve analysis and problem resolution. The purpose of an SRE is to aid DEV/SEC/OPS teams produce reliable products and services with improved efficiency through Dashboard Observability comprised of Key Performance Indicators (KPIs - Actual Results) expressed through SLOs (Service Level Objectives - Desired Results), and displayed via SLIs (Service Level Indicators
- Difference between Desired & Actual Results), comprising and displaying and SLAs (Service Level Agreements) and Error Budgets (SLA - SLI) that comply to contractual guidelines through the interpretation of dashboard thresholds and visual displays. When thresholds are crossed, they produce “Alarms from Metrics violations,” “Alerts to notify personnel” and “Actions to be Taken”. The SRE automates the Alarms that activate personnel via Alerts and Actions to automate the response to mitigate anomalies and avoid violating SLA parameters and resulting in penalties through Error Budget violations when they occur.
- Researched and recommended usage of Robotic Assistant Programs (RPA) both simple and data event driven, Machine Learning (ML) implementation to analyze substantial amounts of data, and Artificial Intelligence (AI) and Algorithms to act on the data via data driven event and cognitive decisions.
- I produced an enterprise plan to standardize Disaster Recovery Planning and for application migration from on-premises Silo’s to AWS Cloud Regions and Availability Zones. Applications were subjected to migration and cutover to the AWS production environment, Independent Verification &
- Validation (IV&V) Testing, Regression Testing, Information Assurance, Chaos Testing, Compliance, User Acceptance, and finally Production Acceptance to obtain a Permit to Operate (PTO) for turnover to production operations. The SRE also performs many other functions designed to optimize the enterprise and ensure compliance, including: Integrating Chaos Testing Problem Playbooks to direct responses to problem types identified and resolved during Chaos Testing (Technical, Security Incidents, and Recovery Plan initiation), Relating Problems / Incidents and Recovery Plans by notifying the DR Team Leader, and conducting postmortem meetings to identify weaknesses and problems with recovery operations and resolutions to implement improvements as needed. Worked with Fusion Risk Management to perform Risk Reviews and to create Recovery Plan Playbooks. When improvements can no longer be made, recovery planning is optimized. Used Metrics, Logs, Traces, Health Checks, “Alarms”, “Alerts”, and “Actions to be Taken” through manual and automated techniques to perform recovery from problems, incidents, and disaster events.
Confidential
Enterprise Resiliency and Corporate
Responsibilities:
- My personal company, where I specialized in Enterprise Confidential and Corporate, and performed Business Continuity Management (IT Disaster, Business, Crisis, Emergency, Personal Safety and Workplace Violence Prevention, and COOP, ISO 22301, ISO 27001, FFIEC, and NIST SP 800 Standards), Risk Assessments based on RMF, Business Impact Analysis, decided on recovery strategy, selected and trained recovery team personnel, provided management awareness, developed Failover/Failback Recovery Plans and Continuous Operations Plans for most critical applications and services.
- Ensured Vital Records Management was properly performed and established data synchronization in adherence to Recovery Time Objectives. Integrated recovery operations into the SELC / SDLC and Change Management process to ensure Version and Release Management principles. Created awareness and courses, and all associated documentation needed to support recovery operations. Later implemented IT Service Management (ITSM), Asset Management (AM), Identity Management (IM), Identity Access Management (IAM), Role Based Access Control (RBAC), Multi-Factor Authentication (MFA), Attribute Based Access Control (ABAC), Zero Trust Authentication (ZTA), and related security guidelines for allowing only properly authorized personnel to access information based on their Job Title, location, and Biometrics. Implemented Information Security Management Systems (ISO 27000) and Dashboards (Continuous Diagnostics and Mitigation - CDM for DHS) to monitor and respond to incidents and problems.
- My assignments were designed to achieve Best Practices through the use of tools and techniques like, COSO, COBIT-5, ITIL-v3/4, ServiceNow, Systems Engineering Life Cycle (SELC) for development, Systems Development Life Cycle (SDLC) for deployment into production, Configuration and Change Management, IT Service Management, Asset and Infrastructure Management, Data Backup and Recovery, Site Replications for disaster recovery, Continuous and High Availability recovery operations, and adherence to security and compliance requirements for domestic and international laws and regulations. Worked with a range of DR/RMF tools including LDRPS and RPX.
Confidential
Founder
Responsibilities:
- Founded Confidential eVote with my partner, Alex St-Gardien Jecrois, to build an electronic voting system for the country of Haiti. We spent five years researching and designing the system to make sure it met the best standards available today for voting systems. Confidential eVote is based on “One Person -
- One Vote”, can eliminate fraud and corruption at its inception, can support electronic (remote and Polling Station Walk-In Voters) and paper (Mail-In and Absentee) ballots, and can work for Primary and General Elections. Our work led to the creation of a proprietary database design, a E-card generating system that includes vetting of individuals to ensure they are eligible to vote (outcome is a voting card or personal id for those who failed vetting), has a Blank Ballot Distribution System that takes Polling Station templates and generates paper ballots or screen displays, a Vote Capture system based on interoperability, a Vote Management System to process votes and generate tally’s, and an Audit Management System that monitors voting activity throughout elections in real-time mode. We have spent much time formulating alliances and ensuring that data integrity is incorporated into our design with technologies like Blockchain and Zero Trust Authentication. We believe we have developed the best voting system design available today and have been seeking funding to produce a prototype.
- The recent introduction of HR-1 and SR-1 has put our system on hold until an agreement is determined on which direction voting systems will take in the future. We continue to make improvements in our design and have developed a relationship with the Government Blockchain Association (GBA) to explore possibilities of utilizing Blockchain to build and distribute our system.
Confidential
Business Confidential Consultant
Responsibilities:
- Ensured recoverability, security, and compliance of applications and services to DHS standards.
- Assumed position of Enterprise Architect to assist in implementing Systems Management disciplines.
- Assisted in designing and implementing Risk Management, Compliance, and Service Continuity.
- Performed a Risk Assessment of the environment and identified risks, gaps, obstacles, and exceptions.
- Assisted in locating Tools & Sensors applications designed to achieve Hardware Asset Management (HWAM), Software Asset Management (SWAM), Configuration Setting Management (CSM), and Vulnerability Management (VUL) to be integrated by Splunk and used to drive SOC’s and the CDM Dashboard.
- Performed Analysis of Alternatives (AoA) initially for Incident Response and Reporting (IRR) product, but recommended selection of a Security Orchestration Automation Response (SOAR) solution for DHS adoption and implementation. Also recommended Tanium product for SOAR assistance and Patch Management.
- Suggested Business Continuity Management practices to implement and follow.
Confidential
Responsibilities:
- Member of CISO Development Team specializing in Service Continuity, and Cyber/Technology Threat Response Plans.
- CISO structure was based on RMM V1.1 to establish business processes and reporting metrics.
- Established the Service Continuity Functional Recovery Plan template for High Value Services and assisted groups develop their recovery plans.
Confidential
Sr. RSA Archer Developer
Responsibilities:
- Member of the RSA Archer Development team responsible for developing and implementing a Continuous Diagnostic and Mitigation (CDM) Dashboard System capable of detecting and resolving cybercrimes and technology threats throughout the entire government enterprise in near real-time (RSA Components include FEM, CM, A&A, and GRC).
- Created Agency and Federal Dashboards where Agencies would provide the Federal Level with encountered problems as a summary report. After performing a Risk Assessment, the Federal Level would provide detail reports back, listing worse case problems first and directing government locations to address largest impact problems first, then others.
Confidential
Course Developer / Instructor
Responsibilities:
- Commissioned by the Chairman of DRII, Al Berman, to create an IT/ DR course describing how to utilize Virtual Technologies to create Business Continuity and Disaster Recovery plans capable of instantaneous recovery from a primary data center to a secondary site without loss of data and transparent to the end user.
Confidential
DR Subject Matter Expert
Responsibilities:
- Led project to build three Regional Data Centers (Americas, Asia / Pacific, and Europe) and a Recovery Data Center (Munich). Additionally, implemented Disaster Recovery procedures that would provide instantaneous recovery.
Confidential, Delaware
HA/DR Project Manager
Responsibilities:
- Consulting position as a Project Manager for the High Availability / Disaster Recovery (HA/DR) project at Chase Bank. Trained bank staff and turned process over to them.
Confidential
Disaster Recovery Process Lead
Responsibilities:
- Responsible for establishing a line of business to create automated Disaster Recovery and Business Continuity plans for existing clients and prospects using the company’s line of products and services. Created a job stream that was responsible for recovering data files, applications, and services in a desired sequence. When completed site was recovered to new destination successfully.
Confidential
Business Continuity Analyst
Responsibilities:
- I worked as the NYC lead consultant responsible for converting the “General Business and Financial Services” Line of Business (LOB) for Confidential to the LDRPS Release 10 program product from Strohl / SunGard.
Confidential
Engagement Manager - Technology Risk Management
Responsibilities:
- Responsible for performing cyber security reviews through IT Audits, IT Sarbanes Oxley Surveys, IT Risk Assessments using COSO and CERT RMM, Business Continuity Planning, IT Security, overseeing Basel II and SSAE Supply Chain audits, and many other functions devoted to selling and closing client contracts for Technology Risk Management services. Directed personnel assigned to Technology Risk Management tasks and performed Project Management over concurrent activities assigned to my staff (adhere to FFIEC, ISO, and NIST standards)