8000 Add DR Arch Options, Multi-Region Deployment, Multi-Region Failover by vkumbha · Pull Request #309 · turbot/guardrails-docs · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add DR Arch Options, Multi-Region Deployment, Multi-Region Failover #309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Mar 24, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
3acba60
Add DR Arch Options
vkumbha Mar 10, 2025
f613f75
update table format
vkumbha Mar 10, 2025
2b262de
Updates
rajlearner17 Mar 12, 2025
0e2f486
updates
rajlearner17 Mar 12, 2025
c6c72c3
updates
rajlearner17 Mar 12, 2025
02c008e
Update sidebar and images
vkumbha Mar 13, 2025
f515c3d
Add title and sidebar label
vkumbha Mar 13, 2025
2a67b3a
update subtitles
vkumbha Mar 13, 2025
4e9eb28
reduce sidebar label length
vkumbha Mar 13, 2025
5779fd6
rename titles
vkumbha Mar 13, 2025
4cc10e3
update images path
vkumbha Mar 13, 2025
d0aa4ce
remove multiregion doc
vkumbha Mar 13, 2025
82a0693
Updates
rajlearner17 Mar 13, 2025
0cd4763
Updates
rajlearner17 Mar 13, 2025
696c66a
Updates
rajlearner17 Mar 13, 2025
4e51b45
Add guide - Disaster Recovery using multi-region architrcture. Closes…
vkumbha Mar 13, 2025
999d4e3
workspace restore
vkumbha Mar 13, 2025
49bc1af
too much bold everywhere
vkumbha Mar 13, 2025
4a0d533
Add multi-region failover guide
vkumbha Mar 13, 2025
458d40c
Updates
rajlearner17 Mar 13, 2025
6904c65
remove bold for titles
vkumbha Mar 13, 2025
5928348
update RPO for tier3
vkumbha Mar 13, 2025
4fcbb95
Update images and broken links for Getting Started sections. Closes #…
8000 RahulSrivastav14 Mar 13, 2025
472d9ca
Update images and broken links for Using Guardrails. Closes #313 (#319)
RahulSrivastav14 Mar 13, 2025
023da0d
Add guide -Setting up Turbot Workspace Retention Policy Closes #321 (…
rajlearner17 Mar 21, 2025
0388daf
Updates
rajlearner17 Mar 22, 2025
e5c8a0f
Typo fix
rajlearner17 Mar 22, 2025
3102b1c
Format update
rajlearner17 Mar 22, 2025
8509e67
Updates
rajlearner17 Mar 22, 2025
de646dc
Merge branch 'main' into update-dr-intro
rajlearner17 Mar 23, 2025
0fb402e
Updates
rajlearner17 Mar 23, 2025
b17a688
Updates to retail old one for review
rajlearner17 Mar 23, 2025
35bf8f4
Add the landing index page with references
rajlearner17 Mar 23, 2025
8bd4388
Updates
rajlearner17 Mar 23, 2025
fe705a9
Update Multi-Region Failover with Guardrails
rajlearner17 Mar 23, 2025
f5c44b2
Updates
rajlearner17 Mar 23, 2025
ec5f073
Updates
rajlearner17 Mar 23, 2025
9354155
Updates
rajlearner17 Mar 24, 2025
8549cbb
Updates
rajlearner17 Mar 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
---
title: Architecture Options
sidebar_label: Architecture Options
---

# Architecture Options

In this guide, you will:

- Explore architectural considerations for deploying Turbot Guardrails.
- Understand different options available based on organizational risk and availability requirements.


Turbot Guardrails is a comprehensive governance platform that automates discovery, compliance, security, and operational remediation tasks across cloud environments. Due to its critical role as a security and compliance control plane, it's essential to configure Guardrails with high availability and disaster recovery in mind.

This document outlines various architectural options to help you select an approach aligned with your organization's specific high availability (HA) and disaster recovery (DR) needs, based on your risk tolerance and operational requirements.


| Tier | Account | Region | Availability Zone | Availability | RTO | RPO | Use Cases |
|----------|---------------|-----------------|-------------------|--------------|-----|-----|----------------------------------------------|
| Tier1 | Single-account | Single-region | Single-AZ | 99% | 4 Hr | 4 Hr | Development and non-prod environments |
| Tier2 | Single-account | Single-region | Multi-AZ | 99.9% | 4 Hr | 4 Hr | Production without rapid DR requirements |
| Tier3 | Single-account | Multi-region | Multi-AZ | 99.9% | 2 Hr | 2 Hr | Production requiring rapid DR |
| Tier4 | Multi-account | Multi-region | Multi-AZ | 99.99% | 0 Hr | 0 Hr | Mandated zero downtime DR |

<!-- - **Tier 1** – Single-account, single-region, single availability zone.

- 99% Availability
- RTO: 4 Hr.
- RPO: 4 Hr.
- Use cases: Development and non-prod environments

- **Tier 2** – Single-account, single-region, multi-availability zone.

- 99.9% Availability
- RTO: 4 Hr.
- RPO: 4 Hr.
- Use cases: Production deployments without need for rapid DR

- **Tier 3** – Single-account, multi-region, multi-availability zone.

- 99.9% Availability
- RTO: 2 Hr.
- RPO: 2 Hr.
- Use cases: Production deployments with need for rapid DR

- **Tier 4** – Multi-account, multi-region, multi-availability zone.
- 99.99% Availability
- RTO: 0 Hr.
- RPO: 0 Hr.
- Use cases: Mandated zero downtime DR -->

## Tier 1: Development

**Key Characteristics**: Single-account, single-region, single availability zone.

This deployment option is appropriate for non-production and development workspaces, where high-availability and disaster recovery are not important for the accounts monitored by guardrails.

This is the lowest cost infrastructure deployment option available.

![Tier 1 DR Architecture](/images/docs/guardrails/guides/hosting-guardrails/disaster-recovery/architecture-options/tier-1.png)

This deployment uses one primary RDS instance without a failover configuration. Recovery can be performed from RDS point-in-time backups.

## Tier 2: High Availability

**Key Characteristics**: Single-account, single-region, multi-availability zone.

This deployment option is appropriate for all production usage. It is the most cost-effective deployment option for production use cases and has the capability to achieve 4hr RPO/RTO in all circumstances except the loss of an entire AWS Region.

![Tier 2 DR Architecture](/images/docs/guardrails/guides/hosting-guardrails/disaster-recovery/architecture-options/tier-2.png)

The changes in this deployment vs the **Tier 1 DR** architecture are:

1. The ECS compute cluster is deployed across multiple availability zones.
2. Lambda are deployed across multiple availability zones.
3. An RDS failover instance is deployed in a second availability zone.
4. An Elasticache failover instance is deployed in a second availability zone.

## Tier 3: Multi-Region

**Key Characteristics**: Single-account, multi-region, multi-availability zone.

This deployment option is appropriate when regulatory requirements demand that a multi-region solution be implemented, or when requirements drive less than a 4hr RTO/RPO. It has the benefit of being resilient to the loss of an entire AWS Region.

![Tier 3 DR Architecture](/images/docs/guardrails/guides/hosting-guardrails/disaster-recovery/architecture-options/tier-3.png)

The key difference between this deployment is that a second Turbot Guardrails deployment is created in the standby region. The compute cluster will be set to be dormant, and no inbound events will be received by the cluster. On declaration of a disaster, DNS will be changed to send events to this region, while the database is recovered from a cross region RDS snapshot. Once the DB is recovered, the workspace is enabled, and events will start processing from the queue.

To use this pattern, [cross-region RDS backups](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReplicateBackups.html) must be configured in this account to ensure the DB can be restored in the target region without access to KMS in the primary region. This option also requires the use of AWS API Gateway, and a public DNS endpoint and SSL certificate to allow redirection of inbound real-time events between regions.

## Tier 4: Multi-Account

**Key Characteristics**: Multi-account, multi-region, multi-availability zone.

The **Tier 4** deployment option should be considered for any organization with zero RTO/RPO requirements. This deployment option allows for instantaneous failover between two active Guardrails environments. We use the “Change Window” feature of guardrails to prevent one of the implementations from executing any enforcements. Upon declaration of an emergency, the standby environment change window can be removed allowing that environment to become the primary and enforce changes.

In normal day to day operation, both environments consume cloud events and maintain independent CMDB databases. This pattern results in both doubling the infrastructure and per control usage costs for Guardrails if employed.

![Tier 4 DR Architecture](/images/docs/guardrails/guides/hosting-guardrails/disaster-recovery/architecture-options/tier-4.png)

Care must be made in this configuration to ensure that policy packs and account onboarding/offboarding is done across both environments in tandem, using the Guardrails Terraform provider to maintain consistency between the deployments. Custom scripting may be necessary to periodically check to ensure both environments are identical in configuration, to meet your organizations DR requirements.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 7 additions & 3 deletions docs/guides/hosting-guardrails/disaster-recovery/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,13 @@ This section provides detailed step-by-step instructions on how to use DR featur

| Guide | Description
| - | -
| [Hive Restore](guides/hosting-guardrails/disaster-recovery/restore) | Guides to restore a Guardrails database from RDS snapshot.
| [DR Testing](guides/hosting-guardrails/disaster-recovery/dr-testing) | Guides to restore a destroyed workspace.
| [Database Upgrade and Storage Optimization](guides/hosting-guardrails/disaster-recovery/database-upgrade-storage-optimization) | Guides to resize and/or upgrade a database engine version with minimal downtime.
| [Architecture Options](guides/hosting-guardrails/disaster-recovery/architecture-options) | Architecture Options.
| [Hive Restore](guides/hosting-guardrails/disaster-recovery/hive-restore) | Guides to restore a Guardrails database from RDS snapshot.
| [Workspace Restore](guides/hosting-guardrails/disaster-recovery/restore-workspace) | Guides to restore a destroyed workspace.
| [Multi-Region Deployment](guides/hosting-guardrails/disaster-recovery/multi-region-deployment) | Guides to set up a multi-region deployment of Turbot Guardrails using Tier 3 architecture.
| [Multi-Region Failover](guides/hosting-guardrails/disaster-recovery/multi-region-failover) | Guides to set up Disaster Recovery (DR) failover for Turbot Guardrails Multi-Region deployment.

<!-- | [Database Upgrade and Storage Optimization](guides/hosting-guardrails/disaster-recovery/database-upgrade-storage-optimization) | Guides to resize and/or upgrade a database engine version with minimal downtime. -->

## Additional Assistance

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
---
title: Multi-Region Deployment
sidebar_label: Multi-Region Deployment
---

# Multi-Region Deployment

## 1. Introduction

### 1.1 Purpose

This document outlines the setup plan for deploying the **Turbot Guardrails** application using the **Tier 3** architecture. The objective is to ensure high availability, minimize downtime, and reduce data loss in the event of a disaster by utilizing a multi-region and multi-availability zone (AZ) deployment strategy.

### 1.2 Scope

This setup applies to all production workloads deployed under the **Tier 3** architecture, guaranteeing high availability and fast recovery.

### 1.3 Target Audience

This guide is intended for **Guardrails Administrators** with experience in AWS cloud infrastructure management and Guardrails deployment. Familiarity with database recovery and restoration processes is beneficial.

## 2. Disaster Recovery Objectives

| Objective | Definition |
| ------------------------------ | -------------------------------------------------------- |
| Recovery Time Objective (RTO) | 2 Hours |
| Recovery Point Objective (RPO) | 2 Hour |
| Availability | 99.9% |
| Use Case | Production deployments requiring rapid disaster recovery |

## 3. Tier 3 Deployment Architecture

### 3.1 Overview

The **Tier 3** architecture enhances resilience by deploying a **standby environment in a secondary AWS region**. The primary and standby environments adhere to the following principles:

- Installation of **TEF, TED, and TE** will follow the steps outlined in the [main installation guide](https://turbot.com/guardrails/docs/guides/hosting-guardrails/installation).
- Below is a list of differences or key considerations for installations where multi-region disaster recovery (DR) is required.

<!-- - **Cross-region RDS snapshots** for database backup and recovery.
- **Multi-AZ deployment** for compute and storage redundancy.
- **DNS failover** to redirect traffic to the standby region in case of a primary region failure. -->

### 3.2 Architecture Diagram

![Tier 3 Architecture](/images/docs/guardrails/guides/hosting-guardrails/disaster-recovery/multi-region-deployment/tier-3.png)

## 4. Prerequisites

### 4.1 Glossary

- **Primary Region**: The main region where Turbot Guardrails is installed or will be installed. This region acts as the active environment.
- **Disaster Recovery (DR) Region**: The secondary region where the workspace will be failed over in case of a disaster.

### 4.2 Assumptions

This guide assumes the following setup for deploying Turbot Guardrails:

- A **predefined VPC** (not created by Turbot Guardrails).
- **DNS records** are not managed by Turbot Guardrails.
- **IAM roles** are not provisioned by Turbot Guardrails.
- **API Gateway with an internal load balancer** is used.

### 4.3 Key Considerations

#### VPC Configuration

A predefined VPC with subnets mirroring the primary region must be set up in the DR region.

#### SSL Certificate

- Ensure the certificate is valid and available in both primary and DR regions.
- If the certificate includes a wildcard domain (e.g., `*.cloudportal.company.com`), no additional changes are required.
- Otherwise, the certificate should be configured to trust the following domains for API Gateway:
- `gateway.cloudportal.company.com` (Primary region)
- `gateway-dr.cloudportal.company.com` (DR region)

#### Workspace Configuration

- A **single additional workspace** will be installed in the DR region.
- The domain for the DR workspace will follow the pattern: `{workspace_name}-dr.cloudportal.company.com`.

#### Product Version Requirements

Both regions must run the following minimum versions:

- **TEF:** 1.66.0
- **TED:** 1.45.0
- **TE:** 5.49.0
- **Turbot Resource Name Prefix** should be identical in both regions. Defaults to `turbot`.

### 4.4 Differences Between Primary and DR Regions

| Configuration | Primary Region | DR Region |
|--------------|----------------|------------|
| **TEF Configuration** | • SSL certificate must cover required domains | • SSL certificate must cover required domains |
| | • "API Gateway prefix" parameter set to `gateway` | • "API Gateway prefix" parameter set to `gateway-dr` |
| | • "Guardrails multi-region KMS Key Type" set to `Primary` | • "Guardrails multi-region KMS Key Type" set to KMS key ARN from primary region (alias: `turbot_guardrails`, prefixed with `mrk-`) |
| | | • Manual creation of custom domain names (`gateway.cloudportal.company.com`) for API Gateway |
| **TED Configuration** | • Database name must be identical in both regions | • Database name must be identical in both regions |
| **RDS Configuration** | • Manual configuration of [cross-region RDS DB snapshots](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReplicateBackups.html) with appropriate retention policies | - |

> [!WARNING]
> When setting up TEF in the DR region, ensure a smooth deployment to avoid rollback issues. If a replica key is created and a rollback is required, the replica key cannot be deleted immediately and will be subject to a 7-day retention period unless removed with AWS Support assistance. **You can create only one replica of each primary key in each AWS Region.**

> If necessary, complete the TEF setup in the DR region by setting the Guardrails multi-region KMS Key Type (under Advanced - Deployment) to Primary. Once the setup is successfully completed, update the parameter to Replica and delete the multi-region key created in the DR region.

### 4.5 Workspace Deployment in DR Region

- Create a **test workspace** in the DR region.
- Install the same set of **mods** as in the primary region to ensure consistency.

#### Context

Creating a test workspace in the DR region is essential because manually installing mods during an actual disaster recovery scenario can be time-consuming and might lead to delays exceeding your Recovery Time Objective (RTO) and Recovery Point Objective (RPO). By preparing a sandbox workspace in advance in the DR region, you can install mods proactively using the same automation methods (such as pipelines, Terraform scripts, or AutoMod updates) and schedules employed for your primary workspace. This ensures that your DR workspace remains continuously up-to-date and can quickly and reliably take over workloads if your primary workspace experiences downtime.

## 5. Implementation Steps

### 5.1 Setting Up Cross-Region Database Backup

- Navigate to the AWS RDS Service in the Primary region.
- Click on "Automated backups".
- Under the "Current Region" tab, select the Turbot Guardrails database (e.g., `turbot-newton`).
- Select the Guardrails database, click on the "Actions" dropdown button, and choose "Manage cross-Region replication".
- A "Manage cross-Region replication" window will open.
- Check the "Enable replication in another AWS Region" option.

![Enable cross-Region replication](/images/docs/guardrails/guides/hosting-guardrails/disaster-recovery/multi-region-deployment/enable-crossregion-replication.png)

- Fill in the necessary details in the form:
- Destination Region: Select the "DR region".
- Replicated backup retention period: Choose the appropriate retention period in days.
- AWS KMS Key: Select the encryption key used for the Turbot database in the DR region. Typically, this follows the format "turbot_databasename" (e.g., `turbot_newton`).
- Validate the KMS Key ID: Navigate to the KMS service in the DR region to confirm the correct key.

![Manage cross-Region replication](/images/docs/guardrails/guides/hosting-guardrails/disaster-recovery/multi-region-deployment/manage-crossregion-replication.png)

- Click **Save** to complete the setup.
- Navigate to the **"Replicated"** tab and verify that the database is listed under **"Replicated backups"**.

### 5.2 Configuring Workspaces in the Primary Region

- Make sure to set the following policies on the Guardrails workspace:

- `Turbot > Workspace > Gateway Domain Name`: Fully qualified domain name of the publicly accessible gateway to the workspace - for example, `gateway.turbot.acme.com`. Set to the domain name only, do not include protocol or path information.

- `Turbot > Workspace > Domain Name`: Fully qualified domain name of the workspace - for example, `console.turbot.acme.com`. Set to the domain name only, do not include protocol or path information.

### 5.3 Configuring API Gateway Custom Domain Name in the DR Region

To ensure seamless failover in the DR region, you need to configure the "API Gateway Custom Domain Name".

- Open the AWS API Gateway service in the "DR region".
- Verify that the custom domain `gateway-dr.cloudportal.company.com` is already present.
- Click on "Add domain name".
- Enter the same domain name as in the primary region: `gateway.cloudportal.company.com`.
- Configure the following settings:

![Add domain name](/images/docs/guardrails/guides/hosting-guardrails/disaster-recovery/multi-region-deployment/add-domain-name.png)

- Type: Public
- API endpoint type: Regional
- Minimum TLS version: TLS 1.2
- ACM Certificate: Select the ACM Certificate created for Turbot Guardrails. This certificate should be configured to trust both `gateway.cloudportal.company.com` and `gateway-dr.cloudportal.company.com`.
- Click "Add domain name" to finalize the setup.
- Once created, navigate to the "Custom domain name" settings and open the "API mappings" tab.
- Click on "Configure API mappings", then select "Add new mapping".

![Configure API mappings](/images/docs/guardrails/guides/hosting-guardrails/disaster-recovery/multi-region-deployment/configure-api-mappings.png)

- Set the following values:
- API: Select `turbot-api`.
- Stage: Choose `turbot`.
- Path (optional): Leave blank.
- Click **Save** to apply the changes.

### 5.4 Configuring DNS records

Ensure that the following DNS records are correctly configured to route traffic appropriately:

- API Gateway DNS Record:

The domain `gateway.cloudportal.company.com` should have an **A** record pointing to the API Gateway endpoint in the primary region. The API Gateway endpoint typically follows the format: `abcdefghij.execute-api.us-east-1.amazonaws.com`

- Workspace Console DNS Record:

The domain `console.cloudportal.company.com` should have a **CNAME** record pointing to the internal load balancer DNS name in the primary region. This internal load balancer DNS name generally follows the format: `internal-turbot-5-49-0-lb-1234567890.us-east-1.elb.amazonaws.com`

## Additional Assistance

Turbot Support is happy to consult with Enterprise customers to help determine a strategy to manage these scenarios. Contact us at [help@turbot.com](mailto:help@turbot.com).
Loading
0