Add doc for disaster recovery volume
Longhorn issue 535
This commit is contained in:
parent
5a7a3cb755
commit
e7c9e7c197
@ -211,6 +211,7 @@ More examples are available at `./examples/`
|
||||
### [Deal with Kubernetes node failure](./docs/node-failure.md)
|
||||
### [Use CSI driver on RancherOS/CoreOS + RKE or K3S](./docs/csi-config.md)
|
||||
### [Restore a backup to an image file](./docs/restore-to-file.md)
|
||||
### [Disaster Recovery Volume](./docs/dr-volume.md)
|
||||
|
||||
# Troubleshooting
|
||||
You can click `Generate Support Bundle` link at the bottom of the UI to download a zip file contains Longhorn related configuration and logs.
|
||||
|
53
docs/dr-volume.md
Normal file
53
docs/dr-volume.md
Normal file
@ -0,0 +1,53 @@
|
||||
# Disaster Recovery Volume
|
||||
## What is Disaster Recovery Volume?
|
||||
To increase the resiliency of the volume, Longhorn supports disaster recovery volume.
|
||||
|
||||
The disaster recovery volume is designed for the backup cluster in the case of the whole main cluster goes down.
|
||||
A disaster recovery volume is normally in standby mode. User would need to activate it before using it as a normal volume.
|
||||
A disaster recovery volume can be created from a volume's backup in the backup store. And Longhorn will monitor its
|
||||
original backup volume and incrementally restore from the latest backup. Once the original volume in the main cluster goes
|
||||
down and users decide to activate the disaster recovery volume in the backup cluster, the disaster recovery volume can be
|
||||
activated immediately in the most condition, so it will greatly reduced the time needed to restore the data from the
|
||||
backup store to the volume in the backup cluster.
|
||||
|
||||
## How to create Disaster Recovery Volume?
|
||||
1. In the cluster A, make sure the original volume X has backup created or recurring backup scheduling.
|
||||
2. Set backup target in cluster B to be same as cluster A's.
|
||||
3. In backup page of cluster B, choose the backup volume X then create disaster recovery volume Y. It's highly recommended
|
||||
to use backup volume name as disaster volume name.
|
||||
4. Attach the disaster recovery volume Y to any node. Then Longhorn will automatically polling for the last backup of the
|
||||
volume X, and incrementally restore it to the volume Y.
|
||||
5. If volume X is down, users can activate volume Y immediately. Once activated, volume Y will become a
|
||||
normal Longhorn volume.
|
||||
5.1. Notice that deactivate a normal volume is not allowed.
|
||||
|
||||
## About Activating Disaster Recovery Volume
|
||||
1. A disaster recovery volume doesn't support creating/deleting/reverting snapshot, creating backup, creating
|
||||
PV/PVC. Users cannot update `Backup Target` in Settings if any disaster recovery volumes exist.
|
||||
|
||||
2. When users try to activate a disaster recovery volume, Longhorn will check the last backup of the original volume. If
|
||||
it hasn't been restored, the restoration will be started, and the activate action will fail. Users need to wait for
|
||||
the restoration to complete before retrying.
|
||||
|
||||
3. For disaster recovery volume, `Last Backup` indicates the most recent backup of its original backup volume. If the icon
|
||||
representing disaster volume is gray, it means the volume is restoring `Last Backup` and users cannot activate this
|
||||
volume right now; if the icon is blue, it means the volume has restored the `Last Backup`.
|
||||
|
||||
## RPO and RTO
|
||||
Typically incremental restoration is triggered by the periodic backup store update. Users can set backup store update
|
||||
interval in `Setting - General - Backupstore Poll Interval`. Notice that this interval can potentially impact
|
||||
Recovery Time Objective(RTO). If it is too long, there may be a large amount of data for the disaster recovery volume to
|
||||
restore, which will take a long time. As for Recovery Point Objective(RPO), it is determined by recurring backup
|
||||
scheduling of the backup volume. You can check [here](snapshot-backup.md) to see how to set recurring backup in Longhorn.
|
||||
|
||||
e.g.:
|
||||
|
||||
If recurring backup scheduling for normal volume A is creating backup every hour, then RPO is 1 hour.
|
||||
|
||||
Assuming the volume creates backup every hour, and incrementally restoring data of one backup takes 5 minutes.
|
||||
|
||||
If `Backupstore Poll Interval` is 30 minutes, then there will be at most one backup worth of data since last restoration.
|
||||
The time for restoring one backup is 5 minute, so RTO is 5 minutes.
|
||||
|
||||
If `Backupstore Poll Interval` is 12 hours, then there will be at most 12 backups worth of data since last restoration.
|
||||
The time for restoring the backups is 5 * 12 = 60 minutes, so RTO is 60 minutes.
|
Loading…
Reference in New Issue
Block a user