2.6 KiB
Concurrent Backup Restore Per Node Limit
Summary
Longhorn has no boundary on the number of concurrent volume backup restoring.
Having a new concurrent-backup-restore-per-node-limit
setting allows the user to limit the concurring backup restoring. Setting this restriction lowers the potential risk of overloading the cluster when volumes restoring from backup concurrently. For ex: during the Longhorn system restore.
Related Issues
https://github.com/longhorn/longhorn/issues/4558
Motivation
Goals
Introduce a new concurrent-backup-restore-per-node-limit
setting to define the boundary of the concurrent volume backup restoring.
Non-goals
None
Proposal
- Introduce a new
concurrent-backup-restore-per-node-limit
setting. - Track the number of per-node volumes restoring from backup with atomic count (thread-safe) in the engine monitor.
User Stories
Allow the user to set the concurrent backup restore per node limit to control the risk of cluster overload when Longhorn volume is restoring from backup concurrently.
User Experience In Detail
- Longhorn holds the engine backup restore when the number of volume backups restoring on a node reaches the
concurrent-backup-restore-per-node-limit
. - The volume backup restore continues when the number of volume backups restoring on a node is below the limit.
Design
Implementation Overview
The concurrent-backup-restore-per-node-limit
Setting
This setting controls how many engines on a node can restore the backup concurrently.
Longhorn engine monitor backs off when the volume backup restoring count reaches the setting limit.
Set the value to 0 to disable backup restore.
Category = SettingCategoryGeneral,
Type = integer
Default = 5 # same as the default replica rebuilding number
Track the volume backup restoring per node
-
Create a new atomic counter in the engine controller.
type EngineController struct { restoringCounter util.Counter }
-
Pass the restoring counter to each of its engine monitors.
type EngineMonitor struct { restoringCounter util.Counter }
-
Increase the restoring counter before backup restore.
Ignore DR volumes (volume.Status.IsStandby).
-
Decrease the restoring counter when the backup restore caller method ends
Test plan
- Test the setting should block backup restore when creating multiple volumes from the backup at the same time.
- Test the setting should be per-node limited.
- Test the setting should not have effect on DR volumes.
Upgrade strategy
None
Note [optional]
None