14 KiB
BackingImage Backup Support
Summary
This feature enables Longhorn to backup the BackingImage to backup store and restore it.
Related Issues
- [FEATURE] Restore BackingImage for BackupVolume in a new cluster #4165
Motivation
Goals
- When a Volume with a BackingImage being backed up, the BackingImage will also be backed up.
- User can manually back up the BackingImage.
- When restoring a Volume with a BackingImage, the BackingImage will also be restored.
- User can manually restore the BackingImage.
- All BackingImages are backed up in blocks.
- If the block contains the same data, BackingImages will reuse the same block in backup store instead of uploading another identical one.
Proposal
User Stories
With this feature, there is no need for user to manually handle BackingImage across cluster when backing up and restoring the Volumes with BackingImages.
User Experience In Detail
Before this feature: The BackingImage will not be backed up automatically when backing up a Volume with the BackingImage. So the user needs to prepare the BackingImage again in another cluster before restoring the Volume back.
After this feature: A BackingImage will be backed up automatically when a Volume with the BackingImage is being backed up. User can also manually back up a BackingImage independently. Then, when the Volume with the BackingImage is being restored from backup store, Longhorn will restore the BackingImage at the same time automatically. User can also manually restore the BackingImage independently.
This improve the user experience and reduce the operation overhead.
Design
Implementation Overview
Backup BackingImage - BackupStore
-
Backup
BackingImage
is not the same as backupVolume
which consists of a series ofSnapshots
. Instead, aBackingImage
already has all the blocks we need to backup. Therefore, we don't need to find the delta between twoBackingImages
like what we do forSnapshots
which delta might exist in otherSnapshots
between the currentSnapshot
and the last backupSnapshot
. -
All the
BackingImages
share the same block pools in backup store, so we can reuse the blocks to increase the backup speed and save the space. This can happen when user create v1BackingImage
, use the image to add more data and then export another v2BackingImage
. -
For restoration, we still restore fully on one of the ready disk.
-
Different from
Volume
backup,BackingImage
does not have any size limit. It can be less than 2MB or not a multiple of 2MB. Thus, the last block might not be 2MB. -
When backing up
BackingImage
preload()
: the BackingImage to get the all the sectors that have data inside.createBackupBackingMapping()
: to get all the blocks we need to backup- Block: offset + size (2MB for each block, last block might less than 2MB)
backupMappings()
: write the block to the backup store- if the block is already in the backup store, skip it.
saveBackupBacking()
: save the metadata of theBackupBackingImage
including the block mapping to the backup store. Mapping needs to include block size.
-
When restoring
BackingImage
loadBackupBacking()
: load the metadata of theBackupBackingImage
from the backup storepopulateBlocksForFullRestore() + restoreBlocks()
: based on the mapping, write the block data to the correct offset.
-
We backup the blocks in async way to increase the backup speed.
-
For qcow2
BackingImage
, the format is not the same as raw file, we can't detect the hole and the data sector. So we back up all the blocks.
Backup BackingImage - Controller
-
Add a new CRD
backupbackingimage.longhorn.io
type BackupBackingImageSpec struct { Labels map[string]string `json:"labels"` BackingImageName string `json:"backingImageName"` SyncRequestedAt metav1.Time `json:"syncRequestedAt"` } type BackupBackingImageStatus struct { OwnerID string `json:"ownerID"` Checksum string `json:"checksum"` URL string `json:"url"` Size string `json:"size"` Labels map[string]string `json:"labels"` State BackupBackingImageState `json:"state"` Progress int `json:"progress"` Error string `json:"error,omitempty"` Messages map[string]string `json:"messages"` ManagerAddress string `json:"managerAddress"` BackupCreatedAt string `json:"backupCreatedAt"` LastSyncedAt metav1.Time `json:"lastSyncedAt"` CompressionMethod BackupCompressionMethod `json:"compressionMethod"` }
type BackupBackingImageState string const ( BackupBackingImageStateNew = BackupBackingImageState("") BackupBackingImageStatePending = BackupBackingImageState("Pending") BackupBackingImageStateInProgress = BackupBackingImageState("InProgress") BackupBackingImageStateCompleted = BackupBackingImageState("Completed") BackupBackingImageStateError = BackupBackingImageState("Error") BackupBackingImageStateUnknown = BackupBackingImageState("Unknown") )
- Field
Spec.ManagerAddress
indicates the address of the backing-image-manager running BackingImage backup. - Field
Status.Checksum
records the checksum of the BackingImage. Users may create a new BackingImage with the same name but different content after deleting an old one or there is another BackingImage with the same name in another cluster. To avoid the confliction, we use checksum to check if they are the same. - If cluster already has the
BackingImage
with the same name as in the backup store, we still create theBackupBackingImage
CR. User can use the checksum to check if they are the same. Therefore we don't useUUID
across cluster since user might already prepare the same BackingImage with the same name and content in another cluster.
- Field
-
Add a new controller
BackupBackingImageController
.- Workflow
- Check and update the ownership.
- Do cleanup if the deletion timestamp is set.
- Cleanup the backup
BackingImage
on backup store - Stop the monitoring
- Cleanup the backup
- If
Status.LastSyncedAt.IsZero() && Spec.BackingImageName != ""
means it is created by the User/API layer, we need to do the backup- Start the monitor
- Pick one
BackingImageManager
- Request
BackingImageManager
to backup theBackingImage
by callingCreateBackup()
grpc
- Else it means the
BackupBackingImage
CR is created byBackupTargetController
and the backupBackingImage
already exists in the remote backup target before the CR creation.- Use
backupTargetClient
to get the info of the backupBackingImage
- Sync the status
- Use
- Workflow
-
In
BackingImageManager - manager(backing_image.go)
- Implement
CreateBackup()
grpc- Backup
BackingImage
to backup store in blocks
- Backup
- Implement
-
In controller
BackupTargetController
- Workflow
- Implement
syncBackupBackingImage()
function- Create the
BackupBackingImage
CRs whose name are in the backup store but not in the cluster - Delete the
BackupBackingImage
CRs whose name are in the cluster but not in the backup store - Request
BackupBackingImageController
to reconcile thoseBackupBackingImage
CRs
- Create the
- Implement
- Workflow
-
Add a backup API for
BackingImage
- Add new action
backup
toBackingImage
("/v1/backingimages/{name}"
)- create
BackupBackingImage
CR to init the backup process - if
BackupBackingImage
already exists, it means there is already aBackupBackingImage
in backup store, user can check the checksum to verify if they are the same.
- create
- API Watch: establish a streaming connection to report BackupBackingImage info.
- Add new action
-
Trigger
- Back up through
BackingImage
operation manually - Back up
BackingImage
when user back up the volume- in
SnapshotBackup()
API- we get the
BackingImage
of theVolume
- back up
BackingImage
if theBackupBackingImage
does not exist
- we get the
- in
- Back up through
Restoring BackingImage - Controller
- Add new data source type
restore
forBackingImageDataSource
type BackingImageDataSourceType string const ( BackingImageDataSourceTypeDownload = BackingImageDataSourceType("download") BackingImageDataSourceTypeUpload = BackingImageDataSourceType("upload") BackingImageDataSourceTypeExportFromVolume = BackingImageDataSourceType("export-from-volume") BackingImageDataSourceTypeRestore = BackingImageDataSourceType("restore") DataSourceTypeRestoreParameterBackupURL = "backup-url" ) // BackingImageDataSourceSpec defines the desired state of the Longhorn backing image data source type BackingImageDataSourceSpec struct { NodeID string `json:"nodeID"` UUID string `json:"uuid"` DiskUUID string `json:"diskUUID"` DiskPath string `json:"diskPath"` Checksum string `json:"checksum"` SourceType BackingImageDataSourceType `json:"sourceType"` Parameters map[string]string `json:"parameters"` FileTransferred bool `json:"fileTransferred"` }
- Create BackingImage APIs
- No need to change
- Create BackingImage CR with
type=restore
andrestore-url=${URL}
- If BackingImage already exists in the cluster, user can use checksum to verify if they are the same.
- Create BackingImage CR with
- No need to change
- In
BackingImageController
- No need to change, it will create the
BackingImageDataSource
CR
- No need to change, it will create the
- In
BackingImageDataSourceController
- No need to change, it will create the
BackingImageDataSourcePod
to do the restore.
- No need to change, it will create the
- In
BackingImageManager - data_source
- When init the service, if the type is
restore
, then restore frombackup-url
by requesting sync service in the same pod.requestURL := fmt.Sprintf("http://%s/v1/files", client.Remote) req, err := http.NewRequest("POST", requestURL, nil) q := req.URL.Query() q.Add("action", "restoreFromBackupURL") q.Add("url", backupURL) q.Add("file-path", filePath) q.Add("uuid", uuid) q.Add("disk-uuid", diskUUID) q.Add("expected-checksum", expectedChecksum)
- In
sync/service
implementrestoreFromBackupURL()
to restore theBackingImage
from backup store to the local disk.
- When init the service, if the type is
- In
BackingImageDataSourceController
- No need to change, it will take over control when
BackingImageDataSource
status isReadyForTransfer
. - If it failed to restore the
BackingImage
, the status of theBackingImage
will be failed andBackingImageDataSourcePod
will be cleaned up and retry with backoff limit liketype=download
. The process is the same as otherBackingImage
creation process.
- No need to change, it will take over control when
- Trigger
- Restore through
BackingImage
operation manually - Restore when user restore the
Volume
withBackingImage
- Restoring a Volume is actually requesting
Create
a Volume withfromBackup
in the spec - In
Create()
API we check if theVolume
hasfromBackup
parameters and hasBackingImage
- Check if
BackingImage
exists - Check and restore
BackupBackingImage
ifBackingImage
does not exist - Restore
BackupBackingImage
by creatingBackingImage
with typerestore
andbackupURL
- Then Create the
Volume
CR so the admission webhook won't failed because of missingBackingImage
(ref)
- Restoring a Volume is actually requesting
- Restore when user create
Volume
throughCSI
- In
CreateVolume()
we check if theVolume
hasfromBackup
parameters and hasBackingImage
- In
checkAndPrepareBackingImage()
, we restoreBackupBackingImage
by creatingBackingImage
with typerestore
andbackupURL
- In
- Restore through
API and UI changes In Summary
-
longhorn-ui
:- Add a new page of
BackupBackingImage
likeBackup
- The columns on
BackupBackingImage
list page should be:Name
,Size
,State
,Created At
,Operation
. Name
can be clicked and will showChecksum
of theBackupBackingImage
State
:BackupBackingImageState
of theBackupBackingImage
CROperation
includesrestore
delete
- The columns on
- Add a new operation
backup
for everyBackingImage
in theBackingImage
page
- Add a new page of
-
API
:- Add new action
backup
toBackingImage
("/v1/backingimages/{name}"
)- create
BackupBackingImage
CR to init the backup process
- create
BackupBackingImage
GET "/v1/backupbackingimages"
: get allBackupBackingImage
- API Watch: establish a streaming connection to report
BackupBackingImage
info change.
- Add new action
Test plan
Integration tests
-
BackupBackingImage
Basic Operation- Setup
- Create a
BackingImage
- Setup the backup target
- Create a
- Back up
BackingImage
BackupBackingImage
CR should be complete
- Delete the
BackingImage
in the cluster - Restore the
BackupBackingImage
- Checksum should be the same
- Setup
-
Back up
BackingImage
when backing up and restoring Volume- Setup
- Create a
BackingImage
- Setup the backup target
- Create a Volume with the
BackingImage
- Create a
- Backup the
Volume
BackupBackingImage
CR should be created and complete- Delete the
BackingImage
- Restore the Volume with same
BackingImage
BackingImage
should be restored and theVolume
should also be restored successfullyVolume
checksum is the same
- Setup
Manual tests
BackupBackingImage
reuse blocks- Setup
- Create a
BackingImage
A - Setup the backup target
- Create a
- Create a
Volume
withBackingImage
A, write some data and export to anotherBackingImage
B - Back up
BackingImage
A - Back up
BackingImage
B - Check it reuses the blocks when backing up
BackingImage
B (by trace log)
- Setup