9.5 KiB
CSI Snapshot Support
Summary
To allow users to create/restore/delete backups programmatically,
we want to add support for the csi snapshot mechanism.
This way existing tools can be used to create kubernetes VolumeSnapshot
api resources,
which we can react to in the csi plugin.
Related Issues
https://github.com/longhorn/longhorn/issues/304 https://github.com/longhorn/longhorn/issues/610 https://github.com/longhorn/longhorn/issues/1127 https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/csi-snapshot.md
Motivation
Provide user with programmatic access for backup operations via the standardized csi interface. https://kubernetes.io/docs/concepts/storage/volume-snapshots/#provisioning-volume-snapshot
Goals
- add csi snapshot support to our csi driver
Non-goals
- VolumeBackup crd refactor
- Changes to the longhorn backup code
Proposal
- support csi CreateSnapshot call
- support csi DeleteSnapshot call
- support snapshot as ContentSource for restoration during CreateVolume calls
User Stories
Currently, it's hard for users to interact with the backup system programmatically, after this enhancement the users will be able to use the standard kubernetes csi mechanisms for backup creation / deletion and restoration of a new volume based on a backup.
User Experience In Detail
Backup creation via VolumeSnapshot resource
The user can request a backup of a volume by creation of a kubernetes VolumeSnapshot
object.
Example below for a volume named test-vol
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
name: test-snapshot-pvc
spec:
volumeSnapshotClassName: longhorn
source:
persistentVolumeClaimName: test-vol
Restoration via VolumeSnapshot resource
The user can request the creation of a volume based on a prior created VolumeSnapshot
object.
Example below for a volume named test-vol-restore
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-vol-restore
spec:
storageClassName: longhorn
dataSource:
name: test-vol-snapshot
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
Restoration of an existing Longhorn backup (pre-provisioning)
The user can request the creation of a volume based on a prior longhorn backup, that was not created via the csi layer.
The user needs to create a VolumeSnapshotContent
object with an associated VolumeSnapshot
object.
The snapshotHandle
of the VolumeSnapshotContent
needs to point to an existing longhorn backup.
Example below for a volume named test-restore-existing-backup
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshotContent
metadata:
name: test-existing-backup
spec:
volumeSnapshotClassName: longhorn
driver: driver.longhorn.io
deletionPolicy: Delete
source:
# NOTE: change this to point to an existing backup on the backupstore
snapshotHandle: bs://test-vol/backup-625159fb469e492e
volumeSnapshotRef:
name: test-snapshot-existing-backup
namespace: default
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
name: test-snapshot-existing-backup
spec:
volumeSnapshotClassName: longhorn
source:
volumeSnapshotContentName: test-existing-backup
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-restore-existing-backup
spec:
storageClassName: longhorn
dataSource:
name: test-snapshot-existing-backup
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
Backup deletion via VolumeSnapshot resource
The user can request the deletion of a backup by removing the associated VolumeSnapshot
object.
If the DeletionPolicy is Delete, then the referenced longhorn backup will be deleted along with the VolumeSnapshotContent
object.
If the DeletionPolicy is Retain, then both the referenced longhorn backup and VolumeSnapshotContent
object remain.
The default DeletionPolicy is set to Delete, if the user wants to retain the longhorn backups,
the user can create a snapshotClass with DeletionPolicy set to Retain.
https://kubernetes.io/docs/concepts/storage/volume-snapshot-classes/#deletionpolicy
Example below for a snapshot named test-snapshot-pvc
kubectl delete volumesnapshots test-snapshot-pvc
Deletion is triggered by deleting the VolumeSnapshot object, and the DeletionPolicy will be followed.
The default deletion policy
API changes
no changes necessary
Design
Implementation Overview
Implement CSI CreateSnapshot call
The creation of a VolumeSnapshot
resource triggers the creation of a longhorn snapshot afterwards,
that snapshot will be backed up via the longhorn backup mechanism to a user defined backupstore (s3, nfs).
While the backup isn't completed the csi snapshot will be marked not ready to use.
The csi-snapshot-request name is generated by the csi snapshotter, based on the configured prefix + VolumeSnapshot.uuid
This information is also used to generate the VolumeSnapshotContent.name = snapcontent-uuid
The csi-snapshotter will always call the longhorn csi plugin with the same csi snapshot request name for a specific Snapshot object. This means we have to use the csi snapshotter generated name, for the longhorn snapshot or introduce a CRD (VolumeBackup) so that we have a persisted way of associating csiSnapshot - longhornSnapshot - longhornBackup
If we would ignore the requested name, we would end up generating a new snapshot for each successive call, since a backup can take a long time for very big volumes, that would not be the desired behavior.
To be able to lookup the backup during a following DeleteSnapshot
or CreateVolume
call,
we encode the backupVolume and backupName as part of the snapshotID
which is returned from the csi CreateSnapshot call.
this will be set as the VolumeSnapshotContent.snapshotHandle
for the kubernetes created VolumeSnapshotContent
object.
We use the format type://backupVolume/backupName
where the default type equals bs
for direct references to longhorn backups.
This is so that in the future we can refer to a custom kubernetes resource, which we can use for backup metadata persistence.
Implement CSI DeleteSnapshot call
For backup deletion all we have todo is decode the snapshotID then trigger the backup delete calls via the longhorn api.
Add CSI CreateVolume ContentSource support.
For the volume creation based on a CSI snapshot we can decode the snapshotID to lookup the backup in the backupstore.
This will provide us with the backupURL which we can add to the fromBackup
field in the longhorn created Volume resource.
As part of prior work longhorn already knows how to restore these backups, this was initially used for a StorageClass
parameter, by reusing the same mechanism we don't have to maintain multiple code paths for volume restoration.
Test plan
See examples of the necessary yaml manifests in the User Experience In Detail
section.
Creation test:
- create volume
- write data to volume
- create a
VolumeSnapshot
object - wait for
VolumeSnapshot
to be ready to use - check for backup existence on the backupstore
Deletion test:
- create volume
- write data to volume
- create a
VolumeSnapshot
object - wait for
VolumeSnapshot
to be ready to use - check for backup existence on the backupstore
- delete
VolumeSnapshot
object - wait for backup removal from the backupstore
Restore csi snapshot test:
- create volume
- write data to volume
- create a
VolumeSnapshot
object - wait for
VolumeSnapshot
to be ready to use - check for backup existence on the backupstore
- create PVC with content source set to the
VolumeSnapshot
object - wait for volume restoration
- verify restored volume data == previously written data
Restore existing longhorn backup test:
- create volume
- write data to volume
- create a longhorn backup
- check for backup existence on the backupstore
- create a
VolumeSnapshotContent
object pointing to the longhorn backup - create a
VolumeSnapshot
object pointing to theVolumeSnapshotContent
object - create PVC with content source set to the
VolumeSnapshot
object - wait for volume restoration
- verify restored volume data == previously written data
Upgrade strategy
For csi snapshot support the user needs to update their kubernetes installation to at least 1.17
For environments where the user has pinned their csi images (airgap) the users need to manually provide the following images:
- longhornio/csi-provisioner:v1.6.0
- longhornio/csi-snapshotter:v2.1.1
We upgraded the csi-provsioner from 1.4 to 1.6, which still supports kubernetes 1.13 as a minimum version
Note
The CRDs and snapshot controller installations are the responsibility of the Kubernetes distribution. See: https://kubernetes.io/docs/concepts/storage/volume-snapshots/#introduction
We are discussing whether longhorn can provide these as part of the longhorn installation, but there isn't really a good way of making sure that there isn't a snapshot controller already deployed in the cluster.
Make sure your cluster contains the below crds, rancher rke did not deploy them for me. https://github.com/kubernetes-csi/external-snapshotter/tree/master/client/config/crd
Make sure your cluster contains the snapshot controller, rancher rke did not deploy it for me. https://github.com/kubernetes-csi/external-snapshotter/tree/master/deploy/kubernetes/snapshot-controller