diff --git a/enhancements/20200904-csi-snapshot-support.md b/enhancements/20200904-csi-snapshot-support.md new file mode 100644 index 0000000..bd25d26 --- /dev/null +++ b/enhancements/20200904-csi-snapshot-support.md @@ -0,0 +1,263 @@ +# CSI Snapshot Support + +## Summary + +To allow users to create/restore/delete backups programmatically, +we want to add support for the csi snapshot mechanism. +This way existing tools can be used to create kubernetes `VolumeSnapshot` api resources, +which we can react to in the csi plugin. + +### Related Issues + +https://github.com/longhorn/longhorn/issues/304 +https://github.com/longhorn/longhorn/issues/610 +https://github.com/longhorn/longhorn/issues/1127 +https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/csi-snapshot.md + +## Motivation + +Provide user with programmatic access for backup operations via the standardized csi interface. +https://kubernetes.io/docs/concepts/storage/volume-snapshots/#provisioning-volume-snapshot + +### Goals + +- add csi snapshot support to our csi driver + +### Non-goals + +- VolumeBackup crd refactor +- Changes to the longhorn backup code + +## Proposal + +- support csi CreateSnapshot call +- support csi DeleteSnapshot call +- support snapshot as ContentSource for restoration during CreateVolume calls + +### User Stories + +Currently, it's hard for users to interact with the backup system programmatically, +after this enhancement the users will be able to use the standard kubernetes csi mechanisms for +backup creation / deletion and restoration of a new volume based on a backup. + +### User Experience In Detail + +#### Backup creation via VolumeSnapshot resource + +The user can request a backup of a volume by creation of a kubernetes `VolumeSnapshot` object. +Example below for a volume named `test-vol` + +```yaml +apiVersion: snapshot.storage.k8s.io/v1beta1 +kind: VolumeSnapshot +metadata: + name: test-snapshot-pvc +spec: + volumeSnapshotClassName: longhorn + source: + persistentVolumeClaimName: test-vol +``` + +#### Restoration via VolumeSnapshot resource + +The user can request the creation of a volume based on a prior created `VolumeSnapshot` object. +Example below for a volume named `test-vol-restore` + +```yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: test-vol-restore +spec: + storageClassName: longhorn + dataSource: + name: test-vol-snapshot + kind: VolumeSnapshot + apiGroup: snapshot.storage.k8s.io + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 2Gi +``` + +#### Restoration of an existing Longhorn backup (pre-provisioning) + +The user can request the creation of a volume based on a prior longhorn backup, that was not created via the csi layer. +The user needs to create a `VolumeSnapshotContent` object with an associated `VolumeSnapshot` object. +The `snapshotHandle` of the `VolumeSnapshotContent` needs to point to an existing longhorn backup. +Example below for a volume named `test-restore-existing-backup` + +```yaml +apiVersion: snapshot.storage.k8s.io/v1beta1 +kind: VolumeSnapshotContent +metadata: + name: test-existing-backup +spec: + volumeSnapshotClassName: longhorn + driver: driver.longhorn.io + deletionPolicy: Delete + source: + # NOTE: change this to point to an existing backup on the backupstore + snapshotHandle: bs://test-vol/backup-625159fb469e492e + volumeSnapshotRef: + name: test-snapshot-existing-backup + namespace: default +``` + +```yaml +apiVersion: snapshot.storage.k8s.io/v1beta1 +kind: VolumeSnapshot +metadata: + name: test-snapshot-existing-backup +spec: + volumeSnapshotClassName: longhorn + source: + volumeSnapshotContentName: test-existing-backup +``` + +```yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: test-restore-existing-backup +spec: + storageClassName: longhorn + dataSource: + name: test-snapshot-existing-backup + kind: VolumeSnapshot + apiGroup: snapshot.storage.k8s.io + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 2Gi +``` + +#### Backup deletion via VolumeSnapshot resource + +The user can request the deletion of a backup by removing the associated `VolumeSnapshot` object. +If the DeletionPolicy is Delete, then the referenced longhorn backup will be deleted along with the `VolumeSnapshotContent` object. +If the DeletionPolicy is Retain, then both the referenced longhorn backup and `VolumeSnapshotContent` object remain. +The default DeletionPolicy is set to Delete, if the user wants to retain the longhorn backups, +the user can create a snapshotClass with DeletionPolicy set to Retain. +https://kubernetes.io/docs/concepts/storage/volume-snapshot-classes/#deletionpolicy + +Example below for a snapshot named `test-snapshot-pvc` +`kubectl delete volumesnapshots test-snapshot-pvc` + +Deletion is triggered by deleting the VolumeSnapshot object, and the DeletionPolicy will be followed. + +The default deletion policy + +### API changes + +no changes necessary + +## Design + +### Implementation Overview + +#### Implement CSI CreateSnapshot call + +The creation of a `VolumeSnapshot` resource triggers the creation of a longhorn snapshot afterwards, +that snapshot will be backed up via the longhorn backup mechanism to a user defined backupstore (s3, nfs). +While the backup isn't completed the csi snapshot will be marked not ready to use. + +The csi-snapshot-request name is generated by the csi snapshotter, based on the configured prefix + `VolumeSnapshot.uuid` +This information is also used to generate the `VolumeSnapshotContent.name = snapcontent-uuid` + +The csi-snapshotter will always call the longhorn csi plugin with the same csi snapshot request name for a specific Snapshot object. +This means we have to use the csi snapshotter generated name, for the longhorn snapshot +or introduce a CRD (VolumeBackup) so that we have a persisted way of associating +csiSnapshot - longhornSnapshot - longhornBackup + +If we would ignore the requested name, we would end up generating a new snapshot for each successive call, +since a backup can take a long time for very big volumes, that would not be the desired behavior. + +To be able to lookup the backup during a following `DeleteSnapshot` or `CreateVolume` call, +we encode the backupVolume and backupName as part of the snapshotID +which is returned from the csi CreateSnapshot call. + +this will be set as the `VolumeSnapshotContent.snapshotHandle` for the kubernetes created `VolumeSnapshotContent` object. +We use the format `type://backupVolume/backupName` where the default type equals `bs` for direct references to longhorn backups. + +This is so that in the future we can refer to a custom kubernetes resource, which we can use for backup metadata persistence. + +#### Implement CSI DeleteSnapshot call + +For backup deletion all we have todo is decode the snapshotID then trigger the backup delete calls via the longhorn api. + +#### Add CSI CreateVolume ContentSource support. + +For the volume creation based on a CSI snapshot we can decode the snapshotID to lookup the backup in the backupstore. +This will provide us with the backupURL which we can add to the `fromBackup` field in the longhorn created Volume resource. +As part of prior work longhorn already knows how to restore these backups, this was initially used for a StorageClass +parameter, by reusing the same mechanism we don't have to maintain multiple code paths for volume restoration. + +### Test plan + +See examples of the necessary yaml manifests in the `User Experience In Detail` section. + +Creation test: +- create volume +- write data to volume +- create a `VolumeSnapshot` object +- wait for `VolumeSnapshot` to be ready to use +- check for backup existence on the backupstore + +Deletion test: +- create volume +- write data to volume +- create a `VolumeSnapshot` object +- wait for `VolumeSnapshot` to be ready to use +- check for backup existence on the backupstore +- delete `VolumeSnapshot` object +- wait for backup removal from the backupstore + +Restore csi snapshot test: +- create volume +- write data to volume +- create a `VolumeSnapshot` object +- wait for `VolumeSnapshot` to be ready to use +- check for backup existence on the backupstore +- create PVC with content source set to the `VolumeSnapshot` object +- wait for volume restoration +- verify restored volume data == previously written data + +Restore existing longhorn backup test: +- create volume +- write data to volume +- create a longhorn backup +- check for backup existence on the backupstore +- create a `VolumeSnapshotContent` object pointing to the longhorn backup +- create a `VolumeSnapshot` object pointing to the `VolumeSnapshotContent` object +- create PVC with content source set to the `VolumeSnapshot` object +- wait for volume restoration +- verify restored volume data == previously written data + +### Upgrade strategy + +For csi snapshot support the user needs to update their kubernetes installation to at least 1.17 + +For environments where the user has pinned their csi images (airgap) +the users need to manually provide the following images: + +- longhornio/csi-provisioner:v1.6.0 +- longhornio/csi-snapshotter:v2.1.1 + +We upgraded the csi-provsioner from 1.4 to 1.6, which still supports kubernetes 1.13 as a minimum version + +## Note + +The CRDs and snapshot controller installations are the responsibility of the Kubernetes distribution. +See: https://kubernetes.io/docs/concepts/storage/volume-snapshots/#introduction + +We are discussing whether longhorn can provide these as part of the longhorn installation, but there isn't really +a good way of making sure that there isn't a snapshot controller already deployed in the cluster. + +Make sure your cluster contains the below crds, rancher rke did not deploy them for me. +https://github.com/kubernetes-csi/external-snapshotter/tree/master/client/config/crd + +Make sure your cluster contains the snapshot controller, rancher rke did not deploy it for me. +https://github.com/kubernetes-csi/external-snapshotter/tree/master/deploy/kubernetes/snapshot-controller