diff --git a/enhancements/20220110-extend-csi-snapshot-to-support-longhorn-snapshot.md b/enhancements/20220110-extend-csi-snapshot-to-support-longhorn-snapshot.md new file mode 100644 index 0000000..e30f963 --- /dev/null +++ b/enhancements/20220110-extend-csi-snapshot-to-support-longhorn-snapshot.md @@ -0,0 +1,290 @@ +# Title + +Extend CSI snapshot to support Longhorn snapshot + +## Summary + +Before this feature, if the user uses [the CSI Snapshotter mechanism](https://kubernetes-csi.github.io/docs/snapshot-restore-feature.html), +they can only create Longhorn backups (out of cluster). We want to extend the CSI Snapshotter to support creating for +Longhorn snapshot (in-cluster) as well. + +### Related Issues + +https://github.com/longhorn/longhorn/issues/2534 + +## Motivation + +### Goals + +Extend the CSI Snapshotter to support: +* Creating Longhorn snapshot +* Deleting Longhorn snapshot +* Creating a new PVC from a CSI snapshot that is associated with a Longhorn snapshot + +### Non-goals + +* Longhorn snapshot Reverting is not a goal because CSI snapshotter doesn't support replace in place for now: + https://github.com/container-storage-interface/spec/blob/master/spec.md#createsnapshot + +## Proposal + +### User Stories + +Before this feature is implemented, users can only use CSI Snapshotter to create/restore Longhorn backups. +This means that users must set up a backup target outside of the cluster. Uploading/downloading data from +backup target is a long/costly operation. Sometimes, users might just want to use CSI Snapshotter to take +an in-cluster Longhorn snapshot and create a new volume from that snapshot. The Longhorn snapshot operation +is cheap and faster than the backup operation and doesn't require setting up a backup target. + +### User Experience In Detail + +To use this feature, users need to do: +1. Deploy the CSI snapshot CRDs, Controller as instructed at https://longhorn.io/docs/1.2.3/snapshots-and-backups/csi-snapshot-support/enable-csi-snapshot-support/ +1. Deploy a VolumeSnapshotClass with the parameter `type: longhorn-snapshot`. I.e., + ```yaml + kind: VolumeSnapshotClass + apiVersion: snapshot.storage.k8s.io/v1beta1 + metadata: + name: longhorn-snapshot + driver: driver.longhorn.io + deletionPolicy: Delete + parameters: + type: longhorn-snapshot + ``` +1. To create a new CSI snapshot associated with a Longhorn snapshot of the volume `test-vol`, users deploy the following VolumeSnapshot CR: + ```yaml + apiVersion: snapshot.storage.k8s.io/v1beta1 + kind: VolumeSnapshot + metadata: + name: test-snapshot + spec: + volumeSnapshotClassName: longhorn-snapshot + source: + persistentVolumeClaimName: test-vol + ``` + A new Longhorn snapshot is created for the volume `test-vol` +1. To create a new PVC from the CSI snapshot, users can deploy the following yaml: + ```yaml + apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + name: test-restore-snapshot-pvc + spec: + storageClassName: longhorn + dataSource: + name: test-snapshot + kind: VolumeSnapshot + apiGroup: snapshot.storage.k8s.io + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 5Gi # should be the same as the size of `test-vol` + ``` + A new PVC will be created with the same content as in the VolumeSnapshot `test-snapshot` +1. Deleting the VolumeSnapshot `test-snapshot` will lead to the deletion of the corresponding Longhorn snapshot of the volume `test-vol` + +### API changes +None + +## Design + +### Implementation Overview + +We follow the specification in [the CSI spec](https://github.com/container-storage-interface/spec/blob/master/spec.md#createsnapshot) when supporting the CSI snapshot. + +We define a new parameter in the VolumeSnapshotClass `type`. +The value of the parameter `type` can be `longhorn-snapshot` or `longhorn-backup`. +When `type` is `longhorn-snapshot` it means that the CSI VolumeSnapshot created with this VolumeSnapshotClass is associated with a Longhorn snapshot. +When `type` is `longhorn-backup` it means that the CSI VolumeSnapshot created with this VolumeSnapshotClass is associated with a Longhorn backup. + +In [CreateSnapshot function](https://github.com/longhorn/longhorn-manager/blob/878cfb868c568396d6ebfa4ce096c5d95d9b31e3/csi/controller_server.go#L539), we get the +value of parameter `type`. If it is `longhorn-backup`, we take a Longhorn backup as before. If it is `longhorn-snapshot` we do: +* Get the name of the Longhorn volume +* Check if the volume is in attached state. + If it is not, return `codes.FailedPrecondition`. + We cannot take a snapshot of non-attached volume. +* Check if a Longhorn snapshot with the same name as the requested CSI snapshot already exists. + If yes, return OK without taking a new Longhorn snapshot. +* Take a new Longhorn snapshot. Encode the snapshotId in the format `snap://volume-name/snapshot-name`. + This snaphotId will be used in the later CSI CreateVolume and DeleteSnapshot call. + +In [CreateVolume function](https://github.com/longhorn/longhorn-manager/blob/878cfb868c568396d6ebfa4ce096c5d95d9b31e3/csi/controller_server.go#L63): +* If the VolumeContentSource is a `VolumeContentSource_Snapshot` type, decode the snapshotId in the format from the above step. +* Create a new volume with the `dataSource` set to `snap://volume-name/snapshot-name`. This will trigger Longhorn to clone the content of the snapshot to the new volume. + Note that if the source volume is not attached, Longhorn cannot verify the existence of the snapshot inside the Longhorn volume. + This means that [the API will return error](https://github.com/longhorn/longhorn-manager/blob/878cfb868c568396d6ebfa4ce096c5d95d9b31e3/manager/volume.go#L347-L352) and new PVC cannot be provisioned. + +In [DeleteSnapshot function](https://github.com/longhorn/longhorn-manager/blob/878cfb868c568396d6ebfa4ce096c5d95d9b31e3/csi/controller_server.go#L675): +* Decode the snapshotId in the format from the above step. + If the type is `longhorn-backup` we delete the backup as before. + If the type is `longhorn-snapshot`, we delete the corresponding Longhorn snapshot of the source volume. + If the source volume or the snapshot is no longer exist, we return OK as specified in [the CSI spec](https://github.com/container-storage-interface/spec/blob/master/spec.md#deletesnapshot) + +### Test plan + +Integration test plan. + +1. Deploy the CSI snapshot CRDs, Controller as instructed at https://longhorn.io/docs/1.2.3/snapshots-and-backups/csi-snapshot-support/enable-csi-snapshot-support/ +1. Deploy 4 VolumeSnapshotClass: + ```yaml + kind: VolumeSnapshotClass + apiVersion: snapshot.storage.k8s.io/v1beta1 + metadata: + name: longhorn-backup-1 + driver: driver.longhorn.io + deletionPolicy: Delete + ``` + ```yaml + kind: VolumeSnapshotClass + apiVersion: snapshot.storage.k8s.io/v1beta1 + metadata: + name: longhorn-backup-2 + driver: driver.longhorn.io + deletionPolicy: Delete + parameters: + type: longhorn-backup + ``` + ```yaml + kind: VolumeSnapshotClass + apiVersion: snapshot.storage.k8s.io/v1beta1 + metadata: + name: longhorn-snapshot + driver: driver.longhorn.io + deletionPolicy: Delete + parameters: + type: longhorn-snapshot + ``` + ```yaml + kind: VolumeSnapshotClass + apiVersion: snapshot.storage.k8s.io/v1beta1 + metadata: + name: invalid-class + driver: driver.longhorn.io + deletionPolicy: Delete + parameters: + type: invalid + ``` +1. Create Longhorn volume `test-vol` of 5GB. Create PV/PVC for the Longhorn volume. +1. Create a workload that uses the volume. Write some data to the volume. + Make sure data persist to the volume by running `sync` +1. Set up a backup target for Longhorn + +#### Scenarios 1: CreateSnapshot + * `type` is `longhorn-backup` or `""` + + * Create a VolumeSnapshot with the following yaml + ```yaml + apiVersion: snapshot.storage.k8s.io/v1beta1 + kind: VolumeSnapshot + metadata: + name: test-snapshot-longhorn-backup + spec: + volumeSnapshotClassName: longhorn-backup-1 + source: + persistentVolumeClaimName: test-vol + ``` + * Verify that a backup is created. + * Delete the `test-snapshot-longhorn-backup` + * Verify that the backup is deleted + * Create the `test-snapshot-longhorn-backup` VolumeSnapshot with `volumeSnapshotClassName: longhorn-backup-2` + * Verify that a backup is created. + * `type` is `longhorn-snapshot` + * volume is in detached state. + * Scale down the workload of `test-vol` to detach the volume. + * Create `test-snapshot-longhorn-snapshot` VolumeSnapshot with `volumeSnapshotClassName: longhorn-snapshot`. + * Verify the error `volume ... invalid state ... for taking snapshot` in the Longhorn CSI plugin. + * volume is in attached state. + * Scale up the workload to attach `test-vol` + * Verify that a Longhorn snapshot is created for the `test-vol`. + * invalid type + * Create `test-snapshot-invalid` VolumeSnapshot with `volumeSnapshotClassName: invalid-class`. + * Verify the error `invalid snapshot type: %v. Must be %v or %v or` in the Longhorn CSI plugin. + * Delete `test-snapshot-invalid` VolumeSnapshot. + +#### Scenarios 2: Create new volume from CSI snapshot + * From `longhorn-backup` type + * Create a new PVC with the flowing yaml: + ```yaml + apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + name: test-restore-pvc + spec: + storageClassName: longhorn + dataSource: + name: test-snapshot-longhorn-backup + kind: VolumeSnapshot + apiGroup: snapshot.storage.k8s.io + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 5Gi + ``` + * Attach the PVC `test-restore-pvc` and verify the data + * Delete the PVC + * From `longhorn-snapshot` type + * Source volume is attached && Longhorn snapshot exist + * Create a PVC with the following yaml: + ```yaml + apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + name: test-restore-pvc + spec: + storageClassName: longhorn + dataSource: + name: test-snapshot-longhorn-snapshot + kind: VolumeSnapshot + apiGroup: snapshot.storage.k8s.io + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 5Gi + ``` + * Attach the PVC `test-restore-pvc` and verify the data + * Delete the PVC + * Source volume is detached + * Scale down the workload to detach the `test-vol` + * Create the same PVC `test-restore-pvc` as in the `Source volume is attached && Longhorn snapshot exist` section + * Verify that PVC provisioning failed because the source volume is detached so Longhorn cannot verify the existence of the Longhorn snapshot in the source volume. + * Scale up the workload to attache `test-vol` + * Wait for PVC to finish provisioning and be bounded + * Attach the PVC `test-restore-pvc` and verify the data + * Delete the PVC + * Source volume is attached && Longhorn snapshot doesn’t exist + * Find the VolumeSnapshotContent of the VolumeSnapshot `test-snapshot-longhorn-snapshot`. + Find the Longhorn snapshot name inside the field `VolumeSnapshotContent.snapshotHandle`. + Go to Longhorn UI. Delete the Longhorn snapshot. + * Repeat steps in the section `Longhorn snapshot exist` above. + PVC should be stuck in provisioning because Longhorn snapshot of the source volume doesn't exist. + * Delete the PVC `test-restore-pvc` PVC + +#### Scenarios 3: Delete CSI snapshot + * `longhorn-backup` type + * Done in the above step + * `longhorn-snapshot` type + * volume is attached && snapshot doesn’t exist + * Delete the VolumeSnapshot `test-snapshot-longhorn-snapshot` and verify that the VolumeSnapshot is deleted. + * volume is attached && snapshot exist + * Recreate the VolumeSnapshot `test-snapshot-longhorn-snapshot` + * Verify the creation of Longhorn snapshot with the name in the field `VolumeSnapshotContent.snapshotHandle` + * Delete the VolumeSnapshot `test-snapshot-longhorn-snapshot` + * Verify that Longhorn snapshot is removed or marked as removed + * Verify that the VolumeSnapshot `test-snapshot-longhorn-snapshot` is deleted. + * volume is detached + * Recreate the VolumeSnapshot `test-snapshot-longhorn-snapshot` + * Scale down the workload to detach `test-vol` + * Delete the VolumeSnapshot `test-snapshot-longhorn-snapshot` + * Verify that VolumeSnapshot `test-snapshot-longhorn-snapshot` is stuck in deleting + + +### Upgrade strategy + +No upgrade strategy needed + +## Note [optional] + +We need to update the docs and examples to reflect the new parameter in the VolumeSnapshotClass, `type`.