291 lines
12 KiB
Markdown
291 lines
12 KiB
Markdown
# Title
|
||
|
||
Extend CSI snapshot to support Longhorn snapshot
|
||
|
||
## Summary
|
||
|
||
Before this feature, if the user uses [the CSI Snapshotter mechanism](https://kubernetes-csi.github.io/docs/snapshot-restore-feature.html),
|
||
they can only create Longhorn backups (out of cluster). We want to extend the CSI Snapshotter to support creating for
|
||
Longhorn snapshot (in-cluster) as well.
|
||
|
||
### Related Issues
|
||
|
||
https://github.com/longhorn/longhorn/issues/2534
|
||
|
||
## Motivation
|
||
|
||
### Goals
|
||
|
||
Extend the CSI Snapshotter to support:
|
||
* Creating Longhorn snapshot
|
||
* Deleting Longhorn snapshot
|
||
* Creating a new PVC from a CSI snapshot that is associated with a Longhorn snapshot
|
||
|
||
### Non-goals
|
||
|
||
* Longhorn snapshot Reverting is not a goal because CSI snapshotter doesn't support replace in place for now:
|
||
https://github.com/container-storage-interface/spec/blob/master/spec.md#createsnapshot
|
||
|
||
## Proposal
|
||
|
||
### User Stories
|
||
|
||
Before this feature is implemented, users can only use CSI Snapshotter to create/restore Longhorn backups.
|
||
This means that users must set up a backup target outside of the cluster. Uploading/downloading data from
|
||
backup target is a long/costly operation. Sometimes, users might just want to use CSI Snapshotter to take
|
||
an in-cluster Longhorn snapshot and create a new volume from that snapshot. The Longhorn snapshot operation
|
||
is cheap and faster than the backup operation and doesn't require setting up a backup target.
|
||
|
||
### User Experience In Detail
|
||
|
||
To use this feature, users need to do:
|
||
1. Deploy the CSI snapshot CRDs, Controller as instructed at https://longhorn.io/docs/1.2.3/snapshots-and-backups/csi-snapshot-support/enable-csi-snapshot-support/
|
||
1. Deploy a VolumeSnapshotClass with the parameter `type: longhorn-snapshot`. I.e.,
|
||
```yaml
|
||
kind: VolumeSnapshotClass
|
||
apiVersion: snapshot.storage.k8s.io/v1beta1
|
||
metadata:
|
||
name: longhorn-snapshot
|
||
driver: driver.longhorn.io
|
||
deletionPolicy: Delete
|
||
parameters:
|
||
type: longhorn-snapshot
|
||
```
|
||
1. To create a new CSI snapshot associated with a Longhorn snapshot of the volume `test-vol`, users deploy the following VolumeSnapshot CR:
|
||
```yaml
|
||
apiVersion: snapshot.storage.k8s.io/v1beta1
|
||
kind: VolumeSnapshot
|
||
metadata:
|
||
name: test-snapshot
|
||
spec:
|
||
volumeSnapshotClassName: longhorn-snapshot
|
||
source:
|
||
persistentVolumeClaimName: test-vol
|
||
```
|
||
A new Longhorn snapshot is created for the volume `test-vol`
|
||
1. To create a new PVC from the CSI snapshot, users can deploy the following yaml:
|
||
```yaml
|
||
apiVersion: v1
|
||
kind: PersistentVolumeClaim
|
||
metadata:
|
||
name: test-restore-snapshot-pvc
|
||
spec:
|
||
storageClassName: longhorn
|
||
dataSource:
|
||
name: test-snapshot
|
||
kind: VolumeSnapshot
|
||
apiGroup: snapshot.storage.k8s.io
|
||
accessModes:
|
||
- ReadWriteOnce
|
||
resources:
|
||
requests:
|
||
storage: 5Gi # should be the same as the size of `test-vol`
|
||
```
|
||
A new PVC will be created with the same content as in the VolumeSnapshot `test-snapshot`
|
||
1. Deleting the VolumeSnapshot `test-snapshot` will lead to the deletion of the corresponding Longhorn snapshot of the volume `test-vol`
|
||
|
||
### API changes
|
||
None
|
||
|
||
## Design
|
||
|
||
### Implementation Overview
|
||
|
||
We follow the specification in [the CSI spec](https://github.com/container-storage-interface/spec/blob/master/spec.md#createsnapshot) when supporting the CSI snapshot.
|
||
|
||
We define a new parameter in the VolumeSnapshotClass `type`.
|
||
The value of the parameter `type` can be `longhorn-snapshot` or `longhorn-backup`.
|
||
When `type` is `longhorn-snapshot` it means that the CSI VolumeSnapshot created with this VolumeSnapshotClass is associated with a Longhorn snapshot.
|
||
When `type` is `longhorn-backup` it means that the CSI VolumeSnapshot created with this VolumeSnapshotClass is associated with a Longhorn backup.
|
||
|
||
In [CreateSnapshot function](https://github.com/longhorn/longhorn-manager/blob/878cfb868c568396d6ebfa4ce096c5d95d9b31e3/csi/controller_server.go#L539), we get the
|
||
value of parameter `type`. If it is `longhorn-backup`, we take a Longhorn backup as before. If it is `longhorn-snapshot` we do:
|
||
* Get the name of the Longhorn volume
|
||
* Check if the volume is in attached state.
|
||
If it is not, return `codes.FailedPrecondition`.
|
||
We cannot take a snapshot of non-attached volume.
|
||
* Check if a Longhorn snapshot with the same name as the requested CSI snapshot already exists.
|
||
If yes, return OK without taking a new Longhorn snapshot.
|
||
* Take a new Longhorn snapshot. Encode the snapshotId in the format `snap://volume-name/snapshot-name`.
|
||
This snaphotId will be used in the later CSI CreateVolume and DeleteSnapshot call.
|
||
|
||
In [CreateVolume function](https://github.com/longhorn/longhorn-manager/blob/878cfb868c568396d6ebfa4ce096c5d95d9b31e3/csi/controller_server.go#L63):
|
||
* If the VolumeContentSource is a `VolumeContentSource_Snapshot` type, decode the snapshotId in the format from the above step.
|
||
* Create a new volume with the `dataSource` set to `snap://volume-name/snapshot-name`. This will trigger Longhorn to clone the content of the snapshot to the new volume.
|
||
Note that if the source volume is not attached, Longhorn cannot verify the existence of the snapshot inside the Longhorn volume.
|
||
This means that [the API will return error](https://github.com/longhorn/longhorn-manager/blob/878cfb868c568396d6ebfa4ce096c5d95d9b31e3/manager/volume.go#L347-L352) and new PVC cannot be provisioned.
|
||
|
||
In [DeleteSnapshot function](https://github.com/longhorn/longhorn-manager/blob/878cfb868c568396d6ebfa4ce096c5d95d9b31e3/csi/controller_server.go#L675):
|
||
* Decode the snapshotId in the format from the above step.
|
||
If the type is `longhorn-backup` we delete the backup as before.
|
||
If the type is `longhorn-snapshot`, we delete the corresponding Longhorn snapshot of the source volume.
|
||
If the source volume or the snapshot is no longer exist, we return OK as specified in [the CSI spec](https://github.com/container-storage-interface/spec/blob/master/spec.md#deletesnapshot)
|
||
|
||
### Test plan
|
||
|
||
Integration test plan.
|
||
|
||
1. Deploy the CSI snapshot CRDs, Controller as instructed at https://longhorn.io/docs/1.2.3/snapshots-and-backups/csi-snapshot-support/enable-csi-snapshot-support/
|
||
1. Deploy 4 VolumeSnapshotClass:
|
||
```yaml
|
||
kind: VolumeSnapshotClass
|
||
apiVersion: snapshot.storage.k8s.io/v1beta1
|
||
metadata:
|
||
name: longhorn-backup-1
|
||
driver: driver.longhorn.io
|
||
deletionPolicy: Delete
|
||
```
|
||
```yaml
|
||
kind: VolumeSnapshotClass
|
||
apiVersion: snapshot.storage.k8s.io/v1beta1
|
||
metadata:
|
||
name: longhorn-backup-2
|
||
driver: driver.longhorn.io
|
||
deletionPolicy: Delete
|
||
parameters:
|
||
type: longhorn-backup
|
||
```
|
||
```yaml
|
||
kind: VolumeSnapshotClass
|
||
apiVersion: snapshot.storage.k8s.io/v1beta1
|
||
metadata:
|
||
name: longhorn-snapshot
|
||
driver: driver.longhorn.io
|
||
deletionPolicy: Delete
|
||
parameters:
|
||
type: longhorn-snapshot
|
||
```
|
||
```yaml
|
||
kind: VolumeSnapshotClass
|
||
apiVersion: snapshot.storage.k8s.io/v1beta1
|
||
metadata:
|
||
name: invalid-class
|
||
driver: driver.longhorn.io
|
||
deletionPolicy: Delete
|
||
parameters:
|
||
type: invalid
|
||
```
|
||
1. Create Longhorn volume `test-vol` of 5GB. Create PV/PVC for the Longhorn volume.
|
||
1. Create a workload that uses the volume. Write some data to the volume.
|
||
Make sure data persist to the volume by running `sync`
|
||
1. Set up a backup target for Longhorn
|
||
|
||
#### Scenarios 1: CreateSnapshot
|
||
* `type` is `longhorn-backup` or `""`
|
||
|
||
* Create a VolumeSnapshot with the following yaml
|
||
```yaml
|
||
apiVersion: snapshot.storage.k8s.io/v1beta1
|
||
kind: VolumeSnapshot
|
||
metadata:
|
||
name: test-snapshot-longhorn-backup
|
||
spec:
|
||
volumeSnapshotClassName: longhorn-backup-1
|
||
source:
|
||
persistentVolumeClaimName: test-vol
|
||
```
|
||
* Verify that a backup is created.
|
||
* Delete the `test-snapshot-longhorn-backup`
|
||
* Verify that the backup is deleted
|
||
* Create the `test-snapshot-longhorn-backup` VolumeSnapshot with `volumeSnapshotClassName: longhorn-backup-2`
|
||
* Verify that a backup is created.
|
||
* `type` is `longhorn-snapshot`
|
||
* volume is in detached state.
|
||
* Scale down the workload of `test-vol` to detach the volume.
|
||
* Create `test-snapshot-longhorn-snapshot` VolumeSnapshot with `volumeSnapshotClassName: longhorn-snapshot`.
|
||
* Verify the error `volume ... invalid state ... for taking snapshot` in the Longhorn CSI plugin.
|
||
* volume is in attached state.
|
||
* Scale up the workload to attach `test-vol`
|
||
* Verify that a Longhorn snapshot is created for the `test-vol`.
|
||
* invalid type
|
||
* Create `test-snapshot-invalid` VolumeSnapshot with `volumeSnapshotClassName: invalid-class`.
|
||
* Verify the error `invalid snapshot type: %v. Must be %v or %v or` in the Longhorn CSI plugin.
|
||
* Delete `test-snapshot-invalid` VolumeSnapshot.
|
||
|
||
#### Scenarios 2: Create new volume from CSI snapshot
|
||
* From `longhorn-backup` type
|
||
* Create a new PVC with the flowing yaml:
|
||
```yaml
|
||
apiVersion: v1
|
||
kind: PersistentVolumeClaim
|
||
metadata:
|
||
name: test-restore-pvc
|
||
spec:
|
||
storageClassName: longhorn
|
||
dataSource:
|
||
name: test-snapshot-longhorn-backup
|
||
kind: VolumeSnapshot
|
||
apiGroup: snapshot.storage.k8s.io
|
||
accessModes:
|
||
- ReadWriteOnce
|
||
resources:
|
||
requests:
|
||
storage: 5Gi
|
||
```
|
||
* Attach the PVC `test-restore-pvc` and verify the data
|
||
* Delete the PVC
|
||
* From `longhorn-snapshot` type
|
||
* Source volume is attached && Longhorn snapshot exist
|
||
* Create a PVC with the following yaml:
|
||
```yaml
|
||
apiVersion: v1
|
||
kind: PersistentVolumeClaim
|
||
metadata:
|
||
name: test-restore-pvc
|
||
spec:
|
||
storageClassName: longhorn
|
||
dataSource:
|
||
name: test-snapshot-longhorn-snapshot
|
||
kind: VolumeSnapshot
|
||
apiGroup: snapshot.storage.k8s.io
|
||
accessModes:
|
||
- ReadWriteOnce
|
||
resources:
|
||
requests:
|
||
storage: 5Gi
|
||
```
|
||
* Attach the PVC `test-restore-pvc` and verify the data
|
||
* Delete the PVC
|
||
* Source volume is detached
|
||
* Scale down the workload to detach the `test-vol`
|
||
* Create the same PVC `test-restore-pvc` as in the `Source volume is attached && Longhorn snapshot exist` section
|
||
* Verify that PVC provisioning failed because the source volume is detached so Longhorn cannot verify the existence of the Longhorn snapshot in the source volume.
|
||
* Scale up the workload to attach `test-vol`
|
||
* Wait for PVC to finish provisioning and be bounded
|
||
* Attach the PVC `test-restore-pvc` and verify the data
|
||
* Delete the PVC
|
||
* Source volume is attached && Longhorn snapshot doesn’t exist
|
||
* Find the VolumeSnapshotContent of the VolumeSnapshot `test-snapshot-longhorn-snapshot`.
|
||
Find the Longhorn snapshot name inside the field `VolumeSnapshotContent.snapshotHandle`.
|
||
Go to Longhorn UI. Delete the Longhorn snapshot.
|
||
* Repeat steps in the section `Longhorn snapshot exist` above.
|
||
PVC should be stuck in provisioning because Longhorn snapshot of the source volume doesn't exist.
|
||
* Delete the PVC `test-restore-pvc` PVC
|
||
|
||
#### Scenarios 3: Delete CSI snapshot
|
||
* `longhorn-backup` type
|
||
* Done in the above step
|
||
* `longhorn-snapshot` type
|
||
* volume is attached && snapshot doesn’t exist
|
||
* Delete the VolumeSnapshot `test-snapshot-longhorn-snapshot` and verify that the VolumeSnapshot is deleted.
|
||
* volume is attached && snapshot exist
|
||
* Recreate the VolumeSnapshot `test-snapshot-longhorn-snapshot`
|
||
* Verify the creation of Longhorn snapshot with the name in the field `VolumeSnapshotContent.snapshotHandle`
|
||
* Delete the VolumeSnapshot `test-snapshot-longhorn-snapshot`
|
||
* Verify that Longhorn snapshot is removed or marked as removed
|
||
* Verify that the VolumeSnapshot `test-snapshot-longhorn-snapshot` is deleted.
|
||
* volume is detached
|
||
* Recreate the VolumeSnapshot `test-snapshot-longhorn-snapshot`
|
||
* Scale down the workload to detach `test-vol`
|
||
* Delete the VolumeSnapshot `test-snapshot-longhorn-snapshot`
|
||
* Verify that VolumeSnapshot `test-snapshot-longhorn-snapshot` is stuck in deleting
|
||
|
||
|
||
### Upgrade strategy
|
||
|
||
No upgrade strategy needed
|
||
|
||
## Note [optional]
|
||
|
||
We need to update the docs and examples to reflect the new parameter in the VolumeSnapshotClass, `type`.
|