Add LEP for feature Extend CSI Snapshot support to Longhorn snapshot
Longhorn-2534 Signed-off-by: Phan Le <phan.le@suse.com>
This commit is contained in:
parent
be119b4a84
commit
159f44a0c5
@ -0,0 +1,290 @@
|
||||
# Title
|
||||
|
||||
Extend CSI snapshot to support Longhorn snapshot
|
||||
|
||||
## Summary
|
||||
|
||||
Before this feature, if the user uses [the CSI Snapshotter mechanism](https://kubernetes-csi.github.io/docs/snapshot-restore-feature.html),
|
||||
they can only create Longhorn backups (out of cluster). We want to extend the CSI Snapshotter to support creating for
|
||||
Longhorn snapshot (in-cluster) as well.
|
||||
|
||||
### Related Issues
|
||||
|
||||
https://github.com/longhorn/longhorn/issues/2534
|
||||
|
||||
## Motivation
|
||||
|
||||
### Goals
|
||||
|
||||
Extend the CSI Snapshotter to support:
|
||||
* Creating Longhorn snapshot
|
||||
* Deleting Longhorn snapshot
|
||||
* Creating a new PVC from a CSI snapshot that is associated with a Longhorn snapshot
|
||||
|
||||
### Non-goals
|
||||
|
||||
* Longhorn snapshot Reverting is not a goal because CSI snapshotter doesn't support replace in place for now:
|
||||
https://github.com/container-storage-interface/spec/blob/master/spec.md#createsnapshot
|
||||
|
||||
## Proposal
|
||||
|
||||
### User Stories
|
||||
|
||||
Before this feature is implemented, users can only use CSI Snapshotter to create/restore Longhorn backups.
|
||||
This means that users must set up a backup target outside of the cluster. Uploading/downloading data from
|
||||
backup target is a long/costly operation. Sometimes, users might just want to use CSI Snapshotter to take
|
||||
an in-cluster Longhorn snapshot and create a new volume from that snapshot. The Longhorn snapshot operation
|
||||
is cheap and faster than the backup operation and doesn't require setting up a backup target.
|
||||
|
||||
### User Experience In Detail
|
||||
|
||||
To use this feature, users need to do:
|
||||
1. Deploy the CSI snapshot CRDs, Controller as instructed at https://longhorn.io/docs/1.2.3/snapshots-and-backups/csi-snapshot-support/enable-csi-snapshot-support/
|
||||
1. Deploy a VolumeSnapshotClass with the parameter `type: longhorn-snapshot`. I.e.,
|
||||
```yaml
|
||||
kind: VolumeSnapshotClass
|
||||
apiVersion: snapshot.storage.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: longhorn-snapshot
|
||||
driver: driver.longhorn.io
|
||||
deletionPolicy: Delete
|
||||
parameters:
|
||||
type: longhorn-snapshot
|
||||
```
|
||||
1. To create a new CSI snapshot associated with a Longhorn snapshot of the volume `test-vol`, users deploy the following VolumeSnapshot CR:
|
||||
```yaml
|
||||
apiVersion: snapshot.storage.k8s.io/v1beta1
|
||||
kind: VolumeSnapshot
|
||||
metadata:
|
||||
name: test-snapshot
|
||||
spec:
|
||||
volumeSnapshotClassName: longhorn-snapshot
|
||||
source:
|
||||
persistentVolumeClaimName: test-vol
|
||||
```
|
||||
A new Longhorn snapshot is created for the volume `test-vol`
|
||||
1. To create a new PVC from the CSI snapshot, users can deploy the following yaml:
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: test-restore-snapshot-pvc
|
||||
spec:
|
||||
storageClassName: longhorn
|
||||
dataSource:
|
||||
name: test-snapshot
|
||||
kind: VolumeSnapshot
|
||||
apiGroup: snapshot.storage.k8s.io
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 5Gi # should be the same as the size of `test-vol`
|
||||
```
|
||||
A new PVC will be created with the same content as in the VolumeSnapshot `test-snapshot`
|
||||
1. Deleting the VolumeSnapshot `test-snapshot` will lead to the deletion of the corresponding Longhorn snapshot of the volume `test-vol`
|
||||
|
||||
### API changes
|
||||
None
|
||||
|
||||
## Design
|
||||
|
||||
### Implementation Overview
|
||||
|
||||
We follow the specification in [the CSI spec](https://github.com/container-storage-interface/spec/blob/master/spec.md#createsnapshot) when supporting the CSI snapshot.
|
||||
|
||||
We define a new parameter in the VolumeSnapshotClass `type`.
|
||||
The value of the parameter `type` can be `longhorn-snapshot` or `longhorn-backup`.
|
||||
When `type` is `longhorn-snapshot` it means that the CSI VolumeSnapshot created with this VolumeSnapshotClass is associated with a Longhorn snapshot.
|
||||
When `type` is `longhorn-backup` it means that the CSI VolumeSnapshot created with this VolumeSnapshotClass is associated with a Longhorn backup.
|
||||
|
||||
In [CreateSnapshot function](https://github.com/longhorn/longhorn-manager/blob/878cfb868c568396d6ebfa4ce096c5d95d9b31e3/csi/controller_server.go#L539), we get the
|
||||
value of parameter `type`. If it is `longhorn-backup`, we take a Longhorn backup as before. If it is `longhorn-snapshot` we do:
|
||||
* Get the name of the Longhorn volume
|
||||
* Check if the volume is in attached state.
|
||||
If it is not, return `codes.FailedPrecondition`.
|
||||
We cannot take a snapshot of non-attached volume.
|
||||
* Check if a Longhorn snapshot with the same name as the requested CSI snapshot already exists.
|
||||
If yes, return OK without taking a new Longhorn snapshot.
|
||||
* Take a new Longhorn snapshot. Encode the snapshotId in the format `snap://volume-name/snapshot-name`.
|
||||
This snaphotId will be used in the later CSI CreateVolume and DeleteSnapshot call.
|
||||
|
||||
In [CreateVolume function](https://github.com/longhorn/longhorn-manager/blob/878cfb868c568396d6ebfa4ce096c5d95d9b31e3/csi/controller_server.go#L63):
|
||||
* If the VolumeContentSource is a `VolumeContentSource_Snapshot` type, decode the snapshotId in the format from the above step.
|
||||
* Create a new volume with the `dataSource` set to `snap://volume-name/snapshot-name`. This will trigger Longhorn to clone the content of the snapshot to the new volume.
|
||||
Note that if the source volume is not attached, Longhorn cannot verify the existence of the snapshot inside the Longhorn volume.
|
||||
This means that [the API will return error](https://github.com/longhorn/longhorn-manager/blob/878cfb868c568396d6ebfa4ce096c5d95d9b31e3/manager/volume.go#L347-L352) and new PVC cannot be provisioned.
|
||||
|
||||
In [DeleteSnapshot function](https://github.com/longhorn/longhorn-manager/blob/878cfb868c568396d6ebfa4ce096c5d95d9b31e3/csi/controller_server.go#L675):
|
||||
* Decode the snapshotId in the format from the above step.
|
||||
If the type is `longhorn-backup` we delete the backup as before.
|
||||
If the type is `longhorn-snapshot`, we delete the corresponding Longhorn snapshot of the source volume.
|
||||
If the source volume or the snapshot is no longer exist, we return OK as specified in [the CSI spec](https://github.com/container-storage-interface/spec/blob/master/spec.md#deletesnapshot)
|
||||
|
||||
### Test plan
|
||||
|
||||
Integration test plan.
|
||||
|
||||
1. Deploy the CSI snapshot CRDs, Controller as instructed at https://longhorn.io/docs/1.2.3/snapshots-and-backups/csi-snapshot-support/enable-csi-snapshot-support/
|
||||
1. Deploy 4 VolumeSnapshotClass:
|
||||
```yaml
|
||||
kind: VolumeSnapshotClass
|
||||
apiVersion: snapshot.storage.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: longhorn-backup-1
|
||||
driver: driver.longhorn.io
|
||||
deletionPolicy: Delete
|
||||
```
|
||||
```yaml
|
||||
kind: VolumeSnapshotClass
|
||||
apiVersion: snapshot.storage.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: longhorn-backup-2
|
||||
driver: driver.longhorn.io
|
||||
deletionPolicy: Delete
|
||||
parameters:
|
||||
type: longhorn-backup
|
||||
```
|
||||
```yaml
|
||||
kind: VolumeSnapshotClass
|
||||
apiVersion: snapshot.storage.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: longhorn-snapshot
|
||||
driver: driver.longhorn.io
|
||||
deletionPolicy: Delete
|
||||
parameters:
|
||||
type: longhorn-snapshot
|
||||
```
|
||||
```yaml
|
||||
kind: VolumeSnapshotClass
|
||||
apiVersion: snapshot.storage.k8s.io/v1beta1
|
||||
metadata:
|
||||
name: invalid-class
|
||||
driver: driver.longhorn.io
|
||||
deletionPolicy: Delete
|
||||
parameters:
|
||||
type: invalid
|
||||
```
|
||||
1. Create Longhorn volume `test-vol` of 5GB. Create PV/PVC for the Longhorn volume.
|
||||
1. Create a workload that uses the volume. Write some data to the volume.
|
||||
Make sure data persist to the volume by running `sync`
|
||||
1. Set up a backup target for Longhorn
|
||||
|
||||
#### Scenarios 1: CreateSnapshot
|
||||
* `type` is `longhorn-backup` or `""`
|
||||
|
||||
* Create a VolumeSnapshot with the following yaml
|
||||
```yaml
|
||||
apiVersion: snapshot.storage.k8s.io/v1beta1
|
||||
kind: VolumeSnapshot
|
||||
metadata:
|
||||
name: test-snapshot-longhorn-backup
|
||||
spec:
|
||||
volumeSnapshotClassName: longhorn-backup-1
|
||||
source:
|
||||
persistentVolumeClaimName: test-vol
|
||||
```
|
||||
* Verify that a backup is created.
|
||||
* Delete the `test-snapshot-longhorn-backup`
|
||||
* Verify that the backup is deleted
|
||||
* Create the `test-snapshot-longhorn-backup` VolumeSnapshot with `volumeSnapshotClassName: longhorn-backup-2`
|
||||
* Verify that a backup is created.
|
||||
* `type` is `longhorn-snapshot`
|
||||
* volume is in detached state.
|
||||
* Scale down the workload of `test-vol` to detach the volume.
|
||||
* Create `test-snapshot-longhorn-snapshot` VolumeSnapshot with `volumeSnapshotClassName: longhorn-snapshot`.
|
||||
* Verify the error `volume ... invalid state ... for taking snapshot` in the Longhorn CSI plugin.
|
||||
* volume is in attached state.
|
||||
* Scale up the workload to attach `test-vol`
|
||||
* Verify that a Longhorn snapshot is created for the `test-vol`.
|
||||
* invalid type
|
||||
* Create `test-snapshot-invalid` VolumeSnapshot with `volumeSnapshotClassName: invalid-class`.
|
||||
* Verify the error `invalid snapshot type: %v. Must be %v or %v or` in the Longhorn CSI plugin.
|
||||
* Delete `test-snapshot-invalid` VolumeSnapshot.
|
||||
|
||||
#### Scenarios 2: Create new volume from CSI snapshot
|
||||
* From `longhorn-backup` type
|
||||
* Create a new PVC with the flowing yaml:
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: test-restore-pvc
|
||||
spec:
|
||||
storageClassName: longhorn
|
||||
dataSource:
|
||||
name: test-snapshot-longhorn-backup
|
||||
kind: VolumeSnapshot
|
||||
apiGroup: snapshot.storage.k8s.io
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 5Gi
|
||||
```
|
||||
* Attach the PVC `test-restore-pvc` and verify the data
|
||||
* Delete the PVC
|
||||
* From `longhorn-snapshot` type
|
||||
* Source volume is attached && Longhorn snapshot exist
|
||||
* Create a PVC with the following yaml:
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: test-restore-pvc
|
||||
spec:
|
||||
storageClassName: longhorn
|
||||
dataSource:
|
||||
name: test-snapshot-longhorn-snapshot
|
||||
kind: VolumeSnapshot
|
||||
apiGroup: snapshot.storage.k8s.io
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 5Gi
|
||||
```
|
||||
* Attach the PVC `test-restore-pvc` and verify the data
|
||||
* Delete the PVC
|
||||
* Source volume is detached
|
||||
* Scale down the workload to detach the `test-vol`
|
||||
* Create the same PVC `test-restore-pvc` as in the `Source volume is attached && Longhorn snapshot exist` section
|
||||
* Verify that PVC provisioning failed because the source volume is detached so Longhorn cannot verify the existence of the Longhorn snapshot in the source volume.
|
||||
* Scale up the workload to attache `test-vol`
|
||||
* Wait for PVC to finish provisioning and be bounded
|
||||
* Attach the PVC `test-restore-pvc` and verify the data
|
||||
* Delete the PVC
|
||||
* Source volume is attached && Longhorn snapshot doesn’t exist
|
||||
* Find the VolumeSnapshotContent of the VolumeSnapshot `test-snapshot-longhorn-snapshot`.
|
||||
Find the Longhorn snapshot name inside the field `VolumeSnapshotContent.snapshotHandle`.
|
||||
Go to Longhorn UI. Delete the Longhorn snapshot.
|
||||
* Repeat steps in the section `Longhorn snapshot exist` above.
|
||||
PVC should be stuck in provisioning because Longhorn snapshot of the source volume doesn't exist.
|
||||
* Delete the PVC `test-restore-pvc` PVC
|
||||
|
||||
#### Scenarios 3: Delete CSI snapshot
|
||||
* `longhorn-backup` type
|
||||
* Done in the above step
|
||||
* `longhorn-snapshot` type
|
||||
* volume is attached && snapshot doesn’t exist
|
||||
* Delete the VolumeSnapshot `test-snapshot-longhorn-snapshot` and verify that the VolumeSnapshot is deleted.
|
||||
* volume is attached && snapshot exist
|
||||
* Recreate the VolumeSnapshot `test-snapshot-longhorn-snapshot`
|
||||
* Verify the creation of Longhorn snapshot with the name in the field `VolumeSnapshotContent.snapshotHandle`
|
||||
* Delete the VolumeSnapshot `test-snapshot-longhorn-snapshot`
|
||||
* Verify that Longhorn snapshot is removed or marked as removed
|
||||
* Verify that the VolumeSnapshot `test-snapshot-longhorn-snapshot` is deleted.
|
||||
* volume is detached
|
||||
* Recreate the VolumeSnapshot `test-snapshot-longhorn-snapshot`
|
||||
* Scale down the workload to detach `test-vol`
|
||||
* Delete the VolumeSnapshot `test-snapshot-longhorn-snapshot`
|
||||
* Verify that VolumeSnapshot `test-snapshot-longhorn-snapshot` is stuck in deleting
|
||||
|
||||
|
||||
### Upgrade strategy
|
||||
|
||||
No upgrade strategy needed
|
||||
|
||||
## Note [optional]
|
||||
|
||||
We need to update the docs and examples to reflect the new parameter in the VolumeSnapshotClass, `type`.
|
Loading…
Reference in New Issue
Block a user