Add LEP for csi snapshot support
Longhorn #304 Signed-off-by: Joshua Moody <joshua.moody@rancher.com>
This commit is contained in:
parent
93c0d6a8af
commit
9cb7bdc160
263
enhancements/20200904-csi-snapshot-support.md
Normal file
263
enhancements/20200904-csi-snapshot-support.md
Normal file
@ -0,0 +1,263 @@
|
|||||||
|
# CSI Snapshot Support
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
To allow users to create/restore/delete backups programmatically,
|
||||||
|
we want to add support for the csi snapshot mechanism.
|
||||||
|
This way existing tools can be used to create kubernetes `VolumeSnapshot` api resources,
|
||||||
|
which we can react to in the csi plugin.
|
||||||
|
|
||||||
|
### Related Issues
|
||||||
|
|
||||||
|
https://github.com/longhorn/longhorn/issues/304
|
||||||
|
https://github.com/longhorn/longhorn/issues/610
|
||||||
|
https://github.com/longhorn/longhorn/issues/1127
|
||||||
|
https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/csi-snapshot.md
|
||||||
|
|
||||||
|
## Motivation
|
||||||
|
|
||||||
|
Provide user with programmatic access for backup operations via the standardized csi interface.
|
||||||
|
https://kubernetes.io/docs/concepts/storage/volume-snapshots/#provisioning-volume-snapshot
|
||||||
|
|
||||||
|
### Goals
|
||||||
|
|
||||||
|
- add csi snapshot support to our csi driver
|
||||||
|
|
||||||
|
### Non-goals
|
||||||
|
|
||||||
|
- VolumeBackup crd refactor
|
||||||
|
- Changes to the longhorn backup code
|
||||||
|
|
||||||
|
## Proposal
|
||||||
|
|
||||||
|
- support csi CreateSnapshot call
|
||||||
|
- support csi DeleteSnapshot call
|
||||||
|
- support snapshot as ContentSource for restoration during CreateVolume calls
|
||||||
|
|
||||||
|
### User Stories
|
||||||
|
|
||||||
|
Currently, it's hard for users to interact with the backup system programmatically,
|
||||||
|
after this enhancement the users will be able to use the standard kubernetes csi mechanisms for
|
||||||
|
backup creation / deletion and restoration of a new volume based on a backup.
|
||||||
|
|
||||||
|
### User Experience In Detail
|
||||||
|
|
||||||
|
#### Backup creation via VolumeSnapshot resource
|
||||||
|
|
||||||
|
The user can request a backup of a volume by creation of a kubernetes `VolumeSnapshot` object.
|
||||||
|
Example below for a volume named `test-vol`
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: snapshot.storage.k8s.io/v1beta1
|
||||||
|
kind: VolumeSnapshot
|
||||||
|
metadata:
|
||||||
|
name: test-snapshot-pvc
|
||||||
|
spec:
|
||||||
|
volumeSnapshotClassName: longhorn
|
||||||
|
source:
|
||||||
|
persistentVolumeClaimName: test-vol
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Restoration via VolumeSnapshot resource
|
||||||
|
|
||||||
|
The user can request the creation of a volume based on a prior created `VolumeSnapshot` object.
|
||||||
|
Example below for a volume named `test-vol-restore`
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: PersistentVolumeClaim
|
||||||
|
metadata:
|
||||||
|
name: test-vol-restore
|
||||||
|
spec:
|
||||||
|
storageClassName: longhorn
|
||||||
|
dataSource:
|
||||||
|
name: test-vol-snapshot
|
||||||
|
kind: VolumeSnapshot
|
||||||
|
apiGroup: snapshot.storage.k8s.io
|
||||||
|
accessModes:
|
||||||
|
- ReadWriteOnce
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
storage: 2Gi
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Restoration of an existing Longhorn backup (pre-provisioning)
|
||||||
|
|
||||||
|
The user can request the creation of a volume based on a prior longhorn backup, that was not created via the csi layer.
|
||||||
|
The user needs to create a `VolumeSnapshotContent` object with an associated `VolumeSnapshot` object.
|
||||||
|
The `snapshotHandle` of the `VolumeSnapshotContent` needs to point to an existing longhorn backup.
|
||||||
|
Example below for a volume named `test-restore-existing-backup`
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: snapshot.storage.k8s.io/v1beta1
|
||||||
|
kind: VolumeSnapshotContent
|
||||||
|
metadata:
|
||||||
|
name: test-existing-backup
|
||||||
|
spec:
|
||||||
|
volumeSnapshotClassName: longhorn
|
||||||
|
driver: driver.longhorn.io
|
||||||
|
deletionPolicy: Delete
|
||||||
|
source:
|
||||||
|
# NOTE: change this to point to an existing backup on the backupstore
|
||||||
|
snapshotHandle: bs://test-vol/backup-625159fb469e492e
|
||||||
|
volumeSnapshotRef:
|
||||||
|
name: test-snapshot-existing-backup
|
||||||
|
namespace: default
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: snapshot.storage.k8s.io/v1beta1
|
||||||
|
kind: VolumeSnapshot
|
||||||
|
metadata:
|
||||||
|
name: test-snapshot-existing-backup
|
||||||
|
spec:
|
||||||
|
volumeSnapshotClassName: longhorn
|
||||||
|
source:
|
||||||
|
volumeSnapshotContentName: test-existing-backup
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: PersistentVolumeClaim
|
||||||
|
metadata:
|
||||||
|
name: test-restore-existing-backup
|
||||||
|
spec:
|
||||||
|
storageClassName: longhorn
|
||||||
|
dataSource:
|
||||||
|
name: test-snapshot-existing-backup
|
||||||
|
kind: VolumeSnapshot
|
||||||
|
apiGroup: snapshot.storage.k8s.io
|
||||||
|
accessModes:
|
||||||
|
- ReadWriteOnce
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
storage: 2Gi
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Backup deletion via VolumeSnapshot resource
|
||||||
|
|
||||||
|
The user can request the deletion of a backup by removing the associated `VolumeSnapshot` object.
|
||||||
|
If the DeletionPolicy is Delete, then the referenced longhorn backup will be deleted along with the `VolumeSnapshotContent` object.
|
||||||
|
If the DeletionPolicy is Retain, then both the referenced longhorn backup and `VolumeSnapshotContent` object remain.
|
||||||
|
The default DeletionPolicy is set to Delete, if the user wants to retain the longhorn backups,
|
||||||
|
the user can create a snapshotClass with DeletionPolicy set to Retain.
|
||||||
|
https://kubernetes.io/docs/concepts/storage/volume-snapshot-classes/#deletionpolicy
|
||||||
|
|
||||||
|
Example below for a snapshot named `test-snapshot-pvc`
|
||||||
|
`kubectl delete volumesnapshots test-snapshot-pvc`
|
||||||
|
|
||||||
|
Deletion is triggered by deleting the VolumeSnapshot object, and the DeletionPolicy will be followed.
|
||||||
|
|
||||||
|
The default deletion policy
|
||||||
|
|
||||||
|
### API changes
|
||||||
|
|
||||||
|
no changes necessary
|
||||||
|
|
||||||
|
## Design
|
||||||
|
|
||||||
|
### Implementation Overview
|
||||||
|
|
||||||
|
#### Implement CSI CreateSnapshot call
|
||||||
|
|
||||||
|
The creation of a `VolumeSnapshot` resource triggers the creation of a longhorn snapshot afterwards,
|
||||||
|
that snapshot will be backed up via the longhorn backup mechanism to a user defined backupstore (s3, nfs).
|
||||||
|
While the backup isn't completed the csi snapshot will be marked not ready to use.
|
||||||
|
|
||||||
|
The csi-snapshot-request name is generated by the csi snapshotter, based on the configured prefix + `VolumeSnapshot.uuid`
|
||||||
|
This information is also used to generate the `VolumeSnapshotContent.name = snapcontent-uuid`
|
||||||
|
|
||||||
|
The csi-snapshotter will always call the longhorn csi plugin with the same csi snapshot request name for a specific Snapshot object.
|
||||||
|
This means we have to use the csi snapshotter generated name, for the longhorn snapshot
|
||||||
|
or introduce a CRD (VolumeBackup) so that we have a persisted way of associating
|
||||||
|
csiSnapshot - longhornSnapshot - longhornBackup
|
||||||
|
|
||||||
|
If we would ignore the requested name, we would end up generating a new snapshot for each successive call,
|
||||||
|
since a backup can take a long time for very big volumes, that would not be the desired behavior.
|
||||||
|
|
||||||
|
To be able to lookup the backup during a following `DeleteSnapshot` or `CreateVolume` call,
|
||||||
|
we encode the backupVolume and backupName as part of the snapshotID
|
||||||
|
which is returned from the csi CreateSnapshot call.
|
||||||
|
|
||||||
|
this will be set as the `VolumeSnapshotContent.snapshotHandle` for the kubernetes created `VolumeSnapshotContent` object.
|
||||||
|
We use the format `type://backupVolume/backupName` where the default type equals `bs` for direct references to longhorn backups.
|
||||||
|
|
||||||
|
This is so that in the future we can refer to a custom kubernetes resource, which we can use for backup metadata persistence.
|
||||||
|
|
||||||
|
#### Implement CSI DeleteSnapshot call
|
||||||
|
|
||||||
|
For backup deletion all we have todo is decode the snapshotID then trigger the backup delete calls via the longhorn api.
|
||||||
|
|
||||||
|
#### Add CSI CreateVolume ContentSource support.
|
||||||
|
|
||||||
|
For the volume creation based on a CSI snapshot we can decode the snapshotID to lookup the backup in the backupstore.
|
||||||
|
This will provide us with the backupURL which we can add to the `fromBackup` field in the longhorn created Volume resource.
|
||||||
|
As part of prior work longhorn already knows how to restore these backups, this was initially used for a StorageClass
|
||||||
|
parameter, by reusing the same mechanism we don't have to maintain multiple code paths for volume restoration.
|
||||||
|
|
||||||
|
### Test plan
|
||||||
|
|
||||||
|
See examples of the necessary yaml manifests in the `User Experience In Detail` section.
|
||||||
|
|
||||||
|
Creation test:
|
||||||
|
- create volume
|
||||||
|
- write data to volume
|
||||||
|
- create a `VolumeSnapshot` object
|
||||||
|
- wait for `VolumeSnapshot` to be ready to use
|
||||||
|
- check for backup existence on the backupstore
|
||||||
|
|
||||||
|
Deletion test:
|
||||||
|
- create volume
|
||||||
|
- write data to volume
|
||||||
|
- create a `VolumeSnapshot` object
|
||||||
|
- wait for `VolumeSnapshot` to be ready to use
|
||||||
|
- check for backup existence on the backupstore
|
||||||
|
- delete `VolumeSnapshot` object
|
||||||
|
- wait for backup removal from the backupstore
|
||||||
|
|
||||||
|
Restore csi snapshot test:
|
||||||
|
- create volume
|
||||||
|
- write data to volume
|
||||||
|
- create a `VolumeSnapshot` object
|
||||||
|
- wait for `VolumeSnapshot` to be ready to use
|
||||||
|
- check for backup existence on the backupstore
|
||||||
|
- create PVC with content source set to the `VolumeSnapshot` object
|
||||||
|
- wait for volume restoration
|
||||||
|
- verify restored volume data == previously written data
|
||||||
|
|
||||||
|
Restore existing longhorn backup test:
|
||||||
|
- create volume
|
||||||
|
- write data to volume
|
||||||
|
- create a longhorn backup
|
||||||
|
- check for backup existence on the backupstore
|
||||||
|
- create a `VolumeSnapshotContent` object pointing to the longhorn backup
|
||||||
|
- create a `VolumeSnapshot` object pointing to the `VolumeSnapshotContent` object
|
||||||
|
- create PVC with content source set to the `VolumeSnapshot` object
|
||||||
|
- wait for volume restoration
|
||||||
|
- verify restored volume data == previously written data
|
||||||
|
|
||||||
|
### Upgrade strategy
|
||||||
|
|
||||||
|
For csi snapshot support the user needs to update their kubernetes installation to at least 1.17
|
||||||
|
|
||||||
|
For environments where the user has pinned their csi images (airgap)
|
||||||
|
the users need to manually provide the following images:
|
||||||
|
|
||||||
|
- longhornio/csi-provisioner:v1.6.0
|
||||||
|
- longhornio/csi-snapshotter:v2.1.1
|
||||||
|
|
||||||
|
We upgraded the csi-provsioner from 1.4 to 1.6, which still supports kubernetes 1.13 as a minimum version
|
||||||
|
|
||||||
|
## Note
|
||||||
|
|
||||||
|
The CRDs and snapshot controller installations are the responsibility of the Kubernetes distribution.
|
||||||
|
See: https://kubernetes.io/docs/concepts/storage/volume-snapshots/#introduction
|
||||||
|
|
||||||
|
We are discussing whether longhorn can provide these as part of the longhorn installation, but there isn't really
|
||||||
|
a good way of making sure that there isn't a snapshot controller already deployed in the cluster.
|
||||||
|
|
||||||
|
Make sure your cluster contains the below crds, rancher rke did not deploy them for me.
|
||||||
|
https://github.com/kubernetes-csi/external-snapshotter/tree/master/client/config/crd
|
||||||
|
|
||||||
|
Make sure your cluster contains the snapshot controller, rancher rke did not deploy it for me.
|
||||||
|
https://github.com/kubernetes-csi/external-snapshotter/tree/master/deploy/kubernetes/snapshot-controller
|
Loading…
Reference in New Issue
Block a user