longhorn/enhancements/20200904-csi-snapshot-support.md
Joshua Moody f9a7690bc5 Add note about VolumeSnapshotClass to csi snapshot LEP
Signed-off-by: Joshua Moody <joshua.moody@rancher.com>
2020-09-22 14:56:48 -07:00

9.8 KiB

CSI Snapshot Support

Summary

To allow users to create/restore/delete backups programmatically, we want to add support for the csi snapshot mechanism. This way existing tools can be used to create kubernetes VolumeSnapshot api resources, which we can react to in the csi plugin.

https://github.com/longhorn/longhorn/issues/304 https://github.com/longhorn/longhorn/issues/610 https://github.com/longhorn/longhorn/issues/1127 https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/csi-snapshot.md

Motivation

Provide user with programmatic access for backup operations via the standardized csi interface. https://kubernetes.io/docs/concepts/storage/volume-snapshots/#provisioning-volume-snapshot

Goals

  • add csi snapshot support to our csi driver

Non-goals

  • VolumeBackup crd refactor
  • Changes to the longhorn backup code

Proposal

  • support csi CreateSnapshot call
  • support csi DeleteSnapshot call
  • support snapshot as ContentSource for restoration during CreateVolume calls

User Stories

Currently, it's hard for users to interact with the backup system programmatically, after this enhancement the users will be able to use the standard kubernetes csi mechanisms for backup creation / deletion and restoration of a new volume based on a backup.

User Experience In Detail

Backup creation via VolumeSnapshot resource

The user can request a backup of a volume by creation of a kubernetes VolumeSnapshot object. Example below for a volume named test-vol

apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
  name: test-snapshot-pvc
spec:
  volumeSnapshotClassName: longhorn
  source:
    persistentVolumeClaimName: test-vol

Restoration via VolumeSnapshot resource

The user can request the creation of a volume based on a prior created VolumeSnapshot object. Example below for a volume named test-vol-restore

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-vol-restore
spec:
  storageClassName: longhorn
  dataSource:
    name: test-vol-snapshot
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi

Restoration of an existing Longhorn backup (pre-provisioning)

The user can request the creation of a volume based on a prior longhorn backup, that was not created via the csi layer. The user needs to create a VolumeSnapshotContent object with an associated VolumeSnapshot object. The snapshotHandle of the VolumeSnapshotContent needs to point to an existing longhorn backup. Example below for a volume named test-restore-existing-backup

apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshotContent
metadata:
  name: test-existing-backup
spec:
  volumeSnapshotClassName: longhorn
  driver: driver.longhorn.io
  deletionPolicy: Delete
  source:
    # NOTE: change this to point to an existing backup on the backupstore
    snapshotHandle: bs://test-vol/backup-625159fb469e492e
  volumeSnapshotRef:
    name: test-snapshot-existing-backup
    namespace: default
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
  name: test-snapshot-existing-backup
spec:
  volumeSnapshotClassName: longhorn
  source:
    volumeSnapshotContentName: test-existing-backup
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-restore-existing-backup
spec:
  storageClassName: longhorn
  dataSource:
    name: test-snapshot-existing-backup
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi

Backup deletion via VolumeSnapshot resource

The user can request the deletion of a backup by removing the associated VolumeSnapshot object. If the DeletionPolicy is Delete, then the referenced longhorn backup will be deleted along with the VolumeSnapshotContent object. If the DeletionPolicy is Retain, then both the referenced longhorn backup and VolumeSnapshotContent object remain. The default DeletionPolicy is set to Delete, if the user wants to retain the longhorn backups, the user can create a snapshotClass with DeletionPolicy set to Retain. https://kubernetes.io/docs/concepts/storage/volume-snapshot-classes/#deletionpolicy

Example below for a snapshot named test-snapshot-pvc kubectl delete volumesnapshots test-snapshot-pvc

Deletion is triggered by deleting the VolumeSnapshot object, and the DeletionPolicy will be followed.

The default deletion policy

API changes

no changes necessary

Design

Implementation Overview

Implement CSI CreateSnapshot call

The creation of a VolumeSnapshot resource triggers the creation of a longhorn snapshot afterwards, that snapshot will be backed up via the longhorn backup mechanism to a user defined backupstore (s3, nfs). While the backup isn't completed the csi snapshot will be marked not ready to use.

The csi-snapshot-request name is generated by the csi snapshotter, based on the configured prefix + VolumeSnapshot.uuid This information is also used to generate the VolumeSnapshotContent.name = snapcontent-uuid

The csi-snapshotter will always call the longhorn csi plugin with the same csi snapshot request name for a specific Snapshot object. This means we have to use the csi snapshotter generated name, for the longhorn snapshot or introduce a CRD (VolumeBackup) so that we have a persisted way of associating csiSnapshot - longhornSnapshot - longhornBackup

If we would ignore the requested name, we would end up generating a new snapshot for each successive call, since a backup can take a long time for very big volumes, that would not be the desired behavior.

To be able to lookup the backup during a following DeleteSnapshot or CreateVolume call, we encode the backupVolume and backupName as part of the snapshotID which is returned from the csi CreateSnapshot call.

this will be set as the VolumeSnapshotContent.snapshotHandle for the kubernetes created VolumeSnapshotContent object. We use the format type://backupVolume/backupName where the default type equals bs for direct references to longhorn backups.

This is so that in the future we can refer to a custom kubernetes resource, which we can use for backup metadata persistence.

Implement CSI DeleteSnapshot call

For backup deletion all we have todo is decode the snapshotID then trigger the backup delete calls via the longhorn api.

Add CSI CreateVolume ContentSource support.

For the volume creation based on a CSI snapshot we can decode the snapshotID to lookup the backup in the backupstore. This will provide us with the backupURL which we can add to the fromBackup field in the longhorn created Volume resource. As part of prior work longhorn already knows how to restore these backups, this was initially used for a StorageClass parameter, by reusing the same mechanism we don't have to maintain multiple code paths for volume restoration.

Test plan

See examples of the necessary yaml manifests in the User Experience In Detail section.

Creation test:

  • create volume
  • write data to volume
  • create a VolumeSnapshot object
  • wait for VolumeSnapshot to be ready to use
  • check for backup existence on the backupstore

Deletion test:

  • create volume
  • write data to volume
  • create a VolumeSnapshot object
  • wait for VolumeSnapshot to be ready to use
  • check for backup existence on the backupstore
  • delete VolumeSnapshot object
  • wait for backup removal from the backupstore

Restore csi snapshot test:

  • create volume
  • write data to volume
  • create a VolumeSnapshot object
  • wait for VolumeSnapshot to be ready to use
  • check for backup existence on the backupstore
  • create PVC with content source set to the VolumeSnapshot object
  • wait for volume restoration
  • verify restored volume data == previously written data

Restore existing longhorn backup test:

  • create volume
  • write data to volume
  • create a longhorn backup
  • check for backup existence on the backupstore
  • create a VolumeSnapshotContent object pointing to the longhorn backup
  • create a VolumeSnapshot object pointing to the VolumeSnapshotContent object
  • create PVC with content source set to the VolumeSnapshot object
  • wait for volume restoration
  • verify restored volume data == previously written data

Upgrade strategy

For csi snapshot support the user needs to update their kubernetes installation to at least 1.17

For environments where the user has pinned their csi images (airgap) the users need to manually provide the following images:

  • longhornio/csi-provisioner:v1.6.0
  • longhornio/csi-snapshotter:v2.1.1

We upgraded the csi-provsioner from 1.4 to 1.6, which still supports kubernetes 1.13 as a minimum version

Note

Since we cannot assume that the users distribution has csi snapshotter support, the user needs to create a VolumeSnapshotClass to be able to use the csi snapshot support.

Example longhorn VolumeSnapshotClass

kind: VolumeSnapshotClass
apiVersion: snapshot.storage.k8s.io/v1beta1
metadata:
  name: longhorn
driver: driver.longhorn.io
deletionPolicy: Delete

The CRDs and snapshot controller installations are the responsibility of the Kubernetes distribution. See: https://kubernetes.io/docs/concepts/storage/volume-snapshots/#introduction

We are discussing whether longhorn can provide these as part of the longhorn installation, but there isn't really a good way of making sure that there isn't a snapshot controller already deployed in the cluster.

Make sure your cluster contains the below crds, rancher rke did not deploy them for me. https://github.com/kubernetes-csi/external-snapshotter/tree/master/client/config/crd

Make sure your cluster contains the snapshot controller, rancher rke did not deploy it for me. https://github.com/kubernetes-csi/external-snapshotter/tree/master/deploy/kubernetes/snapshot-controller