longhorn/enhancements/20220110-extend-csi-snapshot-to-support-longhorn-snapshot.md
David Ko 4f35fda4b2 docs: fix typo
Signed-off-by: David Ko <dko@suse.com>
2022-12-12 14:20:19 +08:00

12 KiB
Raw Permalink Blame History

Title

Extend CSI snapshot to support Longhorn snapshot

Summary

Before this feature, if the user uses the CSI Snapshotter mechanism, they can only create Longhorn backups (out of cluster). We want to extend the CSI Snapshotter to support creating for Longhorn snapshot (in-cluster) as well.

https://github.com/longhorn/longhorn/issues/2534

Motivation

Goals

Extend the CSI Snapshotter to support:

  • Creating Longhorn snapshot
  • Deleting Longhorn snapshot
  • Creating a new PVC from a CSI snapshot that is associated with a Longhorn snapshot

Non-goals

Proposal

User Stories

Before this feature is implemented, users can only use CSI Snapshotter to create/restore Longhorn backups. This means that users must set up a backup target outside of the cluster. Uploading/downloading data from backup target is a long/costly operation. Sometimes, users might just want to use CSI Snapshotter to take an in-cluster Longhorn snapshot and create a new volume from that snapshot. The Longhorn snapshot operation is cheap and faster than the backup operation and doesn't require setting up a backup target.

User Experience In Detail

To use this feature, users need to do:

  1. Deploy the CSI snapshot CRDs, Controller as instructed at https://longhorn.io/docs/1.2.3/snapshots-and-backups/csi-snapshot-support/enable-csi-snapshot-support/
  2. Deploy a VolumeSnapshotClass with the parameter type: longhorn-snapshot. I.e.,
    kind: VolumeSnapshotClass
    apiVersion: snapshot.storage.k8s.io/v1beta1
    metadata:
      name: longhorn-snapshot
    driver: driver.longhorn.io
    deletionPolicy: Delete
    parameters:
      type: longhorn-snapshot
    
  3. To create a new CSI snapshot associated with a Longhorn snapshot of the volume test-vol, users deploy the following VolumeSnapshot CR:
    apiVersion: snapshot.storage.k8s.io/v1beta1
    kind: VolumeSnapshot
    metadata:
      name: test-snapshot
    spec:
      volumeSnapshotClassName: longhorn-snapshot
      source:
        persistentVolumeClaimName: test-vol
    
    A new Longhorn snapshot is created for the volume test-vol
  4. To create a new PVC from the CSI snapshot, users can deploy the following yaml:
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: test-restore-snapshot-pvc
    spec:
      storageClassName: longhorn
      dataSource:
        name: test-snapshot
        kind: VolumeSnapshot
        apiGroup: snapshot.storage.k8s.io
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi # should be the same as the size of `test-vol`
    
    A new PVC will be created with the same content as in the VolumeSnapshot test-snapshot
  5. Deleting the VolumeSnapshot test-snapshot will lead to the deletion of the corresponding Longhorn snapshot of the volume test-vol

API changes

None

Design

Implementation Overview

We follow the specification in the CSI spec when supporting the CSI snapshot.

We define a new parameter in the VolumeSnapshotClass type. The value of the parameter type can be longhorn-snapshot or longhorn-backup. When type is longhorn-snapshot it means that the CSI VolumeSnapshot created with this VolumeSnapshotClass is associated with a Longhorn snapshot. When type is longhorn-backup it means that the CSI VolumeSnapshot created with this VolumeSnapshotClass is associated with a Longhorn backup.

In CreateSnapshot function, we get the value of parameter type. If it is longhorn-backup, we take a Longhorn backup as before. If it is longhorn-snapshot we do:

  • Get the name of the Longhorn volume
  • Check if the volume is in attached state. If it is not, return codes.FailedPrecondition. We cannot take a snapshot of non-attached volume.
  • Check if a Longhorn snapshot with the same name as the requested CSI snapshot already exists. If yes, return OK without taking a new Longhorn snapshot.
  • Take a new Longhorn snapshot. Encode the snapshotId in the format snap://volume-name/snapshot-name. This snaphotId will be used in the later CSI CreateVolume and DeleteSnapshot call.

In CreateVolume function:

  • If the VolumeContentSource is a VolumeContentSource_Snapshot type, decode the snapshotId in the format from the above step.
  • Create a new volume with the dataSource set to snap://volume-name/snapshot-name. This will trigger Longhorn to clone the content of the snapshot to the new volume. Note that if the source volume is not attached, Longhorn cannot verify the existence of the snapshot inside the Longhorn volume. This means that the API will return error and new PVC cannot be provisioned.

In DeleteSnapshot function:

  • Decode the snapshotId in the format from the above step. If the type is longhorn-backup we delete the backup as before. If the type is longhorn-snapshot, we delete the corresponding Longhorn snapshot of the source volume. If the source volume or the snapshot is no longer exist, we return OK as specified in the CSI spec

Test plan

Integration test plan.

  1. Deploy the CSI snapshot CRDs, Controller as instructed at https://longhorn.io/docs/1.2.3/snapshots-and-backups/csi-snapshot-support/enable-csi-snapshot-support/
  2. Deploy 4 VolumeSnapshotClass:
    kind: VolumeSnapshotClass
    apiVersion: snapshot.storage.k8s.io/v1beta1
    metadata:
      name: longhorn-backup-1
    driver: driver.longhorn.io
    deletionPolicy: Delete
    
    kind: VolumeSnapshotClass
    apiVersion: snapshot.storage.k8s.io/v1beta1
    metadata:
      name: longhorn-backup-2
    driver: driver.longhorn.io
    deletionPolicy: Delete
    parameters:
      type: longhorn-backup
    
    kind: VolumeSnapshotClass
    apiVersion: snapshot.storage.k8s.io/v1beta1
    metadata:
      name: longhorn-snapshot
    driver: driver.longhorn.io
    deletionPolicy: Delete
    parameters:
      type: longhorn-snapshot
    
    kind: VolumeSnapshotClass
    apiVersion: snapshot.storage.k8s.io/v1beta1
    metadata:
      name: invalid-class
    driver: driver.longhorn.io
    deletionPolicy: Delete
    parameters:
      type: invalid
    
  3. Create Longhorn volume test-vol of 5GB. Create PV/PVC for the Longhorn volume.
  4. Create a workload that uses the volume. Write some data to the volume. Make sure data persist to the volume by running sync
  5. Set up a backup target for Longhorn

Scenarios 1: CreateSnapshot

  • type is longhorn-backup or ""

    • Create a VolumeSnapshot with the following yaml
      apiVersion: snapshot.storage.k8s.io/v1beta1
      kind: VolumeSnapshot
      metadata:
        name: test-snapshot-longhorn-backup
      spec:
        volumeSnapshotClassName: longhorn-backup-1
        source:
          persistentVolumeClaimName: test-vol
      
    • Verify that a backup is created.
    • Delete the test-snapshot-longhorn-backup
    • Verify that the backup is deleted
    • Create the test-snapshot-longhorn-backup VolumeSnapshot with volumeSnapshotClassName: longhorn-backup-2
    • Verify that a backup is created.
  • type is longhorn-snapshot

    • volume is in detached state.
      • Scale down the workload of test-vol to detach the volume.
      • Create test-snapshot-longhorn-snapshot VolumeSnapshot with volumeSnapshotClassName: longhorn-snapshot.
      • Verify the error volume ... invalid state ... for taking snapshot in the Longhorn CSI plugin.
    • volume is in attached state.
      • Scale up the workload to attach test-vol
      • Verify that a Longhorn snapshot is created for the test-vol.
  • invalid type

    • Create test-snapshot-invalid VolumeSnapshot with volumeSnapshotClassName: invalid-class.
    • Verify the error invalid snapshot type: %v. Must be %v or %v or in the Longhorn CSI plugin.
    • Delete test-snapshot-invalid VolumeSnapshot.

Scenarios 2: Create new volume from CSI snapshot

  • From longhorn-backup type
    • Create a new PVC with the flowing yaml:
      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: test-restore-pvc
      spec:
        storageClassName: longhorn
        dataSource:
          name: test-snapshot-longhorn-backup
          kind: VolumeSnapshot
          apiGroup: snapshot.storage.k8s.io
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
      
    • Attach the PVC test-restore-pvc and verify the data
    • Delete the PVC
  • From longhorn-snapshot type
    • Source volume is attached && Longhorn snapshot exist
      • Create a PVC with the following yaml:
        apiVersion: v1
        kind: PersistentVolumeClaim
        metadata:
          name: test-restore-pvc
        spec:
          storageClassName: longhorn
          dataSource:
            name: test-snapshot-longhorn-snapshot
            kind: VolumeSnapshot
            apiGroup: snapshot.storage.k8s.io
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi
        
      • Attach the PVC test-restore-pvc and verify the data
      • Delete the PVC
    • Source volume is detached
      • Scale down the workload to detach the test-vol
      • Create the same PVC test-restore-pvc as in the Source volume is attached && Longhorn snapshot exist section
      • Verify that PVC provisioning failed because the source volume is detached so Longhorn cannot verify the existence of the Longhorn snapshot in the source volume.
      • Scale up the workload to attach test-vol
      • Wait for PVC to finish provisioning and be bounded
      • Attach the PVC test-restore-pvc and verify the data
      • Delete the PVC
    • Source volume is attached && Longhorn snapshot doesnt exist
      • Find the VolumeSnapshotContent of the VolumeSnapshot test-snapshot-longhorn-snapshot. Find the Longhorn snapshot name inside the field VolumeSnapshotContent.snapshotHandle. Go to Longhorn UI. Delete the Longhorn snapshot.
      • Repeat steps in the section Longhorn snapshot exist above. PVC should be stuck in provisioning because Longhorn snapshot of the source volume doesn't exist.
      • Delete the PVC test-restore-pvc PVC

Scenarios 3: Delete CSI snapshot

  • longhorn-backup type
    • Done in the above step
  • longhorn-snapshot type
    • volume is attached && snapshot doesnt exist
      • Delete the VolumeSnapshot test-snapshot-longhorn-snapshot and verify that the VolumeSnapshot is deleted.
    • volume is attached && snapshot exist
      • Recreate the VolumeSnapshot test-snapshot-longhorn-snapshot
      • Verify the creation of Longhorn snapshot with the name in the field VolumeSnapshotContent.snapshotHandle
      • Delete the VolumeSnapshot test-snapshot-longhorn-snapshot
      • Verify that Longhorn snapshot is removed or marked as removed
      • Verify that the VolumeSnapshot test-snapshot-longhorn-snapshot is deleted.
    • volume is detached
      • Recreate the VolumeSnapshot test-snapshot-longhorn-snapshot
      • Scale down the workload to detach test-vol
      • Delete the VolumeSnapshot test-snapshot-longhorn-snapshot
      • Verify that VolumeSnapshot test-snapshot-longhorn-snapshot is stuck in deleting

Upgrade strategy

No upgrade strategy needed

Note [optional]

We need to update the docs and examples to reflect the new parameter in the VolumeSnapshotClass, type.