# Support CSI Volume Cloning ## Summary We want to support CSI volume cloning so users can create a new PVC that has identical data as a source PVC. ### Related Issues https://github.com/longhorn/longhorn/issues/1815 ## Motivation ### Goals * Support exporting the snapshot data of a volume * Allow user to create a PVC with identical data as the source PVC ## Proposal There are multiple parts in implementing this feature: ### Sparse-tools Implement a function that fetches data from a readable object then sends it to a remote server via HTTP. ### Longhorn engine * Implementing `VolumeExport()` gRPC in replica SyncAgentServer. When called, `VolumeExport()` exports volume data at the input snapshot to the receiver on the remote host. * Implementing `SnapshotCloneCmd()` and `SnapshotCloneStatusCmd()` CLIs. Longhorn manager can trigger the volume cloning process by calling `SnapshotCloneCmd()` on the replica of new volume. Longhorn manager can fetch the cloning status by calling `SnapshotCloneStatusCmd()` on the replica of the new volume. ### Longhorn manager * When the volume controller detects that a volume clone is needed, it will attach the target volume. Start 1 replica for the target volume. Auto-attach the source volume if needed. Take a snapshot of the source volume. Copy the snapshot from a replica of the source volume to the new replica by calling `SnapshotCloneCmd()`. After the snapshot was copied over to the replica of the new volume, the volume controller marks volume as completed cloning. * Once the cloning is done, the volume controller detaches the source volume if it was auto attached. Detach the target volume to allow the workload to start using it. Later on, when the target volume is attached by workload pod, Longhorn will start rebuilding other replicas. ### Longhorn CSI plugin * Advertise that Longhorn CSI driver has ability to clone a volume, `csi.ControllerServiceCapability_RPC_CLONE_VOLUME` * When receiving a volume creat request, inspect `req.GetVolumeContentSource()` to see if it is from another volume. If so, create a new Longhorn volume with appropriate `DataSource` set so Longhorn volume controller can start cloning later on. ### User Stories Before this feature, to create a new PVC with the same data as another PVC, the users would have to use one of the following methods: 1. Create a backup of the source volume. Restore the backup to a new volume. Create PV/PVC for the new volume. This method requires a backup target. Data has to move through an extra layer (the backup target) which might cost money. 1. Create a new PVC (that leads to creating a new Longhorn volume). Mount both new PVC and source PVC to the same pod then copy the data over. See more [here](https://github.com/longhorn/longhorn/blob/v1.1.2/examples/data_migration.yaml). This copying method only applied for PVC with `Filesystem` volumeMode. Also, it requires manual steps. After this cloning feature, users can clone a volume by specifying `dataSource` in the new PVC pointing to an existing PVC. ### User Experience In Detail Users can create a new PVC that uses `longhorn` storageclass from an existing PVC which also uses `longhorn` storageclass by specifying `dataSource` in new PVC pointing to the existing PVC: ```yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: clone-pvc namespace: myns spec: accessModes: - ReadWriteOnce storageClassName: longhorn resources: requests: storage: 5Gi dataSource: kind: PersistentVolumeClaim name: source-pvc ``` ### API changes `VolumeCreate` API will check/validate data a new field, `DataSource` which is a new field in `v.Spec` that specifies the source of the Longhorn volume. ## Design ### Implementation Overview ### Sparse-tools Implement a generalized function, `SyncContent()`, which syncs the content of a `ReaderWriterAt` object to a file on remote host. The `ReaderWriterAt` is interface that has `ReadAt()`, `WriteAt()` and `GetDataLayout()` method: ```go type ReaderWriterAt interface { io.ReaderAt io.WriterAt GetDataLayout (ctx context.Context) (<-chan FileInterval, <-chan error, error) } ``` Using those methods, the Sparse-tools know where is a data/hole interval to transfer to a file on the remote host. ### Longhorn engine * Implementing `VolumeExport()` gRPC in replica SyncAgentServer. When called, `VolumeExport()` will: * Create and open a read-only replica from the input snapshot * Pre-load `r.volume.location` (which is the map of data sector to snapshot file) by: * If the volume has backing file layer and users want to export the backing file layer, we initialize all elements of `r.volume.location` to 1 (the index of the backing file layer). Otherwise, initialize all elements of `r.volume.location` to 0 (0 means we don't know the location for this sector yet) * Looping over `r.volume.files` and populates `r.volume.location` (which is the map of data sector to snapshot file) with correct values. * The replica is able to know which region is data/hole region. This logic is implemented inside the replica's method `GetDataLayout()`. The method checks `r.volume.location`. The sector at offset `i` is in data region if `r.volume.location[i] >= 1`. Otherwise, the sector is inside a hole region. * Call and pass the read-only replica into `SyncContent()` function in the Sparse-tools module to copy the snapshot to a file on the remote host. * Implementing `SnapshotCloneCmd()` and `SnapshotCloneStatusCmd()` CLIs. * Longhorn manager can trigger the volume cloning process by calling `SnapshotCloneCmd()` on the replica of the new volume. The command finds a healthy replica of the source volume by listing replicas of the source controller and selecting a `RW` replica. The command then calls `CloneSnapshot()` method on replicas of the target volumes. This method in turn does: * Call `SnapshotClone()` on the sync agent of the target replica. This will launch a receiver server on the target replica. Call `VolumeExport()` on the sync agent of the source replica to export the snapshot data to the target replica. Once the snapshot data is copied over, revert the target replica to the newly copied snapshot. * Longhorn manager can fetch cloning status by calling `SnapshotCloneStatusCmd()` on the target replica. ### Longhorn manager * Add a new field to volume spec, `DataSource`. The `DataSource` is of type `VolumeDataSource`. Currently, there are 2 types of data sources: `volume` type and `snapshot` type. `volume` data source type has the format `vol://`. `snapshot` data source type has the format `snap:///`. In the future, we might want to refactor `fromBackup` field into a new type of data source with format `bk:///`. * Add a new field into volume status, `CloneStatus` of type `VolumeCloneStatus`: ```go type VolumeCloneStatus struct { SourceVolume string `json:"sourceVolume"` Snapshot string `json:"snapshot"` State VolumeCloneState `json:"state"` } type VolumeCloneState string const ( VolumeCloneStateEmpty = VolumeCloneState("") VolumeCloneStateInitiated = VolumeCloneState("initiated") VolumeCloneStateCompleted = VolumeCloneState("completed") VolumeCloneStateFailed = VolumeCloneState("failed") ) ``` * Add a new field into engine spec, `RequestedDataSource` of type `VolumeDataSource` * Add a new field into engine status, `CloneStatus`. `CloneStatus` is a map of `SnapshotCloneStatus` inside each replica: ```go type SnapshotCloneStatus struct { IsCloning bool `json:"isCloning"` Error string `json:"error"` Progress int `json:"progress"` State string `json:"state"` FromReplicaAddress string `json:"fromReplicaAddress"` SnapshotName string `json:"snapshotName"` } ``` This will keep track of status of snapshot cloning inside the target replica. * When the volume controller detect that a volume clone is needed (`v.Spec.DataSource` is `volume` or `snapshot` type and `v.Status.CloneStatus.State == VolumeCloneStateEmpty`), it will auto attach the source volume if needed. Take a snapshot of the source volume if needed. Fill the `v.Status.CloneStatus` with correct value for `SourceVolume`, `Snapshot`, and `State`(`initiated`). Auto attach the target volume. Start 1 replica for the target volume. Set `e.Spec.RequestedDataSource` to the correct value, `snap://`. * Engine controller monitoring loop will start the snapshot clone by calling `SnapshotCloneCmd()`. * After the snapshot is copied over to the replica of the new volume, volume controller marks `v.Status.CloneStatus.State = VolumeCloneStateCompleted` and clear the `e.Spec.RequestedDataSource` * Once the cloning is done, the volume controller detaches the source volume if it was auto attached. Detach the target volume to allow the workload to start using it. * When workload attach volume, Longhorn starts rebuilding other replicas of the volume. ### Longhorn CSI plugin * Advertise that Longhorn CSI driver has ability to clone a volume, `csi.ControllerServiceCapability_RPC_CLONE_VOLUME` * When receiving a volume creat request, inspect `req.GetVolumeContentSource()` to see if it is from another volume. If so, create a new Longhorn volume with appropriate `DataSource` set so Longhorn volume controller can start cloning later on. ### Test plan Integration test plan. #### Clone volume that doesn't have backing image 1. Create a PVC: ```yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: source-pvc spec: storageClassName: longhorn accessModes: - ReadWriteOnce resources: requests: storage: 10Gi ``` 1. Specify the `source-pvc` in a pod yaml and start the pod 1. Wait for the pod to be running, write some data to the mount path of the volume 1. Clone a volume by creating the PVC: ```yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: cloned-pvc spec: storageClassName: longhorn dataSource: name: source-pvc kind: PersistentVolumeClaim accessModes: - ReadWriteOnce resources: requests: storage: 10Gi ``` 1. Specify the `cloned-pvc` in a cloned pod yaml and deploy the cloned pod 1. Wait for the `CloneStatus.State` in `cloned-pvc` to be `completed` 1. In 3-min retry loop, wait for the cloned pod to be running 1. Verify the data in `cloned-pvc` is the same as in `source-pvc` 1. In 2-min retry loop, verify the volume of the `clone-pvc` eventually becomes healthy 1. Cleanup the cloned pod, `cloned-pvc`. Wait for the cleaning to finish 1. Scale down the source pod so the `source-pvc` is detached. 1. Wait for the `source-pvc` to be in detached state 1. Clone a volume by creating the PVC: ```yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: cloned-pvc spec: storageClassName: longhorn dataSource: name: source-pvc kind: PersistentVolumeClaim accessModes: - ReadWriteOnce resources: requests: storage: 10Gi ``` 1. Specify the `cloned-pvc` in a cloned pod yaml and deploy the cloned pod 1. Wait for `source-pvc` to be attached 1. Wait for a new snapshot created in `source-pvc` volume created 1. Wait for the `CloneStatus.State` in `cloned-pvc` to be `completed` 1. Wait for `source-pvc` to be detached 1. In 3-min retry loop, wait for the cloned pod to be running 1. Verify the data in `cloned-pvc` is the same as in `source-pvc` 1. In 2-min retry loop, verify the volume of the `clone-pvc` eventually becomes healthy 1. Cleanup the test #### Clone volume that has backing image 1. Deploy a storage class that has backing image parameter ```yaml kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: longhorn-bi-parrot provisioner: driver.longhorn.io allowVolumeExpansion: true parameters: numberOfReplicas: "3" staleReplicaTimeout: "2880" # 48 hours in minutes backingImage: "bi-parrot" backingImageURL: "https://longhorn-backing-image.s3-us-west-1.amazonaws.com/parrot.qcow2" ``` Repeat the `Clone volume that doesn't have backing image` test with `source-pvc` and `cloned-pvc` use `longhorn-bi-parrot` instead of `longhorn` storageclass #### Interrupt volume clone process 1. Create a PVC: ```yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: source-pvc spec: storageClassName: longhorn accessModes: - ReadWriteOnce resources: requests: storage: 10Gi ``` 1. Specify the `source-pvc` in a pod yaml and start the pod 1. Wait for the pod to be running, write 1GB of data to the mount path of the volume 1. Clone a volume by creating the PVC: ```yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: cloned-pvc spec: storageClassName: longhorn dataSource: name: source-pvc kind: PersistentVolumeClaim accessModes: - ReadWriteOnce resources: requests: storage: 10Gi ``` 1. Specify the `cloned-pvc` in a cloned pod yaml and deploy the cloned pod 1. Wait for the `CloneStatus.State` in `cloned-pvc` to be `initiated` 1. Kill all replicas process of the `source-pvc` 1. Wait for the `CloneStatus.State` in `cloned-pvc` to be `failed` 1. In 2-min retry loop, verify cloned pod cannot start 1. Clean up cloned pod and `clone-pvc` 1. Redeploy `cloned-pvc` and clone pod 1. In 3-min retry loop, verify cloned pod become running 2. `cloned-pvc` has the same data as `source-pvc` 1. Cleanup the test ### Upgrade strategy No upgrade strategy needed