14 KiB
Support CSI Volume Cloning
Summary
We want to support CSI volume cloning so users can create a new PVC that has identical data as a source PVC.
Related Issues
https://github.com/longhorn/longhorn/issues/1815
Motivation
Goals
- Support exporting the snapshot data of a volume
- Allow user to create a PVC with identical data as the source PVC
Proposal
There are multiple parts in implementing this feature:
Sparse-tools
Implement a function that fetches data from a readable object then sends it to a remote server via HTTP.
Longhorn engine
- Implementing
VolumeExport()
gRPC in replica SyncAgentServer. When called,VolumeExport()
exports volume data at the input snapshot to the receiver on the remote host. - Implementing
SnapshotCloneCmd()
andSnapshotCloneStatusCmd()
CLIs. Longhorn manager can trigger the volume cloning process by callingSnapshotCloneCmd()
on the replica of new volume. Longhorn manager can fetch the cloning status by callingSnapshotCloneStatusCmd()
on the replica of the new volume.
Longhorn manager
-
When the volume controller detects that a volume clone is needed, it will attach the target volume. Start 1 replica for the target volume. Auto-attach the source volume if needed. Take a snapshot of the source volume. Copy the snapshot from a replica of the source volume to the new replica by calling
SnapshotCloneCmd()
. After the snapshot was copied over to the replica of the new volume, the volume controller marks volume as completed cloning. -
Once the cloning is done, the volume controller detaches the source volume if it was auto attached. Detach the target volume to allow the workload to start using it. Later on, when the target volume is attached by workload pod, Longhorn will start rebuilding other replicas.
Longhorn CSI plugin
- Advertise that Longhorn CSI driver has ability to clone a volume,
csi.ControllerServiceCapability_RPC_CLONE_VOLUME
- When receiving a volume creat request, inspect
req.GetVolumeContentSource()
to see if it is from another volume. If so, create a new Longhorn volume with appropriateDataSource
set so Longhorn volume controller can start cloning later on.
User Stories
Before this feature, to create a new PVC with the same data as another PVC, the users would have to use one of the following methods:
- Create a backup of the source volume. Restore the backup to a new volume. Create PV/PVC for the new volume. This method requires a backup target. Data has to move through an extra layer (the backup target) which might cost money.
- Create a new PVC (that leads to creating a new Longhorn volume).
Mount both new PVC and source PVC to the same pod then copy the data over.
See more here.
This copying method only applied for PVC with
Filesystem
volumeMode. Also, it requires manual steps.
After this cloning feature, users can clone a volume by specifying dataSource
in the new PVC pointing to an existing PVC.
User Experience In Detail
Users can create a new PVC that uses longhorn
storageclass from an existing PVC which also uses longhorn
storageclass by
specifying dataSource
in new PVC pointing to the existing PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: clone-pvc
namespace: myns
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 5Gi
dataSource:
kind: PersistentVolumeClaim
name: source-pvc
API changes
VolumeCreate
API will check/validate data a new field, DataSource
which is a new field in v.Spec
that
specifies the source of the Longhorn volume.
Design
Implementation Overview
Sparse-tools
Implement a generalized function, SyncContent()
, which syncs the content of a ReaderWriterAt
object to a file on remote host.
The ReaderWriterAt
is interface that has ReadAt()
, WriteAt()
and GetDataLayout()
method:
type ReaderWriterAt interface {
io.ReaderAt
io.WriterAt
GetDataLayout (ctx context.Context) (<-chan FileInterval, <-chan error, error)
}
Using those methods, the Sparse-tools know where is a data/hole interval to transfer to a file on the remote host.
Longhorn engine
-
Implementing
VolumeExport()
gRPC in replica SyncAgentServer. When called,VolumeExport()
will:- Create and open a read-only replica from the input snapshot
- Pre-load
r.volume.location
(which is the map of data sector to snapshot file) by:- If the volume has backing file layer and users want to export the backing file layer,
we initialize all elements of
r.volume.location
to 1 (the index of the backing file layer). Otherwise, initialize all elements ofr.volume.location
to 0 (0 means we don't know the location for this sector yet) - Looping over
r.volume.files
and populatesr.volume.location
(which is the map of data sector to snapshot file) with correct values.
- If the volume has backing file layer and users want to export the backing file layer,
we initialize all elements of
- The replica is able to know which region is data/hole region.
This logic is implemented inside the replica's method
GetDataLayout()
. The method checksr.volume.location
. The sector at offseti
is in data region ifr.volume.location[i] >= 1
. Otherwise, the sector is inside a hole region. - Call and pass the read-only replica into
SyncContent()
function in the Sparse-tools module to copy the snapshot to a file on the remote host.
-
Implementing
SnapshotCloneCmd()
andSnapshotCloneStatusCmd()
CLIs.- Longhorn manager can trigger the volume cloning process by calling
SnapshotCloneCmd()
on the replica of the new volume. The command finds a healthy replica of the source volume by listing replicas of the source controller and selecting aRW
replica. The command then callsCloneSnapshot()
method on replicas of the target volumes. This method in turn does:- Call
SnapshotClone()
on the sync agent of the target replica. This will launch a receiver server on the target replica. CallVolumeExport()
on the sync agent of the source replica to export the snapshot data to the target replica. Once the snapshot data is copied over, revert the target replica to the newly copied snapshot.
- Call
- Longhorn manager can fetch cloning status by calling
SnapshotCloneStatusCmd()
on the target replica.
- Longhorn manager can trigger the volume cloning process by calling
Longhorn manager
-
Add a new field to volume spec,
DataSource
. TheDataSource
is of typeVolumeDataSource
. Currently, there are 2 types of data sources:volume
type andsnapshot
type.volume
data source type has the formatvol://<VOLUME-NAME>
.snapshot
data source type has the formatsnap://<VOLUME-NAME>/<SNAPSHOT-NAME>
. In the future, we might want to refactorfromBackup
field into a new type of data source with formatbk://<VOLUME-NAME>/<BACKUP-NAME>
. -
Add a new field into volume status,
CloneStatus
of typeVolumeCloneStatus
:type VolumeCloneStatus struct { SourceVolume string `json:"sourceVolume"` Snapshot string `json:"snapshot"` State VolumeCloneState `json:"state"` } type VolumeCloneState string const ( VolumeCloneStateEmpty = VolumeCloneState("") VolumeCloneStateInitiated = VolumeCloneState("initiated") VolumeCloneStateCompleted = VolumeCloneState("completed") VolumeCloneStateFailed = VolumeCloneState("failed") )
-
Add a new field into engine spec,
RequestedDataSource
of typeVolumeDataSource
-
Add a new field into engine status,
CloneStatus
.CloneStatus
is a map ofSnapshotCloneStatus
inside each replica:type SnapshotCloneStatus struct { IsCloning bool `json:"isCloning"` Error string `json:"error"` Progress int `json:"progress"` State string `json:"state"` FromReplicaAddress string `json:"fromReplicaAddress"` SnapshotName string `json:"snapshotName"` }
This will keep track of status of snapshot cloning inside the target replica.
-
When the volume controller detect that a volume clone is needed (
v.Spec.DataSource
isvolume
orsnapshot
type andv.Status.CloneStatus.State == VolumeCloneStateEmpty
), it will auto attach the source volume if needed. Take a snapshot of the source volume if needed. Fill thev.Status.CloneStatus
with correct value forSourceVolume
,Snapshot
, andState
(initiated
). Auto attach the target volume. Start 1 replica for the target volume. Sete.Spec.RequestedDataSource
to the correct value,snap://<SOURCE-VOL-NAME/<SNAPSHOT-NAME>
. -
Engine controller monitoring loop will start the snapshot clone by calling
SnapshotCloneCmd()
. -
After the snapshot is copied over to the replica of the new volume, volume controller marks
v.Status.CloneStatus.State = VolumeCloneStateCompleted
and clear thee.Spec.RequestedDataSource
-
Once the cloning is done, the volume controller detaches the source volume if it was auto attached. Detach the target volume to allow the workload to start using it.
-
When workload attach volume, Longhorn starts rebuilding other replicas of the volume.
Longhorn CSI plugin
- Advertise that Longhorn CSI driver has ability to clone a volume,
csi.ControllerServiceCapability_RPC_CLONE_VOLUME
- When receiving a volume creat request, inspect
req.GetVolumeContentSource()
to see if it is from another volume. If so, create a new Longhorn volume with appropriateDataSource
set so Longhorn volume controller can start cloning later on.
Test plan
Integration test plan.
Clone volume that doesn't have backing image
- Create a PVC:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: source-pvc spec: storageClassName: longhorn accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
- Specify the
source-pvc
in a pod yaml and start the pod - Wait for the pod to be running, write some data to the mount path of the volume
- Clone a volume by creating the PVC:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: cloned-pvc spec: storageClassName: longhorn dataSource: name: source-pvc kind: PersistentVolumeClaim accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
- Specify the
cloned-pvc
in a cloned pod yaml and deploy the cloned pod - Wait for the
CloneStatus.State
incloned-pvc
to becompleted
- In 3-min retry loop, wait for the cloned pod to be running
- Verify the data in
cloned-pvc
is the same as insource-pvc
- In 2-min retry loop, verify the volume of the
clone-pvc
eventually becomes healthy - Cleanup the cloned pod,
cloned-pvc
. Wait for the cleaning to finish - Scale down the source pod so the
source-pvc
is detached. - Wait for the
source-pvc
to be in detached state - Clone a volume by creating the PVC:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: cloned-pvc spec: storageClassName: longhorn dataSource: name: source-pvc kind: PersistentVolumeClaim accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
- Specify the
cloned-pvc
in a cloned pod yaml and deploy the cloned pod - Wait for
source-pvc
to be attached - Wait for a new snapshot created in
source-pvc
volume created - Wait for the
CloneStatus.State
incloned-pvc
to becompleted
- Wait for
source-pvc
to be detached - In 3-min retry loop, wait for the cloned pod to be running
- Verify the data in
cloned-pvc
is the same as insource-pvc
- In 2-min retry loop, verify the volume of the
clone-pvc
eventually becomes healthy - Cleanup the test
Clone volume that has backing image
- Deploy a storage class that has backing image parameter
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-bi-parrot
provisioner: driver.longhorn.io
allowVolumeExpansion: true
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880" # 48 hours in minutes
backingImage: "bi-parrot"
backingImageURL: "https://longhorn-backing-image.s3-us-west-1.amazonaws.com/parrot.qcow2"
Repeat the Clone volume that doesn't have backing image
test with source-pvc
and cloned-pvc
use longhorn-bi-parrot
instead of longhorn
storageclass
Interrupt volume clone process
- Create a PVC:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: source-pvc spec: storageClassName: longhorn accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
- Specify the
source-pvc
in a pod yaml and start the pod - Wait for the pod to be running, write 1GB of data to the mount path of the volume
- Clone a volume by creating the PVC:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: cloned-pvc spec: storageClassName: longhorn dataSource: name: source-pvc kind: PersistentVolumeClaim accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
- Specify the
cloned-pvc
in a cloned pod yaml and deploy the cloned pod - Wait for the
CloneStatus.State
incloned-pvc
to beinitiated
- Kill all replicas process of the
source-pvc
- Wait for the
CloneStatus.State
incloned-pvc
to befailed
- In 2-min retry loop, verify cloned pod cannot start
- Clean up cloned pod and
clone-pvc
- Redeploy
cloned-pvc
and clone pod - In 3-min retry loop, verify cloned pod become running
cloned-pvc
has the same data assource-pvc
- Cleanup the test
Upgrade strategy
No upgrade strategy needed