Longhorn 5778 Longhorn 5711 Longhorn 5744 Longhorn 5827 Signed-off-by: Derek Su <derek.su@suse.com>
146 lines
7.3 KiB
Markdown
146 lines
7.3 KiB
Markdown
# Support Volumes using V2 Data Engine
|
||
|
||
## Summary
|
||
|
||
Longhorn's storage stack, based on iSCSI and a customized protocol, has limitations such as increased I/O latencies and reduced IOPS due to the longer data path. This makes it less suitable for latency-critical applications. To overcome these challenges, Longhorn introduces the Storage Performance Development Kit (SPDK) to enhance overall performance. With SPDK integration, Longhorn optimizes system efficiency, addresses latency concerns, and provides a high-performance storage solution capable of meeting diverse workload demands.
|
||
|
||
### Related Issues
|
||
|
||
- [[FEATURE] Add a global setting for enabling and disabling SPDK feature](https://github.com/longhorn/longhorn/issues/5778)
|
||
- [[FEATURE] Support replica scheduling for SPDK volume](https://github.com/longhorn/longhorn/issues/5711)
|
||
- [[FEATURE] Implement Disk gRPC Service in Instance Manager for collecting SPDK disk statistics from SPDK gRPC service](https://github.com/longhorn/longhorn/issues/5744)
|
||
- [[FEATURE] Identify and manage orphaned lvols and raid bdevs if the associated Volume resources are not existing](https://github.com/longhorn/longhorn/issues/5827)
|
||
|
||
## Motivation
|
||
|
||
### Goals
|
||
|
||
- Introduce backend store drivers
|
||
- `v1`: legacy data path
|
||
- `v2`: a newly introduced data path based on SPDK
|
||
- Introduce disk types and management
|
||
- Support volume creation, attachment, detachment and deletion
|
||
- Support orphaned replica collection
|
||
|
||
### Non-goals [optional]
|
||
|
||
- Support runtime replica rebuilding
|
||
- Support changing number of replicas of a volume
|
||
- Support volume expansion
|
||
- Support volume backup
|
||
|
||
## Proposal
|
||
|
||
### User Stories
|
||
|
||
Longhorn's storage stack is built upon iSCSI and a customized protocol. However, the longer data path associated with this architecture introduces certain limitations, resulting in increased I/O latencies and reduced IOPS. Consequently, Longhorn may not be the ideal choice for latency-critical applications, as the performance constraints could impede their deployment on the platform.
|
||
|
||
By incorporating SPDK, Longhorn leverages its capabilities to significantly improve performance levels. The integration of SPDK enables Longhorn to optimize system efficiency, mitigate latency concerns, and deliver a high-performance storage solution that can better meet the demands of diverse workloads.
|
||
|
||
### User Experience In Detail
|
||
|
||
- Environment Setup
|
||
- Configure Kernel Modules (uio and uio_pci_generic) and Huge Pages for SPDK
|
||
|
||
```bash
|
||
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/prerequisite/longhorn-spdk-setup.yaml
|
||
```
|
||
|
||
- Install NVMe Userspace Tool and Load `nvme-tcp` Kernel Module
|
||
|
||
nvme-cli on each node and make sure that the version of nvme-cli is equal to or greater than version `1.12` .
|
||
|
||
```bash
|
||
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/prerequisite/longhorn-nvme-cli-installation.yaml
|
||
```
|
||
|
||
- Restart `kubelet`
|
||
|
||
Modifying the Huge Page configuration of a node requires either a restart of kubelet or a complete reboot of the node. This step is crucial to ensure that the changes take effect and are properly applied.
|
||
|
||
- Install Longhorn system
|
||
- Enable SPDK Support
|
||
|
||
Enable the SPDK feature by changing the `v2-data-engine` setting to `true` after installation. Following this, the instance-manager pods shall be automatically restarted.
|
||
|
||
- Add Disks for volumes using v2 data engine
|
||
- Legacy disks are classified as `filesystem`-type disks
|
||
- Add one or multiple `block`-type disks into `node.Spec.Disks`
|
||
|
||
```bash
|
||
block-disk-example1:
|
||
allowScheduling: true
|
||
evictionRequested: false
|
||
path: /path/to/block/device
|
||
storageReserved: 0
|
||
tags: []
|
||
diskType: block
|
||
```
|
||
|
||
- Create a storage class utilizing the enhanced performance capabilities offered by SPDK
|
||
|
||
```bash
|
||
kind: StorageClass
|
||
apiVersion: storage.k8s.io/v1
|
||
metadata:
|
||
name: longhorn-v2-data-engine
|
||
provisioner: driver.longhorn.io
|
||
allowVolumeExpansion: true
|
||
reclaimPolicy: Delete
|
||
volumeBindingMode: Immediate
|
||
parameters:
|
||
numberOfReplicas: "2"
|
||
staleReplicaTimeout: "2880"
|
||
fromBackup: ""
|
||
fsType: "ext4"
|
||
backendStoreDriver: "v2"
|
||
```
|
||
|
||
- Create workloads that use Longhorn volumes provisioning based on the storage class.
|
||
|
||
### API changes
|
||
|
||
|
||
## Design
|
||
|
||
### Implementation Overview
|
||
|
||
- Global settings
|
||
- `v2-data-engine`: This setting allows users to enable v2 data engine support. Default: false.
|
||
- `v2-data-engine-hugepage-limit`: This setting allows users to specify the 2 MiB hugepage size for v2 data engine. Default: 2048.
|
||
- CRD
|
||
- Introduce `diskType` in `node.Spec.Disks`
|
||
- `filesystem`: disks for legacy volumes. These disks, which are actually directories, store and organize data in a hierarchical manner.
|
||
- `block`: block disks for volumes using v2 data engine
|
||
|
||
The replica scheduler assigns replicas of legacy volumes to `filesystem`-type disks while replicas of volumes using v2 data engine are scheduled to `block`-type disks.
|
||
|
||
- Introduce `backendStoreDriver` in `volume.Spec`, `engine.Spec` and `replica.Spec`.
|
||
- `backendStoreDriver` is utilized to differentiate between volume types and their associated data paths.
|
||
- Introduce `Instance`, `Disk` and `SPDK` gRPC services
|
||
|
||

|
||
|
||
- `Instance` gRPC service: It is tasked with managing various operations related to instance management, including creation, deletion, retrieval, listing, and watching. An instance, either an engine or a replica of a legacy volume, represents a process. On the other hand, for replicas of volumes using v2 data engine, an instance represents a logical volume. In the case of an engine for an volume using v2 data engine, an instance is associated with a raid bdev, a frontend NVMe target/initiator pair and a bind mount device.
|
||
|
||
- `Disk` gRPC service: It is responsible for managing various disk operations, including creation, deletion, and retrieval. Additionally, it provides functionalities to list or delete replica instances associated with the disks. In the case of a legacy volume, a replica instance is represented as a replica directory on the disk. On the other hand, for an volume using v2 data engine, a replica instance is a replica chained by logical volumes.
|
||
|
||
- `SPDK` gRPC service: It manages replicas chained by logical volumes and engines constructed using SPDK raid1 bdevs. In addition, the service is responsible for the communication with `spdk_tgt`.
|
||
|
||
- Proxy gRPC service APIs
|
||
- Update gRPC service APIs for support different disk type, filesystem and block, and data engines, v1 and v2.
|
||
|
||
- Disk orchestration
|
||
|
||
Within the Longhorn system, an aio bdev and an lvstore are created on top of a block-type disk. Replicas in terms of logical volumes (lvols) are then created on the lvstore.
|
||
|
||

|
||
|
||
- Orphaned replicas collection
|
||
|
||
The features have been integrated into the existing framework for collecting and cleaning up orphaned replicas.
|
||
|
||
## Test Plan
|
||
|
||
|
||
## Note [optional] |