Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com> Change-Id: I0de47e21a34d7766c4addd6f751098b03d8a4a9e Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/16245 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
218 lines
8.7 KiB
Markdown
218 lines
8.7 KiB
Markdown
# ublk Target {#ublk}
|
||
|
||
## Table of Contents {#ublk_toc}
|
||
|
||
- @ref ublk_intro
|
||
- @ref ublk_internal
|
||
- @ref ublk_impl
|
||
- @ref ublk_op
|
||
|
||
## Introduction {#ublk_intro}
|
||
|
||
[ublk](https://docs.kernel.org/block/ublk.html) (or ubd) is a generic framework for
|
||
implementing generic userspace block device based on `io_uring`. It is designed to
|
||
create a highly efficient data path for userspace storage software to provide
|
||
high-performance block device service in local host.
|
||
|
||
The whole ublk service involves three parts: ublk driver, ublk server and ublk workload.
|
||
|
||

|
||
|
||
* __ublk driver__ is a kernel driver added to kernel 6.0. It delivers I/O requests
|
||
from a ublk block device(`/dev/ublkbN`) into a ublk server.
|
||
|
||
* __ublk workload__ can be any local host process which submits I/O requests to a ublk
|
||
block device or a kernel filesystem on top of the ublk block device.
|
||
|
||
* __ublk server__ is the userspace storage software that fetches the I/O requests delivered
|
||
by the ublk driver. The ublk server will process the I/O requests with its specific block
|
||
service logic and connected backends. Once the ublk server gets the response from the
|
||
connected backends, it communicates with the ublk driver and completes the I/O requests.
|
||
|
||
SPDK ublk target acts as a ublk server. It can handle ublk I/O requests within the whole
|
||
SPDK userspace storage software stack.
|
||
|
||
A typical usage scenario is for container attached storage:
|
||
|
||
* Real storage resources are assigned to SPDK, like physical NVMe devices and
|
||
distributed block storage.
|
||
* SPDK creates refined block devices via ublk kernel module on top of its organized
|
||
storage resources, based on user configuration.
|
||
* Container orchestrator and runtime can then mount and stage the ublk block devices
|
||
for container instances to use.
|
||
|
||
## ublk Internal {#ublk_internal}
|
||
|
||
Previously, the design of putting I/O processing logic into userspace software always has a
|
||
noticeable interaction overhead between the kernel module and userspace part.
|
||
|
||
ublk utilizes `io_uring` which has been proven to be very efficient in decreasing the
|
||
interaction overhead. The I/O request is delivered to the userspace ublk server via the
|
||
newly added `io_uring` command. A shared buffer via `mmap` is used for sharing I/O descriptor
|
||
to userspace from the kernel driver. The I/O data is copied only once between the specified
|
||
userspace buffer address and request/bio's pages by the ublk driver.
|
||
|
||
### Control Plane
|
||
|
||
A control device is create by ublk kernel module at `/dev/ublk-control`. Userspace server
|
||
sends control commands to kernel module via the control device using `io_uring`.
|
||
|
||
Control commands includes add, configure, and start new ublk block device.
|
||
Retrieving device information, stop and delete existing ublk block device are also there.
|
||
|
||
The add device command creates a bulk char device `/dev/ublkcN`.
|
||
It will be used by the ublk userspace server to `mmap` I/O descriptor buffer.
|
||
The start device command exposes a ublk block device `/dev/ublkbN`.
|
||
The block device can be formatted and mounted by a kernel filesystem,
|
||
or read/written directly by other processes.
|
||
|
||
### Data Plane
|
||
|
||
The datapath between ublk server and kernel driver includes `io_uring` and shared
|
||
memory buffer. The shared memory buffer is an array of I/O descriptors.
|
||
Each SQE (Submission Queue Entry) in `io_uring` is assigned one I/O descriptor and
|
||
one user buffer address. When ublk kernel driver receives I/O requests from upper
|
||
layer, the information of I/O requests will be filled into I/O descriptors by ublk
|
||
kernel driver. The I/O data is copied between the specified user buffer address and
|
||
request/bio's pages at the proper time.
|
||
|
||
At start, the ublk server needs to fill the `io_uring` SQ (Submission Queue). Each
|
||
SQE is marked with an operation flag `UBLK_IO_FETCH_REQ` which means the SQE is
|
||
ready to get I/O request.
|
||
|
||
When a CQE (Completion Queue Entry) is returned from the `io_uring` indicating I/O
|
||
request, the ublk server gets the position of the I/O descriptor from CQE.
|
||
The ublk server handles the I/O request based on information in the I/O descriptor.
|
||
|
||
After the ublk server completes the I/O request, it updates the I/O's completion status
|
||
and ublk operation flag. This time, the operation flag is `UBLK_IO_COMMIT_AND_FETCH_REQ`
|
||
which informs kernel module that one I/O request is completed, and also the SQE slot
|
||
is free to fetch new I/O request.
|
||
|
||
`UBLK_IO_COMMIT_AND_FETCH_REQ` is designed for efficiency in ublk. In runtime, the ublk
|
||
server needs to commit I/O results back, and then provide new free SQE slots for fetching
|
||
new I/O requests. Without `UBLK_IO_COMMIT_AND_FETCH_REQ` flag, `io_uring_submit()` should
|
||
be called twice, once for committing I/O results back, once for providing free SQE slots.
|
||
With `UBLK_IO_COMMIT_AND_FETCH_REQ` flag, calling `io_uring_submit()` once is enough because
|
||
the ublk driver realizes that the submitted SQEs are reused both for committing back I/O
|
||
results and fetching new requests.
|
||
|
||
## SPDK Implementation {#ublk_impl}
|
||
|
||
SPDK ublk target is implemented as a high performance ublk server.
|
||
|
||
It creates one ublk spdk_thread on each spdk_reactor by default or on user specified
|
||
reactors. When adding a new ublk block device, SPDK ublk target will assign queues
|
||
of ublk block device to ublk spdk_threads in round-robin.
|
||
That means one ublk device queue will only be processed by one spdk_thread.
|
||
One ublk device with multiple queues can get multiple spdk reactors involved
|
||
to process its I/O requests;
|
||
One spdk_thread created by ublk target may process multiple queues, each from
|
||
different ublk devices.
|
||
In this way, spdk reactors can be fully utilized to achieve best performance,
|
||
when there are only a few ublk devices.
|
||
|
||
ublk is `io_uring` based. All ublk I/O queues are mapped to `io_uring`.
|
||
ublk spdk_thread gets I/O requests from available CQEs by polling all its assigned
|
||
`io_uring`s.
|
||
When there are completed I/O requests, ublk spdk_thread will submit them as SQE back
|
||
to `io_uring` in batch.
|
||
|
||
Currently, ublk driver has a system thread context limitation that one ublk device queue
|
||
can be only processed in the context of system thread which initialized the it. SPDK
|
||
can't schedule ublk spdk_thread between different SPDK reactors. In other words, SPDK
|
||
dynamic scheduler can't rebalance ublk workload by rescheduling ublk spdk_thread.
|
||
|
||
## Operation {#ublk_op}
|
||
|
||
### Enabling SPDK ublk target
|
||
|
||
Build SPDK with SPDK ublk target enabled.
|
||
|
||
~~~{.sh}
|
||
./configure --with-ublk
|
||
make -j
|
||
~~~
|
||
|
||
SPDK ublk target related libaries will then be linked into SPDK appliation `spdk_tgt`.
|
||
Setup some hugepages for the SPDK, and then run the SPDK application `spdk_tgt`.
|
||
|
||
~~~{.sh}
|
||
scripts/setup.sh
|
||
build/bin/spdk_tgt &
|
||
~~~
|
||
|
||
Once the `spdk_tgt` is initialized, user can enable SPDK ublk feature
|
||
by creating ublk target. However, before creating ublk target, ublk kernel module
|
||
`ublk_drv` should be loaded using `modprobe`.
|
||
|
||
~~~{.sh}
|
||
modprobe ublk_drv
|
||
scripts/rpc.py ublk_create_target
|
||
~~~
|
||
|
||
### Creating ublk block device
|
||
|
||
SPDK bdevs are block devices which will be exposed to the local host kernel
|
||
as ublk block devices. SPDK supports several different types of storage backends,
|
||
including NVMe, Linux AIO, malloc ramdisk and Ceph RBD. Refer to @ref bdev for
|
||
additional information on configuring SPDK storage backends.
|
||
|
||
This guide will use a malloc bdev (ramdisk) named Malloc0. The following RPC
|
||
will create a 256MB malloc bdev with 512-byte block size.
|
||
|
||
~~~{.sh}
|
||
scripts/rpc.py bdev_malloc_create 256 512 -b Malloc0
|
||
~~~
|
||
|
||
The following RPC will create a ublk block device exposing Malloc0 bdev.
|
||
The created ublk block device has ID 1. It internally has 2 queues with
|
||
queue depth 128.
|
||
|
||
~~~{.sh}
|
||
scripts/rpc.py ublk_start_disk Malloc0 1 -q 2 -d 128
|
||
~~~
|
||
|
||
This RPC will reply back the ID of ublk block device.
|
||
~~~
|
||
1
|
||
~~~
|
||
|
||
The position of ublk block device is determined by its ID. It is created at `/dev/ublkb${ID}`.
|
||
So the device we just created will be accessible to other processes via `/dev/ublkb1`.
|
||
Now applications like FIO or DD can work on `/dev/ublkb1` directly.
|
||
|
||
~~~{.sh}
|
||
dd of=/dev/ublkb1 if=/dev/zero bs=512 count=64
|
||
~~~
|
||
|
||
A ublk block device is a generic kernel block device that can be formatted and
|
||
mounted by kernel file system.
|
||
|
||
~~~{.sh}
|
||
mkfs /dev/ublkb1
|
||
mount /dev/ublkb1 /mnt/
|
||
mkdir /mnt/testdir
|
||
echo "Hello,SPDK ublk Target" > /mnt/testdir/testfile
|
||
umount /mnt
|
||
~~~
|
||
|
||
### Deleting ublk block device and exit
|
||
|
||
After usage, ublk block device can be stopped and deleted by RPC `ublk_stop_disk` with its ID.
|
||
Specify ID 1, then device `/dev/ublkb1` will be removed.
|
||
|
||
~~~{.sh}
|
||
scripts/rpc.py ublk_stop_disk 1
|
||
~~~
|
||
|
||
If ublk is not used anymore, SPDK ublk target can be destroyed to free related SPDK
|
||
resources.
|
||
|
||
~~~{.sh}
|
||
scripts/rpc.py ublk_destroy_target
|
||
~~~
|
||
|
||
Of course, SPDK ublk target and all ublk block devices would be destroyed automatically
|
||
when SPDK application is terminated.
|