2018-01-03 12:33:45 +00:00
|
|
|
# Writing a Custom Block Device Module {#bdev_module}
|
|
|
|
|
|
|
|
## Target Audience
|
|
|
|
|
|
|
|
This programming guide is intended for developers authoring their own block
|
|
|
|
device modules to integrate with SPDK's bdev layer. For a guide on how to use
|
|
|
|
the bdev layer, see @ref bdev_pg.
|
|
|
|
|
|
|
|
## Introduction
|
|
|
|
|
|
|
|
A block device module is SPDK's equivalent of a device driver in a traditional
|
|
|
|
operating system. The module provides a set of function pointers that are
|
|
|
|
called to service block device I/O requests. SPDK provides a number of block
|
|
|
|
device modules including NVMe, RAM-disk, and Ceph RBD. However, some users
|
|
|
|
will want to write their own to interact with either custom hardware or to an
|
|
|
|
existing storage software stack. This guide is intended to demonstrate exactly
|
|
|
|
how to write a module.
|
|
|
|
|
|
|
|
## Creating A New Module
|
|
|
|
|
2020-07-21 20:08:24 +00:00
|
|
|
Block device modules are located in subdirectories under module/bdev today. It is not
|
2018-01-03 12:33:45 +00:00
|
|
|
currently possible to place the code for a bdev module elsewhere, but updates
|
|
|
|
to the build system could be made to enable this in the future. To create a
|
|
|
|
module, add a new directory with a single C file and a Makefile. A great
|
|
|
|
starting point is to copy the existing 'null' bdev module.
|
|
|
|
|
|
|
|
The primary interface that bdev modules will interact with is in
|
2019-05-13 16:50:43 +00:00
|
|
|
include/spdk/bdev_module.h. In that header a macro is defined that registers
|
2018-03-06 18:52:46 +00:00
|
|
|
a new bdev module - SPDK_BDEV_MODULE_REGISTER. This macro take as argument a
|
2018-03-09 22:20:21 +00:00
|
|
|
pointer spdk_bdev_module structure that is used to register new bdev module.
|
2018-03-06 18:52:46 +00:00
|
|
|
|
2018-03-09 22:20:21 +00:00
|
|
|
The spdk_bdev_module structure describes the module properties like
|
2018-03-06 18:52:46 +00:00
|
|
|
initialization (`module_init`) and teardown (`module_fini`) functions,
|
|
|
|
the function that returns context size (`get_ctx_size`) - scratch space that
|
|
|
|
will be allocated in each I/O request for use by this module, and a callback
|
|
|
|
that will be called each time a new bdev is registered by another module
|
2019-03-15 09:32:24 +00:00
|
|
|
(`examine_config` and `examine_disk`). Please check the documentation of
|
|
|
|
struct spdk_bdev_module for more details.
|
2018-01-03 12:33:45 +00:00
|
|
|
|
|
|
|
## Creating Bdevs
|
|
|
|
|
|
|
|
New bdevs are created within the module by calling spdk_bdev_register(). The
|
|
|
|
module must allocate a struct spdk_bdev, fill it out appropriately, and pass
|
|
|
|
it to the register call. The most important field to fill out is `fn_table`,
|
|
|
|
which points at this data structure:
|
|
|
|
|
|
|
|
~~~{.c}
|
|
|
|
/*
|
|
|
|
* Function table for a block device backend.
|
|
|
|
*
|
|
|
|
* The backend block device function table provides a set of APIs to allow
|
|
|
|
* communication with a backend. The main commands are read/write API
|
|
|
|
* calls for I/O via submit_request.
|
|
|
|
*/
|
|
|
|
struct spdk_bdev_fn_table {
|
|
|
|
/* Destroy the backend block device object */
|
|
|
|
int (*destruct)(void *ctx);
|
|
|
|
|
|
|
|
/* Process the IO. */
|
|
|
|
void (*submit_request)(struct spdk_io_channel *ch, struct spdk_bdev_io *);
|
|
|
|
|
|
|
|
/* Check if the block device supports a specific I/O type. */
|
|
|
|
bool (*io_type_supported)(void *ctx, enum spdk_bdev_io_type);
|
|
|
|
|
|
|
|
/* Get an I/O channel for the specific bdev for the calling thread. */
|
|
|
|
struct spdk_io_channel *(*get_io_channel)(void *ctx);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Output driver-specific configuration to a JSON stream. Optional - may be NULL.
|
|
|
|
*
|
|
|
|
* The JSON write context will be initialized with an open object, so the bdev
|
|
|
|
* driver should write a name (based on the driver name) followed by a JSON value
|
|
|
|
* (most likely another nested object).
|
|
|
|
*/
|
|
|
|
int (*dump_config_json)(void *ctx, struct spdk_json_write_ctx *w);
|
|
|
|
|
|
|
|
/* Get spin-time per I/O channel in microseconds.
|
|
|
|
* Optional - may be NULL.
|
|
|
|
*/
|
|
|
|
uint64_t (*get_spin_time)(struct spdk_io_channel *ch);
|
|
|
|
};
|
|
|
|
~~~
|
|
|
|
|
|
|
|
The bdev module must implement these function callbacks.
|
|
|
|
|
|
|
|
The `destruct` function is called to tear down the device when the system no
|
|
|
|
longer needs it. What `destruct` does is up to the module - it may just be
|
|
|
|
freeing memory or it may be shutting down a piece of hardware.
|
|
|
|
|
|
|
|
The `io_type_supported` function returns whether a particular I/O type is
|
|
|
|
supported. The available I/O types are:
|
|
|
|
|
|
|
|
~~~{.c}
|
|
|
|
/** bdev I/O type */
|
|
|
|
enum spdk_bdev_io_type {
|
|
|
|
SPDK_BDEV_IO_TYPE_INVALID = 0,
|
|
|
|
SPDK_BDEV_IO_TYPE_READ,
|
|
|
|
SPDK_BDEV_IO_TYPE_WRITE,
|
|
|
|
SPDK_BDEV_IO_TYPE_UNMAP,
|
|
|
|
SPDK_BDEV_IO_TYPE_FLUSH,
|
|
|
|
SPDK_BDEV_IO_TYPE_RESET,
|
|
|
|
SPDK_BDEV_IO_TYPE_NVME_ADMIN,
|
|
|
|
SPDK_BDEV_IO_TYPE_NVME_IO,
|
|
|
|
SPDK_BDEV_IO_TYPE_NVME_IO_MD,
|
|
|
|
SPDK_BDEV_IO_TYPE_WRITE_ZEROES,
|
|
|
|
};
|
|
|
|
~~~
|
|
|
|
|
|
|
|
For the simplest bdev modules, only `SPDK_BDEV_IO_TYPE_READ` and
|
|
|
|
`SPDK_BDEV_IO_TYPE_WRITE` are necessary. `SPDK_BDEV_IO_TYPE_UNMAP` is often
|
|
|
|
referred to as "trim" or "deallocate", and is a request to mark a set of
|
|
|
|
blocks as no longer containing valid data. `SPDK_BDEV_IO_TYPE_FLUSH` is a
|
|
|
|
request to make all previously completed writes durable. Many devices do not
|
|
|
|
require flushes. `SPDK_BDEV_IO_TYPE_WRITE_ZEROES` is just like a regular
|
|
|
|
write, but does not provide a data buffer (it would have just contained all
|
|
|
|
0's). If it isn't supported, the generic bdev code is capable of emulating it
|
|
|
|
by sending regular write requests.
|
|
|
|
|
|
|
|
`SPDK_BDEV_IO_TYPE_RESET` is a request to abort all I/O and return the
|
|
|
|
underlying device to its initial state. Do not complete the reset request
|
|
|
|
until all I/O has been completed in some way.
|
|
|
|
|
|
|
|
`SPDK_BDEV_IO_TYPE_NVME_ADMIN`, `SPDK_BDEV_IO_TYPE_NVME_IO`, and
|
|
|
|
`SPDK_BDEV_IO_TYPE_NVME_IO_MD` are all mechanisms for passing raw NVMe
|
|
|
|
commands through the SPDK bdev layer. They're strictly optional, and it
|
|
|
|
probably only makes sense to implement those if the backing storage device is
|
|
|
|
capable of handling NVMe commands.
|
|
|
|
|
|
|
|
The `get_io_channel` function should return an I/O channel. For a detailed
|
|
|
|
explanation of I/O channels, see @ref concurrency. The generic bdev layer will
|
|
|
|
call `get_io_channel` one time per thread, cache the result, and pass that
|
|
|
|
result to `submit_request`. It will use the corresponding channel for the
|
|
|
|
thread it calls `submit_request` on.
|
|
|
|
|
|
|
|
The `submit_request` function is called to actually submit I/O requests to the
|
|
|
|
block device. Once the I/O request is completed, the module must call
|
|
|
|
spdk_bdev_io_complete(). The I/O does not have to finish within the calling
|
|
|
|
context of `submit_request`.
|
|
|
|
|
2020-07-21 20:08:24 +00:00
|
|
|
Integrating a new bdev module into the build system requires updates to various
|
|
|
|
files in the /mk directory.
|
|
|
|
|
|
|
|
## Creating Bdevs in an External Repository
|
|
|
|
|
|
|
|
A User can build their own bdev module and application on top of existing SPDK libraries. The example in
|
|
|
|
test/external_code serves as a template for creating, building and linking an external
|
|
|
|
bdev module. Refer to test/external_code/README.md and @ref so_linking for further information.
|
|
|
|
|
2018-01-03 12:33:45 +00:00
|
|
|
## Creating Virtual Bdevs
|
|
|
|
|
|
|
|
Block devices are considered virtual if they handle I/O requests by routing
|
|
|
|
the I/O to other block devices. The canonical example would be a bdev module
|
|
|
|
that implements RAID. Virtual bdevs are created in the same way as regular
|
2022-10-21 04:09:41 +00:00
|
|
|
bdevs, but take the one additional step of claiming the bdev.
|
|
|
|
|
|
|
|
The module can open the underlying bdevs it wishes to route I/O to using
|
|
|
|
spdk_bdev_open_ext(), where the string name is provided by the user via an RPC.
|
2022-10-24 14:52:32 +00:00
|
|
|
To ensure that other consumers do not modify the underlying bdev in an unexpected
|
|
|
|
way, the virtual bdev should take a claim on the underlying bdev before
|
|
|
|
reading from or writing to the underlying bdev.
|
|
|
|
|
|
|
|
There are two slightly different APIs for taking and releasing claims. The
|
|
|
|
preferred interface uses `spdk_bdev_module_claim_bdev_desc()`. This method allows
|
|
|
|
claims that ensure there is a single writer with
|
2023-02-09 02:51:38 +00:00
|
|
|
`SPDK_BDEV_CLAIM_READ_MANY_WRITE_ONE`, cooperating shared writers with
|
|
|
|
`SPDK_BDEV_CLAIM_READ_MANY_WRITE_SHARED`, and shared readers that prevent any
|
|
|
|
writers with `SPDK_BDEV_CLAIM_READ_MANY_WRITE_NONE`. In all cases,
|
2022-10-24 14:52:32 +00:00
|
|
|
`spdk_bdev_open_ext()` may be used to open the underlying bdev read-only. If a
|
|
|
|
read-only bdev descriptor successfully claims a bdev with
|
2023-02-09 02:51:38 +00:00
|
|
|
`SPDK_BDEV_CLAIM_READ_MANY_WRITE_ONE` or `SPDK_BDEV_CLAIM_READ_MANY_WRITE_SHARED`
|
2022-10-24 14:52:32 +00:00
|
|
|
the bdev descriptor is promoted to read-write.
|
|
|
|
Any claim that is obtained with `spdk_bdev_module_claim_bdev_desc()` is
|
|
|
|
automatically released upon closing the bdev descriptor used to obtain the
|
|
|
|
claim. Shared claims continue to block new incompatible claims and new writers
|
|
|
|
until the last claim is released.
|
|
|
|
|
|
|
|
The non-preferred interface for obtaining a claim allows the caller to obtain
|
|
|
|
an exclusive writer claim with `spdk_bdev_module_claim_bdev()`. It may be
|
|
|
|
be released with `spdk_bdev_module_release_bdev()`. If a read-only bdev
|
|
|
|
descriptor is passed, it is promoted to read-write. NULL may be passed instead
|
|
|
|
of a bdev descriptor to avoid promotion and to block new writers. New code
|
|
|
|
should use `spdk_bdev_module_claim_bdev_desc()` with the claim type that is
|
|
|
|
tailored to the virtual bdev's needs.
|
2022-10-21 04:09:41 +00:00
|
|
|
|
|
|
|
The descriptor obtained from the successful spdk_bdev_open_ext() may be used
|
|
|
|
with spdk_bdev_get_io_channel() to obtain I/O channels for the bdev. This is
|
|
|
|
likely done in response to the virtual bdev's `get_io_channel` callback.
|
2022-10-24 14:52:32 +00:00
|
|
|
Channels may be obtained before and/or after claiming the underlying bdev, but
|
|
|
|
beware there may be other unknown writers until the underlying bdev has been
|
|
|
|
claimed.
|
|
|
|
|
|
|
|
When a virtual bdev module claims an underlying bdev from its `examine_config`
|
|
|
|
callback, it causes the `examine_disk` callback to only be called for this
|
|
|
|
module and any others that establish a shared claim. If no claims are taken by
|
|
|
|
`examine_config` callbacks, all virtual bdevs' `examine_disk` callbacks are
|
|
|
|
called.
|