Spdk/examples/nvme/fio_plugin
Simon A. F. Lund 8a1d6f446d examples/nvme_fio_plugin: add initial support for ZNS
This adds initial support for ZNS by aligning the NVMe spec. defined ZNS
structures and commands with the fio Zone representation and
implementation of the following io-engine functions:

get_zoned_model() / spdk_fio_get_zoned_model(), when namespace is ZNS
and the Zoned-Command-Set is enabled, then this function informs fio
that the device is ZBD_HOST_MANAGED.

report_zones() / spdk_fio_report_zones(), submits a single
zone-mgmt-recv and waits for its completion, converts the spec-defined
zone-descriptors to the fio ZBD_ZONE representation and returns the
number of zones in the converted report.

reset_wp() / spdk_fio_reset_wp(), submits multiple zone-mgmt-send,
covering the range [offset, offset+length] and waits for their
completion.

Four helper-functions are added to assist in the above implementations.

get_fio_qpair(), this helper is added to retrieve the namespace matching
the given fio-file, ensuring that management commands reach the correct
namespace.

spdk_fio_qpair_mdts_nbytes(), this helper is added to assist
report_zones() retrieve the zone-report within the bounds of the
maximum-data-transfer of the device.

The functions pcu() and pcu_cb() provide a means to submit
management-commands and waiting for their completions. They are needed
since, although mgmt-send/recv are IO-commands in the context of NVMe,
then for fio they are not part of the regular queue/event/getevents but
utilized in a synchronous/blocking manner.

Note, in the fio-zone-representation, then the start/len/capacity/wp
fields are in units of bytes, whereas the corresponding values in NVMe
are in lbas/sectors. It is worth noting as the offset <-> lba
conversions do not take NVMe configurations with extended-lba format
into account. Thus, the implementation is initial support for ZNS as
more work is needed to support pi/extended-lba configurations.

Note, a guard FIO_HAS_ZBD checks for the required io-engine ops version
and indirectly testing for available of fio Zone representation by
testing for a macro introduced in the same fio-release as the required
fio Zone representation.

Signed-off-by: Simon A. F. Lund <simon.lund@samsung.com>
Change-Id: Id3d1d61a52db2e55019032c724197df4d559271a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4836
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2020-10-23 13:47:01 +00:00
..
.gitignore nvme: Add an fio plugin 2016-05-18 13:51:36 -07:00
example_config.fio fio_plugin: switch to LD_PRELOAD instead of dynamically loading 2017-06-14 11:12:29 -04:00
fio_plugin.c examples/nvme_fio_plugin: add initial support for ZNS 2020-10-23 13:47:01 +00:00
full_bench.fio fio: Update full_bench.fio example to reduce latency 2018-01-09 13:40:59 -05:00
Makefile sock/vpp: remove VPP implementation 2020-08-17 08:19:46 +00:00
mock_sgl_config.fio test/nvmf: add verify_backlog to fio SGL tests. 2020-02-25 15:59:21 +00:00
README.md examples/nvme_fio_plugin: add initial support for ZNS 2020-10-23 13:47:01 +00:00

Compiling fio

First, clone the fio source repository from https://github.com/axboe/fio

git clone https://github.com/axboe/fio

Then check out the latest fio version and compile the code:

make

Compiling SPDK

First, clone the SPDK source repository from https://github.com/spdk/spdk

git clone https://github.com/spdk/spdk
git submodule update --init

Then, run the SPDK configure script to enable fio (point it to the root of the fio repository):

cd spdk
./configure --with-fio=/path/to/fio/repo <other configuration options>

Finally, build SPDK:

make

Note to advanced users: These steps assume you're using the DPDK submodule. If you are using your own version of DPDK, the fio plugin requires that DPDK be compiled with -fPIC. You can compile DPDK with -fPIC by modifying your DPDK configuration file and adding the line:

EXTRA_CFLAGS=-fPIC

Usage

To use the SPDK fio plugin with fio, specify the plugin binary using LD_PRELOAD when running fio and set ioengine=spdk in the fio configuration file (see example_config.fio in the same directory as this README).

LD_PRELOAD=<path to spdk repo>/build/fio/spdk_nvme fio

To select NVMe devices, you pass an SPDK Transport Identifier string as the filename. These are in the form:

filename=key=value [key=value] ... ns=value

Specifically, for local PCIe NVMe devices it will look like this:

filename=trtype=PCIe traddr=0000.04.00.0 ns=1

And remote devices accessed via NVMe over Fabrics will look like this:

filename=trtype=RDMA adrfam=IPv4 traddr=192.168.100.8 trsvcid=4420 ns=1

Note: The specification of the PCIe address should not use the normal ':' and instead only use '.'. This is a limitation in fio - it splits filenames on ':'. Also, the NVMe namespaces start at 1, not 0, and the namespace must be specified at the end of the string.

Currently the SPDK fio plugin is limited to the thread usage model, so fio jobs must also specify thread=1 when using the SPDK fio plugin.

fio also currently has a race condition on shutdown if dynamically loading the ioengine by specifying the engine's full path via the ioengine parameter - LD_PRELOAD is recommended to avoid this race condition.

When testing random workloads, it is recommended to set norandommap=1. fio's random map processing consumes extra CPU cycles which will degrade performance over time with the fio_plugin since all I/O are submitted and completed on a single CPU core.

When testing FIO on multiple NVMe SSDs with SPDK plugin, it is recommended to use multiple jobs in FIO configurion. It has been observed that there are some performance gap between FIO(with SPDK plugin enabled) and SPDK perf (examples/nvme/perf/perf) on testing multiple NVMe SSDs. If you use one job(i.e., use one CPU core) configured for FIO test, the performance is worse than SPDK perf (also using one CPU core) against many NVMe SSDs. But if you use multiple jobs for FIO test, the performance of FIO is similiar with SPDK perf. After analyzing this phenomenon, we think that is caused by the FIO architecture. Mainly FIO can scale with multiple threads (i.e., using CPU cores), but it is not good to use one thread against many I/O devices.

End-to-end Data Protection (Optional)

Running with PI setting, following settings steps are required. First, format device namespace with proper PI setting. For example:

nvme format /dev/nvme0n1 -l 1 -i 1 -p 0 -m 1

In fio configure file, add PRACT and set PRCHK by flags(GUARD|REFTAG|APPTAG) properly. For example:

pi_act=0
pi_chk=GUARD

Blocksize should be set as the sum of data and metadata. For example, if data blocksize is 512 Byte, host generated PI metadata is 8 Byte, then blocksize in fio configure file should be 520 Byte:

bs=520

The storage device may use a block format that requires separate metadata (DIX). In this scenario, the fio_plugin will automatically allocate an extra 4KiB buffer per I/O to hold this metadata. For some cases, such as 512 byte blocks with 32 metadata bytes per block and a 128KiB I/O size, 4KiB isn't large enough. In this case, the md_per_io_size option may be specified to increase the size of the metadata buffer.

Expose two options 'apptag' and 'apptag_mask', users can change them in the configuration file when using application tag and application tag mask in end-to-end data protection. Application tag and application tag mask are set to 0x1234 and 0xFFFF by default.

VMD (Optional)

To enable VMD enumeration add enable_vmd flag in fio configuration file:

enable_vmd=1

ZNS

To use Zoned Namespaces then build the io-engine against, and run using, a fio version >= 3.23 and add:

zonemode=zbd

To your fio-script, also have a look at script-examples provided with fio:

fio/examples/zbd-seq-read.fio
fio/examples/zbd-rand-write.fio