doc: Improve NVMe documentation entry page

Give a higher level overview of the NVMe driver, plus point to the examples and the fio plugin up front. Change-Id: Ifded628a79c2a648e024698410e2c2a164fe08eb Signed-off-by: Ben Walker <benjamin.walker@intel.com> Reviewed-on: https://review.gerrithub.io/364843 Tested-by: SPDK Automated Test System <sys_sgsw@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
2017-06-09 10:21:07 -07:00 · 2017-06-09 10:21:07 -07:00 · d40b934652
commit d40b934652
parent e3ff3e4e59
1 changed files with 71 additions and 75 deletions
--- a/doc/nvme.md
+++ b/doc/nvme.md
@ -1,14 +1,46 @@
 # NVMe Driver {#nvme}

+# Introduction {#nvme_intro}
+
+The NVMe driver is a C library that may be linked directly into an application
+that provides direct, zero-copy data transfer to and from
+[NVMe SSDs](http://nvmexpress.org/). It is entirely passive, meaning that it spawns
+no threads and only performs actions in response to function calls from the
+application itself. The library controls NVMe devices by directly mapping the
+[PCI BAR](https://en.wikipedia.org/wiki/PCI_configuration_space) into the local
+process and performing [MMIO](https://en.wikipedia.org/wiki/Memory-mapped_I/O).
+I/O is submitted asynchronously via queue pairs and the general flow isn't
+entirely dissimilar from Linux's
+[libaio](http://man7.org/linux/man-pages/man2/io_submit.2.html).
+
+More recently, the library has been improved to also connect to remote NVMe
+devices via NVMe over Fabrics. Users may now call spdk_nvme_probe() on both
+local PCI busses and on remote NVMe over Fabrics discovery services. The API is
+otherwise unchanged.
+
+# Examples {#nvme_examples}
+
+There are a number of examples provided that demonstrate how to use the NVMe
+library. They are all in the [examples/nvme](https://github.com/spdk/spdk/tree/master/examples/nvme)
+directory in the repository. The best place to start is
+[hello_world](https://github.com/spdk/spdk/blob/master/examples/nvme/hello_world/hello_world.c).
+
+# Running Benchmarks {#nvme_benchmarks}
+
+SPDK provides a plugin to the very popular [fio](https://github.com/axboe/fio)
+tool for running some basic benchmarks. See the fio start up
+[guide](https://github.com/spdk/spdk/blob/master/examples/nvme/fio_plugin/)
+for more details.
+
 # Public Interface {#nvme_interface}

 - spdk/nvme.h

-# Key Functions {#nvme_key_functions}
-
-Function                                    | Description
+Key Functions                               | Description
 ------------------------------------------- | -----------
 spdk_nvme_probe()                           | @copybrief spdk_nvme_probe()
+spdk_nvme_ctrlr_alloc_io_qpair()            | @copybrief spdk_nvme_ctrlr_alloc_io_qpair()
+spdk_nvme_ctrlr_get_ns()                    | @copybrief spdk_nvme_ctrlr_get_ns()
 spdk_nvme_ns_cmd_read()                     | @copybrief spdk_nvme_ns_cmd_read()
 spdk_nvme_ns_cmd_write()                    | @copybrief spdk_nvme_ns_cmd_write()
 spdk_nvme_ns_cmd_dataset_management()       | @copybrief spdk_nvme_ns_cmd_dataset_management()
@ -17,76 +49,55 @@ spdk_nvme_qpair_process_completions()       | @copybrief spdk_nvme_qpair_process
 spdk_nvme_ctrlr_cmd_admin_raw()             | @copybrief spdk_nvme_ctrlr_cmd_admin_raw()
 spdk_nvme_ctrlr_process_admin_completions() | @copybrief spdk_nvme_ctrlr_process_admin_completions()

-
-# NVMe Initialization {#nvme_initialization}
-
-\msc
-
-	app [label="Application"], nvme [label="NVMe Driver"];
-	app=>nvme [label="nvme_probe()"];
-	app<<nvme [label="probe_cb(pci_dev)"];
-	nvme=>nvme [label="nvme_attach(devhandle)"];
-	nvme=>nvme [label="nvme_ctrlr_start(nvme_controller ptr)"];
-	nvme=>nvme [label="identify controller"];
-	nvme=>nvme [label="create queue pairs"];
-	nvme=>nvme [label="identify namespace(s)"];
-	app<<nvme [label="attach_cb(pci_dev, nvme_controller)"];
-	app=>app [label="create block devices based on controller's namespaces"];
-
-\endmsc
-
-
 # NVMe I/O Submission {#nvme_io_submission}

-I/O is submitted to an NVMe namespace using nvme_ns_cmd_xxx functions
-defined in nvme_ns_cmd.c.  The NVMe driver submits the I/O request
-as an NVMe submission queue entry on the queue pair specified in the command.
-The application must poll for I/O completion on each queue pair with outstanding I/O
-to receive completion callbacks.
+I/O is submitted to an NVMe namespace using nvme_ns_cmd_xxx functions. The NVMe
+driver submits the I/O request as an NVMe submission queue entry on the queue
+pair specified in the command. The function returns immediately, prior to the
+completion of the command. The application must poll for I/O completion on each
+queue pair with outstanding I/O to receive completion callbacks by calling
+spdk_nvme_qpair_process_completions().

@sa spdk_nvme_ns_cmd_read, spdk_nvme_ns_cmd_write, spdk_nvme_ns_cmd_dataset_management,
 spdk_nvme_ns_cmd_flush, spdk_nvme_qpair_process_completions

+## Scaling Performance {#nvme_scaling}

-# NVMe Asynchronous Completion {#nvme_async_completion}
+NVMe queue pairs (struct spdk_nvme_qpair) provide parallel submission paths for
+I/O. I/O may be submitted on multiple queue pairs simultaneously from different
+threads. Queue pairs contain no locks or atomics, however, so a given queue
+pair may only be used by a single thread at a time. This requirement is not
+enforced by the NVMe driver (doing so would require a lock), and violating this
+requirement results in undefined behavior.

-The userspace NVMe driver follows an asynchronous polled model for
-I/O completion.
+The number of queue pairs allowed is dictated by the NVMe SSD itself. The
+specification allows for thousands, but most devices support between 32
+and 128. The specification makes no guarantees about the performance available from
+each queue pair, but in practice the full performance of a device is almost
+always achievable using just one queue pair. For example, if a device claims to
+be capable of 450,000 I/O per second at queue depth 128, in practice it does
+not matter if the driver is using 4 queue pairs each with queue depth 32, or a
+single queue pair with queue depth 128.

-## I/O commands {#nvme_async_io}
-
-The application may submit I/O from one or more threads on one or more queue pairs
-and must call spdk_nvme_qpair_process_completions()
-for each queue pair that submitted I/O.
-
-When the application calls spdk_nvme_qpair_process_completions(),
-if the NVMe driver detects completed I/Os that were submitted on that queue,
-it will invoke the registered callback function
-for each I/O within the context of spdk_nvme_qpair_process_completions().
-
-## Admin commands {#nvme_async_admin}
-
-The application may submit admin commands from one or more threads
-and must call spdk_nvme_ctrlr_process_admin_completions()
-from at least one thread to receive admin command completions.
-The thread that processes admin completions need not be the same thread that submitted the
-admin commands.
-
-When the application calls spdk_nvme_ctrlr_process_admin_completions(),
-if the NVMe driver detects completed admin commands submitted from any thread,
-it will invote the registered callback function
-for each command within the context of spdk_nvme_ctrlr_process_admin_completions().
-
-It is the application's responsibility to manage the order of submitted admin commands.
-If certain admin commands must be submitted while no other commands are outstanding,
-it is the application's responsibility to enforce this rule
-using its own synchronization method.
+Given the above, the easiest threading model for an application using SPDK is
+to spawn a fixed number of threads in a pool and dedicate a single NVMe queue
+pair to each thread. A further improvement would be to pin each thread to a
+separate CPU core, and often the SPDK documentation will use "CPU core" and
+"thread" interchangeably because we have this threading model in mind.

+The NVMe driver takes no locks in the I/O path, so it scales linearly in terms
+of performance per thread as long as a queue pair and a CPU core are dedicated
+to each new thread. In order to take full advantage of this scaling,
+applications should consider organizing their internal data structures such
+that data is assigned exclusively to a single thread. All operations that
+require that data should be done by sending a request to the owning thread.
+This results in a message passing architecture, as opposed to a locking
+architecture, and will result in superior scaling across CPU cores.

 # NVMe over Fabrics Host Support {#nvme_fabrics_host}

 The NVMe driver supports connecting to remote NVMe-oF targets and
-interacting with them in the same manner as local NVMe controllers.
+interacting with them in the same manner as local NVMe SSDs.

 ## Specifying Remote NVMe over Fabrics Targets {#nvme_fabrics_trid}

@ -95,6 +106,7 @@ to the normal enumeration process for local PCIe-attached NVMe devices.
 To connect to a remote NVMe over Fabrics subsystem, the user may call
 spdk_nvme_probe() with the `trid` parameter specifying the address of
 the NVMe-oF target.
+
 The caller may fill out the spdk_nvme_transport_id structure manually
 or use the spdk_nvme_transport_id_parse() function to convert a
 human-readable string representation into the required structure.
@ -108,7 +120,6 @@ single NVM subsystem directly, the NVMe library will call `probe_cb`
 for just that subsystem; this allows the user to skip the discovery step
 and connect directly to a subsystem with a known address.

-
 # NVMe Multi Process {#nvme_multi_process}

 This capability enables the SPDK NVMe driver to support multiple processes accessing the
@ -145,21 +156,6 @@ Example: identical shm_id and non-overlapping core masks
 ./perf -q 8 -s 131072 -w write -c 0x10 -t 60 -i 1
 ~~~

-## Scalability and Performance {#nvme_multi_process_scalability_performance}
-
-To maximize the I/O bandwidth of an NVMe device, ensure that each application has its own
-queue pairs.
-
-The optimal threading model for SPDK is one thread per core, regardless of which processes
-that thread belongs to in the case of multi-process environment. To achieve maximum
-performance, each thread should also have its own I/O queue pair. Applications that share
-memory should be given core masks that do not overlap.
-
-However, admin commands may have some performance impact as there is only one admin queue
-pair per NVMe SSD. The NVMe driver will automatically take a cross-process capable lock
-to enable the sharing of admin queue pair. Further, when each process polls the admin
-queue for completions, it will only see completions for commands that it originated.
-
 ## Limitations {#nvme_multi_process_limitations}

 1. Two processes sharing memory may not share any cores in their core mask.