doc: Add an overview of user space drivers

This is designed to address some of the more basic questions we often field. Change-Id: I53ead3044abf8add0c912e0ce9721e995ae7a6e7 Signed-off-by: Ben Walker <benjamin.walker@intel.com> Reviewed-on: https://review.gerrithub.io/392983 Tested-by: SPDK Automated Test System <sys_sgsw@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
2017-12-26 14:09:22 -07:00 · 2017-12-26 14:09:22 -07:00 · 495651d1ae
commit 495651d1ae
parent afc97ddcf7
3 changed files with 99 additions and 0 deletions
--- a/doc/Doxyfile
+++ b/doc/Doxyfile
@ -799,6 +799,7 @@ INPUT                  = ../include/spdk \
                         nvme.md \
                         nvme-cli.md \
                         nvmf.md \
+                         userspace.md \
                         vagrant.md \
                         vhost.md \
                         virtio.md
--- a/doc/index.md
+++ b/doc/index.md
@ -10,6 +10,7 @@

 # Concepts {#concepts}

+- @ref userspace
 - @ref memory
 - @ref porting

--- a/doc/userspace.md
+++ b/doc/userspace.md
@ -0,0 +1,97 @@
+# User Space Drivers {#userspace}
+
+# Controlling Hardware From User Space {#userspace_control}
+
+Much of the documentation for SPDK talks about _user space drivers_, so it's
+important to understand what that means at a technical level. First and
+foremost, a _driver_ is software that directly controls a particular device
+attached to a computer. Second, operating systems segregate the system's
+virtual memory into two categories of addresses based on privilege level -
+[kernel space and user space](https://en.wikipedia.org/wiki/User_space). This
+separation is aided by features on the CPU itself that enforce memory
+separation called
+[protection rings](https://en.wikipedia.org/wiki/Protection_ring). Typically,
+drivers run in kernel space (i.e. ring 0 on x86). SPDK contains drivers that
+instead are designed to run in user space, but they still interface directly
+with the hardware device that they are controlling.
+
+In order for SPDK to take control of a device, it must first instruct the
+operating system to relinquish control. This is often referred to as unbinding
+the kernel driver from the device and on Linux is done by
+[writing to a file in sysfs](https://lwn.net/Articles/143397/).
+SPDK then rebinds the driver to one of two special device drivers that come
+bundled with Linux -
+[uio](https://www.kernel.org/doc/html/latest/driver-api/uio-howto.html) or
+[vfio](https://www.kernel.org/doc/Documentation/vfio.txt). These two drivers
+are "dummy" drivers in the sense that they mostly indicate to the operating
+system that the device has a driver bound to it so it won't automatically try
+to re-bind the default driver. They don't actually initialize the hardware in
+any way, nor do they even understand what type of device it is. The primary
+difference between uio and vfio is that vfio is capable of programming the
+platform's
+[IOMMU](https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_management_unit),
+which is a critical piece of hardware for ensuring memory safety in user space
+drivers. See @ref memory for full details.
+
+Once the device is unbound from the operating system kernel, the operating
+system can't use it anymore. For example, if you unbind an NVMe device on
+Linux, the devices corresponding to it such as /dev/nvme0n1 will disappear. It
+further means that filesystems mounted on the device will also be removed and
+kernel filesystems can no longer interact with the device. In fact, the entire
+kernel block storage stack is no longer involved. Instead, SPDK provides
+re-imagined implementations of most of the layers in a typical operating
+system storage stack all as C libraries that can be directly embedded into
+your application. This includes a @ref bdev primarily, but also block
+allocators and filesystem-like components such as @ref blob and @ref blobfs.
+
+User space drivers utilize features in uio or vfio to map the
+[PCI BAR](https://en.wikipedia.org/wiki/PCI_configuration_space) for the device
+into the current process, which allows the driver to perform
+[MMIO](https://en.wikipedia.org/wiki/Memory-mapped_I/O) directly. The SPDK @ref
+nvme, for instance, maps the BAR for the NVMe device and then follows along
+with the
+[NVMe Specification](http://nvmexpress.org/wp-content/uploads/NVM_Express_Revision_1.3.pdf)
+to initialize the device, create queue pairs, and ultimately send I/O.
+
+# Interrupts {#userspace_interrupts}
+
+SPDK polls devices for completions instead of waiting for interrupts. There
+are a number of reasons for doing this: 1) practically speaking, routing an
+interrupt to a handler in a user space process just isn't feasible for most
+hardware designs, 2) interrupts introduce software jitter and have significant
+overhead due to forced context switches. Operations in SPDK are almost
+universally asynchronous and allow the user to provide a callback on
+completion. The callback is called in response to the user calling a function
+to poll for completions. Polling an NVMe device is fast because only host
+memory needs to be read (no MMIO) to check a queue pair for a bit flip and
+technologies such as Intel's
+[DDIO](https://www.intel.com/content/www/us/en/io/data-direct-i-o-technology.html)
+will ensure that the host memory being checked is present in the CPU cache
+after an update by the device.
+
+# Threading {#userspace_threading}
+
+NVMe devices expose multiple queues for submitting requests to the hardware.
+Separate queues can be accessed without coordination, so software can send
+requests to the device from multiple threads of execution in parallel without
+locks. Unfortunately, kernel drivers must be designed to handle I/O coming
+from lots of different places either in the operating system or in various
+processes on the system, and the thread topology of those processes changes
+over time. Most kernel drivers elect to map hardware queues to cores (as close
+to 1:1 as possible), and then when a request is submitted they look up the
+correct hardware queue for whatever core the current thread happens to be
+running on. Often, they'll need to either acquire a lock around the queue or
+temporarily disable interrupts to guard against preemption from threads
+running on the same core, which can be expensive. This is a large improvement
+from older hardware interfaces that only had a single queue or no queue at
+all, but still isn't always optimal.
+
+A user space driver, on the other hand, is embedded into a single application.
+This application knows exactly how many threads (or processes) exist
+because the application created them. Therefore, the SPDK drivers choose to
+expose the hardware queues directly to the application with the requirement
+that a hardware queue is only ever accessed from one thread at a time. In
+practice, applications assign one hardware queue to each thread (as opposed to
+one hardware queue per core in kernel drivers). This guarantees that the thread
+can submit requests without having to perform any sort of coordination (i.e.
+locking) with the other threads in the system.