doc: Add an overview of user space drivers
This is designed to address some of the more basic questions we often field. Change-Id: I53ead3044abf8add0c912e0ce9721e995ae7a6e7 Signed-off-by: Ben Walker <benjamin.walker@intel.com> Reviewed-on: https://review.gerrithub.io/392983 Tested-by: SPDK Automated Test System <sys_sgsw@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
This commit is contained in:
parent
afc97ddcf7
commit
495651d1ae
@ -799,6 +799,7 @@ INPUT = ../include/spdk \
|
||||
nvme.md \
|
||||
nvme-cli.md \
|
||||
nvmf.md \
|
||||
userspace.md \
|
||||
vagrant.md \
|
||||
vhost.md \
|
||||
virtio.md
|
||||
|
@ -10,6 +10,7 @@
|
||||
|
||||
# Concepts {#concepts}
|
||||
|
||||
- @ref userspace
|
||||
- @ref memory
|
||||
- @ref porting
|
||||
|
||||
|
97
doc/userspace.md
Normal file
97
doc/userspace.md
Normal file
@ -0,0 +1,97 @@
|
||||
# User Space Drivers {#userspace}
|
||||
|
||||
# Controlling Hardware From User Space {#userspace_control}
|
||||
|
||||
Much of the documentation for SPDK talks about _user space drivers_, so it's
|
||||
important to understand what that means at a technical level. First and
|
||||
foremost, a _driver_ is software that directly controls a particular device
|
||||
attached to a computer. Second, operating systems segregate the system's
|
||||
virtual memory into two categories of addresses based on privilege level -
|
||||
[kernel space and user space](https://en.wikipedia.org/wiki/User_space). This
|
||||
separation is aided by features on the CPU itself that enforce memory
|
||||
separation called
|
||||
[protection rings](https://en.wikipedia.org/wiki/Protection_ring). Typically,
|
||||
drivers run in kernel space (i.e. ring 0 on x86). SPDK contains drivers that
|
||||
instead are designed to run in user space, but they still interface directly
|
||||
with the hardware device that they are controlling.
|
||||
|
||||
In order for SPDK to take control of a device, it must first instruct the
|
||||
operating system to relinquish control. This is often referred to as unbinding
|
||||
the kernel driver from the device and on Linux is done by
|
||||
[writing to a file in sysfs](https://lwn.net/Articles/143397/).
|
||||
SPDK then rebinds the driver to one of two special device drivers that come
|
||||
bundled with Linux -
|
||||
[uio](https://www.kernel.org/doc/html/latest/driver-api/uio-howto.html) or
|
||||
[vfio](https://www.kernel.org/doc/Documentation/vfio.txt). These two drivers
|
||||
are "dummy" drivers in the sense that they mostly indicate to the operating
|
||||
system that the device has a driver bound to it so it won't automatically try
|
||||
to re-bind the default driver. They don't actually initialize the hardware in
|
||||
any way, nor do they even understand what type of device it is. The primary
|
||||
difference between uio and vfio is that vfio is capable of programming the
|
||||
platform's
|
||||
[IOMMU](https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_management_unit),
|
||||
which is a critical piece of hardware for ensuring memory safety in user space
|
||||
drivers. See @ref memory for full details.
|
||||
|
||||
Once the device is unbound from the operating system kernel, the operating
|
||||
system can't use it anymore. For example, if you unbind an NVMe device on
|
||||
Linux, the devices corresponding to it such as /dev/nvme0n1 will disappear. It
|
||||
further means that filesystems mounted on the device will also be removed and
|
||||
kernel filesystems can no longer interact with the device. In fact, the entire
|
||||
kernel block storage stack is no longer involved. Instead, SPDK provides
|
||||
re-imagined implementations of most of the layers in a typical operating
|
||||
system storage stack all as C libraries that can be directly embedded into
|
||||
your application. This includes a @ref bdev primarily, but also block
|
||||
allocators and filesystem-like components such as @ref blob and @ref blobfs.
|
||||
|
||||
User space drivers utilize features in uio or vfio to map the
|
||||
[PCI BAR](https://en.wikipedia.org/wiki/PCI_configuration_space) for the device
|
||||
into the current process, which allows the driver to perform
|
||||
[MMIO](https://en.wikipedia.org/wiki/Memory-mapped_I/O) directly. The SPDK @ref
|
||||
nvme, for instance, maps the BAR for the NVMe device and then follows along
|
||||
with the
|
||||
[NVMe Specification](http://nvmexpress.org/wp-content/uploads/NVM_Express_Revision_1.3.pdf)
|
||||
to initialize the device, create queue pairs, and ultimately send I/O.
|
||||
|
||||
# Interrupts {#userspace_interrupts}
|
||||
|
||||
SPDK polls devices for completions instead of waiting for interrupts. There
|
||||
are a number of reasons for doing this: 1) practically speaking, routing an
|
||||
interrupt to a handler in a user space process just isn't feasible for most
|
||||
hardware designs, 2) interrupts introduce software jitter and have significant
|
||||
overhead due to forced context switches. Operations in SPDK are almost
|
||||
universally asynchronous and allow the user to provide a callback on
|
||||
completion. The callback is called in response to the user calling a function
|
||||
to poll for completions. Polling an NVMe device is fast because only host
|
||||
memory needs to be read (no MMIO) to check a queue pair for a bit flip and
|
||||
technologies such as Intel's
|
||||
[DDIO](https://www.intel.com/content/www/us/en/io/data-direct-i-o-technology.html)
|
||||
will ensure that the host memory being checked is present in the CPU cache
|
||||
after an update by the device.
|
||||
|
||||
# Threading {#userspace_threading}
|
||||
|
||||
NVMe devices expose multiple queues for submitting requests to the hardware.
|
||||
Separate queues can be accessed without coordination, so software can send
|
||||
requests to the device from multiple threads of execution in parallel without
|
||||
locks. Unfortunately, kernel drivers must be designed to handle I/O coming
|
||||
from lots of different places either in the operating system or in various
|
||||
processes on the system, and the thread topology of those processes changes
|
||||
over time. Most kernel drivers elect to map hardware queues to cores (as close
|
||||
to 1:1 as possible), and then when a request is submitted they look up the
|
||||
correct hardware queue for whatever core the current thread happens to be
|
||||
running on. Often, they'll need to either acquire a lock around the queue or
|
||||
temporarily disable interrupts to guard against preemption from threads
|
||||
running on the same core, which can be expensive. This is a large improvement
|
||||
from older hardware interfaces that only had a single queue or no queue at
|
||||
all, but still isn't always optimal.
|
||||
|
||||
A user space driver, on the other hand, is embedded into a single application.
|
||||
This application knows exactly how many threads (or processes) exist
|
||||
because the application created them. Therefore, the SPDK drivers choose to
|
||||
expose the hardware queues directly to the application with the requirement
|
||||
that a hardware queue is only ever accessed from one thread at a time. In
|
||||
practice, applications assign one hardware queue to each thread (as opposed to
|
||||
one hardware queue per core in kernel drivers). This guarantees that the thread
|
||||
can submit requests without having to perform any sort of coordination (i.e.
|
||||
locking) with the other threads in the system.
|
Loading…
Reference in New Issue
Block a user