diff --git a/doc/concurrency.md b/doc/concurrency.md index 37a2fa89d..46bed84be 100644 --- a/doc/concurrency.md +++ b/doc/concurrency.md @@ -3,60 +3,58 @@ # Theory One of the primary aims of SPDK is to scale linearly with the addition of -hardware. This can mean a number of things in practice. For instance, moving -from one SSD to two should double the number of I/O's per second. Or doubling -the number of CPU cores should double the amount of computation possible. Or -even doubling the number of NICs should double the network throughput. To -achieve this, the software must be designed such that threads of execution are -independent from one another as much as possible. In practice, that means -avoiding software locks and even atomic instructions. +hardware. This can mean many things in practice. For instance, moving from one +SSD to two should double the number of I/O's per second. Or doubling the number +of CPU cores should double the amount of computation possible. Or even doubling +the number of NICs should double the network throughput. To achieve this, the +software's threads of execution must be independent from one another as much as +possible. In practice, that means avoiding software locks and even atomic +instructions. Traditionally, software achieves concurrency by placing some shared data onto the heap, protecting it with a lock, and then having all threads of execution -acquire the lock only when that shared data needs to be accessed. This model -has a number of great properties: +acquire the lock only when accessing the data. This model has many great +properties: -* It's relatively easy to convert single-threaded programs to multi-threaded -programs because you don't have to change the data model from the -single-threaded version. You just add a lock around the data. +* It's easy to convert single-threaded programs to multi-threaded programs + because you don't have to change the data model from the single-threaded + version. You add a lock around the data. * You can write your program as a synchronous, imperative list of statements that you read from top to bottom. -* Your threads can be interrupted and put to sleep by the operating system -scheduler behind the scenes, allowing for efficient time-sharing of CPU resources. +* The scheduler can interrupt threads, allowing for efficient time-sharing + of CPU resources. -Unfortunately, as the number of threads scales up, contention on the lock -around the shared data does too. More granular locking helps, but then also -greatly increases the complexity of the program. Even then, beyond a certain -number highly contended locks, threads will spend most of their time -attempting to acquire the locks and the program will not benefit from any -additional CPU cores. +Unfortunately, as the number of threads scales up, contention on the lock around +the shared data does too. More granular locking helps, but then also increases +the complexity of the program. Even then, beyond a certain number of contended +locks, threads will spend most of their time attempting to acquire the locks and +the program will not benefit from more CPU cores. SPDK takes a different approach altogether. Instead of placing shared data in a global location that all threads access after acquiring a lock, SPDK will often -assign that data to a single thread. When other threads want to access the -data, they pass a message to the owning thread to perform the operation on -their behalf. This strategy, of course, is not at all new. For instance, it is -one of the core design principles of +assign that data to a single thread. When other threads want to access the data, +they pass a message to the owning thread to perform the operation on their +behalf. This strategy, of course, is not at all new. For instance, it is one of +the core design principles of [Erlang](http://erlang.org/download/armstrong_thesis_2003.pdf) and is the main concurrency mechanism in [Go](https://tour.golang.org/concurrency/2). A message -in SPDK typically consists of a function pointer and a pointer to some context, -and is passed between threads using a +in SPDK consists of a function pointer and a pointer to some context. Messages +are passed between threads using a [lockless ring](http://dpdk.org/doc/guides/prog_guide/ring_lib.html). Message -passing is often much faster than most software developer's intuition leads them to -believe, primarily due to caching effects. If a single core is consistently -accessing the same data (on behalf of all of the other cores), then that data -is far more likely to be in a cache closer to that core. It's often most -efficient to have each core work on a relatively small set of data sitting in -its local cache and then hand off a small message to the next core when done. +passing is often much faster than most software developer's intuition leads them +to believe due to caching effects. If a single core is accessing the same data +(on behalf of all of the other cores), then that data is far more likely to be +in a cache closer to that core. It's often most efficient to have each core work +on a small set of data sitting in its local cache and then hand off a small +message to the next core when done. -In more extreme cases where even message passing may be too costly, a copy of -the data will be made for each thread. The thread will then only reference its -local copy. To mutate the data, threads will send a message to each other -thread telling them to perform the update on their local copy. This is great -when the data isn't mutated very often, but may be read very frequently, and is -often employed in the I/O path. This of course trades memory size for -computational efficiency, so it's use is limited to only the most critical code -paths. +In more extreme cases where even message passing may be too costly, each thread +may make a local copy of the data. The thread will then only reference its local +copy. To mutate the data, threads will send a message to each other thread +telling them to perform the update on their local copy. This is great when the +data isn't mutated very often, but is read very frequently, and is often +employed in the I/O path. This of course trades memory size for computational +efficiency, so it is used in only the most critical code paths. # Message Passing Infrastructure @@ -68,48 +66,65 @@ their documentation (e.g. @ref nvme). Most libraries, however, depend on SPDK's abstraction, located in `libspdk_thread.a`. The thread abstraction provides a basic message passing framework and defines a few key primitives. -First, spdk_thread is an abstraction for a thread of execution and -spdk_poller is an abstraction for a function that should be -periodically called on the given thread. On each system thread that the user -wishes to use with SPDK, they must first call spdk_thread_create(). +First, `spdk_thread` is an abstraction for a lightweight, stackless thread of +execution. A lower level framework can execute an `spdk_thread` for a single +timeslice by calling `spdk_thread_poll()`. A lower level framework is allowed to +move an `spdk_thread` between system threads at any time, as long as there is +only a single system thread executing `spdk_thread_poll()` on that +`spdk_thread` at any given time. New lightweight threads may be created at any +time by calling `spdk_thread_create()` and destroyed by calling +`spdk_thread_destroy()`. The lightweight thread is the foundational abstraction for +threading in SPDK. -The library also defines two other abstractions: spdk_io_device and -spdk_io_channel. In the course of implementing SPDK we noticed the -same pattern emerging in a number of different libraries. In order to -implement a message passing strategy, the code would describe some object with -global state and also some per-thread context associated with that object that -was accessed in the I/O path to avoid locking on the global state. The pattern -was clearest in the lowest layers where I/O was being submitted to block -devices. These devices often expose multiple queues that can be assigned to -threads and then accessed without a lock to submit I/O. To abstract that, we -generalized the device to spdk_io_device and the thread-specific queue to -spdk_io_channel. Over time, however, the pattern has appeared in a huge -number of places that don't fit quite so nicely with the names we originally -chose. In today's code spdk_io_device is any pointer, whose uniqueness is -predicated only on its memory address, and spdk_io_channel is the per-thread -context associated with a particular spdk_io_device. +There are then a few additional abstractions layered on top of the +`spdk_thread`. One is the `spdk_poller`, which is an abstraction for a +function that should be repeatedly called on the given thread. Another is an +`spdk_msg_fn`, which is a function pointer and a context pointer, that can +be sent to a thread for execution via `spdk_thread_send_msg()`. + +The library also defines two additional abstractions: `spdk_io_device` and +`spdk_io_channel`. In the course of implementing SPDK we noticed the same +pattern emerging in a number of different libraries. In order to implement a +message passing strategy, the code would describe some object with global state +and also some per-thread context associated with that object that was accessed +in the I/O path to avoid locking on the global state. The pattern was clearest +in the lowest layers where I/O was being submitted to block devices. These +devices often expose multiple queues that can be assigned to threads and then +accessed without a lock to submit I/O. To abstract that, we generalized the +device to `spdk_io_device` and the thread-specific queue to `spdk_io_channel`. +Over time, however, the pattern has appeared in a huge number of places that +don't fit quite so nicely with the names we originally chose. In today's code +`spdk_io_device` is any pointer, whose uniqueness is predicated only on its +memory address, and `spdk_io_channel` is the per-thread context associated with +a particular `spdk_io_device`. The threading abstraction provides functions to send a message to any other thread, to send a message to all threads one by one, and to send a message to all threads for which there is an io_channel for a given io_device. +Most critically, the thread abstraction does not actually spawn any system level +threads of its own. Instead, it relies on the existence of some lower level +framework that spawns system threads and sets up event loops. Inside those event +loops, the threading abstraction simply requires the lower level framework to +repeatedly call `spdk_thread_poll()` on each `spdk_thread()` that exists. This +makes SPDK very portable to a wide variety of asynchronous, event-based +frameworks such as [Seastar](https://www.seastar.io) or [libuv](https://libuv.org/). + # The event Framework -As the number of example applications in SPDK grew, it became clear that a -large portion of the code in each was implementing the basic message passing -infrastructure required to call spdk_thread_create(). This includes spawning -one thread per core, pinning each thread to a unique core, and allocating -lockless rings between the threads for message passing. Instead of -re-implementing that infrastructure for each example application, SPDK -provides the SPDK @ref event. This library handles setting up all of the -message passing infrastructure, installing signal handlers to cleanly -shutdown, implements periodic pollers, and does basic command line parsing. -When started through spdk_app_start(), the library automatically spawns all of -the threads requested, pins them, and calls spdk_thread_create(). This makes -it much easier to implement a brand new SPDK application and is the recommended -method for those starting out. Only established applications with sufficient -message passing infrastructure should consider directly integrating the lower -level libraries. +The SPDK project didn't want to officially pick an asynchronous, event-based +framework for all of the example applications it shipped with, in the interest +of supporting the widest variety of frameworks possible. But the applications do +of course require something that implements an asynchronous event loop in order +to run, so enter the `event` framework located in `lib/event`. This framework +includes things like spawning one thread per core, pinning each thread to a +unique core, polling and scheduling the lightweight threads, installing signal +handlers to cleanly shutdown, and basic command line option parsing. When +started through spdk_app_start(), the library automatically spawns all of the +threads requested, pins them, and is ready for lightweight threads to be +created. This makes it much easier to implement a brand new SPDK application and +is the recommended method for those starting out. Only established applications +should consider directly integrating the lower level libraries. # Limitations of the C Language