doc: added scheduler framework documentation
Added changelog entry for dynamic scheduler, along with general information on scheduler framework and behaviour of particular scheduler implemenations. Change-Id: I9fcef56323c4be136b6b531297b070562981eee5 Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com> Signed-off-by: Ben Walker <benjamin.walker@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6151 Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This commit is contained in:
parent
7741de6b7d
commit
c15af452f6
@ -77,6 +77,10 @@ The `--pci-blacklist` command line option has been deprecated, replaced with
|
|||||||
The `--pci-whitelist/-W` command line options have been deprecated, replaced with
|
The `--pci-whitelist/-W` command line options have been deprecated, replaced with
|
||||||
`--pci-allowed/-A`.
|
`--pci-allowed/-A`.
|
||||||
|
|
||||||
|
Added new experimental `dynamic` scheduler that rebalances idle threads, adjusts CPU frequency
|
||||||
|
using dpdk_governor and turns idle reactor cores to interrupt mode. Please see
|
||||||
|
[scheduler documentation](https://www.spdk.io/doc/scheduler.html) for details.
|
||||||
|
|
||||||
## ioat
|
## ioat
|
||||||
|
|
||||||
The PCI BDF whitelist option has been removed from the `ioat_scan_accel_engine` RPC.
|
The PCI BDF whitelist option has been removed from the `ioat_scan_accel_engine` RPC.
|
||||||
|
@ -835,6 +835,7 @@ INPUT += \
|
|||||||
peer_2_peer.md \
|
peer_2_peer.md \
|
||||||
pkgconfig.md \
|
pkgconfig.md \
|
||||||
porting.md \
|
porting.md \
|
||||||
|
scheduler.md \
|
||||||
shfmt.md \
|
shfmt.md \
|
||||||
spdkcli.md \
|
spdkcli.md \
|
||||||
spdk_top.md \
|
spdk_top.md \
|
||||||
|
@ -1,5 +1,6 @@
|
|||||||
# General Information {#general}
|
# General Information {#general}
|
||||||
|
|
||||||
- @subpage event
|
- @subpage event
|
||||||
|
- @subpage scheduler
|
||||||
- @subpage logical_volumes
|
- @subpage logical_volumes
|
||||||
- @subpage accel_fw
|
- @subpage accel_fw
|
||||||
|
82
doc/scheduler.md
Normal file
82
doc/scheduler.md
Normal file
@ -0,0 +1,82 @@
|
|||||||
|
# Scheduler {#scheduler}
|
||||||
|
|
||||||
|
SPDK's event/application framework (`lib/event`) now supports scheduling of
|
||||||
|
lightweight threads. Schedulers are provided as plugins, called
|
||||||
|
implementations. A default implementation is provided, but users may wish to
|
||||||
|
write their own scheduler to integrate into broader code frameworks or meet
|
||||||
|
their performance needs.
|
||||||
|
|
||||||
|
This feature should be considered experimental and is disabled by default. When
|
||||||
|
enabled, the scheduler framework gathers data for each spdk thread and reactor
|
||||||
|
and passes it to a scheduler implementation to perform one of the following
|
||||||
|
actions.
|
||||||
|
|
||||||
|
## Actions
|
||||||
|
|
||||||
|
### Move a thread
|
||||||
|
|
||||||
|
`spdk_thread`s can be moved to another reactor. Schedulers can examine the
|
||||||
|
suggested cpu_mask value for each lightweight thread to see if the user has
|
||||||
|
requested specific reactors, or choose a reactor using whatever algorithm they
|
||||||
|
deem fit.
|
||||||
|
|
||||||
|
### Switch reactor mode
|
||||||
|
|
||||||
|
Reactors by default run in a mode that constantly polls for new actions for the
|
||||||
|
most efficient processing. Schedulers can switch a reactor into a mode that
|
||||||
|
instead waits for an event on a file descriptor. On Linux, this is implemented
|
||||||
|
using epoll. This results in reduced CPU usage but may be less responsive when
|
||||||
|
events occur. A reactor cannot enter this mode if any `spdk_threads` are
|
||||||
|
currently scheduled to it. This limitation is expected to be lifted in the
|
||||||
|
future, allowing `spdk_threads` to enter interrupt mode.
|
||||||
|
|
||||||
|
### Set frequency of CPU core
|
||||||
|
|
||||||
|
The frequency of CPU cores can be modified by the scheduler in response to
|
||||||
|
load. Only CPU cores that match the application cpu_mask may be modified. The
|
||||||
|
mechanism for controlling CPU frequency is pluggable and the default provided
|
||||||
|
implementation is called `dpdk_governor`, based on the `rte_power` library from
|
||||||
|
DPDK.
|
||||||
|
|
||||||
|
#### Known limitation
|
||||||
|
|
||||||
|
When SMT (Hyperthreading) is enabled the two logical CPU cores sharing a single
|
||||||
|
physical CPU core must run at the same frequency. If one of two of such logical
|
||||||
|
CPU cores is outside the application cpu_mask, the policy and frequency on that
|
||||||
|
core has to be managed by the administrator.
|
||||||
|
|
||||||
|
## Scheduler implementations
|
||||||
|
|
||||||
|
The scheduler in use may be controlled by JSON-RPC. Please use the
|
||||||
|
[framework_set_scheduler](jsonrpc.md/#rpc_framework_set_scheduler) RPC to
|
||||||
|
switch between schedulers or change their options.
|
||||||
|
|
||||||
|
[spdk_top](spdk_top.md#spdk_top) is a useful tool to observe the behavior of
|
||||||
|
schedulers in different scenarios and workloads.
|
||||||
|
|
||||||
|
### static [default]
|
||||||
|
|
||||||
|
The `static` scheduler is the default scheduler and does no dynamic scheduling.
|
||||||
|
Lightweight threads are distributed round-robin among reactors, respecting
|
||||||
|
their requested cpu_mask, and then they are never moved. This is equivalent to
|
||||||
|
the previous behavior of the SPDK event/application framework.
|
||||||
|
|
||||||
|
### dynamic
|
||||||
|
|
||||||
|
The `dynamic` scheduler is designed for power saving and reduction of CPU
|
||||||
|
utilization, especially in cases where workloads show large variations over
|
||||||
|
time.
|
||||||
|
|
||||||
|
Active threads are distributed equally among reactors, taking cpu_mask into
|
||||||
|
account. All idle threads are moved to the main core. Once an idle thread becomes
|
||||||
|
active, it is redistributed again.
|
||||||
|
|
||||||
|
When a reactor has no scheduled `spdk_thread`s it is switched into interrupt
|
||||||
|
mode and stops actively polling. After enough threads become active, the
|
||||||
|
reactor is switched back into poll mode and threads are assigned to it again.
|
||||||
|
|
||||||
|
The main core can contain active threads only when their execution time does
|
||||||
|
not exceed the sum of all idle threads. When no active threads are present on
|
||||||
|
the main core, the frequency of that CPU core will decrease as the load
|
||||||
|
decreases. All CPU cores corresponding to the other reactors remain at maximum
|
||||||
|
frequency.
|
Loading…
Reference in New Issue
Block a user