doc: flatten Markdown docs into chapter-per-file
Doxygen interprets each Markdown input file as a separate section (chapter). Concatenate all of the .md files in directories into a single file per section to get a correctly-nested table of contents. In particular, this matters for the navigation in the PDF output. Change-Id: I778849d89da9a308136e43ac6cb630c4c2bbb3a5 Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This commit is contained in:
parent
388ca48336
commit
1a787169b2
30
doc/Doxyfile
30
doc/Doxyfile
@ -784,27 +784,15 @@ INPUT = ../include/spdk \
|
|||||||
index.md \
|
index.md \
|
||||||
directory_structure.md \
|
directory_structure.md \
|
||||||
porting.md \
|
porting.md \
|
||||||
bdev/index.md \
|
blob.md \
|
||||||
bdev/getting_started.md \
|
blobfs.md \
|
||||||
blob/index.md \
|
bdev.md \
|
||||||
blobfs/index.md \
|
event.md \
|
||||||
blobfs/getting_started.md \
|
ioat.md \
|
||||||
event/index.md \
|
iscsi.md \
|
||||||
ioat/index.md \
|
nvme.md \
|
||||||
iscsi/index.md \
|
nvmf.md \
|
||||||
iscsi/getting_started.md \
|
vhost.md
|
||||||
iscsi/hotplug.md \
|
|
||||||
nvme/index.md \
|
|
||||||
nvme/async_completion.md \
|
|
||||||
nvme/fabrics.md \
|
|
||||||
nvme/initialization.md \
|
|
||||||
nvme/hotplug.md \
|
|
||||||
nvme/io_submission.md \
|
|
||||||
nvme/multi_process.md \
|
|
||||||
nvmf/index.md \
|
|
||||||
nvmf/getting_started.md \
|
|
||||||
vhost/index.md \
|
|
||||||
vhost/getting_started.md
|
|
||||||
|
|
||||||
# This tag can be used to specify the character encoding of the source files
|
# This tag can be used to specify the character encoding of the source files
|
||||||
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
|
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
|
||||||
|
@ -1,3 +1,5 @@
|
|||||||
|
# Block Device Abstraction Layer {#bdev}
|
||||||
|
|
||||||
# SPDK bdev Getting Started Guide {#bdev_getting_started}
|
# SPDK bdev Getting Started Guide {#bdev_getting_started}
|
||||||
|
|
||||||
Block storage in SPDK applications is provided by the SPDK bdev layer. SPDK bdev consists of:
|
Block storage in SPDK applications is provided by the SPDK bdev layer. SPDK bdev consists of:
|
@ -1,3 +0,0 @@
|
|||||||
# Block Device Abstraction Layer {#bdev}
|
|
||||||
|
|
||||||
- @ref bdev_getting_started
|
|
@ -1,3 +1,5 @@
|
|||||||
|
# BlobFS (Blobstore Filesystem) {#blobfs}
|
||||||
|
|
||||||
# BlobFS Getting Started Guide {#blobfs_getting_started}
|
# BlobFS Getting Started Guide {#blobfs_getting_started}
|
||||||
|
|
||||||
# RocksDB Integration {#blobfs_rocksdb}
|
# RocksDB Integration {#blobfs_rocksdb}
|
@ -1,3 +0,0 @@
|
|||||||
# BlobFS (Blobstore Filesystem) {#blobfs}
|
|
||||||
|
|
||||||
- @ref blobfs_getting_started
|
|
@ -1,9 +1,11 @@
|
|||||||
|
# iSCSI Target {#iscsi}
|
||||||
|
|
||||||
# Getting Started Guide {#iscsi_getting_started}
|
# Getting Started Guide {#iscsi_getting_started}
|
||||||
|
|
||||||
The Intel(R) Storage Performance Development Kit iSCSI target application is named `iscsi_tgt`.
|
The Intel(R) Storage Performance Development Kit iSCSI target application is named `iscsi_tgt`.
|
||||||
This following section describes how to run iscsi from your cloned package.
|
This following section describes how to run iscsi from your cloned package.
|
||||||
|
|
||||||
# Prerequisites {#iscsi_prereqs}
|
## Prerequisites {#iscsi_prereqs}
|
||||||
|
|
||||||
This guide starts by assuming that you can already build the standard SPDK distribution on your
|
This guide starts by assuming that you can already build the standard SPDK distribution on your
|
||||||
platform. The SPDK iSCSI target has been known to work on several Linux distributions, namely
|
platform. The SPDK iSCSI target has been known to work on several Linux distributions, namely
|
||||||
@ -11,7 +13,7 @@ Ubuntu 14.04, 15.04, and 15.10, Fedora 21, 22, and 23, and CentOS 7.
|
|||||||
|
|
||||||
Once built, the binary will be in `app/iscsi_tgt`.
|
Once built, the binary will be in `app/iscsi_tgt`.
|
||||||
|
|
||||||
# Configuring iSCSI Target {#iscsi_config}
|
## Configuring iSCSI Target {#iscsi_config}
|
||||||
|
|
||||||
A `iscsi_tgt` specific configuration file is used to configure the iSCSI target. A fully documented
|
A `iscsi_tgt` specific configuration file is used to configure the iSCSI target. A fully documented
|
||||||
example configuration file is located at `etc/spdk/iscsi.conf.in`.
|
example configuration file is located at `etc/spdk/iscsi.conf.in`.
|
||||||
@ -43,7 +45,7 @@ the target requires elevated privileges (root) to run.
|
|||||||
app/iscsi_tgt/iscsi_tgt -c /path/to/iscsi.conf
|
app/iscsi_tgt/iscsi_tgt -c /path/to/iscsi.conf
|
||||||
~~~
|
~~~
|
||||||
|
|
||||||
# Configuring iSCSI Initiator {#iscsi_initiator}
|
## Configuring iSCSI Initiator {#iscsi_initiator}
|
||||||
|
|
||||||
The Linux initiator is open-iscsi.
|
The Linux initiator is open-iscsi.
|
||||||
|
|
||||||
@ -58,7 +60,7 @@ Ubuntu:
|
|||||||
apt-get install -y open-iscsi
|
apt-get install -y open-iscsi
|
||||||
~~~
|
~~~
|
||||||
|
|
||||||
## Setup
|
### Setup
|
||||||
|
|
||||||
Edit /etc/iscsi/iscsid.conf
|
Edit /etc/iscsi/iscsid.conf
|
||||||
~~~
|
~~~
|
||||||
@ -146,3 +148,26 @@ Increase requests for block queue
|
|||||||
~~~
|
~~~
|
||||||
echo "1024" > /sys/block/sdc/queue/nr_requests
|
echo "1024" > /sys/block/sdc/queue/nr_requests
|
||||||
~~~
|
~~~
|
||||||
|
|
||||||
|
|
||||||
|
# iSCSI Hotplug {#iscsi_hotplug}
|
||||||
|
|
||||||
|
At the iSCSI level, we provide the following support for Hotplug:
|
||||||
|
|
||||||
|
1. bdev/nvme:
|
||||||
|
At the bdev/nvme level, we start one hotplug monitor which will call
|
||||||
|
spdk_nvme_probe() periodically to get the hotplug events. We provide the
|
||||||
|
private attach_cb and remove_cb for spdk_nvme_probe(). For the attach_cb,
|
||||||
|
we will create the block device base on the NVMe device attached, and for the
|
||||||
|
remove_cb, we will unregister the block device, which will also notify the
|
||||||
|
upper level stack (for iSCSI target, the upper level stack is scsi/lun) to
|
||||||
|
handle the hot-remove event.
|
||||||
|
|
||||||
|
2. scsi/lun:
|
||||||
|
When the LUN receive the hot-remove notification from block device layer,
|
||||||
|
the LUN will be marked as removed, and all the IOs after this point will
|
||||||
|
return with check condition status. Then the LUN starts one poller which will
|
||||||
|
wait for all the commands which have already been submitted to block device to
|
||||||
|
return back; after all the commands return back, the LUN will be deleted.
|
||||||
|
|
||||||
|
@sa spdk_nvme_probe
|
@ -1,21 +0,0 @@
|
|||||||
# iSCSI Hotplug {#iscsi_hotplug}
|
|
||||||
|
|
||||||
At the iSCSI level, we provide the following support for Hotplug:
|
|
||||||
|
|
||||||
1. bdev/nvme:
|
|
||||||
At the bdev/nvme level, we start one hotplug monitor which will call
|
|
||||||
spdk_nvme_probe() periodically to get the hotplug events. We provide the
|
|
||||||
private attach_cb and remove_cb for spdk_nvme_probe(). For the attach_cb,
|
|
||||||
we will create the block device base on the NVMe device attached, and for the
|
|
||||||
remove_cb, we will unregister the block device, which will also notify the
|
|
||||||
upper level stack (for iSCSI target, the upper level stack is scsi/lun) to
|
|
||||||
handle the hot-remove event.
|
|
||||||
|
|
||||||
2. scsi/lun:
|
|
||||||
When the LUN receive the hot-remove notification from block device layer,
|
|
||||||
the LUN will be marked as removed, and all the IOs after this point will
|
|
||||||
return with check condition status. Then the LUN starts one poller which will
|
|
||||||
wait for all the commands which have already been submitted to block device to
|
|
||||||
return back; after all the commands return back, the LUN will be deleted.
|
|
||||||
|
|
||||||
@sa spdk_nvme_probe
|
|
@ -1,4 +0,0 @@
|
|||||||
# iSCSI Target {#iscsi}
|
|
||||||
|
|
||||||
- @ref iscsi_getting_started
|
|
||||||
- @ref iscsi_hotplug
|
|
191
doc/nvme.md
Normal file
191
doc/nvme.md
Normal file
@ -0,0 +1,191 @@
|
|||||||
|
# NVMe Driver {#nvme}
|
||||||
|
|
||||||
|
# Public Interface {#nvme_interface}
|
||||||
|
|
||||||
|
- spdk/nvme.h
|
||||||
|
|
||||||
|
# Key Functions {#nvme_key_functions}
|
||||||
|
|
||||||
|
Function | Description
|
||||||
|
------------------------------------------- | -----------
|
||||||
|
spdk_nvme_probe() | @copybrief spdk_nvme_probe()
|
||||||
|
spdk_nvme_ns_cmd_read() | @copybrief spdk_nvme_ns_cmd_read()
|
||||||
|
spdk_nvme_ns_cmd_write() | @copybrief spdk_nvme_ns_cmd_write()
|
||||||
|
spdk_nvme_ns_cmd_dataset_management() | @copybrief spdk_nvme_ns_cmd_dataset_management()
|
||||||
|
spdk_nvme_ns_cmd_flush() | @copybrief spdk_nvme_ns_cmd_flush()
|
||||||
|
spdk_nvme_qpair_process_completions() | @copybrief spdk_nvme_qpair_process_completions()
|
||||||
|
spdk_nvme_ctrlr_cmd_admin_raw() | @copybrief spdk_nvme_ctrlr_cmd_admin_raw()
|
||||||
|
spdk_nvme_ctrlr_process_admin_completions() | @copybrief spdk_nvme_ctrlr_process_admin_completions()
|
||||||
|
|
||||||
|
|
||||||
|
# NVMe Initialization {#nvme_initialization}
|
||||||
|
|
||||||
|
\msc
|
||||||
|
|
||||||
|
app [label="Application"], nvme [label="NVMe Driver"];
|
||||||
|
app=>nvme [label="nvme_probe()"];
|
||||||
|
app<<nvme [label="probe_cb(pci_dev)"];
|
||||||
|
nvme=>nvme [label="nvme_attach(devhandle)"];
|
||||||
|
nvme=>nvme [label="nvme_ctrlr_start(nvme_controller ptr)"];
|
||||||
|
nvme=>nvme [label="identify controller"];
|
||||||
|
nvme=>nvme [label="create queue pairs"];
|
||||||
|
nvme=>nvme [label="identify namespace(s)"];
|
||||||
|
app<<nvme [label="attach_cb(pci_dev, nvme_controller)"];
|
||||||
|
app=>app [label="create block devices based on controller's namespaces"];
|
||||||
|
|
||||||
|
\endmsc
|
||||||
|
|
||||||
|
|
||||||
|
# NVMe I/O Submission {#nvme_io_submission}
|
||||||
|
|
||||||
|
I/O is submitted to an NVMe namespace using nvme_ns_cmd_xxx functions
|
||||||
|
defined in nvme_ns_cmd.c. The NVMe driver submits the I/O request
|
||||||
|
as an NVMe submission queue entry on the queue pair specified in the command.
|
||||||
|
The application must poll for I/O completion on each queue pair with outstanding I/O
|
||||||
|
to receive completion callbacks.
|
||||||
|
|
||||||
|
@sa spdk_nvme_ns_cmd_read, spdk_nvme_ns_cmd_write, spdk_nvme_ns_cmd_dataset_management,
|
||||||
|
spdk_nvme_ns_cmd_flush, spdk_nvme_qpair_process_completions
|
||||||
|
|
||||||
|
|
||||||
|
# NVMe Asynchronous Completion {#nvme_async_completion}
|
||||||
|
|
||||||
|
The userspace NVMe driver follows an asynchronous polled model for
|
||||||
|
I/O completion.
|
||||||
|
|
||||||
|
## I/O commands {#nvme_async_io}
|
||||||
|
|
||||||
|
The application may submit I/O from one or more threads on one or more queue pairs
|
||||||
|
and must call spdk_nvme_qpair_process_completions()
|
||||||
|
for each queue pair that submitted I/O.
|
||||||
|
|
||||||
|
When the application calls spdk_nvme_qpair_process_completions(),
|
||||||
|
if the NVMe driver detects completed I/Os that were submitted on that queue,
|
||||||
|
it will invoke the registered callback function
|
||||||
|
for each I/O within the context of spdk_nvme_qpair_process_completions().
|
||||||
|
|
||||||
|
## Admin commands {#nvme_async_admin}
|
||||||
|
|
||||||
|
The application may submit admin commands from one or more threads
|
||||||
|
and must call spdk_nvme_ctrlr_process_admin_completions()
|
||||||
|
from at least one thread to receive admin command completions.
|
||||||
|
The thread that processes admin completions need not be the same thread that submitted the
|
||||||
|
admin commands.
|
||||||
|
|
||||||
|
When the application calls spdk_nvme_ctrlr_process_admin_completions(),
|
||||||
|
if the NVMe driver detects completed admin commands submitted from any thread,
|
||||||
|
it will invote the registered callback function
|
||||||
|
for each command within the context of spdk_nvme_ctrlr_process_admin_completions().
|
||||||
|
|
||||||
|
It is the application's responsibility to manage the order of submitted admin commands.
|
||||||
|
If certain admin commands must be submitted while no other commands are outstanding,
|
||||||
|
it is the application's responsibility to enforce this rule
|
||||||
|
using its own synchronization method.
|
||||||
|
|
||||||
|
|
||||||
|
# NVMe over Fabrics Host Support {#nvme_fabrics_host}
|
||||||
|
|
||||||
|
The NVMe driver supports connecting to remote NVMe-oF targets and
|
||||||
|
interacting with them in the same manner as local NVMe controllers.
|
||||||
|
|
||||||
|
## Specifying Remote NVMe over Fabrics Targets {#nvme_fabrics_trid}
|
||||||
|
|
||||||
|
The method for connecting to a remote NVMe-oF target is very similar
|
||||||
|
to the normal enumeration process for local PCIe-attached NVMe devices.
|
||||||
|
To connect to a remote NVMe over Fabrics subsystem, the user may call
|
||||||
|
spdk_nvme_probe() with the `trid` parameter specifying the address of
|
||||||
|
the NVMe-oF target.
|
||||||
|
The caller may fill out the spdk_nvme_transport_id structure manually
|
||||||
|
or use the spdk_nvme_transport_id_parse() function to convert a
|
||||||
|
human-readable string representation into the required structure.
|
||||||
|
|
||||||
|
The spdk_nvme_transport_id may contain the address of a discovery service
|
||||||
|
or a single NVM subsystem. If a discovery service address is specified,
|
||||||
|
the NVMe library will call the spdk_nvme_probe() `probe_cb` for each
|
||||||
|
discovered NVM subsystem, which allows the user to select the desired
|
||||||
|
subsystems to be attached. Alternatively, if the address specifies a
|
||||||
|
single NVM subsystem directly, the NVMe library will call `probe_cb`
|
||||||
|
for just that subsystem; this allows the user to skip the discovery step
|
||||||
|
and connect directly to a subsystem with a known address.
|
||||||
|
|
||||||
|
|
||||||
|
# NVMe Multi Process {#nvme_multi_process}
|
||||||
|
|
||||||
|
This capability enables the SPDK NVMe driver to support multiple processes accessing the
|
||||||
|
same NVMe device. The NVMe driver allocates critical structures from shared memory, so
|
||||||
|
that each process can map that memory and create its own queue pairs or share the admin
|
||||||
|
queue. There is a limited number of I/O queue pairs per NVMe controller.
|
||||||
|
|
||||||
|
The primary motivation for this feature is to support management tools that can attach
|
||||||
|
to long running applications, perform some maintenance work or gather information, and
|
||||||
|
then detach.
|
||||||
|
|
||||||
|
## Configuration {#nvme_multi_process_configuration}
|
||||||
|
|
||||||
|
DPDK EAL allows different types of processes to be spawned, each with different permissions
|
||||||
|
on the hugepage memory used by the applications.
|
||||||
|
|
||||||
|
There are two types of processes:
|
||||||
|
1. a primary process which initializes the shared memory and has full privileges and
|
||||||
|
2. a secondary process which can attach to the primary process by mapping its shared memory
|
||||||
|
regions and perform NVMe operations including creating queue pairs.
|
||||||
|
|
||||||
|
This feature is enabled by default and is controlled by selecting a value for the shared
|
||||||
|
memory group ID. This ID is a positive integer and two applications with the same shared
|
||||||
|
memory group ID will share memory. The first application with a given shared memory group
|
||||||
|
ID will be considered the primary and all others secondary.
|
||||||
|
|
||||||
|
Example: identical shm_id and non-overlapping core masks
|
||||||
|
~~~{.sh}
|
||||||
|
./perf options [AIO device(s)]...
|
||||||
|
[-c core mask for I/O submission/completion]
|
||||||
|
[-i shared memory group ID]
|
||||||
|
|
||||||
|
./perf -q 1 -s 4096 -w randread -c 0x1 -t 60 -i 1
|
||||||
|
./perf -q 8 -s 131072 -w write -c 0x10 -t 60 -i 1
|
||||||
|
~~~
|
||||||
|
|
||||||
|
## Scalability and Performance {#nvme_multi_process_scalability_performance}
|
||||||
|
|
||||||
|
To maximize the I/O bandwidth of an NVMe device, ensure that each application has its own
|
||||||
|
queue pairs.
|
||||||
|
|
||||||
|
The optimal threading model for SPDK is one thread per core, regardless of which processes
|
||||||
|
that thread belongs to in the case of multi-process environment. To achieve maximum
|
||||||
|
performance, each thread should also have its own I/O queue pair. Applications that share
|
||||||
|
memory should be given core masks that do not overlap.
|
||||||
|
|
||||||
|
However, admin commands may have some performance impact as there is only one admin queue
|
||||||
|
pair per NVMe SSD. The NVMe driver will automatically take a cross-process capable lock
|
||||||
|
to enable the sharing of admin queue pair. Further, when each process polls the admin
|
||||||
|
queue for completions, it will only see completions for commands that it originated.
|
||||||
|
|
||||||
|
## Limitations {#nvme_multi_process_limitations}
|
||||||
|
|
||||||
|
1. Two processes sharing memory may not share any cores in their core mask.
|
||||||
|
2. If a primary process exits while secondary processes are still running, those processes
|
||||||
|
will continue to run. However, a new primary process cannot be created.
|
||||||
|
3. Applications are responsible for coordinating access to logical blocks.
|
||||||
|
|
||||||
|
@sa spdk_nvme_probe, spdk_nvme_ctrlr_process_admin_completions
|
||||||
|
|
||||||
|
|
||||||
|
# NVMe Hotplug {#nvme_hotplug}
|
||||||
|
|
||||||
|
At the NVMe driver level, we provide the following support for Hotplug:
|
||||||
|
|
||||||
|
1. Hotplug events detection:
|
||||||
|
The user of the NVMe library can call spdk_nvme_probe() periodically to detect
|
||||||
|
hotplug events. The probe_cb, followed by the attach_cb, will be called for each
|
||||||
|
new device detected. The user may optionally also provide a remove_cb that will be
|
||||||
|
called if a previously attached NVMe device is no longer present on the system.
|
||||||
|
All subsequent I/O to the removed device will return an error.
|
||||||
|
|
||||||
|
2. Hot remove NVMe with IO loads:
|
||||||
|
When a device is hot removed while I/O is occurring, all access to the PCI BAR will
|
||||||
|
result in a SIGBUS error. The NVMe driver automatically handles this case by installing
|
||||||
|
a SIGBUS handler and remapping the PCI BAR to a new, placeholder memory location.
|
||||||
|
This means I/O in flight during a hot remove will complete with an appropriate error
|
||||||
|
code and will not crash the application.
|
||||||
|
|
||||||
|
@sa spdk_nvme_probe
|
@ -1,33 +0,0 @@
|
|||||||
# NVMe Asynchronous Completion {#nvme_async_completion}
|
|
||||||
|
|
||||||
The userspace NVMe driver follows an asynchronous polled model for
|
|
||||||
I/O completion.
|
|
||||||
|
|
||||||
# I/O commands {#nvme_async_io}
|
|
||||||
|
|
||||||
The application may submit I/O from one or more threads on one or more queue pairs
|
|
||||||
and must call spdk_nvme_qpair_process_completions()
|
|
||||||
for each queue pair that submitted I/O.
|
|
||||||
|
|
||||||
When the application calls spdk_nvme_qpair_process_completions(),
|
|
||||||
if the NVMe driver detects completed I/Os that were submitted on that queue,
|
|
||||||
it will invoke the registered callback function
|
|
||||||
for each I/O within the context of spdk_nvme_qpair_process_completions().
|
|
||||||
|
|
||||||
# Admin commands {#nvme_async_admin}
|
|
||||||
|
|
||||||
The application may submit admin commands from one or more threads
|
|
||||||
and must call spdk_nvme_ctrlr_process_admin_completions()
|
|
||||||
from at least one thread to receive admin command completions.
|
|
||||||
The thread that processes admin completions need not be the same thread that submitted the
|
|
||||||
admin commands.
|
|
||||||
|
|
||||||
When the application calls spdk_nvme_ctrlr_process_admin_completions(),
|
|
||||||
if the NVMe driver detects completed admin commands submitted from any thread,
|
|
||||||
it will invote the registered callback function
|
|
||||||
for each command within the context of spdk_nvme_ctrlr_process_admin_completions().
|
|
||||||
|
|
||||||
It is the application's responsibility to manage the order of submitted admin commands.
|
|
||||||
If certain admin commands must be submitted while no other commands are outstanding,
|
|
||||||
it is the application's responsibility to enforce this rule
|
|
||||||
using its own synchronization method.
|
|
@ -1,24 +0,0 @@
|
|||||||
# NVMe over Fabrics Host Support {#nvme_fabrics_host}
|
|
||||||
|
|
||||||
The NVMe driver supports connecting to remote NVMe-oF targets and
|
|
||||||
interacting with them in the same manner as local NVMe controllers.
|
|
||||||
|
|
||||||
# Specifying Remote NVMe over Fabrics Targets {#nvme_fabrics_trid}
|
|
||||||
|
|
||||||
The method for connecting to a remote NVMe-oF target is very similar
|
|
||||||
to the normal enumeration process for local PCIe-attached NVMe devices.
|
|
||||||
To connect to a remote NVMe over Fabrics subsystem, the user may call
|
|
||||||
spdk_nvme_probe() with the `trid` parameter specifying the address of
|
|
||||||
the NVMe-oF target.
|
|
||||||
The caller may fill out the spdk_nvme_transport_id structure manually
|
|
||||||
or use the spdk_nvme_transport_id_parse() function to convert a
|
|
||||||
human-readable string representation into the required structure.
|
|
||||||
|
|
||||||
The spdk_nvme_transport_id may contain the address of a discovery service
|
|
||||||
or a single NVM subsystem. If a discovery service address is specified,
|
|
||||||
the NVMe library will call the spdk_nvme_probe() `probe_cb` for each
|
|
||||||
discovered NVM subsystem, which allows the user to select the desired
|
|
||||||
subsystems to be attached. Alternatively, if the address specifies a
|
|
||||||
single NVM subsystem directly, the NVMe library will call `probe_cb`
|
|
||||||
for just that subsystem; this allows the user to skip the discovery step
|
|
||||||
and connect directly to a subsystem with a known address.
|
|
@ -1,19 +0,0 @@
|
|||||||
# NVMe Hotplug {#nvme_hotplug}
|
|
||||||
|
|
||||||
At the NVMe driver level, we provide the following support for Hotplug:
|
|
||||||
|
|
||||||
1. Hotplug events detection:
|
|
||||||
The user of the NVMe library can call spdk_nvme_probe() periodically to detect
|
|
||||||
hotplug events. The probe_cb, followed by the attach_cb, will be called for each
|
|
||||||
new device detected. The user may optionally also provide a remove_cb that will be
|
|
||||||
called if a previously attached NVMe device is no longer present on the system.
|
|
||||||
All subsequent I/O to the removed device will return an error.
|
|
||||||
|
|
||||||
2. Hot remove NVMe with IO loads:
|
|
||||||
When a device is hot removed while I/O is occurring, all access to the PCI BAR will
|
|
||||||
result in a SIGBUS error. The NVMe driver automatically handles this case by installing
|
|
||||||
a SIGBUS handler and remapping the PCI BAR to a new, placeholder memory location.
|
|
||||||
This means I/O in flight during a hot remove will complete with an appropriate error
|
|
||||||
code and will not crash the application.
|
|
||||||
|
|
||||||
@sa spdk_nvme_probe
|
|
@ -1,27 +0,0 @@
|
|||||||
# NVMe Driver {#nvme}
|
|
||||||
|
|
||||||
# Public Interface {#nvme_interface}
|
|
||||||
|
|
||||||
- spdk/nvme.h
|
|
||||||
|
|
||||||
# Key Functions {#nvme_key_functions}
|
|
||||||
|
|
||||||
Function | Description
|
|
||||||
------------------------------------------- | -----------
|
|
||||||
spdk_nvme_probe() | @copybrief spdk_nvme_probe()
|
|
||||||
spdk_nvme_ns_cmd_read() | @copybrief spdk_nvme_ns_cmd_read()
|
|
||||||
spdk_nvme_ns_cmd_write() | @copybrief spdk_nvme_ns_cmd_write()
|
|
||||||
spdk_nvme_ns_cmd_dataset_management() | @copybrief spdk_nvme_ns_cmd_dataset_management()
|
|
||||||
spdk_nvme_ns_cmd_flush() | @copybrief spdk_nvme_ns_cmd_flush()
|
|
||||||
spdk_nvme_qpair_process_completions() | @copybrief spdk_nvme_qpair_process_completions()
|
|
||||||
spdk_nvme_ctrlr_cmd_admin_raw() | @copybrief spdk_nvme_ctrlr_cmd_admin_raw()
|
|
||||||
spdk_nvme_ctrlr_process_admin_completions() | @copybrief spdk_nvme_ctrlr_process_admin_completions()
|
|
||||||
|
|
||||||
# Key Concepts {#nvme_key_concepts}
|
|
||||||
|
|
||||||
- @ref nvme_initialization
|
|
||||||
- @ref nvme_io_submission
|
|
||||||
- @ref nvme_async_completion
|
|
||||||
- @ref nvme_fabrics_host
|
|
||||||
- @ref nvme_multi_process
|
|
||||||
- @ref nvme_hotplug
|
|
@ -1,16 +0,0 @@
|
|||||||
# NVMe Initialization {#nvme_initialization}
|
|
||||||
|
|
||||||
\msc
|
|
||||||
|
|
||||||
app [label="Application"], nvme [label="NVMe Driver"];
|
|
||||||
app=>nvme [label="nvme_probe()"];
|
|
||||||
app<<nvme [label="probe_cb(pci_dev)"];
|
|
||||||
nvme=>nvme [label="nvme_attach(devhandle)"];
|
|
||||||
nvme=>nvme [label="nvme_ctrlr_start(nvme_controller ptr)"];
|
|
||||||
nvme=>nvme [label="identify controller"];
|
|
||||||
nvme=>nvme [label="create queue pairs"];
|
|
||||||
nvme=>nvme [label="identify namespace(s)"];
|
|
||||||
app<<nvme [label="attach_cb(pci_dev, nvme_controller)"];
|
|
||||||
app=>app [label="create block devices based on controller's namespaces"];
|
|
||||||
|
|
||||||
\endmsc
|
|
@ -1,10 +0,0 @@
|
|||||||
# NVMe I/O Submission {#nvme_io_submission}
|
|
||||||
|
|
||||||
I/O is submitted to an NVMe namespace using nvme_ns_cmd_xxx functions
|
|
||||||
defined in nvme_ns_cmd.c. The NVMe driver submits the I/O request
|
|
||||||
as an NVMe submission queue entry on the queue pair specified in the command.
|
|
||||||
The application must poll for I/O completion on each queue pair with outstanding I/O
|
|
||||||
to receive completion callbacks.
|
|
||||||
|
|
||||||
@sa spdk_nvme_ns_cmd_read, spdk_nvme_ns_cmd_write, spdk_nvme_ns_cmd_dataset_management,
|
|
||||||
spdk_nvme_ns_cmd_flush, spdk_nvme_qpair_process_completions
|
|
@ -1,59 +0,0 @@
|
|||||||
# NVMe Multi Process {#nvme_multi_process}
|
|
||||||
|
|
||||||
This capability enables the SPDK NVMe driver to support multiple processes accessing the
|
|
||||||
same NVMe device. The NVMe driver allocates critical structures from shared memory, so
|
|
||||||
that each process can map that memory and create its own queue pairs or share the admin
|
|
||||||
queue. There is a limited number of I/O queue pairs per NVMe controller.
|
|
||||||
|
|
||||||
The primary motivation for this feature is to support management tools that can attach
|
|
||||||
to long running applications, perform some maintenance work or gather information, and
|
|
||||||
then detach.
|
|
||||||
|
|
||||||
# Configuration {#nvme_multi_process_configuration}
|
|
||||||
|
|
||||||
DPDK EAL allows different types of processes to be spawned, each with different permissions
|
|
||||||
on the hugepage memory used by the applications.
|
|
||||||
|
|
||||||
There are two types of processes:
|
|
||||||
1. a primary process which initializes the shared memory and has full privileges and
|
|
||||||
2. a secondary process which can attach to the primary process by mapping its shared memory
|
|
||||||
regions and perform NVMe operations including creating queue pairs.
|
|
||||||
|
|
||||||
This feature is enabled by default and is controlled by selecting a value for the shared
|
|
||||||
memory group ID. This ID is a positive integer and two applications with the same shared
|
|
||||||
memory group ID will share memory. The first application with a given shared memory group
|
|
||||||
ID will be considered the primary and all others secondary.
|
|
||||||
|
|
||||||
Example: identical shm_id and non-overlapping core masks
|
|
||||||
~~~{.sh}
|
|
||||||
./perf options [AIO device(s)]...
|
|
||||||
[-c core mask for I/O submission/completion]
|
|
||||||
[-i shared memory group ID]
|
|
||||||
|
|
||||||
./perf -q 1 -s 4096 -w randread -c 0x1 -t 60 -i 1
|
|
||||||
./perf -q 8 -s 131072 -w write -c 0x10 -t 60 -i 1
|
|
||||||
~~~
|
|
||||||
|
|
||||||
# Scalability and Performance {#nvme_multi_process_scalability_performance}
|
|
||||||
|
|
||||||
To maximize the I/O bandwidth of an NVMe device, ensure that each application has its own
|
|
||||||
queue pairs.
|
|
||||||
|
|
||||||
The optimal threading model for SPDK is one thread per core, regardless of which processes
|
|
||||||
that thread belongs to in the case of multi-process environment. To achieve maximum
|
|
||||||
performance, each thread should also have its own I/O queue pair. Applications that share
|
|
||||||
memory should be given core masks that do not overlap.
|
|
||||||
|
|
||||||
However, admin commands may have some performance impact as there is only one admin queue
|
|
||||||
pair per NVMe SSD. The NVMe driver will automatically take a cross-process capable lock
|
|
||||||
to enable the sharing of admin queue pair. Further, when each process polls the admin
|
|
||||||
queue for completions, it will only see completions for commands that it originated.
|
|
||||||
|
|
||||||
# Limitations {#nvme_multi_process_limitations}
|
|
||||||
|
|
||||||
1. Two processes sharing memory may not share any cores in their core mask.
|
|
||||||
2. If a primary process exits while secondary processes are still running, those processes
|
|
||||||
will continue to run. However, a new primary process cannot be created.
|
|
||||||
3. Applications are responsible for coordinating access to logical blocks.
|
|
||||||
|
|
||||||
@sa spdk_nvme_probe, spdk_nvme_ctrlr_process_admin_completions
|
|
@ -1,3 +1,8 @@
|
|||||||
|
# NVMe over Fabrics Target {#nvmf}
|
||||||
|
|
||||||
|
@sa @ref nvme_fabrics_host
|
||||||
|
|
||||||
|
|
||||||
# Getting Started Guide {#nvmf_getting_started}
|
# Getting Started Guide {#nvmf_getting_started}
|
||||||
|
|
||||||
The NVMe over Fabrics target is a user space application that presents block devices over the
|
The NVMe over Fabrics target is a user space application that presents block devices over the
|
||||||
@ -18,7 +23,7 @@ machine, the kernel will need to be a release candidate until the code is actual
|
|||||||
system running the SPDK target, however, you can run any modern flavor of Linux as required by your
|
system running the SPDK target, however, you can run any modern flavor of Linux as required by your
|
||||||
NIC vendor's OFED distribution.
|
NIC vendor's OFED distribution.
|
||||||
|
|
||||||
# Prerequisites {#nvmf_prereqs}
|
## Prerequisites {#nvmf_prereqs}
|
||||||
|
|
||||||
This guide starts by assuming that you can already build the standard SPDK distribution on your
|
This guide starts by assuming that you can already build the standard SPDK distribution on your
|
||||||
platform. By default, the NVMe over Fabrics target is not built. To build nvmf_tgt there are some
|
platform. By default, the NVMe over Fabrics target is not built. To build nvmf_tgt there are some
|
||||||
@ -43,7 +48,7 @@ make CONFIG_RDMA=y <other config parameters>
|
|||||||
|
|
||||||
Once built, the binary will be in `app/nvmf_tgt`.
|
Once built, the binary will be in `app/nvmf_tgt`.
|
||||||
|
|
||||||
# Prerequisites for InfiniBand/RDMA Verbs {#nvmf_prereqs_verbs}
|
## Prerequisites for InfiniBand/RDMA Verbs {#nvmf_prereqs_verbs}
|
||||||
|
|
||||||
Before starting our NVMe-oF target we must load the InfiniBand and RDMA modules that allow
|
Before starting our NVMe-oF target we must load the InfiniBand and RDMA modules that allow
|
||||||
userspace processes to use InfiniBand/RDMA verbs directly.
|
userspace processes to use InfiniBand/RDMA verbs directly.
|
||||||
@ -59,12 +64,10 @@ modprobe rdma_cm
|
|||||||
modprobe rdma_ucm
|
modprobe rdma_ucm
|
||||||
~~~
|
~~~
|
||||||
|
|
||||||
# Prerequisites for RDMA NICs {#nvmf_prereqs_rdma_nics}
|
## Prerequisites for RDMA NICs {#nvmf_prereqs_rdma_nics}
|
||||||
|
|
||||||
Before starting our NVMe-oF target we must detect RDMA NICs and assign them IP addresses.
|
Before starting our NVMe-oF target we must detect RDMA NICs and assign them IP addresses.
|
||||||
|
|
||||||
## Detecting Mellannox RDMA NICs
|
|
||||||
|
|
||||||
### Mellanox ConnectX-3 RDMA NICs
|
### Mellanox ConnectX-3 RDMA NICs
|
||||||
|
|
||||||
~~~{.sh}
|
~~~{.sh}
|
||||||
@ -80,7 +83,7 @@ modprobe mlx5_core
|
|||||||
modprobe mlx5_ib
|
modprobe mlx5_ib
|
||||||
~~~
|
~~~
|
||||||
|
|
||||||
## Assigning IP addresses to RDMA NICs
|
### Assigning IP addresses to RDMA NICs
|
||||||
|
|
||||||
~~~{.sh}
|
~~~{.sh}
|
||||||
ifconfig eth1 192.168.100.8 netmask 255.255.255.0 up
|
ifconfig eth1 192.168.100.8 netmask 255.255.255.0 up
|
||||||
@ -88,7 +91,7 @@ ifconfig eth2 192.168.100.9 netmask 255.255.255.0 up
|
|||||||
~~~
|
~~~
|
||||||
|
|
||||||
|
|
||||||
# Configuring NVMe over Fabrics Target {#nvmf_config}
|
## Configuring NVMe over Fabrics Target {#nvmf_config}
|
||||||
|
|
||||||
A `nvmf_tgt`-specific configuration file is used to configure the NVMe over Fabrics target. This
|
A `nvmf_tgt`-specific configuration file is used to configure the NVMe over Fabrics target. This
|
||||||
file's primary purpose is to define subsystems. A fully documented example configuration file is
|
file's primary purpose is to define subsystems. A fully documented example configuration file is
|
||||||
@ -102,7 +105,7 @@ the target requires elevated privileges (root) to run.
|
|||||||
app/nvmf_tgt/nvmf_tgt -c /path/to/nvmf.conf
|
app/nvmf_tgt/nvmf_tgt -c /path/to/nvmf.conf
|
||||||
~~~
|
~~~
|
||||||
|
|
||||||
# Configuring NVMe over Fabrics Host {#nvmf_host}
|
## Configuring NVMe over Fabrics Host {#nvmf_host}
|
||||||
|
|
||||||
Both the Linux kernel and SPDK implemented NVMe over Fabrics host. Users who want to test
|
Both the Linux kernel and SPDK implemented NVMe over Fabrics host. Users who want to test
|
||||||
`nvmf_tgt` with kernel based host should upgrade to Linux kernel 4.8 or later, or can use
|
`nvmf_tgt` with kernel based host should upgrade to Linux kernel 4.8 or later, or can use
|
||||||
@ -125,7 +128,7 @@ Disconnect:
|
|||||||
nvme disconnect -n "nqn.2016-06.io.spdk.cnode1"
|
nvme disconnect -n "nqn.2016-06.io.spdk.cnode1"
|
||||||
~~~
|
~~~
|
||||||
|
|
||||||
# Assigning CPU Cores to the NVMe over Fabrics Target {#nvmf_config_lcore}
|
## Assigning CPU Cores to the NVMe over Fabrics Target {#nvmf_config_lcore}
|
||||||
|
|
||||||
SPDK uses the [DPDK Environment Abstraction Layer](http://dpdk.org/doc/guides/prog_guide/env_abstraction_layer.html)
|
SPDK uses the [DPDK Environment Abstraction Layer](http://dpdk.org/doc/guides/prog_guide/env_abstraction_layer.html)
|
||||||
to gain access to hardware resources such as huge memory pages and CPU core(s). DPDK EAL provides
|
to gain access to hardware resources such as huge memory pages and CPU core(s). DPDK EAL provides
|
||||||
@ -166,7 +169,7 @@ on different threads. SPDK gives the user maximum control to determine how many
|
|||||||
to execute subsystems. Configuring different subsystems to execute on different CPU cores prevents
|
to execute subsystems. Configuring different subsystems to execute on different CPU cores prevents
|
||||||
the subsystem data from being evicted from limited CPU cache space.
|
the subsystem data from being evicted from limited CPU cache space.
|
||||||
|
|
||||||
# Emulating an NVMe controller {#nvmf_config_virtual_controller}
|
## Emulating an NVMe controller {#nvmf_config_virtual_controller}
|
||||||
|
|
||||||
The SPDK NVMe-oF target provides the capability to emulate an NVMe controller using a virtual
|
The SPDK NVMe-oF target provides the capability to emulate an NVMe controller using a virtual
|
||||||
controller. Using virtual controllers allows storage software developers to run the NVMe-oF target
|
controller. Using virtual controllers allows storage software developers to run the NVMe-oF target
|
@ -1,5 +0,0 @@
|
|||||||
# NVMe over Fabrics {#nvmf}
|
|
||||||
|
|
||||||
- @ref nvmf_getting_started
|
|
||||||
|
|
||||||
@sa @ref nvme_fabrics_host
|
|
@ -1,3 +1,5 @@
|
|||||||
|
# vhost {#vhost}
|
||||||
|
|
||||||
# vhost Getting Started Guide {#vhost_getting_started}
|
# vhost Getting Started Guide {#vhost_getting_started}
|
||||||
|
|
||||||
The Storage Performance Development Kit vhost application is named "vhost".
|
The Storage Performance Development Kit vhost application is named "vhost".
|
@ -1,3 +0,0 @@
|
|||||||
# vhost {#vhost}
|
|
||||||
|
|
||||||
- @ref vhost_getting_started
|
|
Loading…
Reference in New Issue
Block a user