doc: Fix Markdown MD032 linter warnings
"MD032 Lists should be surrounded by blank lines" Fix this markdown linter error by inserting newlines or adjusting text to list points using spaces. Signed-off-by: Karol Latecki <karol.latecki@intel.com> Change-Id: I09e1f021b8e95e0c6c58c393d7ecc11ce61c3132 Signed-off-by: Karol Latecki <karol.latecki@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/434 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com> Reviewed-by: Maciej Wawryk <maciejx.wawryk@intel.com>
This commit is contained in:
parent
acac4b3813
commit
3d8a0b19b0
12
CHANGELOG.md
12
CHANGELOG.md
@ -227,11 +227,13 @@ Added `spdk_bdev_get_write_unit_size()` function for retrieving required number
|
||||
of logical blocks for write operation.
|
||||
|
||||
New zone-related fields were added to the result of the `get_bdevs` RPC call:
|
||||
|
||||
- `zoned`: indicates whether the device is zoned or a regular
|
||||
block device
|
||||
- `zone_size`: number of blocks in a single zone
|
||||
- `max_open_zones`: maximum number of open zones
|
||||
- `optimal_open_zones`: optimal number of open zones
|
||||
|
||||
The `zoned` field is a boolean and is always present, while the rest is only available for zoned
|
||||
bdevs.
|
||||
|
||||
@ -949,6 +951,7 @@ parameter. The function will now update that parameter with the largest possible
|
||||
for which the memory is contiguous in the physical memory address space.
|
||||
|
||||
The following functions were removed:
|
||||
|
||||
- spdk_pci_nvme_device_attach()
|
||||
- spdk_pci_nvme_enumerate()
|
||||
- spdk_pci_ioat_device_attach()
|
||||
@ -958,6 +961,7 @@ The following functions were removed:
|
||||
|
||||
They were replaced with generic spdk_pci_device_attach() and spdk_pci_enumerate() which
|
||||
require a new spdk_pci_driver object to be provided. It can be one of the following:
|
||||
|
||||
- spdk_pci_nvme_get_driver()
|
||||
- spdk_pci_ioat_get_driver()
|
||||
- spdk_pci_virtio_get_driver()
|
||||
@ -1138,6 +1142,7 @@ Dropped support for DPDK 16.07 and earlier, which SPDK won't even compile with r
|
||||
### RPC
|
||||
|
||||
The following RPC commands deprecated in the previous release are now removed:
|
||||
|
||||
- construct_virtio_user_scsi_bdev
|
||||
- construct_virtio_pci_scsi_bdev
|
||||
- construct_virtio_user_blk_bdev
|
||||
@ -1326,6 +1331,7 @@ respectively.
|
||||
### Virtio
|
||||
|
||||
The following RPC commands have been deprecated:
|
||||
|
||||
- construct_virtio_user_scsi_bdev
|
||||
- construct_virtio_pci_scsi_bdev
|
||||
- construct_virtio_user_blk_bdev
|
||||
@ -1346,6 +1352,7 @@ spdk_file_get_id() returning unique ID for the file was added.
|
||||
Added jsonrpc-client C library intended for issuing RPC commands from applications.
|
||||
|
||||
Added API enabling iteration over JSON object:
|
||||
|
||||
- spdk_json_find()
|
||||
- spdk_json_find_string()
|
||||
- spdk_json_find_array()
|
||||
@ -1785,6 +1792,7 @@ write commands.
|
||||
|
||||
New API functions that accept I/O parameters in units of blocks instead of bytes
|
||||
have been added:
|
||||
|
||||
- spdk_bdev_read_blocks(), spdk_bdev_readv_blocks()
|
||||
- spdk_bdev_write_blocks(), spdk_bdev_writev_blocks()
|
||||
- spdk_bdev_write_zeroes_blocks()
|
||||
@ -1965,6 +1973,7 @@ current set of functions.
|
||||
Support for SPDK performance analysis has been added to Intel® VTune™ Amplifier 2018.
|
||||
|
||||
This analysis provides:
|
||||
|
||||
- I/O performance monitoring (calculating standard I/O metrics like IOPS, throughput, etc.)
|
||||
- Tuning insights on the interplay of I/O and compute devices by estimating how many cores
|
||||
would be reasonable to provide for SPDK to keep up with a current storage workload.
|
||||
@ -2115,6 +2124,7 @@ NVMe devices over a network using the iSCSI protocol. The application is located
|
||||
in app/iscsi_tgt and a documented configuration file can be found at etc/spdk/spdk.conf.in.
|
||||
|
||||
This release also significantly improves the existing NVMe over Fabrics target.
|
||||
|
||||
- The configuration file format was changed, which will require updates to
|
||||
any existing nvmf.conf files (see `etc/spdk/nvmf.conf.in`):
|
||||
- `SubsystemGroup` was renamed to `Subsystem`.
|
||||
@ -2135,6 +2145,7 @@ This release also significantly improves the existing NVMe over Fabrics target.
|
||||
|
||||
This release also adds one new feature and provides some better examples and tools
|
||||
for the NVMe driver.
|
||||
|
||||
- The Weighted Round Robin arbitration method is now supported. This allows
|
||||
the user to specify different priorities on a per-I/O-queue basis. To
|
||||
enable WRR, set the `arb_mechanism` field during `spdk_nvme_probe()`.
|
||||
@ -2215,6 +2226,7 @@ This release adds a user-space driver with support for the Intel I/O Acceleratio
|
||||
This is the initial open source release of the Storage Performance Development Kit (SPDK).
|
||||
|
||||
Features:
|
||||
|
||||
- NVMe user-space driver
|
||||
- NVMe example programs
|
||||
- `examples/nvme/perf` tests performance (IOPS) using the NVMe user-space driver
|
||||
|
@ -10,6 +10,7 @@ interrupts, which avoids kernel context switches and eliminates interrupt
|
||||
handling overhead.
|
||||
|
||||
The development kit currently includes:
|
||||
|
||||
* [NVMe driver](http://www.spdk.io/doc/nvme.html)
|
||||
* [I/OAT (DMA engine) driver](http://www.spdk.io/doc/ioat.html)
|
||||
* [NVMe over Fabrics target](http://www.spdk.io/doc/nvmf.html)
|
||||
@ -172,6 +173,7 @@ of the SPDK static ones.
|
||||
|
||||
In order to start a SPDK app linked with SPDK shared libraries, make sure
|
||||
to do the following steps:
|
||||
|
||||
- run ldconfig specifying the directory containing SPDK shared libraries
|
||||
- provide proper `LD_LIBRARY_PATH`
|
||||
|
||||
|
@ -189,8 +189,8 @@ time the SPDK virtual bdev module supports cipher only as follows:
|
||||
|
||||
- AESN-NI Multi Buffer Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC
|
||||
- Intel(R) QuickAssist (QAT) Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC
|
||||
(Note: QAT is functional however is marked as experimental until the hardware has
|
||||
been fully integrated with the SPDK CI system.)
|
||||
(Note: QAT is functional however is marked as experimental until the hardware has
|
||||
been fully integrated with the SPDK CI system.)
|
||||
|
||||
In order to support using the bdev block offset (LBA) as the initialization vector (IV),
|
||||
the crypto module break up all I/O into crypto operations of a size equal to the block
|
||||
|
109
doc/blob.md
109
doc/blob.md
@ -40,22 +40,22 @@ NAND too.
|
||||
The Blobstore defines a hierarchy of storage abstractions as follows.
|
||||
|
||||
* **Logical Block**: Logical blocks are exposed by the disk itself, which are numbered from 0 to N, where N is the
|
||||
number of blocks in the disk. A logical block is typically either 512B or 4KiB.
|
||||
number of blocks in the disk. A logical block is typically either 512B or 4KiB.
|
||||
* **Page**: A page is defined to be a fixed number of logical blocks defined at Blobstore creation time. The logical
|
||||
blocks that compose a page are always contiguous. Pages are also numbered from the beginning of the disk such
|
||||
that the first page worth of blocks is page 0, the second page is page 1, etc. A page is typically 4KiB in size,
|
||||
so this is either 8 or 1 logical blocks in practice. The SSD must be able to perform atomic reads and writes of
|
||||
at least the page size.
|
||||
blocks that compose a page are always contiguous. Pages are also numbered from the beginning of the disk such
|
||||
that the first page worth of blocks is page 0, the second page is page 1, etc. A page is typically 4KiB in size,
|
||||
so this is either 8 or 1 logical blocks in practice. The SSD must be able to perform atomic reads and writes of
|
||||
at least the page size.
|
||||
* **Cluster**: A cluster is a fixed number of pages defined at Blobstore creation time. The pages that compose a cluster
|
||||
are always contiguous. Clusters are also numbered from the beginning of the disk, where cluster 0 is the first cluster
|
||||
worth of pages, cluster 1 is the second grouping of pages, etc. A cluster is typically 1MiB in size, or 256 pages.
|
||||
are always contiguous. Clusters are also numbered from the beginning of the disk, where cluster 0 is the first cluster
|
||||
worth of pages, cluster 1 is the second grouping of pages, etc. A cluster is typically 1MiB in size, or 256 pages.
|
||||
* **Blob**: A blob is an ordered list of clusters. Blobs are manipulated (created, sized, deleted, etc.) by the application
|
||||
and persist across power failures and reboots. Applications use a Blobstore provided identifier to access a particular blob.
|
||||
Blobs are read and written in units of pages by specifying an offset from the start of the blob. Applications can also
|
||||
store metadata in the form of key/value pairs with each blob which we'll refer to as xattrs (extended attributes).
|
||||
and persist across power failures and reboots. Applications use a Blobstore provided identifier to access a particular blob.
|
||||
Blobs are read and written in units of pages by specifying an offset from the start of the blob. Applications can also
|
||||
store metadata in the form of key/value pairs with each blob which we'll refer to as xattrs (extended attributes).
|
||||
* **Blobstore**: An SSD which has been initialized by a Blobstore-based application is referred to as "a Blobstore." A
|
||||
Blobstore owns the entire underlying device which is made up of a private Blobstore metadata region and the collection of
|
||||
blobs as managed by the application.
|
||||
Blobstore owns the entire underlying device which is made up of a private Blobstore metadata region and the collection of
|
||||
blobs as managed by the application.
|
||||
|
||||
@htmlonly
|
||||
|
||||
@ -115,19 +115,19 @@ For all Blobstore operations regarding atomicity, there is a dependency on the u
|
||||
operations of at least one page size. Atomicity here can refer to multiple operations:
|
||||
|
||||
* **Data Writes**: For the case of data writes, the unit of atomicity is one page. Therefore if a write operation of
|
||||
greater than one page is underway and the system suffers a power failure, the data on media will be consistent at a page
|
||||
size granularity (if a single page were in the middle of being updated when power was lost, the data at that page location
|
||||
will be as it was prior to the start of the write operation following power restoration.)
|
||||
greater than one page is underway and the system suffers a power failure, the data on media will be consistent at a page
|
||||
size granularity (if a single page were in the middle of being updated when power was lost, the data at that page location
|
||||
will be as it was prior to the start of the write operation following power restoration.)
|
||||
* **Blob Metadata Updates**: Each blob has its own set of metadata (xattrs, size, etc). For performance reasons, a copy of
|
||||
this metadata is kept in RAM and only synchronized with the on-disk version when the application makes an explicit call to
|
||||
do so, or when the Blobstore is unloaded. Therefore, setting of an xattr, for example is not consistent until the call to
|
||||
synchronize it (covered later) which is, however, performed atomically.
|
||||
this metadata is kept in RAM and only synchronized with the on-disk version when the application makes an explicit call to
|
||||
do so, or when the Blobstore is unloaded. Therefore, setting of an xattr, for example is not consistent until the call to
|
||||
synchronize it (covered later) which is, however, performed atomically.
|
||||
* **Blobstore Metadata Updates**: Blobstore itself has its own metadata which, like per blob metadata, has a copy in both
|
||||
RAM and on-disk. Unlike the per blob metadata, however, the Blobstore metadata region is not made consistent via a blob
|
||||
synchronization call, it is only synchronized when the Blobstore is properly unloaded via API. Therefore, if the Blobstore
|
||||
metadata is updated (blob creation, deletion, resize, etc.) and not unloaded properly, it will need to perform some extra
|
||||
steps the next time it is loaded which will take a bit more time than it would have if shutdown cleanly, but there will be
|
||||
no inconsistencies.
|
||||
RAM and on-disk. Unlike the per blob metadata, however, the Blobstore metadata region is not made consistent via a blob
|
||||
synchronization call, it is only synchronized when the Blobstore is properly unloaded via API. Therefore, if the Blobstore
|
||||
metadata is updated (blob creation, deletion, resize, etc.) and not unloaded properly, it will need to perform some extra
|
||||
steps the next time it is loaded which will take a bit more time than it would have if shutdown cleanly, but there will be
|
||||
no inconsistencies.
|
||||
|
||||
### Callbacks
|
||||
|
||||
@ -183,22 +183,22 @@ When the Blobstore is initialized, there are multiple configuration options to c
|
||||
options and their defaults are:
|
||||
|
||||
* **Cluster Size**: By default, this value is 1MB. The cluster size is required to be a multiple of page size and should be
|
||||
selected based on the application’s usage model in terms of allocation. Recall that blobs are made up of clusters so when
|
||||
a blob is allocated/deallocated or changes in size, disk LBAs will be manipulated in groups of cluster size. If the
|
||||
application is expecting to deal with mainly very large (always multiple GB) blobs then it may make sense to change the
|
||||
cluster size to 1GB for example.
|
||||
selected based on the application’s usage model in terms of allocation. Recall that blobs are made up of clusters so when
|
||||
a blob is allocated/deallocated or changes in size, disk LBAs will be manipulated in groups of cluster size. If the
|
||||
application is expecting to deal with mainly very large (always multiple GB) blobs then it may make sense to change the
|
||||
cluster size to 1GB for example.
|
||||
* **Number of Metadata Pages**: By default, Blobstore will assume there can be as many clusters as there are metadata pages
|
||||
which is the worst case scenario in terms of metadata usage and can be overridden here however the space efficiency is
|
||||
not significant.
|
||||
which is the worst case scenario in terms of metadata usage and can be overridden here however the space efficiency is
|
||||
not significant.
|
||||
* **Maximum Simultaneous Metadata Operations**: Determines how many internally pre-allocated memory structures are set
|
||||
aside for performing metadata operations. It is unlikely that changes to this value (default 32) would be desirable.
|
||||
aside for performing metadata operations. It is unlikely that changes to this value (default 32) would be desirable.
|
||||
* **Maximum Simultaneous Operations Per Channel**: Determines how many internally pre-allocated memory structures are set
|
||||
aside for channel operations. Changes to this value would be application dependent and best determined by both a knowledge
|
||||
of the typical usage model, an understanding of the types of SSDs being used and empirical data. The default is 512.
|
||||
aside for channel operations. Changes to this value would be application dependent and best determined by both a knowledge
|
||||
of the typical usage model, an understanding of the types of SSDs being used and empirical data. The default is 512.
|
||||
* **Blobstore Type**: This field is a character array to be used by applications that need to identify whether the
|
||||
Blobstore found here is appropriate to claim or not. The default is NULL and unless the application is being deployed in
|
||||
an environment where multiple applications using the same disks are at risk of inadvertently using the wrong Blobstore, there
|
||||
is no need to set this value. It can, however, be set to any valid set of characters.
|
||||
Blobstore found here is appropriate to claim or not. The default is NULL and unless the application is being deployed in
|
||||
an environment where multiple applications using the same disks are at risk of inadvertently using the wrong Blobstore, there
|
||||
is no need to set this value. It can, however, be set to any valid set of characters.
|
||||
|
||||
### Sub-page Sized Operations
|
||||
|
||||
@ -210,10 +210,11 @@ requires finer granularity it will have to accommodate that itself.
|
||||
As mentioned earlier, Blobstore can share a single thread with an application or the application
|
||||
can define any number of threads, within resource constraints, that makes sense. The basic considerations that must be
|
||||
followed are:
|
||||
|
||||
* Metadata operations (API with MD in the name) should be isolated from each other as there is no internal locking on the
|
||||
memory structures affected by these API.
|
||||
memory structures affected by these API.
|
||||
* Metadata operations should be isolated from conflicting IO operations (an example of a conflicting IO would be one that is
|
||||
reading/writing to an area of a blob that a metadata operation is deallocating).
|
||||
reading/writing to an area of a blob that a metadata operation is deallocating).
|
||||
* Asynchronous callbacks will always take place on the calling thread.
|
||||
* No assumptions about IO ordering can be made regardless of how many or which threads were involved in the issuing.
|
||||
|
||||
@ -267,18 +268,18 @@ relevant in understanding any kind of structure for what is on the Blobstore.
|
||||
There are multiple examples of Blobstore usage in the [repo](https://github.com/spdk/spdk):
|
||||
|
||||
* **Hello World**: Actually named `hello_blob.c` this is a very basic example of a single threaded application that
|
||||
does nothing more than demonstrate the very basic API. Although Blobstore is optimized for NVMe, this example uses
|
||||
a RAM disk (malloc) back-end so that it can be executed easily in any development environment. The malloc back-end
|
||||
is a `bdev` module thus this example uses not only the SPDK Framework but the `bdev` layer as well.
|
||||
does nothing more than demonstrate the very basic API. Although Blobstore is optimized for NVMe, this example uses
|
||||
a RAM disk (malloc) back-end so that it can be executed easily in any development environment. The malloc back-end
|
||||
is a `bdev` module thus this example uses not only the SPDK Framework but the `bdev` layer as well.
|
||||
|
||||
* **CLI**: The `blobcli.c` example is command line utility intended to not only serve as example code but as a test
|
||||
and development tool for Blobstore itself. It is also a simple single threaded application that relies on both the
|
||||
SPDK Framework and the `bdev` layer but offers multiple modes of operation to accomplish some real-world tasks. In
|
||||
command mode, it accepts single-shot commands which can be a little time consuming if there are many commands to
|
||||
get through as each one will take a few seconds waiting for DPDK initialization. It therefore has a shell mode that
|
||||
allows the developer to get to a `blob>` prompt and then very quickly interact with Blobstore with simple commands
|
||||
that include the ability to import/export blobs from/to regular files. Lastly there is a scripting mode to automate
|
||||
a series of tasks, again, handy for development and/or test type activities.
|
||||
and development tool for Blobstore itself. It is also a simple single threaded application that relies on both the
|
||||
SPDK Framework and the `bdev` layer but offers multiple modes of operation to accomplish some real-world tasks. In
|
||||
command mode, it accepts single-shot commands which can be a little time consuming if there are many commands to
|
||||
get through as each one will take a few seconds waiting for DPDK initialization. It therefore has a shell mode that
|
||||
allows the developer to get to a `blob>` prompt and then very quickly interact with Blobstore with simple commands
|
||||
that include the ability to import/export blobs from/to regular files. Lastly there is a scripting mode to automate
|
||||
a series of tasks, again, handy for development and/or test type activities.
|
||||
|
||||
## Configuration {#blob_pg_config}
|
||||
|
||||
@ -326,15 +327,16 @@ to the unallocated cluster - new extent is chosen. This information is stored in
|
||||
|
||||
There are two extent representations on-disk, dependent on `use_extent_table` (default:true) opts used
|
||||
when creating a blob.
|
||||
|
||||
* **use_extent_table=true**: EXTENT_PAGE descriptor is not part of linked list of pages. It contains extents
|
||||
that are not run-length encoded. Each extent page is referenced by EXTENT_TABLE descriptor, which is serialized
|
||||
as part of linked list of pages. Extent table is run-length encoding all unallocated extent pages.
|
||||
Every new cluster allocation updates a single extent page, in case when extent page was previously allocated.
|
||||
Otherwise additionally incurs serializing whole linked list of pages for the blob.
|
||||
that are not run-length encoded. Each extent page is referenced by EXTENT_TABLE descriptor, which is serialized
|
||||
as part of linked list of pages. Extent table is run-length encoding all unallocated extent pages.
|
||||
Every new cluster allocation updates a single extent page, in case when extent page was previously allocated.
|
||||
Otherwise additionally incurs serializing whole linked list of pages for the blob.
|
||||
|
||||
* **use_extent_table=false**: EXTENT_RLE descriptor is serialized as part of linked list of pages.
|
||||
Extents pointing to contiguous LBA are run-length encoded, including unallocated extents represented by 0.
|
||||
Every new cluster allocation incurs serializing whole linked list of pages for the blob.
|
||||
Extents pointing to contiguous LBA are run-length encoded, including unallocated extents represented by 0.
|
||||
Every new cluster allocation incurs serializing whole linked list of pages for the blob.
|
||||
|
||||
### Sequences and Batches
|
||||
|
||||
@ -393,5 +395,6 @@ example,
|
||||
~~~
|
||||
|
||||
And for the most part the following conventions are followed throughout:
|
||||
|
||||
* functions beginning with an underscore are called internally only
|
||||
* functions or variables with the letters `cpl` are related to set or callback completions
|
||||
|
@ -20,7 +20,7 @@ properties:
|
||||
because you don't have to change the data model from the single-threaded
|
||||
version. You add a lock around the data.
|
||||
* You can write your program as a synchronous, imperative list of statements
|
||||
that you read from top to bottom.
|
||||
that you read from top to bottom.
|
||||
* The scheduler can interrupt threads, allowing for efficient time-sharing
|
||||
of CPU resources.
|
||||
|
||||
|
@ -19,7 +19,7 @@ containerize your SPDK based application.
|
||||
3. Make sure your host has hugepages enabled
|
||||
4. Make sure your host has bound your nvme device to your userspace driver
|
||||
5. Write your Dockerfile. The following is a simple Dockerfile to containerize the nvme `hello_world`
|
||||
example:
|
||||
example:
|
||||
|
||||
~~~{.sh}
|
||||
# start with the latest Fedora
|
||||
|
@ -46,6 +46,7 @@ from the oldest band to the youngest.
|
||||
The address map and valid map are, along with a several other things (e.g. UUID of the device it's
|
||||
part of, number of surfaced LBAs, band's sequence number, etc.), parts of the band's metadata. The
|
||||
metadata is split in two parts:
|
||||
|
||||
* the head part, containing information already known when opening the band (device's UUID, band's
|
||||
sequence number, etc.), located at the beginning blocks of the band,
|
||||
* the tail part, containing the address map and the valid map, located at the end of the band.
|
||||
@ -146,6 +147,7 @@ bdev or OCSSD `nvme` bdev.
|
||||
Similar to other bdevs, the FTL bdevs can be created either based on JSON config files or via RPC.
|
||||
Both interfaces require the same arguments which are described by the `--help` option of the
|
||||
`bdev_ftl_create` RPC call, which are:
|
||||
|
||||
- bdev's name
|
||||
- base bdev's name (base bdev must implement bdev_zone API)
|
||||
- UUID of the FTL device (if the FTL is to be restored from the SSD)
|
||||
@ -161,6 +163,7 @@ on [spdk-3.0.0](https://github.com/spdk/qemu/tree/spdk-3.0.0) branch.
|
||||
|
||||
To emulate an Open Channel device, QEMU expects parameters describing the characteristics and
|
||||
geometry of the SSD:
|
||||
|
||||
- `serial` - serial number,
|
||||
- `lver` - version of the OCSSD standard (0 - disabled, 1 - "1.2", 2 - "2.0"), libftl only supports
|
||||
2.0,
|
||||
@ -240,6 +243,7 @@ Logical blks per chunk: 24576
|
||||
```
|
||||
|
||||
In order to create FTL on top Open Channel SSD, the following steps are required:
|
||||
|
||||
1) Attach OCSSD NVMe controller
|
||||
2) Create OCSSD bdev on the controller attached in step 1 (user could specify parallel unit range
|
||||
and create multiple OCSSD bdevs on single OCSSD NVMe controller)
|
||||
|
24
doc/iscsi.md
24
doc/iscsi.md
@ -309,20 +309,20 @@ sde
|
||||
At the iSCSI level, we provide the following support for Hotplug:
|
||||
|
||||
1. bdev/nvme:
|
||||
At the bdev/nvme level, we start one hotplug monitor which will call
|
||||
spdk_nvme_probe() periodically to get the hotplug events. We provide the
|
||||
private attach_cb and remove_cb for spdk_nvme_probe(). For the attach_cb,
|
||||
we will create the block device base on the NVMe device attached, and for the
|
||||
remove_cb, we will unregister the block device, which will also notify the
|
||||
upper level stack (for iSCSI target, the upper level stack is scsi/lun) to
|
||||
handle the hot-remove event.
|
||||
At the bdev/nvme level, we start one hotplug monitor which will call
|
||||
spdk_nvme_probe() periodically to get the hotplug events. We provide the
|
||||
private attach_cb and remove_cb for spdk_nvme_probe(). For the attach_cb,
|
||||
we will create the block device base on the NVMe device attached, and for the
|
||||
remove_cb, we will unregister the block device, which will also notify the
|
||||
upper level stack (for iSCSI target, the upper level stack is scsi/lun) to
|
||||
handle the hot-remove event.
|
||||
|
||||
2. scsi/lun:
|
||||
When the LUN receive the hot-remove notification from block device layer,
|
||||
the LUN will be marked as removed, and all the IOs after this point will
|
||||
return with check condition status. Then the LUN starts one poller which will
|
||||
wait for all the commands which have already been submitted to block device to
|
||||
return back; after all the commands return back, the LUN will be deleted.
|
||||
When the LUN receive the hot-remove notification from block device layer,
|
||||
the LUN will be marked as removed, and all the IOs after this point will
|
||||
return with check condition status. Then the LUN starts one poller which will
|
||||
wait for all the commands which have already been submitted to block device to
|
||||
return back; after all the commands return back, the LUN will be deleted.
|
||||
|
||||
## Known bugs and limitations {#iscsi_hotplug_bugs}
|
||||
|
||||
|
@ -4950,6 +4950,7 @@ Either UUID or name is used to access logical volume store in RPCs.
|
||||
A logical volume has a UUID and a name for its identification.
|
||||
The UUID of the logical volume is generated on creation and it can be unique identifier.
|
||||
The alias of the logical volume takes the format _lvs_name/lvol_name_ where:
|
||||
|
||||
* _lvs_name_ is the name of the logical volume store.
|
||||
* _lvol_name_ is specified on creation and can be renamed.
|
||||
|
||||
|
@ -6,9 +6,10 @@ Now nvme-cli can support both kernel driver and SPDK user mode driver for most o
|
||||
Intel specific commands.
|
||||
|
||||
1. Clone the nvme-cli repository from the SPDK GitHub fork. Make sure you check out the spdk-1.6 branch.
|
||||
~~~{.sh}
|
||||
git clone -b spdk-1.6 https://github.com/spdk/nvme-cli.git
|
||||
~~~
|
||||
|
||||
~~~{.sh}
|
||||
git clone -b spdk-1.6 https://github.com/spdk/nvme-cli.git
|
||||
~~~
|
||||
|
||||
2. Clone the SPDK repository from https://github.com/spdk/spdk under the nvme-cli folder.
|
||||
|
||||
@ -19,47 +20,51 @@ git clone -b spdk-1.6 https://github.com/spdk/nvme-cli.git
|
||||
5. Execute "<spdk_folder>/scripts/setup.sh" with the "root" account.
|
||||
|
||||
6. Update the "spdk.conf" file under nvme-cli folder to properly configure the SPDK. Notes as following:
|
||||
~~~{.sh}
|
||||
spdk=1
|
||||
Indicates whether or not to use spdk. Can be 0 (off) or 1 (on).
|
||||
Defaults to 1 which assumes that you have run "<spdk_folder>/scripts/setup.sh", unbinding your drives from the kernel.
|
||||
|
||||
~~~{.sh}
|
||||
spdk=1
|
||||
Indicates whether or not to use spdk. Can be 0 (off) or 1 (on).
|
||||
Defaults to 1 which assumes that you have run "<spdk_folder>/scripts/setup.sh", unbinding your drives from the kernel.
|
||||
|
||||
core_mask=0x1
|
||||
A bitmask representing which core(s) to use for nvme-cli operations.
|
||||
Defaults to core 0.
|
||||
core_mask=0x1
|
||||
A bitmask representing which core(s) to use for nvme-cli operations.
|
||||
Defaults to core 0.
|
||||
|
||||
mem_size=512
|
||||
The amount of reserved hugepage memory to use for nvme-cli (in MB).
|
||||
Defaults to 512MB.
|
||||
mem_size=512
|
||||
The amount of reserved hugepage memory to use for nvme-cli (in MB).
|
||||
Defaults to 512MB.
|
||||
|
||||
shm_id=0
|
||||
Indicates the shared memory ID for the spdk application with which your NVMe drives are associated,
|
||||
and should be adjusted accordingly.
|
||||
Defaults to 0.
|
||||
shm_id=0
|
||||
Indicates the shared memory ID for the spdk application with which your NVMe drives are associated,
|
||||
and should be adjusted accordingly.
|
||||
Defaults to 0.
|
||||
~~~
|
||||
|
||||
7. Run the "./nvme list" command to get the domain:bus:device.function for each found NVMe SSD.
|
||||
|
||||
8. Run the other nvme commands with domain:bus:device.function instead of "/dev/nvmeX" for the specified device.
|
||||
~~~{.sh}
|
||||
Example: ./nvme smart-log 0000:01:00.0
|
||||
~~~
|
||||
|
||||
~~~{.sh}
|
||||
Example: ./nvme smart-log 0000:01:00.0
|
||||
~~~
|
||||
|
||||
9. Run the "./nvme intel" commands for Intel specific commands against Intel NVMe SSD.
|
||||
~~~{.sh}
|
||||
Example: ./nvme intel internal-log 0000:08:00.0
|
||||
~~~
|
||||
|
||||
~~~{.sh}
|
||||
Example: ./nvme intel internal-log 0000:08:00.0
|
||||
~~~
|
||||
|
||||
10. Execute "<spdk_folder>/scripts/setup.sh reset" with the "root" account and update "spdk=0" in spdk.conf to
|
||||
use the kernel driver if wanted.
|
||||
use the kernel driver if wanted.
|
||||
|
||||
## Use scenarios
|
||||
|
||||
### Run as the only SPDK application on the system
|
||||
|
||||
1. Modify the spdk to 1 in spdk.conf. If the system has fewer cores or less memory, update the spdk.conf accordingly.
|
||||
|
||||
### Run together with other running SPDK applications on shared NVMe SSDs
|
||||
|
||||
1. For the other running SPDK application, start with the parameter like "-i 1" to have the same "shm_id".
|
||||
|
||||
2. Use the default spdk.conf setting where "shm_id=1" to start the nvme-cli.
|
||||
@ -67,21 +72,25 @@ use the kernel driver if wanted.
|
||||
3. If other SPDK applications run with different shm_id parameter, update the "spdk.conf" accordingly.
|
||||
|
||||
### Run with other running SPDK applications on non-shared NVMe SSDs
|
||||
|
||||
1. Properly configure the other running SPDK applications.
|
||||
~~~{.sh}
|
||||
a. Only access the NVMe SSDs it wants.
|
||||
b. Allocate a fixed number of memory instead of all available memory.
|
||||
~~~
|
||||
|
||||
~~~{.sh}
|
||||
a. Only access the NVMe SSDs it wants.
|
||||
b. Allocate a fixed number of memory instead of all available memory.
|
||||
~~~
|
||||
|
||||
2. Properly configure the spdk.conf setting for nvme-cli.
|
||||
~~~{.sh}
|
||||
a. Not access the NVMe SSDs from other SPDK applications.
|
||||
b. Change the mem_size to a proper size.
|
||||
~~~
|
||||
|
||||
~~~{.sh}
|
||||
a. Not access the NVMe SSDs from other SPDK applications.
|
||||
b. Change the mem_size to a proper size.
|
||||
~~~
|
||||
|
||||
## Note
|
||||
|
||||
1. To run the newly built nvme-cli, either explicitly run as "./nvme" or added it into the $PATH to avoid
|
||||
invoke other already installed version.
|
||||
invoke other already installed version.
|
||||
|
||||
2. To run the newly built nvme-cli with SPDK support in arbitrary directory, copy "spdk.conf" to that
|
||||
directory from the nvme cli folder and update the configuration as suggested.
|
||||
directory from the nvme cli folder and update the configuration as suggested.
|
||||
|
27
doc/nvme.md
27
doc/nvme.md
@ -249,9 +249,10 @@ DPDK EAL allows different types of processes to be spawned, each with different
|
||||
on the hugepage memory used by the applications.
|
||||
|
||||
There are two types of processes:
|
||||
|
||||
1. a primary process which initializes the shared memory and has full privileges and
|
||||
2. a secondary process which can attach to the primary process by mapping its shared memory
|
||||
regions and perform NVMe operations including creating queue pairs.
|
||||
regions and perform NVMe operations including creating queue pairs.
|
||||
|
||||
This feature is enabled by default and is controlled by selecting a value for the shared
|
||||
memory group ID. This ID is a positive integer and two applications with the same shared
|
||||
@ -272,10 +273,10 @@ Example: identical shm_id and non-overlapping core masks
|
||||
|
||||
1. Two processes sharing memory may not share any cores in their core mask.
|
||||
2. If a primary process exits while secondary processes are still running, those processes
|
||||
will continue to run. However, a new primary process cannot be created.
|
||||
will continue to run. However, a new primary process cannot be created.
|
||||
3. Applications are responsible for coordinating access to logical blocks.
|
||||
4. If a process exits unexpectedly, the allocated memory will be released when the last
|
||||
process exits.
|
||||
process exits.
|
||||
|
||||
@sa spdk_nvme_probe, spdk_nvme_ctrlr_process_admin_completions
|
||||
|
||||
@ -285,18 +286,18 @@ process exits.
|
||||
At the NVMe driver level, we provide the following support for Hotplug:
|
||||
|
||||
1. Hotplug events detection:
|
||||
The user of the NVMe library can call spdk_nvme_probe() periodically to detect
|
||||
hotplug events. The probe_cb, followed by the attach_cb, will be called for each
|
||||
new device detected. The user may optionally also provide a remove_cb that will be
|
||||
called if a previously attached NVMe device is no longer present on the system.
|
||||
All subsequent I/O to the removed device will return an error.
|
||||
The user of the NVMe library can call spdk_nvme_probe() periodically to detect
|
||||
hotplug events. The probe_cb, followed by the attach_cb, will be called for each
|
||||
new device detected. The user may optionally also provide a remove_cb that will be
|
||||
called if a previously attached NVMe device is no longer present on the system.
|
||||
All subsequent I/O to the removed device will return an error.
|
||||
|
||||
2. Hot remove NVMe with IO loads:
|
||||
When a device is hot removed while I/O is occurring, all access to the PCI BAR will
|
||||
result in a SIGBUS error. The NVMe driver automatically handles this case by installing
|
||||
a SIGBUS handler and remapping the PCI BAR to a new, placeholder memory location.
|
||||
This means I/O in flight during a hot remove will complete with an appropriate error
|
||||
code and will not crash the application.
|
||||
When a device is hot removed while I/O is occurring, all access to the PCI BAR will
|
||||
result in a SIGBUS error. The NVMe driver automatically handles this case by installing
|
||||
a SIGBUS handler and remapping the PCI BAR to a new, placeholder memory location.
|
||||
This means I/O in flight during a hot remove will complete with an appropriate error
|
||||
code and will not crash the application.
|
||||
|
||||
@sa spdk_nvme_probe
|
||||
|
||||
|
@ -201,6 +201,7 @@ NVMe Domain NQN = "nqn.", year, '-', month, '.', reverse domain, ':', utf-8 stri
|
||||
~~~
|
||||
|
||||
Please note that the following types from the definition above are defined elsewhere:
|
||||
|
||||
1. utf-8 string: Defined in [rfc 3629](https://tools.ietf.org/html/rfc3629).
|
||||
2. reverse domain: Equivalent to domain name as defined in [rfc 1034](https://tools.ietf.org/html/rfc1034).
|
||||
|
||||
|
@ -11,6 +11,7 @@ for the next SPDK release.
|
||||
|
||||
All dependencies should be handled by scripts/pkgdep.sh script.
|
||||
Package dependencies at the moment include:
|
||||
|
||||
- configshell
|
||||
|
||||
### Run SPDK application instance
|
||||
|
@ -31,6 +31,7 @@ copy the vagrant configuration file (a.k.a. `Vagrantfile`) to it,
|
||||
and run `vagrant up` with some settings defined by the script arguments.
|
||||
|
||||
By default, the VM created is configured with:
|
||||
|
||||
- 2 vCPUs
|
||||
- 4G of RAM
|
||||
- 2 NICs (1 x NAT - host access, 1 x private network)
|
||||
|
@ -347,9 +347,9 @@ To enable it on Linux, it is required to modify kernel options inside the
|
||||
virtual machine.
|
||||
|
||||
Instructions below for Ubuntu OS:
|
||||
|
||||
1. `vi /etc/default/grub`
|
||||
2. Make sure mq is enabled:
|
||||
`GRUB_CMDLINE_LINUX="scsi_mod.use_blk_mq=1"`
|
||||
2. Make sure mq is enabled: `GRUB_CMDLINE_LINUX="scsi_mod.use_blk_mq=1"`
|
||||
3. `sudo update-grub`
|
||||
4. Reboot virtual machine
|
||||
|
||||
|
@ -89,6 +89,7 @@ device (SPDK) can access it directly. The memory can be fragmented into multiple
|
||||
physically-discontiguous regions and Vhost-user specification puts a limit on
|
||||
their number - currently 8. The driver sends a single message for each region with
|
||||
the following data:
|
||||
|
||||
* file descriptor - for mmap
|
||||
* user address - for memory translations in Vhost-user messages (e.g.
|
||||
translating vring addresses)
|
||||
@ -106,6 +107,7 @@ as they use common SCSI I/O to inquiry the underlying disk(s).
|
||||
|
||||
Afterwards, the driver requests the number of maximum supported queues and
|
||||
starts sending virtqueue data, which consists of:
|
||||
|
||||
* unique virtqueue id
|
||||
* index of the last processed vring descriptor
|
||||
* vring addresses (from user address space)
|
||||
|
@ -6,8 +6,9 @@ SPDK Virtio driver is a C library that allows communicating with Virtio devices.
|
||||
It allows any SPDK application to become an initiator for (SPDK) vhost targets.
|
||||
|
||||
The driver supports two different usage models:
|
||||
|
||||
* PCI - This is the standard mode of operation when used in a guest virtual
|
||||
machine, where QEMU has presented the virtio controller as a virtual PCI device.
|
||||
machine, where QEMU has presented the virtio controller as a virtual PCI device.
|
||||
* vhost-user - Can be used to connect to a vhost socket directly on the same host.
|
||||
|
||||
The driver, just like the SPDK @ref vhost, is using pollers instead of standard
|
||||
|
@ -72,21 +72,26 @@ VPP can be configured using a VPP startup file and the `vppctl` command; By defa
|
||||
Some key values from iSCSI point of view includes:
|
||||
|
||||
CPU section (`cpu`):
|
||||
|
||||
- `main-core <lcore>` -- logical CPU core used for main thread.
|
||||
- `corelist-workers <lcore list>` -- logical CPU cores where worker threads are running.
|
||||
|
||||
DPDK section (`dpdk`):
|
||||
|
||||
- `num-rx-queues <num>` -- number of receive queues.
|
||||
- `num-tx-queues <num>` -- number of transmit queues.
|
||||
- `dev <PCI address>` -- whitelisted device.
|
||||
|
||||
Session section (`session`):
|
||||
|
||||
- `evt_qs_memfd_seg` -- uses a memfd segment for event queues. This is required for SPDK.
|
||||
|
||||
Socket server session (`socksvr`):
|
||||
|
||||
- `socket-name <path>` -- configure API socket filename (curently SPDK uses default path `/run/vpp-api.sock`).
|
||||
|
||||
Plugins section (`plugins`):
|
||||
|
||||
- `plugin <plugin name> { [enable|disable] }` -- enable or disable VPP plugin.
|
||||
|
||||
### Example:
|
||||
|
@ -60,6 +60,7 @@ other than -t, -s, -n and -a.
|
||||
|
||||
## fio
|
||||
Fio job parameters.
|
||||
|
||||
- bs: block size
|
||||
- qd: io depth
|
||||
- rw: workload mode
|
||||
|
@ -8,7 +8,7 @@ The following guide explains how to use the scripts in the `spdk/scripts/vagrant
|
||||
4. Install and configure [Vagrant 1.9.4](https://www.vagrantup.com) or newer
|
||||
|
||||
* Note: The extension pack has different licensing than main VirtualBox, please
|
||||
review them carefully as the evaluation license is for personal use only.
|
||||
review them carefully as the evaluation license is for personal use only.
|
||||
|
||||
## Mac OSX Setup (High Sierra)
|
||||
|
||||
@ -20,7 +20,8 @@ Quick start instructions for OSX:
|
||||
4. Install Vagrant Cask
|
||||
|
||||
* Note: The extension pack has different licensing than main VirtualBox, please
|
||||
review them carefully as the evaluation license is for personal use only.
|
||||
review them carefully as the evaluation license is for personal use only.
|
||||
|
||||
```
|
||||
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
|
||||
brew doctor
|
||||
@ -38,7 +39,7 @@ review them carefully as the evaluation license is for personal use only.
|
||||
4. Install and configure [Vagrant 1.9.4](https://www.vagrantup.com) or newer
|
||||
|
||||
* Note: The extension pack has different licensing than main VirtualBox, please
|
||||
review them carefully as the evaluation license is for personal use only.
|
||||
review them carefully as the evaluation license is for personal use only.
|
||||
|
||||
- Note: VirtualBox requires virtualization to be enabled in the BIOS.
|
||||
- Note: You should disable Hyper-V in Windows RS 3 laptop. Search `windows features` un-check Hyper-V, restart laptop
|
||||
@ -58,7 +59,7 @@ Following the generic instructions should be sufficient for most Linux distribut
|
||||
7. rpm -ivh vagrant_2.1.2_x86_64.rpm
|
||||
|
||||
* Note: The extension pack has different licensing than main VirtualBox, please
|
||||
review them carefully as the evaluation license is for personal use only.
|
||||
review them carefully as the evaluation license is for personal use only.
|
||||
|
||||
## Configure Vagrant
|
||||
|
||||
|
@ -7,6 +7,7 @@ Multiple controllers and namespaces can be exposed to the fuzzer at a time. In o
|
||||
handle multiple namespaces, the fuzzer will round robin assign a thread to each namespace and
|
||||
submit commands to that thread at a set queue depth. (currently 128 for I/O, 16 for Admin). The
|
||||
application will terminate under three conditions:
|
||||
|
||||
1. The user specified run time expires (see the -t flag).
|
||||
2. One of the target controllers stops completing I/O operations back to the fuzzer i.e. controller timeout.
|
||||
3. The user specified a json file containing operations to run and the fuzzer has received valid completions for all of them.
|
||||
@ -14,8 +15,10 @@ application will terminate under three conditions:
|
||||
# Output
|
||||
|
||||
By default, the fuzzer will print commands that:
|
||||
|
||||
1. Complete successfully back from the target, or
|
||||
2. Are outstanding at the time of a controller timeout.
|
||||
|
||||
Commands are dumped as named objects in json format which can then be supplied back to the
|
||||
script for targeted debugging on a subsequent run. See `Debugging` below.
|
||||
By default no output is generated when a specific command is returned with a failed status.
|
||||
|
@ -14,6 +14,7 @@ Like the NVMe fuzzer, there is an example json file showing the types of request
|
||||
that the application accepts. Since the vhost application accepts both vhost block
|
||||
and vhost scsi commands, there are three distinct object types that can be passed in
|
||||
to the application.
|
||||
|
||||
1. vhost_blk_cmd
|
||||
2. vhost_scsi_cmd
|
||||
3. vhost_scsi_mgmt_cmd
|
||||
|
@ -10,6 +10,7 @@ to emulate an RDMA enabled NIC. NVMe controllers can also be virtualized in emul
|
||||
|
||||
|
||||
## VM Envronment Requirements (Host):
|
||||
|
||||
- 8 GiB of RAM (for DPDK)
|
||||
- Enable intel_kvm on the host machine from the bios.
|
||||
- Enable nesting for VMs in kernel command line (for vhost tests).
|
||||
@ -28,6 +29,7 @@ configuration file. For a full list of the variable declarations available for a
|
||||
`test/common/autotest_common.sh` starting at line 13.
|
||||
|
||||
## Steps for Configuring the VM
|
||||
|
||||
1. Download a fresh Fedora 26 image.
|
||||
2. Perform the installation of Fedora 26 server.
|
||||
3. Create an admin user sys_sgsw (enabling passwordless sudo for this account will make life easier during the tests).
|
||||
@ -60,6 +62,7 @@ created above and guest or VM refer to the Ubuntu VM created in this section.
|
||||
- move .qcow2 file and ssh keys to default locations used by vhost test scripts
|
||||
|
||||
Alternatively it is possible to create the VM image manually using following steps:
|
||||
|
||||
1. Create an image file for the VM. It does not have to be large, about 3.5G should suffice.
|
||||
2. Create an ssh keypair for host-guest communications (performed on the host):
|
||||
- Generate an ssh keypair with the name spdk_vhost_id_rsa and save it in `/root/.ssh`.
|
||||
|
Loading…
Reference in New Issue
Block a user