diff --git a/CHANGELOG.md b/CHANGELOG.md index 106721797..7f717c845 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -227,11 +227,13 @@ Added `spdk_bdev_get_write_unit_size()` function for retrieving required number of logical blocks for write operation. New zone-related fields were added to the result of the `get_bdevs` RPC call: + - `zoned`: indicates whether the device is zoned or a regular block device - `zone_size`: number of blocks in a single zone - `max_open_zones`: maximum number of open zones - `optimal_open_zones`: optimal number of open zones + The `zoned` field is a boolean and is always present, while the rest is only available for zoned bdevs. @@ -949,6 +951,7 @@ parameter. The function will now update that parameter with the largest possible for which the memory is contiguous in the physical memory address space. The following functions were removed: + - spdk_pci_nvme_device_attach() - spdk_pci_nvme_enumerate() - spdk_pci_ioat_device_attach() @@ -958,6 +961,7 @@ The following functions were removed: They were replaced with generic spdk_pci_device_attach() and spdk_pci_enumerate() which require a new spdk_pci_driver object to be provided. It can be one of the following: + - spdk_pci_nvme_get_driver() - spdk_pci_ioat_get_driver() - spdk_pci_virtio_get_driver() @@ -1138,6 +1142,7 @@ Dropped support for DPDK 16.07 and earlier, which SPDK won't even compile with r ### RPC The following RPC commands deprecated in the previous release are now removed: + - construct_virtio_user_scsi_bdev - construct_virtio_pci_scsi_bdev - construct_virtio_user_blk_bdev @@ -1326,6 +1331,7 @@ respectively. ### Virtio The following RPC commands have been deprecated: + - construct_virtio_user_scsi_bdev - construct_virtio_pci_scsi_bdev - construct_virtio_user_blk_bdev @@ -1346,6 +1352,7 @@ spdk_file_get_id() returning unique ID for the file was added. Added jsonrpc-client C library intended for issuing RPC commands from applications. Added API enabling iteration over JSON object: + - spdk_json_find() - spdk_json_find_string() - spdk_json_find_array() @@ -1785,6 +1792,7 @@ write commands. New API functions that accept I/O parameters in units of blocks instead of bytes have been added: + - spdk_bdev_read_blocks(), spdk_bdev_readv_blocks() - spdk_bdev_write_blocks(), spdk_bdev_writev_blocks() - spdk_bdev_write_zeroes_blocks() @@ -1965,6 +1973,7 @@ current set of functions. Support for SPDK performance analysis has been added to Intel® VTune™ Amplifier 2018. This analysis provides: + - I/O performance monitoring (calculating standard I/O metrics like IOPS, throughput, etc.) - Tuning insights on the interplay of I/O and compute devices by estimating how many cores would be reasonable to provide for SPDK to keep up with a current storage workload. @@ -2115,6 +2124,7 @@ NVMe devices over a network using the iSCSI protocol. The application is located in app/iscsi_tgt and a documented configuration file can be found at etc/spdk/spdk.conf.in. This release also significantly improves the existing NVMe over Fabrics target. + - The configuration file format was changed, which will require updates to any existing nvmf.conf files (see `etc/spdk/nvmf.conf.in`): - `SubsystemGroup` was renamed to `Subsystem`. @@ -2135,6 +2145,7 @@ This release also significantly improves the existing NVMe over Fabrics target. This release also adds one new feature and provides some better examples and tools for the NVMe driver. + - The Weighted Round Robin arbitration method is now supported. This allows the user to specify different priorities on a per-I/O-queue basis. To enable WRR, set the `arb_mechanism` field during `spdk_nvme_probe()`. @@ -2215,6 +2226,7 @@ This release adds a user-space driver with support for the Intel I/O Acceleratio This is the initial open source release of the Storage Performance Development Kit (SPDK). Features: + - NVMe user-space driver - NVMe example programs - `examples/nvme/perf` tests performance (IOPS) using the NVMe user-space driver diff --git a/README.md b/README.md index 5faa2d58d..430362cf7 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,7 @@ interrupts, which avoids kernel context switches and eliminates interrupt handling overhead. The development kit currently includes: + * [NVMe driver](http://www.spdk.io/doc/nvme.html) * [I/OAT (DMA engine) driver](http://www.spdk.io/doc/ioat.html) * [NVMe over Fabrics target](http://www.spdk.io/doc/nvmf.html) @@ -172,6 +173,7 @@ of the SPDK static ones. In order to start a SPDK app linked with SPDK shared libraries, make sure to do the following steps: + - run ldconfig specifying the directory containing SPDK shared libraries - provide proper `LD_LIBRARY_PATH` diff --git a/doc/bdev.md b/doc/bdev.md index 42e5de6a8..fc99ff57b 100644 --- a/doc/bdev.md +++ b/doc/bdev.md @@ -189,8 +189,8 @@ time the SPDK virtual bdev module supports cipher only as follows: - AESN-NI Multi Buffer Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC - Intel(R) QuickAssist (QAT) Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC -(Note: QAT is functional however is marked as experimental until the hardware has -been fully integrated with the SPDK CI system.) + (Note: QAT is functional however is marked as experimental until the hardware has + been fully integrated with the SPDK CI system.) In order to support using the bdev block offset (LBA) as the initialization vector (IV), the crypto module break up all I/O into crypto operations of a size equal to the block diff --git a/doc/blob.md b/doc/blob.md index dad4ae85a..6e10562d4 100644 --- a/doc/blob.md +++ b/doc/blob.md @@ -40,22 +40,22 @@ NAND too. The Blobstore defines a hierarchy of storage abstractions as follows. * **Logical Block**: Logical blocks are exposed by the disk itself, which are numbered from 0 to N, where N is the -number of blocks in the disk. A logical block is typically either 512B or 4KiB. + number of blocks in the disk. A logical block is typically either 512B or 4KiB. * **Page**: A page is defined to be a fixed number of logical blocks defined at Blobstore creation time. The logical -blocks that compose a page are always contiguous. Pages are also numbered from the beginning of the disk such -that the first page worth of blocks is page 0, the second page is page 1, etc. A page is typically 4KiB in size, -so this is either 8 or 1 logical blocks in practice. The SSD must be able to perform atomic reads and writes of -at least the page size. + blocks that compose a page are always contiguous. Pages are also numbered from the beginning of the disk such + that the first page worth of blocks is page 0, the second page is page 1, etc. A page is typically 4KiB in size, + so this is either 8 or 1 logical blocks in practice. The SSD must be able to perform atomic reads and writes of + at least the page size. * **Cluster**: A cluster is a fixed number of pages defined at Blobstore creation time. The pages that compose a cluster -are always contiguous. Clusters are also numbered from the beginning of the disk, where cluster 0 is the first cluster -worth of pages, cluster 1 is the second grouping of pages, etc. A cluster is typically 1MiB in size, or 256 pages. + are always contiguous. Clusters are also numbered from the beginning of the disk, where cluster 0 is the first cluster + worth of pages, cluster 1 is the second grouping of pages, etc. A cluster is typically 1MiB in size, or 256 pages. * **Blob**: A blob is an ordered list of clusters. Blobs are manipulated (created, sized, deleted, etc.) by the application -and persist across power failures and reboots. Applications use a Blobstore provided identifier to access a particular blob. -Blobs are read and written in units of pages by specifying an offset from the start of the blob. Applications can also -store metadata in the form of key/value pairs with each blob which we'll refer to as xattrs (extended attributes). + and persist across power failures and reboots. Applications use a Blobstore provided identifier to access a particular blob. + Blobs are read and written in units of pages by specifying an offset from the start of the blob. Applications can also + store metadata in the form of key/value pairs with each blob which we'll refer to as xattrs (extended attributes). * **Blobstore**: An SSD which has been initialized by a Blobstore-based application is referred to as "a Blobstore." A -Blobstore owns the entire underlying device which is made up of a private Blobstore metadata region and the collection of -blobs as managed by the application. + Blobstore owns the entire underlying device which is made up of a private Blobstore metadata region and the collection of + blobs as managed by the application. @htmlonly @@ -115,19 +115,19 @@ For all Blobstore operations regarding atomicity, there is a dependency on the u operations of at least one page size. Atomicity here can refer to multiple operations: * **Data Writes**: For the case of data writes, the unit of atomicity is one page. Therefore if a write operation of -greater than one page is underway and the system suffers a power failure, the data on media will be consistent at a page -size granularity (if a single page were in the middle of being updated when power was lost, the data at that page location -will be as it was prior to the start of the write operation following power restoration.) + greater than one page is underway and the system suffers a power failure, the data on media will be consistent at a page + size granularity (if a single page were in the middle of being updated when power was lost, the data at that page location + will be as it was prior to the start of the write operation following power restoration.) * **Blob Metadata Updates**: Each blob has its own set of metadata (xattrs, size, etc). For performance reasons, a copy of -this metadata is kept in RAM and only synchronized with the on-disk version when the application makes an explicit call to -do so, or when the Blobstore is unloaded. Therefore, setting of an xattr, for example is not consistent until the call to -synchronize it (covered later) which is, however, performed atomically. + this metadata is kept in RAM and only synchronized with the on-disk version when the application makes an explicit call to + do so, or when the Blobstore is unloaded. Therefore, setting of an xattr, for example is not consistent until the call to + synchronize it (covered later) which is, however, performed atomically. * **Blobstore Metadata Updates**: Blobstore itself has its own metadata which, like per blob metadata, has a copy in both -RAM and on-disk. Unlike the per blob metadata, however, the Blobstore metadata region is not made consistent via a blob -synchronization call, it is only synchronized when the Blobstore is properly unloaded via API. Therefore, if the Blobstore -metadata is updated (blob creation, deletion, resize, etc.) and not unloaded properly, it will need to perform some extra -steps the next time it is loaded which will take a bit more time than it would have if shutdown cleanly, but there will be -no inconsistencies. + RAM and on-disk. Unlike the per blob metadata, however, the Blobstore metadata region is not made consistent via a blob + synchronization call, it is only synchronized when the Blobstore is properly unloaded via API. Therefore, if the Blobstore + metadata is updated (blob creation, deletion, resize, etc.) and not unloaded properly, it will need to perform some extra + steps the next time it is loaded which will take a bit more time than it would have if shutdown cleanly, but there will be + no inconsistencies. ### Callbacks @@ -183,22 +183,22 @@ When the Blobstore is initialized, there are multiple configuration options to c options and their defaults are: * **Cluster Size**: By default, this value is 1MB. The cluster size is required to be a multiple of page size and should be -selected based on the application’s usage model in terms of allocation. Recall that blobs are made up of clusters so when -a blob is allocated/deallocated or changes in size, disk LBAs will be manipulated in groups of cluster size. If the -application is expecting to deal with mainly very large (always multiple GB) blobs then it may make sense to change the -cluster size to 1GB for example. + selected based on the application’s usage model in terms of allocation. Recall that blobs are made up of clusters so when + a blob is allocated/deallocated or changes in size, disk LBAs will be manipulated in groups of cluster size. If the + application is expecting to deal with mainly very large (always multiple GB) blobs then it may make sense to change the + cluster size to 1GB for example. * **Number of Metadata Pages**: By default, Blobstore will assume there can be as many clusters as there are metadata pages -which is the worst case scenario in terms of metadata usage and can be overridden here however the space efficiency is -not significant. + which is the worst case scenario in terms of metadata usage and can be overridden here however the space efficiency is + not significant. * **Maximum Simultaneous Metadata Operations**: Determines how many internally pre-allocated memory structures are set -aside for performing metadata operations. It is unlikely that changes to this value (default 32) would be desirable. + aside for performing metadata operations. It is unlikely that changes to this value (default 32) would be desirable. * **Maximum Simultaneous Operations Per Channel**: Determines how many internally pre-allocated memory structures are set -aside for channel operations. Changes to this value would be application dependent and best determined by both a knowledge -of the typical usage model, an understanding of the types of SSDs being used and empirical data. The default is 512. + aside for channel operations. Changes to this value would be application dependent and best determined by both a knowledge + of the typical usage model, an understanding of the types of SSDs being used and empirical data. The default is 512. * **Blobstore Type**: This field is a character array to be used by applications that need to identify whether the -Blobstore found here is appropriate to claim or not. The default is NULL and unless the application is being deployed in -an environment where multiple applications using the same disks are at risk of inadvertently using the wrong Blobstore, there -is no need to set this value. It can, however, be set to any valid set of characters. + Blobstore found here is appropriate to claim or not. The default is NULL and unless the application is being deployed in + an environment where multiple applications using the same disks are at risk of inadvertently using the wrong Blobstore, there + is no need to set this value. It can, however, be set to any valid set of characters. ### Sub-page Sized Operations @@ -210,10 +210,11 @@ requires finer granularity it will have to accommodate that itself. As mentioned earlier, Blobstore can share a single thread with an application or the application can define any number of threads, within resource constraints, that makes sense. The basic considerations that must be followed are: + * Metadata operations (API with MD in the name) should be isolated from each other as there is no internal locking on the -memory structures affected by these API. + memory structures affected by these API. * Metadata operations should be isolated from conflicting IO operations (an example of a conflicting IO would be one that is -reading/writing to an area of a blob that a metadata operation is deallocating). + reading/writing to an area of a blob that a metadata operation is deallocating). * Asynchronous callbacks will always take place on the calling thread. * No assumptions about IO ordering can be made regardless of how many or which threads were involved in the issuing. @@ -267,18 +268,18 @@ relevant in understanding any kind of structure for what is on the Blobstore. There are multiple examples of Blobstore usage in the [repo](https://github.com/spdk/spdk): * **Hello World**: Actually named `hello_blob.c` this is a very basic example of a single threaded application that -does nothing more than demonstrate the very basic API. Although Blobstore is optimized for NVMe, this example uses -a RAM disk (malloc) back-end so that it can be executed easily in any development environment. The malloc back-end -is a `bdev` module thus this example uses not only the SPDK Framework but the `bdev` layer as well. + does nothing more than demonstrate the very basic API. Although Blobstore is optimized for NVMe, this example uses + a RAM disk (malloc) back-end so that it can be executed easily in any development environment. The malloc back-end + is a `bdev` module thus this example uses not only the SPDK Framework but the `bdev` layer as well. * **CLI**: The `blobcli.c` example is command line utility intended to not only serve as example code but as a test -and development tool for Blobstore itself. It is also a simple single threaded application that relies on both the -SPDK Framework and the `bdev` layer but offers multiple modes of operation to accomplish some real-world tasks. In -command mode, it accepts single-shot commands which can be a little time consuming if there are many commands to -get through as each one will take a few seconds waiting for DPDK initialization. It therefore has a shell mode that -allows the developer to get to a `blob>` prompt and then very quickly interact with Blobstore with simple commands -that include the ability to import/export blobs from/to regular files. Lastly there is a scripting mode to automate -a series of tasks, again, handy for development and/or test type activities. + and development tool for Blobstore itself. It is also a simple single threaded application that relies on both the + SPDK Framework and the `bdev` layer but offers multiple modes of operation to accomplish some real-world tasks. In + command mode, it accepts single-shot commands which can be a little time consuming if there are many commands to + get through as each one will take a few seconds waiting for DPDK initialization. It therefore has a shell mode that + allows the developer to get to a `blob>` prompt and then very quickly interact with Blobstore with simple commands + that include the ability to import/export blobs from/to regular files. Lastly there is a scripting mode to automate + a series of tasks, again, handy for development and/or test type activities. ## Configuration {#blob_pg_config} @@ -326,15 +327,16 @@ to the unallocated cluster - new extent is chosen. This information is stored in There are two extent representations on-disk, dependent on `use_extent_table` (default:true) opts used when creating a blob. + * **use_extent_table=true**: EXTENT_PAGE descriptor is not part of linked list of pages. It contains extents -that are not run-length encoded. Each extent page is referenced by EXTENT_TABLE descriptor, which is serialized -as part of linked list of pages. Extent table is run-length encoding all unallocated extent pages. -Every new cluster allocation updates a single extent page, in case when extent page was previously allocated. -Otherwise additionally incurs serializing whole linked list of pages for the blob. + that are not run-length encoded. Each extent page is referenced by EXTENT_TABLE descriptor, which is serialized + as part of linked list of pages. Extent table is run-length encoding all unallocated extent pages. + Every new cluster allocation updates a single extent page, in case when extent page was previously allocated. + Otherwise additionally incurs serializing whole linked list of pages for the blob. * **use_extent_table=false**: EXTENT_RLE descriptor is serialized as part of linked list of pages. -Extents pointing to contiguous LBA are run-length encoded, including unallocated extents represented by 0. -Every new cluster allocation incurs serializing whole linked list of pages for the blob. + Extents pointing to contiguous LBA are run-length encoded, including unallocated extents represented by 0. + Every new cluster allocation incurs serializing whole linked list of pages for the blob. ### Sequences and Batches @@ -393,5 +395,6 @@ example, ~~~ And for the most part the following conventions are followed throughout: + * functions beginning with an underscore are called internally only * functions or variables with the letters `cpl` are related to set or callback completions diff --git a/doc/concurrency.md b/doc/concurrency.md index 46bed84be..b0ae7021d 100644 --- a/doc/concurrency.md +++ b/doc/concurrency.md @@ -20,7 +20,7 @@ properties: because you don't have to change the data model from the single-threaded version. You add a lock around the data. * You can write your program as a synchronous, imperative list of statements -that you read from top to bottom. + that you read from top to bottom. * The scheduler can interrupt threads, allowing for efficient time-sharing of CPU resources. diff --git a/doc/containers.md b/doc/containers.md index c8de93eb0..3b472674b 100644 --- a/doc/containers.md +++ b/doc/containers.md @@ -19,7 +19,7 @@ containerize your SPDK based application. 3. Make sure your host has hugepages enabled 4. Make sure your host has bound your nvme device to your userspace driver 5. Write your Dockerfile. The following is a simple Dockerfile to containerize the nvme `hello_world` -example: + example: ~~~{.sh} # start with the latest Fedora diff --git a/doc/ftl.md b/doc/ftl.md index b8bd7faeb..ae13889bf 100644 --- a/doc/ftl.md +++ b/doc/ftl.md @@ -46,6 +46,7 @@ from the oldest band to the youngest. The address map and valid map are, along with a several other things (e.g. UUID of the device it's part of, number of surfaced LBAs, band's sequence number, etc.), parts of the band's metadata. The metadata is split in two parts: + * the head part, containing information already known when opening the band (device's UUID, band's sequence number, etc.), located at the beginning blocks of the band, * the tail part, containing the address map and the valid map, located at the end of the band. @@ -146,6 +147,7 @@ bdev or OCSSD `nvme` bdev. Similar to other bdevs, the FTL bdevs can be created either based on JSON config files or via RPC. Both interfaces require the same arguments which are described by the `--help` option of the `bdev_ftl_create` RPC call, which are: + - bdev's name - base bdev's name (base bdev must implement bdev_zone API) - UUID of the FTL device (if the FTL is to be restored from the SSD) @@ -161,6 +163,7 @@ on [spdk-3.0.0](https://github.com/spdk/qemu/tree/spdk-3.0.0) branch. To emulate an Open Channel device, QEMU expects parameters describing the characteristics and geometry of the SSD: + - `serial` - serial number, - `lver` - version of the OCSSD standard (0 - disabled, 1 - "1.2", 2 - "2.0"), libftl only supports 2.0, @@ -240,6 +243,7 @@ Logical blks per chunk: 24576 ``` In order to create FTL on top Open Channel SSD, the following steps are required: + 1) Attach OCSSD NVMe controller 2) Create OCSSD bdev on the controller attached in step 1 (user could specify parallel unit range and create multiple OCSSD bdevs on single OCSSD NVMe controller) diff --git a/doc/iscsi.md b/doc/iscsi.md index 827d421a3..cd9fe5bec 100644 --- a/doc/iscsi.md +++ b/doc/iscsi.md @@ -309,20 +309,20 @@ sde At the iSCSI level, we provide the following support for Hotplug: 1. bdev/nvme: -At the bdev/nvme level, we start one hotplug monitor which will call -spdk_nvme_probe() periodically to get the hotplug events. We provide the -private attach_cb and remove_cb for spdk_nvme_probe(). For the attach_cb, -we will create the block device base on the NVMe device attached, and for the -remove_cb, we will unregister the block device, which will also notify the -upper level stack (for iSCSI target, the upper level stack is scsi/lun) to -handle the hot-remove event. + At the bdev/nvme level, we start one hotplug monitor which will call + spdk_nvme_probe() periodically to get the hotplug events. We provide the + private attach_cb and remove_cb for spdk_nvme_probe(). For the attach_cb, + we will create the block device base on the NVMe device attached, and for the + remove_cb, we will unregister the block device, which will also notify the + upper level stack (for iSCSI target, the upper level stack is scsi/lun) to + handle the hot-remove event. 2. scsi/lun: -When the LUN receive the hot-remove notification from block device layer, -the LUN will be marked as removed, and all the IOs after this point will -return with check condition status. Then the LUN starts one poller which will -wait for all the commands which have already been submitted to block device to -return back; after all the commands return back, the LUN will be deleted. + When the LUN receive the hot-remove notification from block device layer, + the LUN will be marked as removed, and all the IOs after this point will + return with check condition status. Then the LUN starts one poller which will + wait for all the commands which have already been submitted to block device to + return back; after all the commands return back, the LUN will be deleted. ## Known bugs and limitations {#iscsi_hotplug_bugs} diff --git a/doc/jsonrpc.md b/doc/jsonrpc.md index cee57a5eb..4b009a02f 100644 --- a/doc/jsonrpc.md +++ b/doc/jsonrpc.md @@ -4950,6 +4950,7 @@ Either UUID or name is used to access logical volume store in RPCs. A logical volume has a UUID and a name for its identification. The UUID of the logical volume is generated on creation and it can be unique identifier. The alias of the logical volume takes the format _lvs_name/lvol_name_ where: + * _lvs_name_ is the name of the logical volume store. * _lvol_name_ is specified on creation and can be renamed. diff --git a/doc/nvme-cli.md b/doc/nvme-cli.md index 4b79fea87..dc477bf16 100644 --- a/doc/nvme-cli.md +++ b/doc/nvme-cli.md @@ -6,9 +6,10 @@ Now nvme-cli can support both kernel driver and SPDK user mode driver for most o Intel specific commands. 1. Clone the nvme-cli repository from the SPDK GitHub fork. Make sure you check out the spdk-1.6 branch. -~~~{.sh} -git clone -b spdk-1.6 https://github.com/spdk/nvme-cli.git -~~~ + + ~~~{.sh} + git clone -b spdk-1.6 https://github.com/spdk/nvme-cli.git + ~~~ 2. Clone the SPDK repository from https://github.com/spdk/spdk under the nvme-cli folder. @@ -19,47 +20,51 @@ git clone -b spdk-1.6 https://github.com/spdk/nvme-cli.git 5. Execute "/scripts/setup.sh" with the "root" account. 6. Update the "spdk.conf" file under nvme-cli folder to properly configure the SPDK. Notes as following: -~~~{.sh} -spdk=1 -Indicates whether or not to use spdk. Can be 0 (off) or 1 (on). -Defaults to 1 which assumes that you have run "/scripts/setup.sh", unbinding your drives from the kernel. + ~~~{.sh} + spdk=1 + Indicates whether or not to use spdk. Can be 0 (off) or 1 (on). + Defaults to 1 which assumes that you have run "/scripts/setup.sh", unbinding your drives from the kernel. -core_mask=0x1 -A bitmask representing which core(s) to use for nvme-cli operations. -Defaults to core 0. + core_mask=0x1 + A bitmask representing which core(s) to use for nvme-cli operations. + Defaults to core 0. -mem_size=512 -The amount of reserved hugepage memory to use for nvme-cli (in MB). -Defaults to 512MB. + mem_size=512 + The amount of reserved hugepage memory to use for nvme-cli (in MB). + Defaults to 512MB. -shm_id=0 -Indicates the shared memory ID for the spdk application with which your NVMe drives are associated, -and should be adjusted accordingly. -Defaults to 0. + shm_id=0 + Indicates the shared memory ID for the spdk application with which your NVMe drives are associated, + and should be adjusted accordingly. + Defaults to 0. ~~~ 7. Run the "./nvme list" command to get the domain:bus:device.function for each found NVMe SSD. 8. Run the other nvme commands with domain:bus:device.function instead of "/dev/nvmeX" for the specified device. -~~~{.sh} -Example: ./nvme smart-log 0000:01:00.0 -~~~ + + ~~~{.sh} + Example: ./nvme smart-log 0000:01:00.0 + ~~~ 9. Run the "./nvme intel" commands for Intel specific commands against Intel NVMe SSD. -~~~{.sh} -Example: ./nvme intel internal-log 0000:08:00.0 -~~~ + + ~~~{.sh} + Example: ./nvme intel internal-log 0000:08:00.0 + ~~~ 10. Execute "/scripts/setup.sh reset" with the "root" account and update "spdk=0" in spdk.conf to -use the kernel driver if wanted. + use the kernel driver if wanted. ## Use scenarios ### Run as the only SPDK application on the system + 1. Modify the spdk to 1 in spdk.conf. If the system has fewer cores or less memory, update the spdk.conf accordingly. ### Run together with other running SPDK applications on shared NVMe SSDs + 1. For the other running SPDK application, start with the parameter like "-i 1" to have the same "shm_id". 2. Use the default spdk.conf setting where "shm_id=1" to start the nvme-cli. @@ -67,21 +72,25 @@ use the kernel driver if wanted. 3. If other SPDK applications run with different shm_id parameter, update the "spdk.conf" accordingly. ### Run with other running SPDK applications on non-shared NVMe SSDs + 1. Properly configure the other running SPDK applications. -~~~{.sh} -a. Only access the NVMe SSDs it wants. -b. Allocate a fixed number of memory instead of all available memory. -~~~ + + ~~~{.sh} + a. Only access the NVMe SSDs it wants. + b. Allocate a fixed number of memory instead of all available memory. + ~~~ 2. Properly configure the spdk.conf setting for nvme-cli. -~~~{.sh} -a. Not access the NVMe SSDs from other SPDK applications. -b. Change the mem_size to a proper size. -~~~ + + ~~~{.sh} + a. Not access the NVMe SSDs from other SPDK applications. + b. Change the mem_size to a proper size. + ~~~ ## Note + 1. To run the newly built nvme-cli, either explicitly run as "./nvme" or added it into the $PATH to avoid -invoke other already installed version. + invoke other already installed version. 2. To run the newly built nvme-cli with SPDK support in arbitrary directory, copy "spdk.conf" to that -directory from the nvme cli folder and update the configuration as suggested. + directory from the nvme cli folder and update the configuration as suggested. diff --git a/doc/nvme.md b/doc/nvme.md index 49802ad73..6f1fb8546 100644 --- a/doc/nvme.md +++ b/doc/nvme.md @@ -249,9 +249,10 @@ DPDK EAL allows different types of processes to be spawned, each with different on the hugepage memory used by the applications. There are two types of processes: + 1. a primary process which initializes the shared memory and has full privileges and 2. a secondary process which can attach to the primary process by mapping its shared memory -regions and perform NVMe operations including creating queue pairs. + regions and perform NVMe operations including creating queue pairs. This feature is enabled by default and is controlled by selecting a value for the shared memory group ID. This ID is a positive integer and two applications with the same shared @@ -272,10 +273,10 @@ Example: identical shm_id and non-overlapping core masks 1. Two processes sharing memory may not share any cores in their core mask. 2. If a primary process exits while secondary processes are still running, those processes -will continue to run. However, a new primary process cannot be created. + will continue to run. However, a new primary process cannot be created. 3. Applications are responsible for coordinating access to logical blocks. 4. If a process exits unexpectedly, the allocated memory will be released when the last -process exits. + process exits. @sa spdk_nvme_probe, spdk_nvme_ctrlr_process_admin_completions @@ -285,18 +286,18 @@ process exits. At the NVMe driver level, we provide the following support for Hotplug: 1. Hotplug events detection: -The user of the NVMe library can call spdk_nvme_probe() periodically to detect -hotplug events. The probe_cb, followed by the attach_cb, will be called for each -new device detected. The user may optionally also provide a remove_cb that will be -called if a previously attached NVMe device is no longer present on the system. -All subsequent I/O to the removed device will return an error. + The user of the NVMe library can call spdk_nvme_probe() periodically to detect + hotplug events. The probe_cb, followed by the attach_cb, will be called for each + new device detected. The user may optionally also provide a remove_cb that will be + called if a previously attached NVMe device is no longer present on the system. + All subsequent I/O to the removed device will return an error. 2. Hot remove NVMe with IO loads: -When a device is hot removed while I/O is occurring, all access to the PCI BAR will -result in a SIGBUS error. The NVMe driver automatically handles this case by installing -a SIGBUS handler and remapping the PCI BAR to a new, placeholder memory location. -This means I/O in flight during a hot remove will complete with an appropriate error -code and will not crash the application. + When a device is hot removed while I/O is occurring, all access to the PCI BAR will + result in a SIGBUS error. The NVMe driver automatically handles this case by installing + a SIGBUS handler and remapping the PCI BAR to a new, placeholder memory location. + This means I/O in flight during a hot remove will complete with an appropriate error + code and will not crash the application. @sa spdk_nvme_probe diff --git a/doc/nvmf.md b/doc/nvmf.md index 80cc47b30..55c6ba16d 100644 --- a/doc/nvmf.md +++ b/doc/nvmf.md @@ -201,6 +201,7 @@ NVMe Domain NQN = "nqn.", year, '-', month, '.', reverse domain, ':', utf-8 stri ~~~ Please note that the following types from the definition above are defined elsewhere: + 1. utf-8 string: Defined in [rfc 3629](https://tools.ietf.org/html/rfc3629). 2. reverse domain: Equivalent to domain name as defined in [rfc 1034](https://tools.ietf.org/html/rfc1034). diff --git a/doc/spdkcli.md b/doc/spdkcli.md index b3d3e4095..1b35e1eea 100644 --- a/doc/spdkcli.md +++ b/doc/spdkcli.md @@ -11,6 +11,7 @@ for the next SPDK release. All dependencies should be handled by scripts/pkgdep.sh script. Package dependencies at the moment include: + - configshell ### Run SPDK application instance diff --git a/doc/vagrant.md b/doc/vagrant.md index 4ab3d38ca..5fba4a210 100644 --- a/doc/vagrant.md +++ b/doc/vagrant.md @@ -31,6 +31,7 @@ copy the vagrant configuration file (a.k.a. `Vagrantfile`) to it, and run `vagrant up` with some settings defined by the script arguments. By default, the VM created is configured with: + - 2 vCPUs - 4G of RAM - 2 NICs (1 x NAT - host access, 1 x private network) diff --git a/doc/vhost.md b/doc/vhost.md index 1f0dde408..c9c020436 100644 --- a/doc/vhost.md +++ b/doc/vhost.md @@ -347,9 +347,9 @@ To enable it on Linux, it is required to modify kernel options inside the virtual machine. Instructions below for Ubuntu OS: + 1. `vi /etc/default/grub` -2. Make sure mq is enabled: -`GRUB_CMDLINE_LINUX="scsi_mod.use_blk_mq=1"` +2. Make sure mq is enabled: `GRUB_CMDLINE_LINUX="scsi_mod.use_blk_mq=1"` 3. `sudo update-grub` 4. Reboot virtual machine diff --git a/doc/vhost_processing.md b/doc/vhost_processing.md index f66812bae..93a11633f 100644 --- a/doc/vhost_processing.md +++ b/doc/vhost_processing.md @@ -89,6 +89,7 @@ device (SPDK) can access it directly. The memory can be fragmented into multiple physically-discontiguous regions and Vhost-user specification puts a limit on their number - currently 8. The driver sends a single message for each region with the following data: + * file descriptor - for mmap * user address - for memory translations in Vhost-user messages (e.g. translating vring addresses) @@ -106,6 +107,7 @@ as they use common SCSI I/O to inquiry the underlying disk(s). Afterwards, the driver requests the number of maximum supported queues and starts sending virtqueue data, which consists of: + * unique virtqueue id * index of the last processed vring descriptor * vring addresses (from user address space) diff --git a/doc/virtio.md b/doc/virtio.md index 753356518..ae498f23e 100644 --- a/doc/virtio.md +++ b/doc/virtio.md @@ -6,8 +6,9 @@ SPDK Virtio driver is a C library that allows communicating with Virtio devices. It allows any SPDK application to become an initiator for (SPDK) vhost targets. The driver supports two different usage models: + * PCI - This is the standard mode of operation when used in a guest virtual -machine, where QEMU has presented the virtio controller as a virtual PCI device. + machine, where QEMU has presented the virtio controller as a virtual PCI device. * vhost-user - Can be used to connect to a vhost socket directly on the same host. The driver, just like the SPDK @ref vhost, is using pollers instead of standard diff --git a/doc/vpp_integration.md b/doc/vpp_integration.md index 36b864e7a..7477aec91 100644 --- a/doc/vpp_integration.md +++ b/doc/vpp_integration.md @@ -72,21 +72,26 @@ VPP can be configured using a VPP startup file and the `vppctl` command; By defa Some key values from iSCSI point of view includes: CPU section (`cpu`): + - `main-core ` -- logical CPU core used for main thread. - `corelist-workers ` -- logical CPU cores where worker threads are running. DPDK section (`dpdk`): + - `num-rx-queues ` -- number of receive queues. - `num-tx-queues ` -- number of transmit queues. - `dev ` -- whitelisted device. Session section (`session`): + - `evt_qs_memfd_seg` -- uses a memfd segment for event queues. This is required for SPDK. Socket server session (`socksvr`): + - `socket-name ` -- configure API socket filename (curently SPDK uses default path `/run/vpp-api.sock`). Plugins section (`plugins`): + - `plugin { [enable|disable] }` -- enable or disable VPP plugin. ### Example: diff --git a/scripts/perf/nvmf/README.md b/scripts/perf/nvmf/README.md index d617601a5..7199efbb0 100644 --- a/scripts/perf/nvmf/README.md +++ b/scripts/perf/nvmf/README.md @@ -60,6 +60,7 @@ other than -t, -s, -n and -a. ## fio Fio job parameters. + - bs: block size - qd: io depth - rw: workload mode diff --git a/scripts/vagrant/README.md b/scripts/vagrant/README.md index 1f95dc45f..3e74e8027 100644 --- a/scripts/vagrant/README.md +++ b/scripts/vagrant/README.md @@ -8,7 +8,7 @@ The following guide explains how to use the scripts in the `spdk/scripts/vagrant 4. Install and configure [Vagrant 1.9.4](https://www.vagrantup.com) or newer * Note: The extension pack has different licensing than main VirtualBox, please -review them carefully as the evaluation license is for personal use only. + review them carefully as the evaluation license is for personal use only. ## Mac OSX Setup (High Sierra) @@ -20,7 +20,8 @@ Quick start instructions for OSX: 4. Install Vagrant Cask * Note: The extension pack has different licensing than main VirtualBox, please -review them carefully as the evaluation license is for personal use only. + review them carefully as the evaluation license is for personal use only. + ``` /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" brew doctor @@ -38,7 +39,7 @@ review them carefully as the evaluation license is for personal use only. 4. Install and configure [Vagrant 1.9.4](https://www.vagrantup.com) or newer * Note: The extension pack has different licensing than main VirtualBox, please -review them carefully as the evaluation license is for personal use only. + review them carefully as the evaluation license is for personal use only. - Note: VirtualBox requires virtualization to be enabled in the BIOS. - Note: You should disable Hyper-V in Windows RS 3 laptop. Search `windows features` un-check Hyper-V, restart laptop @@ -58,7 +59,7 @@ Following the generic instructions should be sufficient for most Linux distribut 7. rpm -ivh vagrant_2.1.2_x86_64.rpm * Note: The extension pack has different licensing than main VirtualBox, please -review them carefully as the evaluation license is for personal use only. + review them carefully as the evaluation license is for personal use only. ## Configure Vagrant diff --git a/test/app/fuzz/nvme_fuzz/README.md b/test/app/fuzz/nvme_fuzz/README.md index 8c5910c10..2f188b5b8 100644 --- a/test/app/fuzz/nvme_fuzz/README.md +++ b/test/app/fuzz/nvme_fuzz/README.md @@ -7,6 +7,7 @@ Multiple controllers and namespaces can be exposed to the fuzzer at a time. In o handle multiple namespaces, the fuzzer will round robin assign a thread to each namespace and submit commands to that thread at a set queue depth. (currently 128 for I/O, 16 for Admin). The application will terminate under three conditions: + 1. The user specified run time expires (see the -t flag). 2. One of the target controllers stops completing I/O operations back to the fuzzer i.e. controller timeout. 3. The user specified a json file containing operations to run and the fuzzer has received valid completions for all of them. @@ -14,8 +15,10 @@ application will terminate under three conditions: # Output By default, the fuzzer will print commands that: + 1. Complete successfully back from the target, or 2. Are outstanding at the time of a controller timeout. + Commands are dumped as named objects in json format which can then be supplied back to the script for targeted debugging on a subsequent run. See `Debugging` below. By default no output is generated when a specific command is returned with a failed status. diff --git a/test/app/fuzz/vhost_fuzz/README.md b/test/app/fuzz/vhost_fuzz/README.md index 129febad9..ab9656c5b 100644 --- a/test/app/fuzz/vhost_fuzz/README.md +++ b/test/app/fuzz/vhost_fuzz/README.md @@ -14,6 +14,7 @@ Like the NVMe fuzzer, there is an example json file showing the types of request that the application accepts. Since the vhost application accepts both vhost block and vhost scsi commands, there are three distinct object types that can be passed in to the application. + 1. vhost_blk_cmd 2. vhost_scsi_cmd 3. vhost_scsi_mgmt_cmd diff --git a/test/common/config/README.md b/test/common/config/README.md index 609f9de80..e4a15cf37 100644 --- a/test/common/config/README.md +++ b/test/common/config/README.md @@ -10,6 +10,7 @@ to emulate an RDMA enabled NIC. NVMe controllers can also be virtualized in emul ## VM Envronment Requirements (Host): + - 8 GiB of RAM (for DPDK) - Enable intel_kvm on the host machine from the bios. - Enable nesting for VMs in kernel command line (for vhost tests). @@ -28,6 +29,7 @@ configuration file. For a full list of the variable declarations available for a `test/common/autotest_common.sh` starting at line 13. ## Steps for Configuring the VM + 1. Download a fresh Fedora 26 image. 2. Perform the installation of Fedora 26 server. 3. Create an admin user sys_sgsw (enabling passwordless sudo for this account will make life easier during the tests). @@ -60,6 +62,7 @@ created above and guest or VM refer to the Ubuntu VM created in this section. - move .qcow2 file and ssh keys to default locations used by vhost test scripts Alternatively it is possible to create the VM image manually using following steps: + 1. Create an image file for the VM. It does not have to be large, about 3.5G should suffice. 2. Create an ssh keypair for host-guest communications (performed on the host): - Generate an ssh keypair with the name spdk_vhost_id_rsa and save it in `/root/.ssh`.