ivampiresp/Spdk - Spdk - Leaflow Developers

Author	SHA1	Message	Date
Richael Zhuang	4d7b2b36aa	bdev_nvme: record io paths' stat before being destroyed The io paths' stat will get lost when they are destroyed. Record the stat in the nvme_ns structure. Change-Id: I12fc0b04fac0d59e7465fe543ee733f2822a9cdb Signed-off-by: Richael Zhuang <richael.zhuang@arm.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14744 Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2023-01-19 01:57:11 +00:00
Richael Zhuang	f61b004197	bdev_nvme: update nvme_io_path stat when IO completes Currently we have stat per bdev I/O channel, but for NVMe bdev multipath, we don't have stat per I/O path. Especially for active-active mode, we may want to observe each path's statistics. This patch support IO stat for nvme_io_path. Record each nvme_io_path stat using structure spdk_bdev_io_stat. The following is the comparison of bdevperf test. Test on Arm server with the following basic configuration. 1 Null bdev: block size: 4K, num_blocks:16k run bdevperf with io size=4k, qdepth=1/32/128, rw type=randwrite/mixed with 70% read/randread Each time run 30 seconds, each item run for 16 times and get the average. The result is as follows. qdepth type IOPS(default) IOPS(this patch) diff 1 randwrite 7795157.27 7859909.78 0.83% 1 mix(70% r) 7418607.08 7404026.54 -0.20% 1 randread 8053560.83 8046315.44 -0.09% 32 randwrite 15409191.3 15327642.11 -0.53% 32 mix(70% r) 13760145.97 13714666.28 -0.33% 32 randread 16136922.98 16038855.39 -0.61% 128 randwrite 14815647.56 14944902.74 0.87% 128 mix(70% r) 13414858.59 13412317.46 -0.02% 128 randread 15508642.43 15521752.41 0.08% Change-Id: I4eb5673f49d65d3ff9b930361d2f31ab0ccfa021 Signed-off-by: Richael Zhuang <richael.zhuang@arm.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14743 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>	2023-01-19 01:57:11 +00:00
Richael Zhuang	2f500a23fb	bdev/nvme: support switch to another io path after a number of IOs Support to specify rr_min_io for multipath round-robin policy, which makes I/O switches to another io path after rr_min_io I/Os are rounted to current io path. Change-Id: I09f0d8d24271c0178ff816fa63ce8576b6e8ae47 Signed-off-by: Richael Zhuang <richael.zhuang@arm.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15445 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>	2023-01-19 01:57:11 +00:00
Richael Zhuang	6aa4edc27d	bdev/nvme: select io path according to outstanding io numbder Support selecting io path according to number of outstanding io of each path in a channel. It's optional, and can be set by calling RPC "bdev_nvme_set_multipath_policy -s queue_depth". Change-Id: I82cdfbd69b3e105c973844c4f34dc98f0dca2faf Signed-off-by: Richael Zhuang <richael.zhuang@arm.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14734 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2023-01-19 01:57:11 +00:00
Shuhei Matsumoto	a3ae6eaa75	bdev/nvme: Add an option for the RDMA SRQ size Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I8e678b5681c8039ccd359de8a797ede4eaddf8b5 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14914 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2023-01-17 23:53:01 +00:00
Parameswaran Krishnamurthy	2796687d54	nvme: Added support for TP-8009, Auto-discovery of Discovery controllers for NVME initiator using mDNS using Avahi Approach: Avahi Daemon needs to be running to provide the mDNS server service. In the SPDK, Avahi-client library based client API is implemented. The client API will connect to the Avahi-daemon and receive events for new discovery and removal of an existing discovery entry. Following sets on new RPCs have been introduced. scripts/rpc.py bdev_nvme_start_mdns_discovery -b cdc_auto -s _nvme-disc._tcp User shall initiate an mDNS based discovery using this RPC. This will start a Avahi-client based poller looking for new discovery events from the Avahi server. On a new discovery of the discovery controller, the existing bdev_nvme_start_discovery API will be invoked with the trid of the discovery controller learnt. This will enable automatic connection of the initiator to the subsystems discovered from the discovery controller. Multiple mdns discovery instances can be run by specifying a unique bdev-prefix and a unique servicename to discover as parameters. scripts/rpc.py bdev_nvme_stop_mdns_discovery -b cdc_auto This will stop the Avahi poller that was started for the specified service.Internally bdev_nvme_stop_discovery API will be invoked for each of the discovery controllers learnt automatically by this instance of mdns discovery service. This will result in termination of connections to all the subsystems learnt by this mdns discovery instance. scripts/rpc.py bdev_nvme_get_mdns_discovery_info This RPC will display the list of mdns discovery instances running and the trid of the controllers discovered by these instances. Test Result: root@ubuntu-pm-18-226:~/param-spdk/spdk/build/bin# ./nvmf_tgt -i 1 -s 2048 -m 0xF root@ubuntu-pm-18-226:~/param-spdk/spdk# scripts/rpc.py bdev_nvme_start_mdns_discovery -b cdc_auto -s _nvme-disc._tcp root@ubuntu-pm-18-226:~/param-spdk/spdk# root@ubuntu-pm-18-226:~/param-spdk/spdk# scripts/rpc.py bdev_nvme_get_mdns_discovery_info [ { "name": "cdc_auto", "svcname": "_nvme-disc._tcp", "referrals": [ { "name": "cdc_auto0", "trid": { "trtype": "TCP", "adrfam": "IPv4", "traddr": "66.1.2.21", "trsvcid": "8009", "subnqn": "nqn.2014-08.org.nvmexpress.discovery" } }, { "name": "cdc_auto1", "trid": { "trtype": "TCP", "adrfam": "IPv4", "traddr": "66.1.1.21", "trsvcid": "8009", "subnqn": "nqn.2014-08.org.nvmexpress.discovery" } } ] } ] root@ubuntu-pm-18-226:~/param-spdk/spdk# root@ubuntu-pm-18-226:~/param-spdk/spdk# scripts/rpc.py bdev_nvme_get_discovery_info [ { "name": "cdc_auto0", "trid": { "trtype": "TCP", "adrfam": "IPv4", "traddr": "66.1.2.21", "trsvcid": "8009", "subnqn": "nqn.2014-08.org.nvmexpress.discovery" }, "referrals": [] }, { "name": "cdc_auto1", "trid": { "trtype": "TCP", "adrfam": "IPv4", "traddr": "66.1.1.21", "trsvcid": "8009", "subnqn": "nqn.2014-08.org.nvmexpress.discovery" }, "referrals": [] } ] root@ubuntu-pm-18-226:~/param-spdk/spdk# scripts/rpc.py bdev_get_bdevs [ { "name": "cdc_auto02n1", "aliases": [ "600110d6-1681-1681-0403-000045805c45" ], "product_name": "NVMe disk", "block_size": 512, "num_blocks": 32768, "uuid": "600110d6-1681-1681-0403-000045805c45", "assigned_rate_limits": { "rw_ios_per_sec": 0, "rw_mbytes_per_sec": 0, "r_mbytes_per_sec": 0, "w_mbytes_per_sec": 0 }, "claimed": false, "zoned": false, "supported_io_types": { "read": true, "write": true, "unmap": true, "write_zeroes": true, "flush": true, "reset": true, "compare": true, "compare_and_write": true, "abort": true, "nvme_admin": true, "nvme_io": true }, "driver_specific": { "nvme": [ { "trid": { "trtype": "TCP", "adrfam": "IPv4", "traddr": "66.1.1.40", "trsvcid": "4420", "subnqn": "nqn.2014-08.com.sanblaze:virtualun.virtualun.3.0" }, "ctrlr_data": { "cntlid": 3, "vendor_id": "0x0000", "model_number": "SANBlaze VLUN P3T0", "serial_number": "00-681681dc681681dc", "firmware_revision": "V10.5", "subnqn": "nqn.2014-08.com.sanblaze:virtualun.virtualun.3.0", "oacs": { "security": 0, "format": 1, "firmware": 0, "ns_manage": 1 }, "multi_ctrlr": true, "ana_reporting": true }, "vs": { "nvme_version": "2.0" }, "ns_data": { "id": 1, "ana_state": "optimized", "can_share": true } } ], "mp_policy": "active_passive" } }, { "name": "cdc_auto00n1", "aliases": [ "600110da-09a6-09a6-0302-00005eeb19b4" ], "product_name": "NVMe disk", "block_size": 512, "num_blocks": 2048, "uuid": "600110da-09a6-09a6-0302-00005eeb19b4", "assigned_rate_limits": { "rw_ios_per_sec": 0, "rw_mbytes_per_sec": 0, "r_mbytes_per_sec": 0, "w_mbytes_per_sec": 0 }, "claimed": false, "zoned": false, "supported_io_types": { "read": true, "write": true, "unmap": true, "write_zeroes": true, "flush": true, "reset": true, "compare": true, "compare_and_write": true, "abort": true, "nvme_admin": true, "nvme_io": true }, "driver_specific": { "nvme": [ { "trid": { "trtype": "TCP", "adrfam": "IPv4", "traddr": "66.1.2.40", "trsvcid": "4420", "subnqn": "nqn.2014-08.com.sanblaze:virtualun.virtualun.2.0" }, "ctrlr_data": { "cntlid": 1, "vendor_id": "0x0000", "model_number": "SANBlaze VLUN P2T0", "serial_number": "00-ab09a6f5ab09a6f5", "firmware_revision": "V10.5", "subnqn": "nqn.2014-08.com.sanblaze:virtualun.virtualun.2.0", "oacs": { "security": 0, "format": 1, "firmware": 0, "ns_manage": 1 }, "multi_ctrlr": true, "ana_reporting": true }, "vs": { "nvme_version": "2.0" }, "ns_data": { "id": 1, "ana_state": "optimized", "can_share": true } } ], "mp_policy": "active_passive" } }, { "name": "cdc_auto01n1", "aliases": [ "600110d6-dce8-dce8-0403-00010b2d3d8c" ], "product_name": "NVMe disk", "block_size": 512, "num_blocks": 32768, "uuid": "600110d6-dce8-dce8-0403-00010b2d3d8c", "assigned_rate_limits": { "rw_ios_per_sec": 0, "rw_mbytes_per_sec": 0, "r_mbytes_per_sec": 0, "w_mbytes_per_sec": 0 }, "claimed": false, "zoned": false, "supported_io_types": { "read": true, "write": true, "unmap": true, "write_zeroes": true, "flush": true, "reset": true, "compare": true, "compare_and_write": true, "abort": true, "nvme_admin": true, "nvme_io": true }, "driver_specific": { "nvme": [ { "trid": { "trtype": "TCP", "adrfam": "IPv4", "traddr": "66.1.1.40", "trsvcid": "4420", "subnqn": "nqn.2014-08.com.sanblaze:virtualun.virtualun.3.1" }, "ctrlr_data": { "cntlid": 3, "vendor_id": "0x0000", "model_number": "SANBlaze VLUN P3T1", "serial_number": "01-6ddce86d6ddce86d", "firmware_revision": "V10.5", "subnqn": "nqn.2014-08.com.sanblaze:virtualun.virtualun.3.1", "oacs": { "security": 0, "format": 1, "firmware": 0, "ns_manage": 1 }, "multi_ctrlr": true, "ana_reporting": true }, "vs": { "nvme_version": "2.0" }, "ns_data": { "id": 1, "ana_state": "optimized", "can_share": true } } ], "mp_policy": "active_passive" } }, { "name": "cdc_auto01n2", "aliases": [ "600110d6-dce8-dce8-0403-00010b2d3d8d" ], "product_name": "NVMe disk", "block_size": 512, "num_blocks": 32768, "uuid": "600110d6-dce8-dce8-0403-00010b2d3d8d", "assigned_rate_limits": { "rw_ios_per_sec": 0, "rw_mbytes_per_sec": 0, "r_mbytes_per_sec": 0, "w_mbytes_per_sec": 0 }, "claimed": false, "zoned": false, "supported_io_types": { "read": true, "write": true, "unmap": true, "write_zeroes": true, "flush": true, "reset": true, "compare": true, "compare_and_write": true, "abort": true, "nvme_admin": true, "nvme_io": true }, "driver_specific": { "nvme": [ { "trid": { "trtype": "TCP", "adrfam": "IPv4", "traddr": "66.1.1.40", "trsvcid": "4420", "subnqn": "nqn.2014-08.com.sanblaze:virtualun.virtualun.3.1" }, "ctrlr_data": { "cntlid": 3, "vendor_id": "0x0000", "model_number": "SANBlaze VLUN P3T1", "serial_number": "01-6ddce86d6ddce86d", "firmware_revision": "V10.5", "subnqn": "nqn.2014-08.com.sanblaze:virtualun.virtualun.3.1", "oacs": { "security": 0, "format": 1, "firmware": 0, "ns_manage": 1 }, "multi_ctrlr": true, "ana_reporting": true }, "vs": { "nvme_version": "2.0" }, "ns_data": { "id": 2, "ana_state": "optimized", "can_share": true } } ], "mp_policy": "active_passive" } } ] root@ubuntu-pm-18-226:~/param-spdk/spdk# root@ubuntu-pm-18-226:~/param-spdk/spdk# scripts/rpc.py bdev_nvme_stop_mdns_discovery -b cdc_auto root@ubuntu-pm-18-226:~/param-spdk/spdk# root@ubuntu-pm-18-226:~/param-spdk/spdk# scripts/rpc.py bdev_nvme_get_mdns_discovery_info [] root@ubuntu-pm-18-226:~/param-spdk/spdk# scripts/rpc.py bdev_nvme_get_discovery_info [] root@ubuntu-pm-18-226:~/param-spdk/spdk# scripts/rpc.py bdev_get_bdevs [] root@ubuntu-pm-18-226:~/param-spdk/spdk# Signed-off-by: Parameswaran Krishnamurthy <parameswaran.krishna@dell.com> Change-Id: Ic2c2e614e2549a655c7f81ae844b80d8505a4f02 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15703 Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com> Reviewed-by: Boris Glimcher <Boris.Glimcher@emc.com> Reviewed-by: <qun.wan@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2023-01-12 17:22:48 +00:00
Shuhei Matsumoto	e33ae4a6d5	bdev/nvme: Count number of NVMe errors per type or code Error counters for NVMe error was added in the generic bdev layer but we want to know more detailed information for some use cases. Add NVMe error counters per type and per code as module specific statistics. For status codes, the first idea was to have different named member for each status code value. However, it was bad and too hard to test, review, and maintain. Instead, we have just two dimensional uint32_t arrays, and increment one of these uint32_t values based on the status code type and status code. Then, when dump the JSON, we use spdk_nvme_cpl_get_status_string() and spdk_nvme_cpl_get_status_type_string(). This idea has one potential downside. This idea consumes 4 (types) * 256 (codes) * 4 (counter) = 4KB per NVMe bdev. We can make this smarter if memory allocation is a problem. Hence we add an option nvme_error_stat to enable this feature only if the user requests. Additionally, the string returned by spdk_nvme_cpl_get_status_string() or spdk_nvme_cpl_get_status_type_string() has uppercases, spaces, and hyphens. These should not be included in JSON strings. Hence, convert these via spdk_strcpy_replace(). Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I07b07621e777bdf6556b95054abbbb65e5f9ea3e Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15370 Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot	2023-01-10 13:12:05 +00:00
Richael Zhuang	2ebbeba7d9	bdev_nvme: remove io_outstanding from nvme_io_path Revert commit:61b8122dc51 to remove io_outstanding in nvme_io_path, because it's decided to use num_outstanding_reqs in spdk_nvme_qpair instead. Change-Id: Ib3afc6e93d4cb426bb46986faf575737312da6b6 Signed-off-by: Richael Zhuang <richael.zhuang@arm.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15977 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>	2023-01-09 14:49:11 +00:00
Michael Haeuptle	990cd38a8c	bdev_nvme: Support for transport_tos in RPC Added transport_tos parameter to bdev_nvme_set_options and corresponding rpc.py command line. Signed-off-by: Michael Haeuptle <michael.haeuptle@hpe.com> Change-Id: If95eafbd9963fee8d7b230e91ec84dae8713df23 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15949 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2023-01-05 19:54:53 +00:00
Richael Zhuang	61b8122dc5	bdev_nvme: added io_outstanding in nvme_io_path Added io_outstanding in struct nvme_io_path to record outstanding I/O number in each path, which will be used by multipath to select I/O path. io_outstanding gets updated for I/O sent to a namespace and not get updated if sent to a controller. For FLUSH case, it calls bdev_nvme_io_complete() directly and io_outstanding is not updated for this case. spdk_bdev_io_get_buf() is executed in the generic bdev layer. Hence, we do not update io_outstanding for spdk_bdev_io_get_buf(). Change-Id: I47b515e0f254e5daa7e1e88799a832032b23ff34 Signed-off-by: Richael Zhuang <richael.zhuang@arm.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15032 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>	2022-11-24 10:08:43 +00:00
Krzysztof Karas	b5bdbbb959	bdev: enable bdevs based on physical device to generate UUID Add an option "--generate-uuids" to bdev_nvme_set_options RPC to enable generation of UUIDs for NVMes devices that do not provide this value themselves. The identifier is based on a serial number of the device, so a bdev using this NVMe will always be assigned the same UUID. Part of enhancement from #2516. Change-Id: I86d76274e5702d14ace89d83d1e9129573f543e2 Signed-off-by: Krzysztof Karas <krzysztof.karas@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15151 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Community-CI: Mellanox Build Bot	2022-11-18 08:38:13 +00:00
paul luse	a6dbe3721e	update Intel copyright notices per Intel policy to include file commit date using git cmd below. The policy does not apply to non-Intel (C) notices. git log --follow -C90% --format=%ad --date default <file> \| tail -1 and then pull just the 4 digit year from the result. Intel copyrights were not added to files where Intel either had no contribution ot the contribution lacked substance (ie license header updates, formatting changes, etc). Contribution date used "--follow -C95%" to get the most accurate date. Note that several files in this patch didn't end the license/(c) block with a blank comment line so these were added as the vast majority of files do have this last blank line. Simply there for consistency. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: Id5b7ce4f658fe87132f14139ead58d6e285c04d4 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15192 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Community-CI: Mellanox Build Bot	2022-11-10 08:28:53 +00:00
Shuhei Matsumoto	a4ebc5714f	bdev/nvme: Fix race between for_each_channel() and io_device_unregister() If a nvme_ctrlr is unregistered while I/O path caches are clearing, the unregistration would fail. This race was not considered for a case that a nvme_ctrlr is unregsitered when adminq poller started I/O path cache clearing. As a bug fix, control ANA log page updating and I/O path cache clearing separately by adding a new flag io_path_cache_clearing. Fixes issue #2617 Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: Id58684caf9b2a10fa68bdb717288ff2bd799c3f7 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/13924 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Dong Yi <dongx.yi@intel.com>	2022-08-12 09:00:59 +00:00
Shuhei Matsumoto	81e92f6bca	bdev/nvme: Calculate and use active ns count to read ANA log page nvme_ctrlr_init_ana_log_page() had used cdata->mnan as active ns count to allocate a buffer and read a ANA log page into the buffer. However, cdata->mnan was larger than the real active ns count and caused an issue when the corresponding NVMe-oF target set the bit 18 of the SGL support field in the Identify Controller Data structure. We still need to use cdata->mnan to allocate the buffer because number of namespaces may be increased dynamically after initialization. Hence, rename nvme_ctrlr::ana_log_page_size to nvme_ctrlr::max_ana_log_page_size and calculate and use the active ns count to read the ANA log page. Check if the current ana_log_page_size is not larger than nvme_ctrlr->max_ana_log_page_size. Fixes issue #2584 Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: Ieb10b9c793c4f48ffd88d517c0e9a55184b7d935 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/13653 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2022-07-14 09:47:41 +00:00
Jim Harris	488570ebd4	Replace most BSD 3-clause license text with SPDX identifier. Many open source projects have moved to using SPDX identifiers to specify license information, reducing the amount of boilerplate code in every source file. This patch replaces the bulk of SPDK .c, .cpp and Makefiles with the BSD-3-Clause identifier. Almost all of these files share the exact same license text, and this patch only modifies the files that contain the most common license text. There can be slight variations because the third clause contains company names - most say "Intel Corporation", but there are instances for Nvidia, Samsung, Eideticom and even "the copyright holder". Used a bash script to automate replacement of the license text with SPDX identifier which is checked into scripts/spdx.sh. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: Iaa88ab5e92ea471691dc298cfe41ebfb5d169780 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12904 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Dong Yi <dongx.yi@intel.com> Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com> Reviewed-by: <qun.wan@intel.com>	2022-06-09 07:35:12 +00:00
Konrad Sztyber	e5f9e82291	bdev/nvme: add timeout option to start_discovery It's now possible to specify a time to wait until a connection to the discovery controller and the NVM controllers it exposes is made. Whenever that time is exceeded, a callback is immediately executed. However, depending on the stage of the discovery process, we might need to wait a while before actually stopping it (e.g. because a controller attach is in progress). That means that a discovery service might be visible for a while after it timed out. Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com> Change-Id: I2d01837b581e0fa24c8e777730d88d990c94b1d8 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12684 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2022-05-18 07:24:06 +00:00
Shuhei Matsumoto	00d46b80b2	bdev/nvme: Disable automatic failback in multipath mode By default, failback to the preferred I/O path is done automatically if it is restored. Some users may want to keep using the backup I/O path even if the preferred I/O path is restored. In this case, bdev_nvme_set_preferred_path can be used to do manual failback. We may be able to clear/fill I/O path cache more strictly but it will be complicated and have bugs. This patch does the minimal change, just skips an apparent case. Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I78fe5faee6ff04e88ae3d7c6be6da1c20637c912 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12431 Community-CI: Mellanox Build Bot Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2022-05-17 12:54:45 +00:00
Konrad Sztyber	f331ae167b	bdev/nvme: add RPC returning information about discovery service The RPC returns a list of active discovery service connections. Each discovery service is described by a name, its trid, and a list of discovery service trids it refers to. Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com> Change-Id: Ifa4b9501dd353e7b4948ad830575a6c94dafd86b Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12380 Community-CI: Mellanox Build Bot Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>	2022-05-05 14:08:57 +00:00
Shuhei Matsumoto	8f9b977504	bdev/nvme: Add active/active policy for multipath mode The NVMe bdev module supported active-passive policy for multipath mode first. By this patch, the NVMe bdev module supports active-active policy for multipath node next. Following the Linux kernel native NVMe multipath, the NVMe bdev module supports round robin algorithm for active-active policy. The multipath policy, active-passive or active-active, is managed per nvme_bdev. The multipath policy is copied to all corresponding nvme_bdev_channels. Different from active-passive, active-active caches even non_optimized path to provide load balance across multiple paths. Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: Ie18b24db60d3da1ce2f83725b6cd3079f628f95b Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12001 Community-CI: Mellanox Build Bot Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>	2022-05-05 07:11:24 +00:00
Shuhei Matsumoto	22b77a3c80	bdev/nvme: Set preferred I/O path in multipath mode If we specify a preferred path manually for each NVMe bdev, we will be able to realize a simple static load balancing and make the failover more controllable in the multipath mode. The idea is to move I/O path to the NVMe-oF controller to the head of the list and then clear the I/O path cache for each NVMe bdev channel. We can set the I/O path to the I/O path cache directly but it must be conditional and make the code very complex. Hence, let find_io_path() do that. However, a NVMe bdev channel may be acquired after setting the preferred path. To cover such case, sort the nvme_ns list of the NVMe bdev too. This feature supports only multipath mode. The NVMe bdev module supports failover mode too. However, to support the latter, the new RPC needs to have trid as parameters and the code and the usage will be come very complex. Add a note for such limitation. To verify one by one exactly, add unit test. Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: Ia51c74f530d6d7dc1f73d5b65f854967363e76b0 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12262 Community-CI: Mellanox Build Bot Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Reviewed-by: <tanl12@chinatelecom.cn> Reviewed-by: GangCao <gang.cao@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2022-05-05 07:11:24 +00:00
Shuhei Matsumoto	2a6a64485c	bdev/nvme: Add bdev_nvme_get_io_paths RPC to monitor I/O path states Add an new RPC bdev_nvme_get_io_paths to query all active I/O paths. One io_path belongs to One nvme_bdev_channel. Each nvme_bdev_channel is associated with one nvme_bdev. If the RPC bdev_nvme_get_io_paths has a bdev name as a parameter it can use spdk_for_each_channel() simply for the corresponding nvme_bdev. However, users will want to know I/O paths of all nvme_bdevs like the RPC bdev_get_bdevs. One io_path has one nvme_qpair. One nvme_qpair belongs to one nvme_poll_group. By relying on these relationships, the RPC bdev_nvme_get_io_paths traverses all nvme_poll_groups by using spdk_for_each_channel() to g_bdev_nvme_ctrlrs. The RPC bdev_nvme_get_io_paths has two modes, display all or the specified NVMe bdev's active I/O paths. The specified bdev name is used just for comparison and empty array is returned if no matched io_path is found. Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I4a0dbf3ef7aaa9a7b7345fc03dc493cc6d37bc99 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12146 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2022-04-22 09:44:57 +00:00
Shuhei Matsumoto	50b6329ca0	bdev/nvme: Factor out ctrlr info json dump into a helper function Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I7f1e08ff13d890cb780e7b66c18a77ab85c82029 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12311 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2022-04-22 09:44:57 +00:00
Shuhei Matsumoto	13ca6e52d3	bdev/nvme: Handle ANA transition (change or inaccessible state) correctly Previously, if a namespace is in ANA inaccessible state, I/O had been queued infinitely. Fix this issue according to the NVMe spec. Add a temporary poller anatt_timer and a flag ana_transition_timedout for each nvme_ns. Start anatt_timer if the nvme_ns enters ANA transition. If anatt_timer is expired, set ana_transition_timedout to true. Cancel anatt_timer or clear ana_transition_timedout if the nvme_ns exits ANA transition. nvme_io_path_become_available() returns false if ana_transition_timedout is true. Add unit test case to verify these addition. Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: Ic76933242046b3e8e553de88221b943ad097c91c Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12194 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>	2022-04-22 09:44:57 +00:00
Ben Walker	c86778398b	bdev/nvme: Remove ctrlr from nvme_ctrlr_channel This was neither set nor used. Signed-off-by: Ben Walker <benjamin.walker@intel.com> Change-Id: I3119135843c5fc0b8724e593db40df46e6b5bdb0 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12097 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2022-04-07 07:23:56 +00:00
Jim Harris	0bd7ace836	bdev/nvme: add wait_for_attach param to discovery RPC Setting this optional parameter to true makes the RPC completion wait until the attach for all discovered NVM subsystems have completed. This is especially useful for fio or bdevperf, to ensure that all of the namespaces are actually available before testing. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: Icf04a122052f72e263a26b3c7582c81eac32a487 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12044 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>	2022-04-01 10:03:45 +00:00
Jim Harris	9da3d742ff	bdev_nvme: add nvme_ctrlr::from_discovery_service This keeps track if an nvme_ctrlr was created implicitly by the discovery service. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I493b7cacfe563737f45a1fffca98855a1929a751 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11817 Community-CI: Mellanox Build Bot Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>	2022-03-28 17:10:04 +00:00
Jim Harris	13cffc5e76	bdev_nvme: add timeout parameters to start_discovery RPC These parameters will be used for any controller created by the discovery service. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I221b791f38b9c5797ba084c647a98b82c102a121 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11942 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>	2022-03-28 17:10:04 +00:00
Jim Harris	81587f0663	bdev/nvme: always defer start of discovery service We used to wait until the discovery service could connect to the discovery subsystem before calling the callback function provided by the caller (mainly the start_discovery RPC). Moving forward, we will be handling the case where the discovery subsystem is unavailable temporarily. For now, let's not fail the bdev_nvme_start_discovery call if we cannot connect to the discovery subsystem. This will keep the initial service start path the same as the path where the discovery subsystem is temporarily unavailable. In the future, we can consider adding functionality to the start_discovery RPC that waits up to X number of seconds to see if we were able to connect and fail otherwise. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: Icb05523b9d59f508bfbc0233595c8bf58c10488f Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11768 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>	2022-03-21 11:05:17 +00:00
Shuhei Matsumoto	aca0d56e3d	bdev/nvme: Reconnect ctrlr after it is disconnected at completion poller spdk_nvme_ctrlr_disconnect() will be made asynchronous in the following patches and so we will need to have some changes. spdk_nvme_ctrlr_disconnect() disconnects adminq and ctrlr synchronously now. If spdk_nvme_ctrlr_disconnect() is made asynchronous, spdk_nvme_ctrlr_process_admin_completions() will complete to disconnect adminq and ctrlr, and will return -ENXIO only if adminq is disconnected. However even now spdk_nvme_ctrlr_process_admin_completions() returns -ENXIO if adminq is disconnected. So as a preparation, set a callback before calling spdk_nvme_ctrlr_disconnect() and call the callback if it is set and spdk_nvme_ctrlr_process_admin_completions() returns -ENXIO. Besides, fix the return value of bdev_nvme_poll_adminq() in this patch. Change-Id: I2559f86bb8cf9a92b5b386ed816c00b08c9832df Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10950 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2022-03-21 10:49:11 +00:00
Shuhei Matsumoto	a76bbe3553	bdev/nvme: Disconnect and then free I/O qpair in a ctrlr reset sequence As we do when deleting ctrlr_channel, disconnect and then free I/O qpair in a ctrlr reset sequence. Deleting ctrlr_channel and resetting ctrlr_channel may cause conflicts. This patch processes such conflicts correctly. If destroy_ctrlr_channel_cb() is executed between pending and executing reset_destroy_qpair(), reset_destroy_qpair() is not executed because ctrlr_channel is not found. In this case, destroy_qpair_channel() starts disconnecting qpair and deletes ctrlr_channel. Then disconnected_qpair_cb() releases a reference to poll group. If destroy_ctrlr_channel_cb() is excuted between executing reset_destroy_qpair() and disconnected_qpair_cb(), destroy_ctrlr_channel_cb() skips ctrlr_channel for a reset sequence. Change-Id: I1f49f74b94aefbea178680aa53ded3a12876c676 Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10766 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2022-03-21 10:49:11 +00:00
Shuhei Matsumoto	c113e4cdca	bdev/nvme: Alloc qpair context dynamically on nvme_ctrlr_channel This is another preparation to disconnect qpair asynchronously. Add nvme_qpair object and move the qpair and poll_group pointers and the io_path_list list from nvme_ctrlr_channel to nvme_qpair. nvme_qpair is allocated dynamically when creating nvme_ctrlr_channel, and nvme_ctrlr_channel points to nvme_qpair. We want to keep the times of references at I/O path. Change nvme_io_path to point nvme_qpair instead of nvme_ctrlr_channel, and add nvme_ctrlr_channel pointer to nvme_qpair. nvme_ctrlr_channel may be freed earlier than nvme_qpair. nvme_poll_group lists nvme_qpair instead of nvme_ctrlr_channel and nvme_qpair has a pointer to nvme_ctrlr. By using the nvme_ctrlr pointer of the nvme_qpair, a helper function nvme_ctrlr_channel_get_ctrlr() is not necessary any more. Remove it. Change-Id: Ib3f579d3441f31b9db7d3844ec56c49e2bb53a5d Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11832 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2022-03-15 09:05:09 +00:00
Shuhei Matsumoto	0fba8dc8cb	bdev/nvme: I/O error resiliency can be configured by global options Add three options for I/O error resiliency to spdk_nvme_bdev_opts. Then the RPC bdev_nvme_set_options can configure these. These can be overridden if these are given by the RPC bdev_nvme_attach_controller. Change-Id: If3ee23aeef8b7585fe0fb5ec4695df5866fc1e74 Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11830 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2022-03-15 09:05:09 +00:00
Shuhei Matsumoto	00a7998254	bdev/nvme: Move per controller settings into a option structure The following patches will enable us to specify I/O error resiliency options per nvme_ctrlr as global options. To do it easier, move per controller options about I/O error resiliency into struct nvme_ctrlr_opts. prchk_flags is not exactly for resiliency but move it into struct nvme_ctrlr_opts too. Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I85fd1738bb6e293cd804b086ade82274485f213d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11829 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2022-03-09 08:00:45 +00:00
Shuhei Matsumoto	d40292d05a	bdev/nvme: Add prefix "drv_" to instance or pointer of spdk_nvme_ctrlr_opts The following patches will add options per struct nvme_ctrlr in the NVMe bdev module. bdev_opts will be used for it. Additionally, fabrics_connect_timeout_us is set directly to spdk_nvme_ctrlr_opts. So remove it from the RPC request structure. Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I981cda5e69375edc43a8581cd3b43497c38a3d56 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11827 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2022-03-09 08:00:45 +00:00
Alexey Marchuk	2ccaf2acfa	bdev/nvme: Add transport_ack_timeout to bdev_nvme_set_options RPC It may take a long time to detect network transport error when e.g. port is removed on remote target. This timeout depends on 2 parameters - retry_count and ack_timeout. bdev_nvme_set_options supports configuration of retry_count but transport_ack_timeout is missed. Note: this parameter is used by RDMA transport only. Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com> Change-Id: I7c3090dc8e4078f64d444e2392a9e0a6ecdc31c0 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11175 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-by: <tanl12@chinatelecom.cn> Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>	2022-01-31 09:44:28 +00:00
Shuhei Matsumoto	80e81273e2	bdev/nvme: Do not use ctrlr for I/O submission if reconnect failed repeatedly If ctrlr_loss_timeout_sec is set to -1, reconnect is tried repeatedly indefinitely, and I/Os continue to be queued. This patch adds another option fast_io_fail_timeout_sec, a flag fast_io_fail_timedout to nvme_ctrlr. If the time fast_io_fail_timeout_sec passed after starting reset, set fast_io_fail_timedout to true not to use the path for I/O submission. fast_io_fail_timeout_sec is initialized to zero as same as ctrlr_loss_timeout_sec and reconnect_delay_sec. The name of the parameter follows the famous DM-multipath, its fast_io_fail_tmo. Change-Id: Ib870cf8e2fd29300c47f1df69617776f4e67bd8c Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10301 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2022-01-17 14:25:15 +00:00
Shuhei Matsumoto	ae4e54fdc3	bdev/nvme: Retry reconnecting ctrlr after seconds if reset failed Previously reconnect retry was not controlled and was repeated indefinitely. This patch adds two options, ctrlr_loss_timeout_sec and reconnect_delay_sec, to nvme_ctrlr and add reset_start_tsc, reconnect_is_delayed, and reconnect_delay_timer to nvme_ctrlr to control reconnect retry. Both of ctrlr_loss_timeout_sec and reconnect_delay_sec are initialized to zero. This means reconnect is not throttled as we did before this patch. A few more changes are added. Change nvme_io_path_is_failed() to return false if reset is throttled even if nvme_ctrlr is reseting or is to be reconnected. spdk_nvme_ctrlr_reconnect_poll_async() may continue returning -EAGAIN infinitely. To check out such exceptional case, use ctrlr_loss_timeout_sec. Not only ctrlr reset but also non-multipath ctrlr failover is controlled. So we need to include path failover into ctrlr reconnect. When the active path is removed and switched to one of the alternative paths, if ctrlr reconnect is scheduled, connecting to the alternative path is left to the scheduled reconnect. If reset or reconnect ctrlr is failed and the retry is scheduled, switch the active path to one of alternative paths. Restore unit test cases removed in the previous patches. Change-Id: Idec636c4eced39eb47ff4ef6fde72d6fd9fe4f85 Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10128 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>	2022-01-17 14:25:15 +00:00
Jim Harris	932ee64b8f	bdev/nvme: add bdev_nvme_stop_discovery RPC This RPC will stop the specified discovery service, including detaching from any controllers that were attached as part of that discovery service. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I9222876457fc45e1acde680a7bd1925917c22308 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10832 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>	2022-01-12 08:20:23 +00:00
Jim Harris	b68f2eeb0b	bdev_nvme: add bdev_nvme_start_discovery RPC This patch adds the framework for a discovery service in the bdev/nvme module. Users can specify an IP/port of a discovery service. The bdev/nvme module will connect to a discovery controller, get the discovery log page, and then register for AERs. It will connect to each subsystem specified in the initial log page. AER completions will trigger fetching the log page again, at which point new subsystems will be connected to, or removed subsystems will be detached. This patch does the following: * Adds the new start_discovery RPC * Connects to the discovery controller * Gets the discovery log page * Registers for AERs * Detach from discovery controllers at shutdown Subsequent patches in this series will: * Connect to subsystems listed in discovery log page * Detach from subsystems that were listed in earlier discovery log pages but subsequently removed * Add a stop_discovery RPC Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I54bfa896a48c5619676f156b5ea9f2d1f886c72f Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10694 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>	2022-01-10 15:23:39 +00:00
Shuhei Matsumoto	696ad465d7	bdev/nvme: Remove the failover_in_progress flag from struct nvme_ctrlr The failover_in_progress flag is used to decide the return value of bdev_nvme_failover(). bdev_nvme_delete() calls bdev_nvme_failover() with remove=true to remove nvme_ctrlr->active_path_id. However bdev_nvme_failover() returns zero if nvme_ctrlr->failover_in_progress is true. bdev_nvme_failover() may return zero even if it does not remove nvme_ctrlr->active_path_id. The following will be better. bdev_nvme_failover() returns -EBUSY if nvme_ctrlr->resetting is true, and the caller repeats calling bdev_nvme_failover() until the target trid becomes alternative path or bdev_nvme_failover() returns zero. To do that, the failover_in_progress flag is not necessary any more. Removing the failover_in_progress will also simplify the following patches to unify ctrlr reset and failover. Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Change-Id: I57ab944beb1d06ea4def144c81c69705860de35f Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10441 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>	2021-12-08 08:31:24 +00:00
Shuhei Matsumoto	8afa746b4d	bdev/nvme: Use new APIs in a reset ctrlr sequence Replace the spdk_nvme_ctrlr_reset_async() and spdk_nvme_reset_poll_async() calls by the spdk_nvme_ctrlr_disconnect(), spdk_nvme_ctrlr_reconnect_async(), and spdk_nvme_ctrlr_reconnect_poll_async() calls in a reset ctrlr sequence. spdk_nvme_ctrlr_disconnect() can fail if ctrlr is already resetting or removed. But both cases are not possible. reset is controlled and the callback to the hot remove is called when the ctrlr is hot removed. So we assume spdk_nvme_ctrlr_disconnect() always succeed. Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Change-Id: I1299e198597b2a2110f80b9a868e2dae015682ee Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10092 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2021-12-08 08:31:24 +00:00
Shuhei Matsumoto	72e4a4d46a	bdev/nvme: Each nvme_bdev_channel caches its current io_path Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Change-Id: I3ec3a588ff741cf04383e89f5a701e33bf1987a6 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9894 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>	2021-11-17 10:58:12 +00:00
Shuhei Matsumoto	32697257a9	bdev/nvme: ctrlr_channel has a list of io_path pointers This patch enables each nvme_ctrlr_channel to access the underlying nvme_bdev_channels. This change is used to maintain io_path cache of nvme_bdev_channel. Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Change-Id: I22cd3763da1642d4e68dee3a9273e9cc698a4ca8 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9893 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Reviewed-by: GangCao <gang.cao@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>	2021-11-17 10:58:12 +00:00
Kai Li	8f633fa1c3	bdev/nvme: display all ctrlrs for this bdev when dump bdev nvme controller After multipath feature is supported, one bdev will have more than one nvme ctrlr. Fore ease of view, display each ctrlr's trid info. Moreover, rename nvme_bdev_ctrlr_get as nvme_bdev_ctrlr_get_by_name here to keep consistent with nvme_ctrlr_get_by_name. Signed-off-by: Kai Li <lik271@chinatelecom.cn> Change-Id: I417506699bbea6ed13dac0fee942749757d2ae47 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10129 Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2021-11-11 23:24:26 +00:00
Shuhei Matsumoto	84ac18e545	bdev/nvme: Update ANA state if I/O failed by ANA error If I/O got ANA error, ANA state may be out of date. So in this case read ANA log page and update ANA states. Mark nvme_ns to be updating to avoid using while updating ANA state. Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Change-Id: Ia43d38b3a589c84d6d0479dedcced033e76fb194 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9458 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>	2021-10-27 11:53:31 +00:00
Shuhei Matsumoto	f3fec96c20	bdev/nvme: Protect ANA log page from concurrent reads by using an new flag If an I/O failed by ANA error, the corresponding ANA state might be out of date. In the following patches, for this case, read the latest ANA log page and update the ANA state. Such reading ANA log page may be done on multiple threads concurrently including AER ANA change. Hence protect ANA log page by adding an new flag ana_log_page_updating to struct nvme_ctrlr and using it. Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Change-Id: I8bb84091d50a5fdc0d9893b585be972dfd31c0f1 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9526 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>	2021-10-27 11:53:31 +00:00
Shuhei Matsumoto	1e79d21967	bdev/nvme: Use bitfields to pack a few flags of struct nvme_ctrlr This will enable us to add more flags without creating any extra hole. Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Change-Id: I166e2bd3d116c8cebf75bfe4f290b390d9e3888e Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9851 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2021-10-27 11:53:31 +00:00
Shuhei Matsumoto	43adb646b8	bdev/nvme: Retry failed I/O up to retry_count times Add bdev_retry_count to spdk_bdev_nvme_opts and retry_count to nvme_bdev_io, respectively. Set type of both to int because we want use -1 for infinite retry. Set the default value of bdev_retry_count to zero for the backward compatibility. bdev_retry_count is configurable by the RPC bdev_nvme_set_options. Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Change-Id: I9bc746fcea54aa8722c76f79c70c2ae2b375aa53 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9864 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>	2021-10-27 11:53:31 +00:00
Shuhei Matsumoto	4495bda43f	rpc/bdev_nvme: Deprecate retry_count and add transport_retry_count instead retry_count of struct spdk_bdev_nvme_opts controls the number of retries in the transport layer, and is set to transport_retry_count of struct spdk_nvme_ctrlr_opts. The next patch will add bdev_retry_count to struct spdk_bdev_nvme_opts to control the number of retries in the bdev layer. For clarification, rename retry_count to transport_retry_count of struct spdk_bdev_nvme_opts. Then deprecate the retry_count parameter and add and use an new parameter transport_retry_count instead for the RPC bdev_nvme_set_options. Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Change-Id: I0689c54aa1c96ee99d24236e8ff1a594ad7208e4 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9924 Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>	2021-10-27 11:53:31 +00:00
Ben Walker	b098640f05	bdev/nvme: bdev_nvme_detach_controller now understands host parameters You can now detach specific paths based on the host parameters. This is useful for two paths to the same target that use different local NICs. Change-Id: I4858bfda7d940052ca77ffb0bbe764a688fb315d Signed-off-by: Ben Walker <benjamin.walker@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9827 Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>	2021-10-22 04:28:22 +00:00

1 2

89 Commits