ACWU and NACWU are 0-based values. But spdk_bdev_get_acwu()
specifies the compare-and-write-unit in terms of blocks
(i.e. 1-based). So the bdev/nvme module needs to add 1
to this value before registering the bdev.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I7c19975a2bd8c09bb65374838fe20aad690d1ecf
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12384
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
When sending a fused compare and write command, we pass a callback
bdev_nvme_comparev_and_writev_done that we expect to be called twice
before marking the io as completed. In order to detect if a call to
bdev_nvme_comparev_and_writev_done is the first or the second one, we
currently rely on the opcode in cdw0. However, cdw0 may be set to 0,
especially when aborting the command. This may cause use-after-free
issues and this may call the user callbacks twice instead of once.
Use a bit in the nvme_bdev_io instead to keep track of the number of
calls to bdev_nvme_comparev_and_writev_done.
Signed-off-by: Alex Michon <amichon@kalrayinc.com>
Change-Id: I0474329e87648e44b08998d0552b2a9dd5d34ac2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12180
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Add an new RPC bdev_nvme_get_io_paths to query all active I/O paths.
One io_path belongs to One nvme_bdev_channel.
Each nvme_bdev_channel is associated with one nvme_bdev.
If the RPC bdev_nvme_get_io_paths has a bdev name as a parameter
it can use spdk_for_each_channel() simply for the corresponding
nvme_bdev.
However, users will want to know I/O paths of all nvme_bdevs like
the RPC bdev_get_bdevs.
One io_path has one nvme_qpair. One nvme_qpair belongs to one
nvme_poll_group. By relying on these relationships, the RPC
bdev_nvme_get_io_paths traverses all nvme_poll_groups by using
spdk_for_each_channel() to g_bdev_nvme_ctrlrs.
The RPC bdev_nvme_get_io_paths has two modes, display all or
the specified NVMe bdev's active I/O paths.
The specified bdev name is used just for comparison and empty
array is returned if no matched io_path is found.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I4a0dbf3ef7aaa9a7b7345fc03dc493cc6d37bc99
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12146
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
NVMe bdev name already includes the name of the NVMe bdev controller and
the NSID. CNTLID will be a good ID to identify a namespace from a NVMe
bdev when multipath is configured. However, the query RPCs,
bdev_get_bdevs and bdev_nvme_get_controllers had not returned such
information.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I2f2e355ff13f69ced616be803a3152c838cdc980
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12276
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Previously, if a namespace is in ANA inaccessible state, I/O had been
queued infinitely. Fix this issue according to the NVMe spec.
Add a temporary poller anatt_timer and a flag ana_transition_timedout for
each nvme_ns.
Start anatt_timer if the nvme_ns enters ANA transition. If anatt_timer
is expired, set ana_transition_timedout to true. Cancel anatt_timer or
clear ana_transition_timedout if the nvme_ns exits ANA transition.
nvme_io_path_become_available() returns false if ana_transition_timedout
is true.
Add unit test case to verify these addition.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ic76933242046b3e8e553de88221b943ad097c91c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12194
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
We may fail creating qpair when adding io_path while creating a bdev_channel
if connection is down. But if we enable I/O error recovery, we can retry
creating qpair later.
So let nvme_qpair_create() succeed if the ctrlr is being reset or
I/O error recovery is enabled even if creating qpair failed.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I7d4ff036187bb79ada258cfc299582b4d287018b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12288
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: <tanl12@chinatelecom.cn>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Previously, only if admin qpair gets error, bdev_nvme_failover() was
called.
However, I/O qpair may get error earlier than admin qpair. In this case,
bdev_nvme_failover() was called but reset was already in progress. So
bdev_nvme_failover() returned without doing anything.
bdev_nvme_reset_complete() executes bdev_nvme_failover() if reset
failed. However the test time of test/nvmf/host/failover.sh was very
short. Timeout came before trying bdev_nvme_failover().
We can replace other bdev_nvme_reset() calls by bdev_nvme_failover()
but this patch focuses on the critical case.
Fixes issue #2128.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I68f54bbf54f92343aa56ae41f2b4cd92421c4bbb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12295
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Setting this optional parameter to true makes the
RPC completion wait until the attach for all
discovered NVM subsystems have completed.
This is especially useful for fio or bdevperf, to
ensure that all of the namespaces are actually
available before testing.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Icf04a122052f72e263a26b3c7582c81eac32a487
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12044
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
After the change that the NVMe bdev module disconnects qpair asynchronously,
disconnected_qpair_cb() got NOTICELOG always when a qpair was disconnected
and freed. This was very noisy.
We have three cases that disconnected_qpair_cb() is called now, 1) qpair
was destroyed in a full ctrlr reset sequence, 2) the upper layer closed
I/O channel, and 3) qpair detected error, and was disconnected and freed.
Get NOTICELOG for 3) but get DEBUGLOG for 1) and 2) with some rewording.
Additionally, to improve readability, change if-else ordering.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ib63bcfd4b72a82a13d3cda208c71cdb40a42fd6b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12085
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
We want to be able to save the discovery configuration
in a generated JSON-RPC file.
The obvious change needed here is to add a
bdev_nvme_start_discovery RPC to the config file
for each discovery context.
But we also need to make sure we do not emit
bdev_nvme_attach_controller RPCs for controllers
that were attached via the discovery service. These
controllers will be attached by the discovery service
instead - or maybe not at all if the discovery
log page returns different results.
Do both of these changes here, since they are
somewhat tied to each other.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ic2072150c3efdd0a8d01da09e33a647e4929779b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11818
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
This keeps track if an nvme_ctrlr was created
implicitly by the discovery service.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I493b7cacfe563737f45a1fffca98855a1929a751
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11817
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
These parameters will be used for any controller created
by the discovery service.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I221b791f38b9c5797ba084c647a98b82c102a121
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11942
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Attempt to start a connection once per second, but
after a connection is successfully started, change
the timer period to one millisecond instead.
This ensures lower response time to AER events when
the discovery controller is operational, but then
decreasing rate of unsuccessful connect attempts (and
associated log messages) if/when a discovery controller
fails.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ie24036303f5b00f4a42b6575656f401ea4d578f2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11774
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Also cycle through the discovery paths if the initial
connect_async() operation fails.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I50f36949d9bba0e3bff81505712076f1a1a7aad5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11773
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
If a discovery controller fails at some point, we
will want to detach it. This can happen separately
from detaching the controller because we are stopping
the discovery service. So break out the ctrlr
detach operation into a separate phase of the
discovery_poller.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ia601b767d32bda1c8899d3a95029781c0aeee136
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11772
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Bdev modules must not access internal bdev_io
structure, so add a new pointer in a public
section. Pointer in internal section will be
used in next patch
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ib631563015b3e5fa9300d22b7ae59d8db43c8275
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10421
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Detach and stop are two different operations. This
->detach field was used to denote when the associated
discovery service should be stopped. So call the field
'stop' instead. That may trigger the currently
attached discovery controller to be detached.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I61c7fc860cd9dbcfab71eedfd223c06c51a41f27
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11771
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
We will now keep a list of the possible paths to
the discovery subsystem. One of them will be the
path we are currently connected to (which at service
start, is the path specified by the user).
Additional entries are added for discovery log page
entries referencing the discovery subsystem.
When the discovery service starts, we just have the
initial entry in the list - the discovery poller
tries to connect to it, and if the connect starts
successfully, removes it from the list and points
ctx->entry_ctx_in_use to it.
This will be useful later when we want to iterate
through the available paths to the discovery
subsystem if the current path fails.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I5b18e0f20c4607e29ac0f12f27ba7eb169d0206d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11770
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
This reduces some code duplication since the same
function will be reused in an upcoming patch.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Id6764171ff93c95de49792a4488f2c205b8eddb6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11769
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
We used to wait until the discovery service could
connect to the discovery subsystem before calling
the callback function provided by the caller (mainly
the start_discovery RPC).
Moving forward, we will be handling the case where
the discovery subsystem is unavailable temporarily.
For now, let's not fail the bdev_nvme_start_discovery
call if we cannot connect to the discovery subsystem.
This will keep the initial service start path the same
as the path where the discovery subsystem is temporarily
unavailable. In the future, we can consider adding
functionality to the start_discovery RPC that waits
up to X number of seconds to see if we were able to
connect and fail otherwise.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Icb05523b9d59f508bfbc0233595c8bf58c10488f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11768
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
For RDMA transport, adminq will find transport error first because
usually only adminq polls CM events.
Change-Id: I7b22cc8883bf02198f1a90d2654c1de6f2e736e6
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11331
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This is a preparation to the following patches.
Change-Id: I1bb0052c745d4f83ff621e4110907a8ac1f1d597
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11330
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
If qpair is disconnected asynchronously, it takes time from detecting
transport error to actually disconnected. We should avoid using the
path as soon as possible after detecting any transport error.
Poll group clears I/O path cache if it finds transport error and avoid
using the path which had transport error.
These changes will reduce the failover time.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I00580159a84372a115ed5e62a6ce13eed4368999
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11329
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
spdk_nvme_ctrlr_disconnect() will be made asynchronous in the
following patches and so we will need to have some changes.
spdk_nvme_ctrlr_disconnect() disconnects adminq and ctrlr synchronously
now.
If spdk_nvme_ctrlr_disconnect() is made asynchronous,
spdk_nvme_ctrlr_process_admin_completions() will complete to disconnect
adminq and ctrlr, and will return -ENXIO only if adminq is disconnected.
However even now spdk_nvme_ctrlr_process_admin_completions() returns
-ENXIO if adminq is disconnected.
So as a preparation, set a callback before calling spdk_nvme_ctrlr_disconnect()
and call the callback if it is set and spdk_nvme_ctrlr_process_admin_completions()
returns -ENXIO.
Besides, fix the return value of bdev_nvme_poll_adminq() in this patch.
Change-Id: I2559f86bb8cf9a92b5b386ed816c00b08c9832df
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10950
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
As we do when deleting ctrlr_channel, disconnect and then free I/O
qpair in a ctrlr reset sequence.
Deleting ctrlr_channel and resetting ctrlr_channel may cause conflicts.
This patch processes such conflicts correctly.
If destroy_ctrlr_channel_cb() is executed between pending and executing
reset_destroy_qpair(), reset_destroy_qpair() is not executed because
ctrlr_channel is not found. In this case, destroy_qpair_channel()
starts disconnecting qpair and deletes ctrlr_channel. Then
disconnected_qpair_cb() releases a reference to poll group.
If destroy_ctrlr_channel_cb() is excuted between executing reset_destroy_qpair()
and disconnected_qpair_cb(), destroy_ctrlr_channel_cb() skips
ctrlr_channel for a reset sequence.
Change-Id: I1f49f74b94aefbea178680aa53ded3a12876c676
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10766
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
When connection is disconnected, bdev_nvme will call
bdev_nvme_failover, and then reset the controller.
nvme_ctrlr->reset_start_tsc should be updated in function
bdev_nvme_failover, then bdev_nvme_check_xxx_timeout can
work well.
Change-Id: I99b639545e9dd4082cdc14696bb7872cb4917b1d
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11957
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
For RDMA transport, current synchronous qpair disconnect occupied CPU for
a second when qpair disconnect gets timeout.
To remove this limitation, we will do the following:
- make spdk_nvme_ctrlr_disconnect_io_qpair() asynchronous,
- spdk_nvme_qpair_process_completions() returns -ENXIO only if the
qpair is actually disconnected.
Even at this patch, spdk_nvme_poll_group_process_completions() invokes
disconnected_qpair_cb only if a qpair is actually disconnected. This
behavior will be maintained.
To use the upcoming asynchronous qpair disconnect easily, when
deleting a ctrlr_channel, disconnect the qpair, and then free the qpair
and release a reference to the poll group when the qpair is actually
disconnected.
We need to delete a nvme_qpair asynchronously after the corresponding
nvme_ctrlr_channel is deleted and defer the deletion of the corresponding
nvme_ctrlr until the nvme_qpair is deleted. To satisfy this requirement,
utilize the reference count of the nvme_ctrlr.
disconnected_qpair_cb() may call spdk_nvme_ctrlr_free_io_qpair() and
spdk_io_device_unregister() successively. The spdk_io_device_unregister()
will execute spdk_nvme_detach_async() from its callback.
spdk_nvme_ctrlr_free_io_qpair() has to complete earlier than
spdk_nvme_detach_async() starts. spdk_nvme_ctrlr_free_io_qpair() is
executed after unwinding stack. spdk_nvme_detach_async() is executed
after sending a message. Sending message is later than unwinding stack.
Hence the requirement is satisfied naturally.
spdk_io_device_unregister() for the nvme_ctrlr is required to be called
on the nvme_ctrlr->thread. To satisfy this requirement, redirect
nvme_ctrlr_unregister() to the nvme_ctrlr->thread. This change is too
small to stand as an independent patch. So include the change in this
patch.
Change-Id: Id8c01966c40b1dae9c4ef17f1b0b3f60a0bd17d5
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10765
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This is another preparation to disconnect qpair asynchronously.
Add nvme_qpair object and move the qpair and poll_group pointers and
the io_path_list list from nvme_ctrlr_channel to nvme_qpair. nvme_qpair
is allocated dynamically when creating nvme_ctrlr_channel, and
nvme_ctrlr_channel points to nvme_qpair.
We want to keep the times of references at I/O path. Change nvme_io_path
to point nvme_qpair instead of nvme_ctrlr_channel, and add
nvme_ctrlr_channel pointer to nvme_qpair.
nvme_ctrlr_channel may be freed earlier than nvme_qpair. nvme_poll_group
lists nvme_qpair instead of nvme_ctrlr_channel and nvme_qpair has a
pointer to nvme_ctrlr.
By using the nvme_ctrlr pointer of the nvme_qpair, a helper function
nvme_ctrlr_channel_get_ctrlr() is not necessary any more. Remove it.
Change-Id: Ib3f579d3441f31b9db7d3844ec56c49e2bb53a5d
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11832
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
The following patches will have the following changes.
Add nvme_qpair object and move qpair and poll_group pointers and
the io_path_list list from nvme_ctrlr_channel to nvme_qpair. nvme_qpair
is allocated dynamically when creating nvme_ctrlr_channel, and
nvme_ctrlr_channel points to nvme_qpair.
qpair is disconnected asynchronously and nvme_ctrlr_channel is
deleted asynchronously.
To make the following patches simpler, refactor two functions,
bdev_nvme_create_ctrlr_channel_cb() and
bdev_nvme_destroy_ctrlr_channel_cb(). The details are as follows.
Factor out nvme_qpair_create() from bdev_nvme_create_ctrlr_channel_cb()
and factor out nvme_qpair_delete() from
bdev_nvme_destroy_ctrlr_channel_cb(). Then reorder a few operation
in these.
Additionally, reorder a operation in _bdev_nvme_add_io_path().
Change-Id: Idf0328fa77a54f40fe52ca72c3842dde82d55972
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11831
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
In the following patches, spdk_nvme_ctrlr_disconnect_io_qpair() will
be changed to be asynchronous, spdk_nvme_ctrlr_disconnect_io_qpair()
will be called first and then spdk_nvme_ctrlr_free_io_qpair() after
the qpair is actually disconnected.
We will not be able to keep the current bdev_nvme_destroy_qpair()
function.
As a preparation, inline bdev_nvme_destroy_qpair() and remove it.
Additionally, this patch has the following changes.
Previously I/O qpair was freed and then I/O path caches were cleared.
Both are SPDK thread local. So there is no dependency for the ordering
of these two operations. However, it will reduce the size of the
following patches if we clear I/O path caches before freeing I/O qpair
when the qpair is disconnected. Hence we clear I/O path caches and then
free I/O qpair.
Remove DTRACE for bdev_nvme_destroy_qpair() for now.
It will be restored in the following patches.
Furthermore, fix potential NULL pointer acces in
bdev_nvme_create_qpair().
Change-Id: I0ab78ccb0d240e56b95b53179341afcd909a31f6
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10746
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Add three options for I/O error resiliency to spdk_nvme_bdev_opts.
Then the RPC bdev_nvme_set_options can configure these.
These can be overridden if these are given by the RPC bdev_nvme_attach_controller.
Change-Id: If3ee23aeef8b7585fe0fb5ec4695df5866fc1e74
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11830
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This is not in the fast path, so using INFOLOG
instead of DEBUGLOG allows these messages to be
enabled in release builds.
While here, set this flag in the discovery.sh
test script so that we get better information if
there are test failures.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I1c0d087b5c0cb40118691f4a1bc16adc2fdaad9c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11932
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
The following patches will enable us to specify I/O error resiliency
options per nvme_ctrlr as global options. To do it easier, move
per controller options about I/O error resiliency into struct nvme_ctrlr_opts.
prchk_flags is not exactly for resiliency but move it into struct
nvme_ctrlr_opts too.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I85fd1738bb6e293cd804b086ade82274485f213d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11829
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The following patches will add options per struct nvme_ctrlr in the
NVMe bdev module. bdev_opts will be used for it.
Additionally, fabrics_connect_timeout_us is set directly to
spdk_nvme_ctrlr_opts. So remove it from the RPC request structure.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I981cda5e69375edc43a8581cd3b43497c38a3d56
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11827
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
If 0 - UINT32_MAX or UINT32_MAX - 0 is substituted into a int variable,
we cannot get any expected result.
Fix the bug and add unit test case to verify the fix.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ib045273238753e16755328805b38569909c8b83a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11836
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
This will be helpful in later patches, when we handle
detach not just at discovery service stop, but also
when a discovery controller is disconnected.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ie62d62f73b328c6e058f6480c61fbdf91e854e2a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11767
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
If the path to a subsystem changes from one discovery
log to the next, we should add the new paths first,
and only then remove paths. This ensures we don't
remove the last path to a subsystem, causing associated
bdevs to get unregisterd and reregistered.
This requires adding a new log_page member to
discovery_ctx, since we now need to walk the log
page to find removed paths after all the new paths
are attached.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I99fc2e40e6f7e2e26d558ebe7bc5208fe474c0ea
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11766
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
These macros are used to prefix the following to
any discovery-related DEBUGLOG or ERRLOG:
Discovery[127.0.0.1:8009]
Inside the brackets are the traddr and trsvcid of
the discovery service associated with that message.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ib1991a13f550bb8c9aaf1194a56b218cbd71c96c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11733
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
This is useful for adding trid details to discovery
related log messages in a later patch.
Future patches will update this trid if the
current discovery ctrlr fails and we need to fail
over to a different path to the discovery subsystem.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I51712bab2d891ae9c683f8716b4228741f64e7db
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11732
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
For now, just allocate entries and put them on a new TAILQ on
the discovery_ctx. Future patches will use these to try
to reattach to the discovery subsystem if the current discovery
ctrlr fails.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I3cd841df2260bbe8a497bbbf36dea4a1081f25c0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11731
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
It will be referenced in a second location in
an upcoming patch, so move its definition now to
reduce the size of that patch and avoid a forward
declaration.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iae12cc613190c03f0d48d71475df98384f8e47c7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11730
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
This name better describes the purpose of this structure.
Currently it is used to represent discovery log page entries
for NVM subsystems found by the discovery service. Upcoming
patches will also use this structure to represent discovery
log page entries for the discovery subsystem.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I84996c9968200c50c32427f0233cb707cdc2d54c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11547
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
For now, if the discovery service finds a discovery subsystem,
don't connect to it. Support for nested discovery controllers
will be coming soon, but for now we need to make sure we don't
try to connect to a discovery subsystem as if it was an NVM
subsystem.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I00234718b0e39eda6e1cb1b1150a4fadcf6d8b11
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11546
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
This is a bug fix. free() was called to the object allocated by
spdk_malloc().
Hence
free(): invalid pointer: 0x00002000146ece00
was printed.
This was found during multipath testing.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Icf6aa6dcdda728fef91b3acad7a1f1ee219c27af
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11710
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
In a test case, test/nvmf/host/failover.sh, we got ANA error even if
the target did not enable ANA reporting.
We marked the corresponding namespace as ANA state updating but we had
no way to clear it.
Check if we can read ANA log page before setting the flag.
If read ANA log page failed, disable ANA feature until the nvme_ctrlr
is created again. In this operation, all ana_state_updating flags are
cleared.
Fixes#2335
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I4e2608a35d9dfa0395ad74fceebae9faf8cd973c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11399
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
It may take a long time to detect network transport error
when e.g. port is removed on remote target. This timeout
depends on 2 parameters - retry_count and ack_timeout.
bdev_nvme_set_options supports configuration of retry_count
but transport_ack_timeout is missed. Note: this parameter
is used by RDMA transport only.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I7c3090dc8e4078f64d444e2392a9e0a6ecdc31c0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11175
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: <tanl12@chinatelecom.cn>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
This patch aligns namespace comparison with Linux kernel
implementation:
- UUID is optional and may be NULL
- command set (CSI) should be the same
Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com>
Change-Id: I8f889989f24cd51b104057217f87eb303b30fa68
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11312
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Add more dtrace probes to help with identifying issues
in production.
Change-Id: I8fb621a15c5e33ae94d75b4fc31135e2635dcfce
Signed-off-by: Krzysztof Karas <krzysztof.karas@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10561
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Now that we have a much more robust retry framework,
set the default bdev_retry_count to 3. Users can
still override this default with the bdev_nvme_set_options
RPC as before. This ensures that by default, we will
retry I/O when possible.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I045bf4969d02be32b951e72a148ce6b6e251dec1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11107
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
If ctrlr_loss_timeout_sec is set to -1, reconnect is tried repeatedly
indefinitely, and I/Os continue to be queued.
This patch adds another option fast_io_fail_timeout_sec, a flag
fast_io_fail_timedout to nvme_ctrlr.
If the time fast_io_fail_timeout_sec passed after starting reset,
set fast_io_fail_timedout to true not to use the path for I/O submission.
fast_io_fail_timeout_sec is initialized to zero as same as
ctrlr_loss_timeout_sec and reconnect_delay_sec.
The name of the parameter follows the famous DM-multipath, its fast_io_fail_tmo.
Change-Id: Ib870cf8e2fd29300c47f1df69617776f4e67bd8c
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10301
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Previously reconnect retry was not controlled and was repeated indefinitely.
This patch adds two options, ctrlr_loss_timeout_sec and reconnect_delay_sec,
to nvme_ctrlr and add reset_start_tsc, reconnect_is_delayed, and
reconnect_delay_timer to nvme_ctrlr to control reconnect retry.
Both of ctrlr_loss_timeout_sec and reconnect_delay_sec are initialized to
zero. This means reconnect is not throttled as we did before this patch.
A few more changes are added.
Change nvme_io_path_is_failed() to return false if reset is throttled
even if nvme_ctrlr is reseting or is to be reconnected.
spdk_nvme_ctrlr_reconnect_poll_async() may continue returning -EAGAIN
infinitely. To check out such exceptional case, use ctrlr_loss_timeout_sec.
Not only ctrlr reset but also non-multipath ctrlr failover is controlled.
So we need to include path failover into ctrlr reconnect.
When the active path is removed and switched to one of the alternative paths,
if ctrlr reconnect is scheduled, connecting to the alternative path is left
to the scheduled reconnect.
If reset or reconnect ctrlr is failed and the retry is scheduled,
switch the active path to one of alternative paths.
Restore unit test cases removed in the previous patches.
Change-Id: Idec636c4eced39eb47ff4ef6fde72d6fd9fe4f85
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10128
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
This is a clean up as a preparation to the following patches.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ib8bc90e17f52086d4e887463e04f65273bb1079b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11068
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
We noticed the difference between the SPDK 21.10 and the latest master
in a test.
The simplified scenario is as follows:
1. Start SPDK NVMe-oF target
2. Run bdevperf for the target with -f parameter to suppress exit
on failure.
3. Kill the target after I/O started.
With the SPDK 21.10, bdevperf retries failed I/Os and exits after
the test time is over.
With the latest SPDK master, bdevperf hungs and does not exit even
after the test time is over.
The cause was as follows:
reset ctrlr is repeated very quickly (once per 10ms by default) and hence
I/Os were queued infinitely because nvme_io_path_is_failed() returned
false if nvme_ctrlr is resetting.
We should queue I/O when nvme_ctrlr is resetting only if reset is throttoled
and fail-fast for the repeated failures is supported.
Hence in this patch, fix the degradation and remove the related unit
test cases.
Reported-by: Evgeniy Kochetov <evgeniik@nvidia.com>
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I4047d42dc44488a05264c6a841d101a7c371358b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11062
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This RPC will stop the specified discovery service,
including detaching from any controllers that were
attached as part of that discovery service.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I9222876457fc45e1acde680a7bd1925917c22308
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10832
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
If a path whose namespace is optimized is restored, the corresponding
I/O path cache should be cleared and the path should be chosen as the
optimal path.
This bug was found by a system test.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ibc3983dbff3418adb090a09df32c2a92a8910d05
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11004
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Rename a few functions for a full ctrlr reset sequence to
clarify what we do and make the following patches easier.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I051e3ab68c3cd77fd6040a2d069d50a700123ae6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10920
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We sort secondary trids to avoid using disconnected trids for failover.
However the sort had a bug.
This bug was found by running test/nvmf/host/multipath.sh in a loop.
Verify the fix by adding unit test.
Fixes#2300
Signed-off-by: Shuhei Matsumoto <shuheimatsumoto@gmail.com>
Change-Id: I22b0ede4d2ef98b786c3e0d1f5337a2d568ba56d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10921
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
This patch adds the framework for a discovery
service in the bdev/nvme module.
Users can specify an IP/port of a discovery service.
The bdev/nvme module will connect to a discovery
controller, get the discovery log page, and then
register for AERs. It will connect to each
subsystem specified in the initial log page.
AER completions will trigger fetching the log
page again, at which point new subsystems will
be connected to, or removed subsystems will be
detached.
This patch does the following:
* Adds the new start_discovery RPC
* Connects to the discovery controller
* Gets the discovery log page
* Registers for AERs
* Detach from discovery controllers at shutdown
Subsequent patches in this series will:
* Connect to subsystems listed in discovery log page
* Detach from subsystems that were listed in earlier
discovery log pages but subsequently removed
* Add a stop_discovery RPC
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I54bfa896a48c5619676f156b5ea9f2d1f886c72f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10694
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
If qpair creation failed, ctrlr_ch remains in group->ctrlr_ch_list
but memory for ctrlr_ch is freed. Next attempt to get ctrlr's io
channel will modify data in already freed memory and may corrupt
another allocation.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I85002f2e6ac86a0ffda6dabfa57e79b59074fb5a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10840
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuheimatsumoto@gmail.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
It is possible that the application calls
get_io_channel during nvme controller reset.
In that case IO qpair won't be created and the
application will get a NULL pointer.
It is possible to repeat get_io_channel later but
there is no such indiciation for the application,
so it can't distinguish between a real failure and
"try again" case during controller reset.
This patch ignores IO qpair creation error if
controller is resetting. IO qpair will be created
when reset completes.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Id39202f5a6878453ff54e35df91d5dc49a5f046a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10828
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuheimatsumoto@gmail.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
We will want to use bdev_nvme_create() to attach to
controllers identified through discovery. In this case,
we won't be reporting bdev names back to an RPC caller,
so there's no need to allocate an array of names to be
filled out since they won't be used.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ia386d034df2c2d5a60f9aa18338ba415ec03d763
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10689
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
We will need to add another step in the fini path for
stopping discovery pollers, so this patch prepares for
that.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ifecbbac60262f3aae7f7a7ced09b7a600df7c2e8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10590
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
We set cb_ctx to NULL when calling spdk_nvme_probe_async(). It looks
that nvme_probe_ctx has not been used anywhere for a long time.
nvme_probe_ctx is not public data structure.
Remove nvme_probe_ctx to simplify the code and make the following
patches easier.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I7dd5f970a7fde1c9c189fae3c8f28f84d7aed991
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10554
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
This refactoring will be helpful for the following patches to unify ctrlr
reset and failover and failover trid also when reconnecting ctrlr.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I4623a5dd310ac7516c270ccd3b0541c27cc880d8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10443
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
The failover_in_progress flag is used to decide the return value of
bdev_nvme_failover().
bdev_nvme_delete() calls bdev_nvme_failover() with remove=true to remove
nvme_ctrlr->active_path_id. However bdev_nvme_failover() returns zero
if nvme_ctrlr->failover_in_progress is true. bdev_nvme_failover() may
return zero even if it does not remove nvme_ctrlr->active_path_id.
The following will be better.
bdev_nvme_failover() returns -EBUSY if nvme_ctrlr->resetting is true,
and the caller repeats calling bdev_nvme_failover() until the target trid
becomes alternative path or bdev_nvme_failover() returns zero.
To do that, the failover_in_progress flag is not necessary any more.
Removing the failover_in_progress will also simplify the following
patches to unify ctrlr reset and failover.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I57ab944beb1d06ea4def144c81c69705860de35f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10441
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Checking if nvme_ctrlr can be unregistered is not so simple and
a few changes will be added. So factoring out the check into a
helper function will be valuable.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I310c7e3ad2dae9583df4db575d342c2cb111f3f3
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10461
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
When a I/O or admin passthrough failed, if the corresponding nvme_ctrlr
is not available, we should failover to another path.
When no path was found, if there is at least one nvme_ctrlr which is
not failed, we should wait until it is recovered.
We should improve error recovery not only for multipath (multipath is
"multipath") but also for failover (multipath is omitted or "failover").
To do this easily, clarify the conditions of availability and failure of
nvme_ctrlr and realize them by helper functions.
Use new helper functions for other cases to improve readability too.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I716731f72811d2ec4dfc91f9eadb191d75739af6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10381
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
We had not checked the bit 0 of the Namespace Multipath I/O and
Namespace Sharing Capabilities (NMIC) field in the Identify Namespace
data structure.
If the bit 0 of the NMIC is zero, it is likely that namespaces are not
identical.
We should check if the value of the NMIC first, and do it in this patch.
Additionally, it is not usual if the bit 0 of the CMIC and the bit 0 of
the NMIC do not match. So in unit tests rename the parameter multi_ctrlr
by multipath for ut_attach_ctrlr() and use it for the value of the NMIC.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I6aa7cbcc99be2507dbf18930f7b585a9ea7d0f90
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10380
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
bdev_nvme_reset() deletes all qpairs, reset a ctrlr, and then create
all qpairs. Any qpair may fail to be created, and then the reset
request may fail. However, already created qpairs were left.
Let's delete the already created qpairs and then fail the reset request.
This will make us easier to control reconnect, deley reconnect by
a few seconds, or stop reconnect after repeated failures and then
delete ctrlr.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I414e2281b4bf0cbd1cf461d8fc64a22f43d26d13
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9896
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Replace the spdk_nvme_ctrlr_reset_async() and spdk_nvme_reset_poll_async()
calls by the spdk_nvme_ctrlr_disconnect(), spdk_nvme_ctrlr_reconnect_async(),
and spdk_nvme_ctrlr_reconnect_poll_async() calls in a reset ctrlr
sequence.
spdk_nvme_ctrlr_disconnect() can fail if ctrlr is already resetting or
removed. But both cases are not possible. reset is controlled and the callback
to the hot remove is called when the ctrlr is hot removed. So we assume
spdk_nvme_ctrlr_disconnect() always succeed.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I1299e198597b2a2110f80b9a868e2dae015682ee
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10092
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
spdk_for_each_channel() always passes status=0 to its completion callback
if each channel completes the requested function successfully.
bdev_nvme_reset_destroy_qpair() always succeeds.
Hence bdev_nvme_reset_ctrlr() does not have to check if the passed
status is not zero.
The following patches will aggregate multiple flags into a single
state for nvme_ctrlr. This change will simplify these.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I1c30c9b20c96886516029e69e90dc23d777a69b4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10077
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
In the following patches, we want to retry reconnect if reconnect failed
in a reset ctrlr sequence but we want to delay the retry. While
we wait the delayed retry, we want to quiesce ctrlr completely.
As part of quiesce ctrlr operations, we want to pause adminq poller but
we need to do it on the nvme_ctrlr->thread.
If a reset ctrlr sequence runs on the nvme_ctrlr->thread, we can avoid
redirecting the pending destruct request at completion too.
So we redirect the reset ctrlr sequence into the nvme_ctrlr->thread.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I538b962e2a7b5cf00ebbac2a1e888482ddeeee61
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10075
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
In the following patches, bdev_nvme_reset() will execute the reset ctrlr
operation on the nvme_ctrlr->thread until completion as bdev_nvme_admin_passthru()
does. Hence change the callback bdev_nvme_reset_io_continue() to
redirect to the orig_thread by using bio. Furthermore, use bio->cpl.cdw0
to store the completion status of the reset processing. bdev_nvme_reset()
does not use bio->cpl.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I361cc44494190ba83ad6e360788d78851416c46c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10074
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch supports admin passthrough retry when we get any error
with DNR=0 but ABORTED_BY_REQUEST up to retry_count times.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I1bf29570791fdbe8651fa70c4c8685bb740fb86b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9944
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
When resetting ctrlr, adminq is disconnected first. If adminq is disconnected,
admin passthrough request is rejected with -ENXIO.
But resetting ctrlr may succeed. If resetting ctrlr succeeds, adminq is
connected again, and admin passthrough request will be
submitted successfully.
On the other hand, if ctrlr is failed, admin passthrough request is
rejected with -ENXIO. But when resetting ctrlr, ctrlr is set to unfailed.
Hence bdev_nvme_admin_passthru() skips any ctrlr which is resetting
or failed, and calls bdev_nvme_admin_passthru_complete() with -ENXIO
if no available ctrlr is found.
bdev_nvme_admin_passthru_complete() queues admin passthrough request
and retry it one second later if ctrlr is resetting.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ic748dc4faf29ebf717ae5c29dcf7c55fe2ea9243
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9942
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Separate the admin passthrough case from bdev_nvme_io_complete_nvme_status()
into bdev_nvme_admin_passthru_complete_nvme_status() and from
bdev_nvme_io_complete() into bdev_nvme_admin_passthru_complete(),
respectively.
Then make the return type of bdev_nvme_admin_passthru() to void
by using bdev_nvme_admin_passthru_complete().
Besides, refactor bdev_nvme_admin_passthru() slightly.
These clean up make the following patches simpler.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I79b89ee1b6360aa6ac6fc3c03f0469be99b0c1f2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9899
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
The NVMe bdev module queues retried I/Os itself now.
bdev_nvme_abort() needs to check and abort the target I/O if it
is queued for retry.
This change will cover admin passthrough requests too because they
will be queued on the same thread as their callers and the public
API spdk_bdev_reset() requires to be submitted on the same thread
as the target I/O or admin passthrough requests.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: If37e8188bd3875805cef436437439220698124b9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9913
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
The completion status of spdk_bdev_abort() is SPDK_BDEV_IO_STATUS_SUCCESS
or SPDK_BDEV_IO_STATUS_FAILED if it is successfully submitted.
In the generic bdev layer, spdk_bdev_abort() does not update cdw0 but
just set SPDK_BDEV_IO_STATUS_SUCCESS or SPDK_BDEV_IO_STATUS_FAILED.
In the NVMe bdev module, for the abort request, spdk_bdev_io_complete()
is called instead of spdk_bdev_io_complete_nvme_status() and the
completion status is SPDK_BDEV_IO_STATUS_SUCCESS or
SPDK_BDEV_IO_STATUS_FAILED.
So let's skip updating cdw0 and call spdk_bdev_io_complete() directly
with SPDK_BDEV_IO_STATUS_SUCCESS or SPDK_BDEV_IO_STATUS_FAILED if
bdev_nvme_abort() does not find the target I/O in any ctrlr.
The next patch will fix spdk_bdev_io_get_nvme_status() for the abort
I/O.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I8fb5389cd27d7467cc6ae18e152bd5228f9437f7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9976
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
This patch enables each nvme_ctrlr_channel to access the underlying
nvme_bdev_channels. This change is used to maintain io_path cache
of nvme_bdev_channel.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I22cd3763da1642d4e68dee3a9273e9cc698a4ca8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9893
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
After multipath feature is supported, one bdev will have more than one
nvme ctrlr. Fore ease of view, display each ctrlr's trid info.
Moreover, rename nvme_bdev_ctrlr_get as nvme_bdev_ctrlr_get_by_name here
to keep consistent with nvme_ctrlr_get_by_name.
Signed-off-by: Kai Li <lik271@chinatelecom.cn>
Change-Id: I417506699bbea6ed13dac0fee942749757d2ae47
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10129
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Current code only print the last namespace of nvme bdev, fix the print
way to show all the namespace.
And this patch will be prepared for the next patch to show io path status for multipath, like: which one is the primary or the backup, and the old status and current status,etc.
Signed-off-by: tanlong <948985618@qq.com>
Change-Id: I4fca154df52c929b8d046198934db0e58586c378
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10140
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
If I/O got ANA error, ANA state may be out of date. So in this case
read ANA log page and update ANA states. Mark nvme_ns to be updating
to avoid using while updating ANA state.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ia43d38b3a589c84d6d0479dedcced033e76fb194
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9458
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
If an I/O failed by ANA error, the corresponding ANA state might be
out of date. In the following patches, for this case, read the latest
ANA log page and update the ANA state. Such reading ANA log page may be
done on multiple threads concurrently including AER ANA change.
Hence protect ANA log page by adding an new flag ana_log_page_updating
to struct nvme_ctrlr and using it.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I8bb84091d50a5fdc0d9893b585be972dfd31c0f1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9526
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Add bdev_retry_count to spdk_bdev_nvme_opts and retry_count to
nvme_bdev_io, respectively.
Set type of both to int because we want use -1 for infinite retry.
Set the default value of bdev_retry_count to zero for the backward
compatibility.
bdev_retry_count is configurable by the RPC bdev_nvme_set_options.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I9bc746fcea54aa8722c76f79c70c2ae2b375aa53
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9864
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
retry_count of struct spdk_bdev_nvme_opts controls the number of retries
in the transport layer, and is set to transport_retry_count of struct
spdk_nvme_ctrlr_opts.
The next patch will add bdev_retry_count to struct spdk_bdev_nvme_opts
to control the number of retries in the bdev layer.
For clarification, rename retry_count to transport_retry_count of
struct spdk_bdev_nvme_opts. Then deprecate the retry_count parameter
and add and use an new parameter transport_retry_count instead for
the RPC bdev_nvme_set_options.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I0689c54aa1c96ee99d24236e8ff1a594ad7208e4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9924
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
You can now detach specific paths based on the host parameters. This is
useful for two paths to the same target that use different local NICs.
Change-Id: I4858bfda7d940052ca77ffb0bbe764a688fb315d
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9827
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
The previous patch supported I/O retry when no available io_path
was found at submission.
This patch supports I/O retry when we get I/O path error at completion.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I93a1664944b15ab0a826a321e2ea7a2574263afe
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9850
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
If ANA state is inaccessible or qpair is disconnected, I/O cannot
be submitted.
But if qpair is connected, ANA state may become accessible, or if
qpair is disconnected, it may become connected via resetting.
Hence even if find_io_path() returned NULL, queue I/O and retry it
one second later if qpair is connected or ctrlr is resetting.
Sort retried I/Os by expiration values in ticks, and activate a timed
poller per nvme_bdev_channel only if there is any retried I/O. So
the poller function bdev_nvme_retry_ios() always returns BUSY because
if the poller runs earlier than the closest retried I/O or runs when
there is no retried I/O, it is more like a bug of the framework.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Id28110a0d63ebc1c5772814e2ff8a47934df1644
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9830
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>