Commit Graph

391 Commits

Author SHA1 Message Date
Jim Harris
002b25cc5a bdev_nvme: use INFOLOG for discovery messages
This is not in the fast path, so using INFOLOG
instead of DEBUGLOG allows these messages to be
enabled in release builds.

While here, set this flag in the discovery.sh
test script so that we get better information if
there are test failures.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I1c0d087b5c0cb40118691f4a1bc16adc2fdaad9c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11932
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2022-03-14 08:44:21 +00:00
Shuhei Matsumoto
00a7998254 bdev/nvme: Move per controller settings into a option structure
The following patches will enable us to specify I/O error resiliency
options per nvme_ctrlr as global options. To do it easier, move
per controller options about I/O error resiliency into struct nvme_ctrlr_opts.

prchk_flags is not exactly for resiliency but move it into struct
nvme_ctrlr_opts too.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I85fd1738bb6e293cd804b086ade82274485f213d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11829
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-03-09 08:00:45 +00:00
Shuhei Matsumoto
d40292d05a bdev/nvme: Add prefix "drv_" to instance or pointer of spdk_nvme_ctrlr_opts
The following patches will add options per struct nvme_ctrlr in the
NVMe bdev module. bdev_opts will be used for it.

Additionally, fabrics_connect_timeout_us is set directly to
spdk_nvme_ctrlr_opts. So remove it from the RPC request structure.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I981cda5e69375edc43a8581cd3b43497c38a3d56
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11827
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-03-09 08:00:45 +00:00
Shuhei Matsumoto
1a00f5c094 bdev/nvme: Fix overflow of RB tree comparison when the NSID is very big
If 0 - UINT32_MAX or UINT32_MAX - 0 is substituted into a int variable,
we cannot get any expected result.

Fix the bug and add unit test case to verify the fix.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ib045273238753e16755328805b38569909c8b83a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11836
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
2022-03-09 08:00:45 +00:00
Jim Harris
43d17a844c bdev/nvme: handle detach first in discovery_poller
This will be helpful in later patches, when we handle
detach not just at discovery service stop, but also
when a discovery controller is disconnected.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ie62d62f73b328c6e058f6480c61fbdf91e854e2a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11767
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
2022-03-08 07:52:49 +00:00
Jim Harris
a96c1bfdbd bdev/nvme: change order of add/remove for discovery
If the path to a subsystem changes from one discovery
log to the next, we should add the new paths first,
and only then remove paths.  This ensures we don't
remove the last path to a subsystem, causing associated
bdevs to get unregisterd and reregistered.

This requires adding a new log_page member to
discovery_ctx, since we now need to walk the log
page to find removed paths after all the new paths
are attached.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I99fc2e40e6f7e2e26d558ebe7bc5208fe474c0ea
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11766
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
2022-03-08 07:52:49 +00:00
Jim Harris
84bec316c2 bdev/nvme: add additional DEBUGLOGs for discovery
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iba16c5f3273fe2335b847b6bd396e45aa97da7c9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11734
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2022-02-28 11:06:16 +00:00
Jim Harris
bcb75753dc bdev/nvme: add DISCOVERY_DEBUGLOG/ERRLOG
These macros are used to prefix the following to
any discovery-related DEBUGLOG or ERRLOG:

Discovery[127.0.0.1:8009]

Inside the brackets are the traddr and trsvcid of
the discovery service associated with that message.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ib1991a13f550bb8c9aaf1194a56b218cbd71c96c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11733
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2022-02-28 11:06:16 +00:00
Jim Harris
9614dca9f9 bdev/nvme: save the trid of the discovery service
This is useful for adding trid details to discovery
related log messages in a later patch.

Future patches will update this trid if the
current discovery ctrlr fails and we need to fail
over to a different path to the discovery subsystem.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I51712bab2d891ae9c683f8716b4228741f64e7db
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11732
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2022-02-28 11:06:16 +00:00
Jim Harris
a0690464dd bdev/nvme: allocate discovery_entry_ctx for discovery subsystems
For now, just allocate entries and put them on a new TAILQ on
the discovery_ctx.  Future patches will use these to try
to reattach to the discovery subsystem if the current discovery
ctrlr fails.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I3cd841df2260bbe8a497bbbf36dea4a1081f25c0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11731
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
2022-02-28 11:06:16 +00:00
Jim Harris
21aa2ba37e bdev/nvme: move discovery_attach_cb up in file
It will be referenced in a second location in
an upcoming patch, so move its definition now to
reduce the size of that patch and avoid a forward
declaration.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iae12cc613190c03f0d48d71475df98384f8e47c7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11730
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
2022-02-28 11:06:16 +00:00
Jim Harris
9b04bd8d5c bdev_nvme: rename discovery_ctrlr_ctx to discovery_entry_ctx
This name better describes the purpose of this structure.
Currently it is used to represent discovery log page entries
for NVM subsystems found by the discovery service.  Upcoming
patches will also use this structure to represent discovery
log page entries for the discovery subsystem.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I84996c9968200c50c32427f0233cb707cdc2d54c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11547
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
2022-02-28 11:06:16 +00:00
Jim Harris
09240a1c3c bdev/nvme: don't connect to discovered discovery subsystems
For now, if the discovery service finds a discovery subsystem,
don't connect to it.  Support for nested discovery controllers
will be coming soon, but for now we need to make sure we don't
try to connect to a discovery subsystem as if it was an NVM
subsystem.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I00234718b0e39eda6e1cb1b1150a4fadcf6d8b11
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11546
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
2022-02-28 11:06:16 +00:00
Shuhei Matsumoto
ecdbaa2310 bdev/nvme: Call spdk_free() to the object allocated by spdk_malloc()
This is a bug fix. free() was called to the object allocated by
spdk_malloc().

Hence

  free(): invalid pointer: 0x00002000146ece00

was printed.

This was found during multipath testing.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Icf6aa6dcdda728fef91b3acad7a1f1ee219c27af
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11710
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-02-24 14:56:03 +00:00
Shuhei Matsumoto
79829ae40b bdev/nvme: Set ana_state_updating only after starting read ANA log page
In a test case, test/nvmf/host/failover.sh, we got ANA error even if
the target did not enable ANA reporting.

We marked the corresponding namespace as ANA state updating but we had
no way to clear it.

Check if we can read ANA log page before setting the flag.

If read ANA log page failed, disable ANA feature until the nvme_ctrlr
is created again. In this operation, all ana_state_updating flags are
cleared.

Fixes #2335

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I4e2608a35d9dfa0395ad74fceebae9faf8cd973c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11399
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-02-09 18:06:15 +00:00
Alexey Marchuk
2ccaf2acfa bdev/nvme: Add transport_ack_timeout to bdev_nvme_set_options RPC
It may take a long time to detect network transport error
when e.g. port is removed on remote target. This timeout
depends on 2 parameters - retry_count and ack_timeout.
bdev_nvme_set_options supports configuration of retry_count
but transport_ack_timeout is missed. Note: this parameter
is used by RDMA transport only.

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I7c3090dc8e4078f64d444e2392a9e0a6ecdc31c0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11175
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: <tanl12@chinatelecom.cn>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2022-01-31 09:44:28 +00:00
Evgeniy Kochetov
08f9b40113 bdev/nvme: Fix namespace comparison
This patch aligns namespace comparison with Linux kernel
implementation:
- UUID is optional and may be NULL
- command set (CSI) should be the same

Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com>
Change-Id: I8f889989f24cd51b104057217f87eb303b30fa68
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11312
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2022-01-27 18:53:41 +00:00
Krzysztof Karas
30ea7ecc6f bdev/nvme: implement additional dtrace probes
Add more dtrace probes to help with identifying issues
in production.

Change-Id: I8fb621a15c5e33ae94d75b4fc31135e2635dcfce
Signed-off-by: Krzysztof Karas <krzysztof.karas@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10561
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-01-20 21:12:56 +00:00
Jim Harris
e0415f1720 bdev/nvme: set default bdev_retry_count to 3
Now that we have a much more robust retry framework,
set the default bdev_retry_count to 3.  Users can
still override this default with the bdev_nvme_set_options
RPC as before.  This ensures that by default, we will
retry I/O when possible.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I045bf4969d02be32b951e72a148ce6b6e251dec1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11107
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-01-19 08:55:46 +00:00
Shuhei Matsumoto
80e81273e2 bdev/nvme: Do not use ctrlr for I/O submission if reconnect failed repeatedly
If ctrlr_loss_timeout_sec is set to -1, reconnect is tried repeatedly
indefinitely, and I/Os continue to be queued.

This patch adds another option fast_io_fail_timeout_sec, a flag
fast_io_fail_timedout to nvme_ctrlr.

If the time fast_io_fail_timeout_sec passed after starting reset,
set fast_io_fail_timedout to true not to use the path for I/O submission.

fast_io_fail_timeout_sec is initialized to zero as same as
ctrlr_loss_timeout_sec and reconnect_delay_sec.

The name of the parameter follows the famous DM-multipath, its fast_io_fail_tmo.

Change-Id: Ib870cf8e2fd29300c47f1df69617776f4e67bd8c
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10301
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-01-17 14:25:15 +00:00
Shuhei Matsumoto
ae4e54fdc3 bdev/nvme: Retry reconnecting ctrlr after seconds if reset failed
Previously reconnect retry was not controlled and was repeated indefinitely.

This patch adds two options, ctrlr_loss_timeout_sec and reconnect_delay_sec,
to nvme_ctrlr and add reset_start_tsc, reconnect_is_delayed, and
reconnect_delay_timer to nvme_ctrlr to control reconnect retry.

Both of ctrlr_loss_timeout_sec and reconnect_delay_sec are initialized to
zero. This means reconnect is not throttled as we did before this patch.

A few more changes are added.

Change nvme_io_path_is_failed() to return false if reset is throttled
even if nvme_ctrlr is reseting or is to be reconnected.

spdk_nvme_ctrlr_reconnect_poll_async() may continue returning -EAGAIN
infinitely. To check out such exceptional case, use ctrlr_loss_timeout_sec.

Not only ctrlr reset but also non-multipath ctrlr failover is controlled.
So we need to include path failover into ctrlr reconnect.

When the active path is removed and switched to one of the alternative paths,
if ctrlr reconnect is scheduled, connecting to the alternative path is left
to the scheduled reconnect.

If reset or reconnect ctrlr is failed and the retry is scheduled,
switch the active path to one of alternative paths.

Restore unit test cases removed in the previous patches.

Change-Id: Idec636c4eced39eb47ff4ef6fde72d6fd9fe4f85
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10128
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
2022-01-17 14:25:15 +00:00
Shuhei Matsumoto
f85370b168 bdev/nvme: Use enum to select operations after reset complete
This is a clean up as a preparation to the following patches.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ib8bc90e17f52086d4e887463e04f65273bb1079b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11068
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2022-01-17 14:25:15 +00:00
Shuhei Matsumoto
962c4c3800 bdev/nvme: Fix a degradation that I/O gets queued infinitely
We noticed the difference between the SPDK 21.10 and the latest master
in a test.

The simplified scenario is as follows:
1. Start SPDK NVMe-oF target
2. Run bdevperf for the target with -f parameter to suppress exit
   on failure.
3. Kill the target after I/O started.

With the SPDK 21.10, bdevperf retries failed I/Os and exits after
the test time is over.

With the latest SPDK master, bdevperf hungs and does not exit even
after the test time is over.

The cause was as follows:

reset ctrlr is repeated very quickly (once per 10ms by default) and hence
I/Os were queued infinitely because nvme_io_path_is_failed() returned
false if nvme_ctrlr is resetting.

We should queue I/O when nvme_ctrlr is resetting only if reset is throttoled
and fail-fast for the repeated failures is supported.

Hence in this patch, fix the degradation and remove the related unit
test cases.

Reported-by: Evgeniy Kochetov <evgeniik@nvidia.com>
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I4047d42dc44488a05264c6a841d101a7c371358b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11062
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-01-17 14:25:15 +00:00
Jim Harris
932ee64b8f bdev/nvme: add bdev_nvme_stop_discovery RPC
This RPC will stop the specified discovery service,
including detaching from any controllers that were
attached as part of that discovery service.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I9222876457fc45e1acde680a7bd1925917c22308
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10832
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
2022-01-12 08:20:23 +00:00
Jim Harris
f2bf7e9727 bdev/nvme: connect to discovered controllers
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I3b05ab3d22851d433e3d0573e65943c4a30b9aa4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10695
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
2022-01-12 08:20:23 +00:00
Shuhei Matsumoto
6ac23b3e60 bdev/nvme: Clear I/O path cache if a path whose ns is optimized is restored
If a path whose namespace is optimized is restored, the corresponding
I/O path cache should be cleared and the path should be chosen as the
optimal path.

This bug was found by a system test.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ibc3983dbff3418adb090a09df32c2a92a8910d05
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11004
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-01-10 22:18:46 +00:00
Shuhei Matsumoto
3308bdf1b9 bdev/nvme: Rename functions for a full ctrlr reset sequence
Rename a few functions for a full ctrlr reset sequence to
clarify what we do and make the following patches easier.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I051e3ab68c3cd77fd6040a2d069d50a700123ae6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10920
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-01-10 22:18:46 +00:00
Shuhei Matsumoto
521a9bb22c bdev/nvme: Fix race between failover and add secondary trid
We sort secondary trids to avoid using disconnected trids for failover.
However the sort had a bug.

This bug was found by running test/nvmf/host/multipath.sh in a loop.

Verify the fix by adding unit test.

Fixes #2300

Signed-off-by: Shuhei Matsumoto <shuheimatsumoto@gmail.com>
Change-Id: I22b0ede4d2ef98b786c3e0d1f5337a2d568ba56d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10921
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2022-01-10 22:18:46 +00:00
Jim Harris
b68f2eeb0b bdev_nvme: add bdev_nvme_start_discovery RPC
This patch adds the framework for a discovery
service in the bdev/nvme module.

Users can specify an IP/port of a discovery service.
The bdev/nvme module will connect to a discovery
controller, get the discovery log page, and then
register for AERs.  It will connect to each
subsystem specified in the initial log page.
AER completions will trigger fetching the log
page again, at which point new subsystems will
be connected to, or removed subsystems will be
detached.

This patch does the following:
* Adds the new start_discovery RPC
* Connects to the discovery controller
* Gets the discovery log page
* Registers for AERs
* Detach from discovery controllers at shutdown

Subsequent patches in this series will:
* Connect to subsystems listed in discovery log page
* Detach from subsystems that were listed in earlier
  discovery log pages but subsequently removed
* Add a stop_discovery RPC

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I54bfa896a48c5619676f156b5ea9f2d1f886c72f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10694
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
2022-01-10 15:23:39 +00:00
Alexey Marchuk
833a5c9d2b bdev/nvme: Remove ctrlr_ch from group's list in error case
If qpair creation failed, ctrlr_ch remains in group->ctrlr_ch_list
but memory for ctrlr_ch is freed. Next attempt to get ctrlr's io
channel will modify data in already freed memory and may corrupt
another allocation.

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I85002f2e6ac86a0ffda6dabfa57e79b59074fb5a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10840
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuheimatsumoto@gmail.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2021-12-27 08:43:03 +00:00
Alexey Marchuk
17e9f58f1f bdev/nvme: Handle failed IO qpair creation
It is possible that the application calls
get_io_channel during nvme controller reset.
In that case IO qpair won't be created and the
application will get a NULL pointer.
It is possible to repeat get_io_channel later but
there is no such indiciation for the application,
so it can't distinguish between a real failure and
"try again" case during controller reset.

This patch ignores IO qpair creation error if
controller is resetting. IO qpair will be created
when reset completes.

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Id39202f5a6878453ff54e35df91d5dc49a5f046a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10828
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuheimatsumoto@gmail.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2021-12-27 08:43:03 +00:00
Jim Harris
ef8f297ba4 bdev_nvme: allow bdev_nvme_create() to take a NULL names arg
We will want to use bdev_nvme_create() to attach to
controllers identified through discovery.  In this case,
we won't be reporting bdev names back to an RPC caller,
so there's no need to allocate an array of names to be
filled out since they won't be used.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ia386d034df2c2d5a60f9aa18338ba415ec03d763
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10689
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2021-12-21 08:15:47 +00:00
Jim Harris
986f74aead bdev_nvme: split fini ctrlr destruction to separate function
We will need to add another step in the fini path for
stopping discovery pollers, so this patch prepares for
that.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ifecbbac60262f3aae7f7a7ced09b7a600df7c2e8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10590
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2021-12-21 08:15:47 +00:00
Shuhei Matsumoto
215518069a bdev/nvme: nvme_ctrlr_create() gets prchk_flags from nvme_async_probe_ctx
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Id3deca8e0aba23299347a6aee6f0f44ee683556e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10555
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2021-12-08 08:31:24 +00:00
Shuhei Matsumoto
619acff501 bdev/nvme: Delete unused nvme_probe_ctx
We set cb_ctx to NULL when calling spdk_nvme_probe_async(). It looks
that nvme_probe_ctx has not been used anywhere for a long time.
nvme_probe_ctx is not public data structure.

Remove nvme_probe_ctx to simplify the code and make the following
patches easier.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I7dd5f970a7fde1c9c189fae3c8f28f84d7aed991
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10554
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2021-12-08 08:31:24 +00:00
Shuhei Matsumoto
bf88d1d4a6 bdev/nvme: Factor out the failover trid operation into a helper function
This refactoring will be helpful for the following patches to unify ctrlr
reset and failover and failover trid also when reconnecting ctrlr.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I4623a5dd310ac7516c270ccd3b0541c27cc880d8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10443
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2021-12-08 08:31:24 +00:00
Shuhei Matsumoto
ffabc8ac29 bdev/nvme: Inline bdev_nvme_failover_start() into bdev_nvme_failover()
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I70593de284f5623db9e30d94b03b6576bd6ca29b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10442
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2021-12-08 08:31:24 +00:00
Shuhei Matsumoto
696ad465d7 bdev/nvme: Remove the failover_in_progress flag from struct nvme_ctrlr
The failover_in_progress flag is used to decide the return value of
bdev_nvme_failover().

bdev_nvme_delete() calls bdev_nvme_failover() with remove=true to remove
nvme_ctrlr->active_path_id. However bdev_nvme_failover() returns zero
if nvme_ctrlr->failover_in_progress is true. bdev_nvme_failover() may
return zero even if it does not remove nvme_ctrlr->active_path_id.

The following will be better.

bdev_nvme_failover() returns -EBUSY if nvme_ctrlr->resetting is true,
and the caller repeats calling bdev_nvme_failover() until the target trid
becomes alternative path or bdev_nvme_failover() returns zero.

To do that, the failover_in_progress flag is not necessary any more.

Removing the failover_in_progress will also simplify the following
patches to unify ctrlr reset and failover.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I57ab944beb1d06ea4def144c81c69705860de35f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10441
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2021-12-08 08:31:24 +00:00
Shuhei Matsumoto
74f18d6a07 bdev/nvme: Factor out checking if nvme_ctrlr can be unregistered
Checking if nvme_ctrlr can be unregistered is not so simple and
a few changes will be added. So factoring out the check into a
helper function will be valuable.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I310c7e3ad2dae9583df4db575d342c2cb111f3f3
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10461
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2021-12-08 08:31:24 +00:00
Shuhei Matsumoto
7329c1e683 bdev/nvme: Refine and factor out checking if nvme_ctrlr is available or failed
When a I/O or admin passthrough failed, if the corresponding nvme_ctrlr
is not available, we should failover to another path.

When no path was found, if there is at least one nvme_ctrlr which is
not failed, we should wait until it is recovered.

We should improve error recovery not only for multipath (multipath is
"multipath") but also for failover (multipath is omitted or "failover").

To do this easily, clarify the conditions of availability and failure of
nvme_ctrlr and realize them by helper functions.

Use new helper functions for other cases to improve readability too.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I716731f72811d2ec4dfc91f9eadb191d75739af6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10381
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2021-12-08 08:31:24 +00:00
Shuhei Matsumoto
7cc66c0ab1 bdev/nvme: Check if ns can be shared when configuring multipath
We had not checked the bit 0 of the Namespace Multipath I/O and
Namespace Sharing Capabilities (NMIC) field in the Identify Namespace
data structure.

If the bit 0 of the NMIC is zero, it is likely that namespaces are not
identical.

We should check if the value of the NMIC first, and do it in this patch.

Additionally, it is not usual if the bit 0 of the CMIC and the bit 0 of
the NMIC do not match. So in unit tests rename the parameter multi_ctrlr
by multipath for ut_attach_ctrlr() and use it for the value of the NMIC.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I6aa7cbcc99be2507dbf18930f7b585a9ea7d0f90
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10380
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2021-12-08 08:31:24 +00:00
Shuhei Matsumoto
819fd52907 bdev/nvme: Delete already created qpairs if connect qpair failed while resetting ctrlr
bdev_nvme_reset() deletes all qpairs, reset a ctrlr, and then create
all qpairs. Any qpair may fail to be created, and then the reset
request may fail. However, already created qpairs were left.

Let's delete the already created qpairs and then fail the reset request.

This will make us easier to control reconnect, deley reconnect by
a few seconds, or stop reconnect after repeated failures and then
delete ctrlr.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I414e2281b4bf0cbd1cf461d8fc64a22f43d26d13
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9896
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2021-12-08 08:31:24 +00:00
Shuhei Matsumoto
8afa746b4d bdev/nvme: Use new APIs in a reset ctrlr sequence
Replace the spdk_nvme_ctrlr_reset_async() and spdk_nvme_reset_poll_async()
calls by the spdk_nvme_ctrlr_disconnect(), spdk_nvme_ctrlr_reconnect_async(),
and spdk_nvme_ctrlr_reconnect_poll_async() calls in a reset ctrlr
sequence.

spdk_nvme_ctrlr_disconnect() can fail if ctrlr is already resetting or
removed. But both cases are not possible. reset is controlled and the callback
to the hot remove is called when the ctrlr is hot removed. So we assume
spdk_nvme_ctrlr_disconnect() always succeed.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I1299e198597b2a2110f80b9a868e2dae015682ee
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10092
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2021-12-08 08:31:24 +00:00
Shuhei Matsumoto
1d524bc384 bdev/nvme: Remove unnecessary error check from bdev_nvme_reset_ctrlr()
spdk_for_each_channel() always passes status=0 to its completion callback
if each channel completes the requested function successfully.

bdev_nvme_reset_destroy_qpair() always succeeds.

Hence bdev_nvme_reset_ctrlr() does not have to check if the passed
status is not zero.

The following patches will aggregate multiple flags into a single
state for nvme_ctrlr. This change will simplify these.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I1c30c9b20c96886516029e69e90dc23d777a69b4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10077
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2021-12-01 09:20:09 +00:00
Shuhei Matsumoto
f9fba507fe bdev/nvme: Redirect the reset ctrlr operation into nvme_ctrlr->thread
In the following patches, we want to retry reconnect if reconnect failed
in a reset ctrlr sequence but we want to delay the retry. While
we wait the delayed retry, we want to quiesce ctrlr completely.

As part of quiesce ctrlr operations, we want to pause adminq poller but
we need to do it on the nvme_ctrlr->thread.

If a reset ctrlr sequence runs on the nvme_ctrlr->thread, we can avoid
redirecting the pending destruct request at completion too.

So we redirect the reset ctrlr sequence into the nvme_ctrlr->thread.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I538b962e2a7b5cf00ebbac2a1e888482ddeeee61
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10075
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2021-12-01 09:20:09 +00:00
Josh Soref
1960ef167a spelling: module
Part of #2256

* calculated
* changing
* deferred
* deinitialize
* initialization
* particular
* receive
* request
* retrieve
* satisfied
* succeed
* thread
* unplugged
* unregister

Change-Id: I13e38f9160cb1a15a87cb5974785a34604124fa3
Signed-off-by: Josh Soref <jsoref@gmail.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10406
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2021-11-30 09:05:32 +00:00
Shuhei Matsumoto
50b10bc20e bdev/nvme: bdev_nvme_reset_io() redirect to the orig_thread at completion
In the following patches, bdev_nvme_reset() will execute the reset ctrlr
operation on the nvme_ctrlr->thread until completion as bdev_nvme_admin_passthru()
does. Hence change the callback bdev_nvme_reset_io_continue() to
redirect to the orig_thread by using bio. Furthermore, use bio->cpl.cdw0
to store the completion status of the reset processing. bdev_nvme_reset()
does not use bio->cpl.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I361cc44494190ba83ad6e360788d78851416c46c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10074
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2021-11-23 08:46:36 +00:00
Shuhei Matsumoto
b4447abf70 bdev/nvme: Retry failed admin passthru up to retry_count times
This patch supports admin passthrough retry when we get any error
with DNR=0 but ABORTED_BY_REQUEST up to retry_count times.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I1bf29570791fdbe8651fa70c4c8685bb740fb86b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9944
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2021-11-23 08:46:36 +00:00
Shuhei Matsumoto
a9a86a14c1 bdev/nvme: Retry admin passthru immediately if it got ctrlr path error
This patch supports admin passthrough retry when we get ctrlr path
error at completion.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ice0045b84054ec66a9db9ef23e21786d2c082b1d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9943
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2021-11-23 08:46:36 +00:00
Shuhei Matsumoto
35a2f4e22e bdev/nvme: Retry admin passthru a second later if any ctrlr may become available
When resetting ctrlr, adminq is disconnected first. If adminq is disconnected,
admin passthrough request is rejected with -ENXIO.

But resetting ctrlr may succeed. If resetting ctrlr succeeds, adminq is
connected again, and admin passthrough request will be
submitted successfully.

On the other hand, if ctrlr is failed, admin passthrough request is
rejected with -ENXIO. But when resetting ctrlr, ctrlr is set to unfailed.

Hence bdev_nvme_admin_passthru() skips any ctrlr which is resetting
or failed, and calls bdev_nvme_admin_passthru_complete() with -ENXIO
if no available ctrlr is found.

bdev_nvme_admin_passthru_complete() queues admin passthrough request
and retry it one second later if ctrlr is resetting.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ic748dc4faf29ebf717ae5c29dcf7c55fe2ea9243
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9942
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2021-11-23 08:46:36 +00:00