bdev_nvme: don't try to read csts on fabrics cmd timeout

It is recommended to read CSTS when there is a timeout.
If CSTS.CFS (Controller Fatal Status) is set, we should
reset the controller.

But if an admin command on a fabrics controller times
out, reading CSTS submits another fabrics command that
could also timeout.  Even worse, we are recursively
polling the admin queue for completions in this case.

Fixes issue #1716.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I23d31f6302375c52eba6f4370748d622fbd25ca7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5513
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This commit is contained in:
Jim Harris 2020-12-09 23:11:08 +00:00 committed by Tomasz Zawadzki
parent 037f6bda65
commit 6d9d3f87e3

View File

@ -1231,11 +1231,18 @@ timeout_cb(void *cb_arg, struct spdk_nvme_ctrlr *ctrlr,
SPDK_WARNLOG("Warning: Detected a timeout. ctrlr=%p qpair=%p cid=%u\n", ctrlr, qpair, cid); SPDK_WARNLOG("Warning: Detected a timeout. ctrlr=%p qpair=%p cid=%u\n", ctrlr, qpair, cid);
csts = spdk_nvme_ctrlr_get_regs_csts(ctrlr); /* Only try to read CSTS if it's a PCIe controller or we have a timeout on an I/O
if (csts.bits.cfs) { * queue. (Note: qpair == NULL when there's an admin cmd timeout.) Otherwise we
SPDK_ERRLOG("Controller Fatal Status, reset required\n"); * would submit another fabrics cmd on the admin queue to read CSTS and check for its
_bdev_nvme_reset(nvme_bdev_ctrlr, NULL); * completion recursively.
return; */
if (nvme_bdev_ctrlr->connected_trid->trtype == SPDK_NVME_TRANSPORT_PCIE || qpair != NULL) {
csts = spdk_nvme_ctrlr_get_regs_csts(ctrlr);
if (csts.bits.cfs) {
SPDK_ERRLOG("Controller Fatal Status, reset required\n");
_bdev_nvme_reset(nvme_bdev_ctrlr, NULL);
return;
}
} }
switch (g_opts.action_on_timeout) { switch (g_opts.action_on_timeout) {