From 43ad7febb2f7a19b381168f6e26e393a483126ea Mon Sep 17 00:00:00 2001 From: Michael Haeuptle Date: Wed, 19 Aug 2020 16:00:58 +0000 Subject: [PATCH] lib/nvmf: Fixes stuck subsystem RPC A subsystem RPC is not transitioned to a paused state when there are ios outstanding (tracked by subsystem poll group). In general AERs, are not tracked as outstanding IOs. However, there are 3 paths in nvmf_ctrlr_async_event_request which do not adjust the outstanding io count. If we get into any of these 3 paths, the subsystem pause can hang forever. The issue was reproduced with hot plug stress testing under load. We can get into the second path (SPDK_NVME_ASYNC_EVENT_TYPE_NOTICE) under these circumstances: - An AER completion is sent to the initiator due to a namespace change (e.g. hot remove/add) - In this case, type is set to SPDK_NVME_ASYNC_EVENT_TYPE_NOTICE - The initiator sends a new AER admin command, hitting the second path where we return without adjusting the outstanding ios. Fixes: 1552 Change-Id: I45f781966cc1e9a601b2305c7985a21154d802e8 Signed-off-by: Michael Haeuptle Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3854 Community-CI: Mellanox Build Bot Community-CI: Broadcom CI Tested-by: SPDK CI Jenkins Reviewed-by: Seth Howell Reviewed-by: Ben Walker Reviewed-by: JinYu Reviewed-by: Changpeng Liu Reviewed-by: Aleksey Marchuk Reviewed-by: Shuhei Matsumoto --- lib/nvmf/ctrlr.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/lib/nvmf/ctrlr.c b/lib/nvmf/ctrlr.c index 2d4a64d22..4afeb32bc 100644 --- a/lib/nvmf/ctrlr.c +++ b/lib/nvmf/ctrlr.c @@ -1578,6 +1578,11 @@ nvmf_ctrlr_async_event_request(struct spdk_nvmf_request *req) SPDK_DEBUGLOG(SPDK_LOG_NVMF, "Async Event Request\n"); + /* AER cmd is an exception */ + sgroup = &req->qpair->group->sgroups[ctrlr->subsys->id]; + assert(sgroup != NULL); + sgroup->io_outstanding--; + /* Four asynchronous events are supported for now */ if (ctrlr->nr_aer_reqs >= NVMF_MAX_ASYNC_EVENTS) { SPDK_DEBUGLOG(SPDK_LOG_NVMF, "AERL exceeded\n"); @@ -1600,11 +1605,6 @@ nvmf_ctrlr_async_event_request(struct spdk_nvmf_request *req) return SPDK_NVMF_REQUEST_EXEC_STATUS_COMPLETE; } - /* AER cmd is an exception */ - sgroup = &req->qpair->group->sgroups[ctrlr->subsys->id]; - assert(sgroup != NULL); - sgroup->io_outstanding--; - ctrlr->aer_req[ctrlr->nr_aer_reqs++] = req; return SPDK_NVMF_REQUEST_EXEC_STATUS_ASYNCHRONOUS; }