From ff0a7dfc42af80e8267442b7aa82569ffc8e5bdc Mon Sep 17 00:00:00 2001 From: yidong0635 Date: Thu, 4 Jul 2019 15:00:19 -0400 Subject: [PATCH] nvme: Handle CQ polling failures by marking the controller as failed. nvme_transport_qpair_process_completions calls nvme_rdma_qpair_process_completions There are some cases return -1 due to failure of "CQ errors". Handle CQ polling failures by marking the controller as failed. That a completion with an error will be treated as controller failed. Requests will be aborted after retry counter exceeded. Otherwise, code will keep on reporting errors without recovery. This is to fix issue #850. Change-Id: I0b324232310e107bf7fd5722aca54d402a19b14d Signed-off-by: yidong0635 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/460569 Tested-by: SPDK CI Jenkins Reviewed-by: Ben Walker Reviewed-by: Shuhei Matsumoto Reviewed-by: Changpeng Liu --- lib/nvme/nvme_qpair.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/lib/nvme/nvme_qpair.c b/lib/nvme/nvme_qpair.c index 7cf40c382..8a52ac855 100644 --- a/lib/nvme/nvme_qpair.c +++ b/lib/nvme/nvme_qpair.c @@ -449,6 +449,10 @@ spdk_nvme_qpair_process_completions(struct spdk_nvme_qpair *qpair, uint32_t max_ qpair->in_completion_context = 1; ret = nvme_transport_qpair_process_completions(qpair, max_completions); + if (ret < 0) { + SPDK_ERRLOG("CQ error, abort requests after transport retry counter exceeded\n"); + qpair->ctrlr->is_failed = true; + } qpair->in_completion_context = 0; if (qpair->delete_after_completion_context) { /*