From bbd3d96b8520adb21c3bdbd06a683d7aab622c66 Mon Sep 17 00:00:00 2001 From: Shuhei Matsumoto Date: Tue, 24 Jan 2023 14:12:00 +0900 Subject: [PATCH] nvme_rdma: Ignore response if its QP was already destroyed This is a workaround but is necessary to fix the github issue #2874. Due to some unknown reason, in nightly test with Intel e810 NICs when a qpair is created with synchronous mode and connection errors are detected, the qpair is destroyed even if requests for the qpair are still inflight. Then, nvme_rdma_process_recv_completion() causes NULL pointer acccess. To fix this NULL pointer access, change nvme_rdma_process_recv_completion() to return immediately if rsp->rqpair is NULL. Add a TODO comment to find a root cause and really fix the issue. One of the fixes for the issue #2874. Signed-off-by: Shuhei Matsumoto Change-Id: Ic810922f7ea1b32373b15f4e0cf7c2429659cbab Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/16431 Tested-by: SPDK CI Jenkins Reviewed-by: Aleksey Marchuk Reviewed-by: Tomasz Zawadzki --- lib/nvme/nvme_rdma.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/lib/nvme/nvme_rdma.c b/lib/nvme/nvme_rdma.c index 195bb9ead..57fc8a738 100644 --- a/lib/nvme/nvme_rdma.c +++ b/lib/nvme/nvme_rdma.c @@ -2508,8 +2508,16 @@ nvme_rdma_process_recv_completion(struct nvme_rdma_poller *poller, struct ibv_wc } } else { rqpair = rdma_rsp->rqpair; + if (spdk_unlikely(!rqpair)) { + /* TODO: Fix forceful QP destroy when it is not async mode. + * CQ itself did not cause any error. Hence, return 0 for now. + */ + SPDK_WARNLOG("QP might be already destroyed.\n"); + return 0; + } } + assert(rqpair->rsps->current_num_recvs > 0); rqpair->rsps->current_num_recvs--;