nvme_rdma: Ignore response if its QP was already destroyed

This is a workaround but is necessary to fix the github issue #2874.
Due to some unknown reason, in nightly test with Intel e810 NICs
when a qpair is created with synchronous mode and connection errors
are detected, the qpair is destroyed even if requests for the qpair are
still inflight. Then, nvme_rdma_process_recv_completion() causes NULL
pointer acccess. To fix this NULL pointer access, change
nvme_rdma_process_recv_completion() to return immediately if rsp->rqpair
is NULL. Add a TODO comment to find a root cause and really fix the
issue.

One of the fixes for the issue #2874.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ic810922f7ea1b32373b15f4e0cf7c2429659cbab
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/16431
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
This commit is contained in:
Shuhei Matsumoto 2023-01-24 14:12:00 +09:00 committed by Tomasz Zawadzki
parent 9aabfb59d9
commit bbd3d96b85

View File

@ -2508,8 +2508,16 @@ nvme_rdma_process_recv_completion(struct nvme_rdma_poller *poller, struct ibv_wc
}
} else {
rqpair = rdma_rsp->rqpair;
if (spdk_unlikely(!rqpair)) {
/* TODO: Fix forceful QP destroy when it is not async mode.
* CQ itself did not cause any error. Hence, return 0 for now.
*/
SPDK_WARNLOG("QP might be already destroyed.\n");
return 0;
}
}
assert(rqpair->rsps->current_num_recvs > 0);
rqpair->rsps->current_num_recvs--;