nvme_rdma: Ignore response if its QP was already destroyed
This is a workaround but is necessary to fix the github issue #2874. Due to some unknown reason, in nightly test with Intel e810 NICs when a qpair is created with synchronous mode and connection errors are detected, the qpair is destroyed even if requests for the qpair are still inflight. Then, nvme_rdma_process_recv_completion() causes NULL pointer acccess. To fix this NULL pointer access, change nvme_rdma_process_recv_completion() to return immediately if rsp->rqpair is NULL. Add a TODO comment to find a root cause and really fix the issue. One of the fixes for the issue #2874. Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: Ic810922f7ea1b32373b15f4e0cf7c2429659cbab Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/16431 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
This commit is contained in:
parent
9aabfb59d9
commit
bbd3d96b85
@ -2508,8 +2508,16 @@ nvme_rdma_process_recv_completion(struct nvme_rdma_poller *poller, struct ibv_wc
|
||||
}
|
||||
} else {
|
||||
rqpair = rdma_rsp->rqpair;
|
||||
if (spdk_unlikely(!rqpair)) {
|
||||
/* TODO: Fix forceful QP destroy when it is not async mode.
|
||||
* CQ itself did not cause any error. Hence, return 0 for now.
|
||||
*/
|
||||
SPDK_WARNLOG("QP might be already destroyed.\n");
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
assert(rqpair->rsps->current_num_recvs > 0);
|
||||
rqpair->rsps->current_num_recvs--;
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user