nvme/rdma: Fix qpair destroy/disconnect race

When qpair is attached to a poll group, disconnect
process is async - we are waiting for the DISCONNECTED
event from rdmacm to destroy rdma resources. However
the user (nvme_perf) can destroy qpair immediatelly,
so memory allocated for qpair is freed but rdma
resouces are still allocated. That means that we may
receive rdmacm event (DISCONNECTED) for the destroyed qpair,
that leads to use-after-free.
To fix this problem, add a check for internal qpair state
when qpair is destroyed, if disconnect is not finished, then
we forcefully destroy rdma resources.

Fixes issue #2515

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reported-by: Or Gerlitz <ogerlitz@nvidia.com>
Change-Id: I7bfa53c9f6fe6ed787323a8941f1f2db17ea0c20
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12700
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This commit is contained in:
Alexey Marchuk 2022-05-16 18:50:57 +03:00 committed by Jim Harris
parent 007fb1d3cb
commit 1003e28623

View File

@ -2188,6 +2188,15 @@ nvme_rdma_ctrlr_delete_io_qpair(struct spdk_nvme_ctrlr *ctrlr, struct spdk_nvme_
assert(qpair != NULL); assert(qpair != NULL);
rqpair = nvme_rdma_qpair(qpair); rqpair = nvme_rdma_qpair(qpair);
if (rqpair->state != NVME_RDMA_QPAIR_STATE_EXITED) {
int rc __attribute__((unused));
/* qpair was removed from the poll group while the disconnect is not finished.
* Destroy rdma resources forcefully. */
rc = nvme_rdma_qpair_disconnected(rqpair, 0);
assert(rc == 0);
}
nvme_rdma_qpair_abort_reqs(qpair, 0); nvme_rdma_qpair_abort_reqs(qpair, 0);
nvme_qpair_deinit(qpair); nvme_qpair_deinit(qpair);