Spdk/lib/nvme
Richael Zhuang 4295661eb8 nvme_tcp: fix bug about qpair stuck in CONNECTING state
When running perf test, sometimes after CONNECT req's resp was
received and processed, the qpair still failed to change from state
CONNECTING to CONNECTED. For when it goes to nvme_fabric_qpair_connect_poll
-> nvme_wait_for_completion_robust_lock_timeout_poll to process the
CONNECT req's resp, the req may have not been finished in sock_check_zcopy,
although its resp has been received and processed, which means the
tcp_req->ordering.bits.send_ack is still 0 and the status->done still
is false. And after the req is completed in sock_check_zcopy, we need
to poll this qpair again to make the state enter CONNECTED.

And if icreq's resp received and processed before nvme_tcp_send_icreq_complete
is called by _sock_check_zcopy, the qpair will be stuck in CONNECTING
and it never proceed to send the CONNECT req. We also need to put it
in pgroup->needs_poll to fix it.

I can reproduce this bug with the following configuration.
target: 16NVMe SSD, running on 20 cores;
initiator: randread test using nvme perf with 32 cpu cores and
zerocopy enabled.

The error doesn't always occur. CONNECT failure is about 1 failure in
ten with the following log. And icreq failure is less frequent with
only target side's "keep alive timeout" log.

Error reported in initiator side:
Initialization complete. Launching workers.
[2022-05-23 14:51:07.286794] nvme_qpair.c: 760:spdk_nvme_qpair_process_completions:
*ERROR*: CQ transport error -6 (No such device or address) on qpair id 2
ERROR: unable to connect I/O qpair.
ERROR: init_ns_worker_ctx() failed

And target side shows:
Disconnecting host  from subsystem nqn.2016-06.io.spdk:cnode2 due to keep alive timeout

Change-Id: Id72c2ffd615ab73c5fc67d36c3ff8b730cebcef7
Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12975
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
2022-06-14 09:18:04 +00:00
..
Makefile Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_ctrlr_cmd.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_ctrlr_ocssd_cmd.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_ctrlr.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_cuse.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_cuse.h Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_discovery.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_fabric.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_internal.h Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_io_msg.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_io_msg.h Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_ns_cmd.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_ns_ocssd_cmd.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_ns.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_opal_internal.h Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_opal.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_pcie_common.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_pcie_internal.h Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_pcie.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_poll_group.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_qpair.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_quirks.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_rdma.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_tcp.c nvme_tcp: fix bug about qpair stuck in CONNECTING state 2022-06-14 09:18:04 +00:00
nvme_transport.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_vfio_user.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme_zns.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
nvme.c Replace most BSD 3-clause license text with SPDX identifier. 2022-06-09 07:35:12 +00:00
spdk_nvme.map nvme: add spdk_nvme_ctrlr_get_discovery_log_page API 2021-12-20 18:12:41 +00:00