In multi-process, we need to make sure we don't
complete a register_operation in the wrong process. So
save the pid in the nvme_register_completion structure
when it is inserted into the STAILQ, then only complete
operations where the pid matches.
Fixes issue #2630.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I58c995237db486fecdd89d95e9e7a64379d0b0e5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/13940
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
spdk_nvme_qpair_process_completions() had called
always _nvme_qpair_complete_abort_queued_reqs() at its end.
However, the call was accidentally removed by a commit
59c8bb527b
to fix an issue.
By this removal, aborting request was not completed for some error
cases.
Fix the degradation by restoring the call.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I0099eb7a008f823e1282576504423cdc248911d7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14045
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
nvme_qpair_abort_all_queued_reqs() aborts error injections, queued
requests, aborting queued requests, and outstanding requests. (Aborting
outstanding requests depends on transports.) However, it did not abort
queued aborts.
Include nvme_ctrlr_abort_queued_aborts() into
nvme_qpair_abort_all_queued_reqs() to do really the name of the
function indicates.
nvme_ctrlr_abort_queued_aborts() has been called in a few cases, but
we do not care duplication.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I19102cc6603a72ce5c398a7947cb4d606b692991
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12849
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Vasuki Manikarnike <vasuki.manikarnike@hpe.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Fully asynchronous ctrlr detach (b6ecc3729) introduce a register
operation state machine that waits for operation to complete. When
controller failed to initialize, `nvme_ctrlr_fail` set qpair state to
`DISCONNECTED` immediately, causing qpair process completions to
never complete register operations therefore prevent async detach exit.
Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com>
Change-Id: I205c5157b8ea7b4535f98ff4052414310e421446
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12858
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Many open source projects have moved to using SPDX identifiers
to specify license information, reducing the amount of
boilerplate code in every source file. This patch replaces
the bulk of SPDK .c, .cpp and Makefiles with the BSD-3-Clause
identifier.
Almost all of these files share the exact same license text,
and this patch only modifies the files that contain the
most common license text. There can be slight variations
because the third clause contains company names - most say
"Intel Corporation", but there are instances for Nvidia,
Samsung, Eideticom and even "the copyright holder".
Used a bash script to automate replacement of the license text
with SPDX identifier which is checked into scripts/spdx.sh.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iaa88ab5e92ea471691dc298cfe41ebfb5d169780
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12904
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: <qun.wan@intel.com>
This SGL type was missed in the original commit
that added the pretty printing.
Fixes: 4d9ab1e9a1 ("nvme: pretty print dptr")
Reported-by: Ramanjaneya Burugula <burugula@gmail.com>
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ibc655db4e65009071f39f55f691c94a094cea0bc
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12705
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
A iterator function nvme_request_add_abort() covers not only a small
I/O request but also children of a large I/O.
However nvme_qpair_abort_queued_reqs_with_cbarg() did not check the
latter. check if cmd_cb_arg matches not only req->cb_arg but also
req->parent_cb_arg.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I015e29b0a8f58920b9a13081330a94f9dd976a45
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12557
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Transport specific qpair_abort_reqs() set SC to SC_ABORTED_SQ_DELETION.
However, nvme_qpair_abort_queued_reqs() set SC to SC_ABORTED_BY_REQUEST
even if its call is not requested by the upper layer.
Change nvme_qpair_abort_queued_reqs() to set SC to SC_ABORTED_SQ_DELETION
for consistency.
nvme_qpair_abort_queued_reqs() is used to abort queued requests that
were sent while adminq was connecting. SC_ABORTED_SQ_DELETION will not
be so bad even for the case.
This change is required for the NVMe bdev module to be resilient for I/O
error. The NVMe bdev module does not retry I/O if SC is
SC_ABORTED_BY_REQUEST.
SC is set to SC_INTERNAL_DEVICE_ERROR if a request is failed to submit
to qpair by a generic qpair layer. We can change it to
SC_ABORTED_SQ_DELETION as well but we keep this for now.
SC_INTERNAL_DEVICE_ERROR is also retriable for the NVMe bdev module.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I7d8d5e97b222fe9275afc4fed024c1654c9579a2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12121
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
If poll group is not used and if qpair is disconnecting,
spdk_nvme_qpair_process_completions() has to poll qpair until it is
actually disconnected even if ctrlr is failed.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I6da84f1e35780d21480fbe6f07e76af3048a777b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11018
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
The patch
nvme: Set dnr to zero for nvme_qpair_abort_reqs()
1b3172f726
did the change stated in the title.
However,
Revert "nvme/rdma: Correct qpair disconnect process"
c8f986c7ee
destroyed it for RDMA transport.
Additionally, we had still set DNR to 1 in nvme_qpair_init().
This patch fixes both.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Iee60ac24aa7e04cce0f394014c9d9afc9d2b56ec
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11644
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
With async connect, we need to avoid the case
where the initiator is sending the icreq, and
meanwhile the application submits enough I/O
such that the request objects are exhausted, leaving
none for the FABRICS/CONNECT command that we need
to send after the icreq is done.
So allocate an extra request, and then use it
when sending the FABRICS/CONNECT command, rather
than trying to pull one from the qpair's STAILQ.
Fixes issue #2371.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: If42a3fbb3fd9d863ee48cf5cae75a9ba1754c349
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11515
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Admin commands can be sent and polled from any thread, which also means
that the error injection queue on the admin qpair can be accessed from
multiple threads. Therefore, any modifications to that queue should be
done under the ctrlr lock.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: Ib1ed194405cb5b93f65a007b9749fd4433dc367d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11099
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
This is necessary to failover another path when multipath is configured.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I0b6bcf63501e38f75efb4b0d6bec58abb4b67aef
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10250
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
For DSM command, the NVMe drive may take a long time to finish it,
if we set a small timeout value for DSM command, the bdev/nvme module
will try to reset the IO queue pair when timeout happens,
in `spdk_nvme_ctrlr_free_io_qpair`, we will abort the outstanding
IO requests first, then in the `nvme_pcie_ctrlr_delete_io_qpair`,
we will poll the CQ for any requests that have been completed by
the NVMe controller, if there are NVMe completions in the CQ,
we will finish them again, thus double completions happened.
Here we rename `nvme_qpair_abort_reqs` to `nvme_qpair_abort_all_queued_reqs`,
so the common layer will just abort queued request, and let each
transport to abort outstanding requests case by case.
Fix#2233.
Change-Id: Icae6214239160c615418cb514fc51cfe77b59211
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10233
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Abort any queued admin requests once admin queue gets enabled. A request
can get queued if a controller is being reset and it gets submitted
while admin qpair is being reconnected. If these requests aren't
aborted, the init process will stall, as requests don't get resubmitted
while controller is resetting and subsequent admin commands required for
the initialization would be queued too.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: If456a297d2d434b3cc741816cbfb13b01d37e963
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9324
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Renamed nvme_qpair_abort_reqs() to nvme_qpair_abort_reqs_with_cbarg() to
highlight the fact that it only aborts requests with specified cb_arg
and to distinguish it from _nvme_qpair_abort_reqs() which aborts all
requests immediately.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: I32fec5ab0501b1beb8605689d73ec42a6424fba5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9323
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This patch introduces asynchronous versions of the ctrlr_(get|set)_reg
functions. Not all transports need to define them - for those that it
doesn't make sense (e.g. PCIe), the transport layer will call the
synchronous API and queue the callback to be executed during the next
process_completions call.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: I2e78e72b5eba58340885381cb279f3c28e7995ec
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8607
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This allows for submitting IO requests before the CONNECT command is
sent and not stopping the connect process due to the CONNECT being
queued. It could happen once IO qpair connection is asynchronous.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: I04e45b1c2f49f9da3412c843ea899341c56a0420
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8624
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This gives us a context that we can poll on when using the
nvme_wait_for_completion framework for async operations.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: I5b23f9096eed5ce95848d1995c1ed0b4c74edc92
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8602
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
This avoids an infite loop when user sends and then immediately aborts
all requests while the transport cannot submit them (returning -EAGAIN).
It might happen in two cases: 1) if the transport doesn't have any free
requests or 2) if the qpair is still in the NVME_QPAIR_CONNECTING state.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: Ia1d0b7c93f387524be0ad39597ec49851e1d3af7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8711
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
async_mode option is currently supported in PCIe transport layer
to create io qpair asynchronously. User polls the io_qpair for
completions, after create cq and sq completes in order, pqpair
is set to READY state. I/O submitted before the qpair is ready
is queued internally. Currently other transports only support
synchronous io qpair creation.
Signed-off-by: Monica Kenguva <monica.kenguva@intel.com>
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: Ib2f9043872bd5602274e2508cf1fe9ff4211cabb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8911
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
If the transport returns error when polling for
completions, it gets to a uint32_t and we end up
trying to resubmit all of the requests that are
currently queued. But that's not correct - if
the transport returns an error we shouldn't be
trying to resubmit requests at all.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I9198e3e2d71875cc1e46e0ac928338bb983487f3
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8395
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Current version provides unclear output
Change-Id: Ib044b00b5f91b1e363911f1b79c51c73c8a6920c
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5743
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
In TCP NVME initiator with zero copy enabled requests might be
completed asynchronously - out of qpair_process_completions
context. At the same time we calculate requests completed
asynchronously so that generic NVME layer can resubmit
queued requests after calling qpair_process_requests (or
poll_group_process_requests).
But there is a time gap between async request complete and
qpair_process_completions and the user can submit new IO
thereby decrease the number of free TCP requests. That means
that there might be less free requests than we excpected when
we try to resubmit queued requests.
The solution is change ERRLOG to DEBUG log since it is not a
fatal case.
Change-Id: If045ecd331cc6693e8ef450d8e15432dfa5d8812
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4859
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
When we disconnect a qpair, part of the code path is
calling _nvme_qpair_abort_queued_reqs. This takes
care of aborting any requests that were queued waiting
for slots to open on the submission queue.
It walks the STAILQ one by one and manually completes
them with ABORT status back to the caller.
But if the callback path submits another request, this
request may also get queued to the end of the queued_req
TAILQ. This can result in an infinite loop.
The solution is to use an STAILQ_SWAP to a local, empty
STAILQ. Then we ensure we only abort the requests that
were queued when _nvme_qpair_abort_queued_reqs() started
executing.
Fixes issue #1588.
I used the multipath.sh test to reproduce this on my local
system. If it ever dropped into the STAILQ loop in this
function, we would hit the infinite loop. With this patch,
I confirmed locally that now we safely avoid the infinite
loop and the test passes.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I657db23efe5983bd8613c870ad62695a7fc7f689
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4284
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This becomes a problem when the qpair is reconnected.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I6677b396cf766684a4891ffbee93aa3e4e83374d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3391
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Use another list dedicated to hold queued requests being aborted
to avoid potential infinite recursive calls.
Add a helper function nvme_qpair_abort_queued_req() to move requests
whose cb_arg matches from qpair->queued_req to qpair->aborted_queued_req.
Then nvme_qpair_resubmit_requests() aborts all requests in
qpair->aborted_queued_req.
The first idea was that nvme_qpair_abort_queued_req() aborts queued
requests directly. However, this caused infinite recursive calls.
Hence separate requesting abort to queued requests and actually
aborting queued requests.
The detail of the infinite recursive calls is as follows:
Some SPDK tool submits the next request from the callback to the completion
of a request in the completion polling loop. For such tool, if the callback
submits a request and then aborts the request immediately, and the request
could not be submitted but queued, it will create infinite recursive calls
by request submit and abort, and it will not be able to get out of
completion polling loop.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I8196182b981bc52dee2074d7642498a5d6ef97d4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2891
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Having functions without qpair on the interface allows for wider usage
e.g. by nvmf layer.
Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com>
Change-Id: I3a51ad53f00eb29e2ba2681ef4ff0cc2a197b65d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3176
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
This is a preparation to the next patch.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I15356c69e676dc41d3af69caa6d12c1fcb282152
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3071
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
When a NVMe device is hot removed, subsequent calls to
nvme_qpair_submit_request can fail with ENXIO.
The failure path handling for ENXIO did not free the request which
exhausts the qpair's free_req list eventually and all IOs are stuck
going forward.
This fix adds the same cleanup handling to nvme_qpair_submit_request
for this error case as it is done in _nvme_qpair_submit_request.
Signed-off-by: Michael Haeuptle <michael.haeuptle@hpe.com>
Change-Id: I5677d53965bdbd6d339c013483cdf42ce782099a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3018
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
When one of the children is failed to submit, if any children is
already submitted, the function can return success to wait for those children
to complete, but the parent should be set to failure.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I2ea53856ee58da991bceca0058d1e1f55d42af37
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2492
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
This will need to be done separately for poll groups.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I0e432493bdb02e13fe5c73a8a09911cef573307b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1664
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
By aborting all requests from every qpair when it is disconnected,
we can completely avoid having to abort requests when we enable the
qpair since nothing will be left enabled.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: Iba3bd866405dd182b72285def0843c9809f6500e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1788
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This is the onlyreasonable thing to do. Plus we need to
be in the destroying or disconnecting state to avoid
an infinite loop when aborting requests.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I38462a01f0455c3d6496434626f6f2f4663bf508
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1857
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
When we destroy a qpair, we need to flush all of the I/O.
But some applications will try to resubmit that I/O. We need
to not re-queue those I/O while in the context of the destroy
call so as to avoid an infinite loop.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I3e4863a563d461092f6e6b4a893f965f41bf34e3
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1856
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This can cause infinite loops if the callback tries to
queue an additional I/O.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I4b80b97d334082465d9228b799ef901645fa968e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1854
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We should be giving completions for all requests when we destroy a qpair.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I802f5120f2e8289aa825872f8085ac21b5fce0f3
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1756
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>