ivampiresp/Spdk - Spdk - Leaflow Developers

Author	SHA1	Message	Date
Alexey Marchuk	e385cafa72	nvme: Don't log an error when we can't resubmit all requests In TCP NVME initiator with zero copy enabled requests might be completed asynchronously - out of qpair_process_completions context. At the same time we calculate requests completed asynchronously so that generic NVME layer can resubmit queued requests after calling qpair_process_requests (or poll_group_process_requests). But there is a time gap between async request complete and qpair_process_completions and the user can submit new IO thereby decrease the number of free TCP requests. That means that there might be less free requests than we excpected when we try to resubmit queued requests. The solution is change ERRLOG to DEBUG log since it is not a fatal case. Change-Id: If045ecd331cc6693e8ef450d8e15432dfa5d8812 Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4859 Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot	2020-10-26 11:48:46 +00:00
Jim Harris	792807a444	nvme: fix infinite loop when aborting queued reqs When we disconnect a qpair, part of the code path is calling _nvme_qpair_abort_queued_reqs. This takes care of aborting any requests that were queued waiting for slots to open on the submission queue. It walks the STAILQ one by one and manually completes them with ABORT status back to the caller. But if the callback path submits another request, this request may also get queued to the end of the queued_req TAILQ. This can result in an infinite loop. The solution is to use an STAILQ_SWAP to a local, empty STAILQ. Then we ensure we only abort the requests that were queued when _nvme_qpair_abort_queued_reqs() started executing. Fixes issue #1588. I used the multipath.sh test to reproduce this on my local system. If it ever dropped into the STAILQ loop in this function, we would hit the infinite loop. With this patch, I confirmed locally that now we safely avoid the infinite loop and the test passes. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I657db23efe5983bd8613c870ad62695a7fc7f689 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4284 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Ziye Yang <ziye.yang@intel.com> Reviewed-by: <dongx.yi@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2020-09-17 21:34:58 +00:00
Seth Howell	58216dd07e	lib/nvme: fix mem leak in req submit. Signed-off-by: Seth Howell <seth.howell@intel.com> Change-Id: If64c06177605a8f57d87ba22b86fe58ddebd6f7a Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3921 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com>	2020-09-02 07:38:38 +00:00
Seth Howell	0d8f86f842	lib/nvme: don't submit request if qpair is disconnected. This becomes a problem when the qpair is reconnected. Signed-off-by: Seth Howell <seth.howell@intel.com> Change-Id: I6677b396cf766684a4891ffbee93aa3e4e83374d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3391 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2020-07-31 08:23:44 +00:00
Shuhei Matsumoto	576a373d58	lib/nvme: Abort queued requests whose cb_arg matches Use another list dedicated to hold queued requests being aborted to avoid potential infinite recursive calls. Add a helper function nvme_qpair_abort_queued_req() to move requests whose cb_arg matches from qpair->queued_req to qpair->aborted_queued_req. Then nvme_qpair_resubmit_requests() aborts all requests in qpair->aborted_queued_req. The first idea was that nvme_qpair_abort_queued_req() aborts queued requests directly. However, this caused infinite recursive calls. Hence separate requesting abort to queued requests and actually aborting queued requests. The detail of the infinite recursive calls is as follows: Some SPDK tool submits the next request from the callback to the completion of a request in the completion polling loop. For such tool, if the callback submits a request and then aborts the request immediately, and the request could not be submitted but queued, it will create infinite recursive calls by request submit and abort, and it will not be able to get out of completion polling loop. Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Change-Id: I8196182b981bc52dee2074d7642498a5d6ef97d4 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2891 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2020-07-13 08:40:42 +00:00
Jacek Kalwas	4d9ab1e9a1	nvme: pretty print dptr Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com> Change-Id: I576878fbbafc3d17617ceeec99e40565be7d5d3d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3213 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>	2020-07-10 07:30:59 +00:00
Jacek Kalwas	64f05eb5c5	nvme: pretty print fabric cmd Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com> Change-Id: Ib4bc28026cab208d45c8b876714fa525e5bb38f3 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3200 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2020-07-10 07:30:59 +00:00
Jacek Kalwas	9cd4723913	nvme: pretty print set/get features Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com> Change-Id: Ib6f1811da9a6294983bce04cff01ba1fb5e45607 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3179 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2020-07-10 07:30:59 +00:00
Jacek Kalwas	41b360d54e	nvme: add missing cmds to admin opc string Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com> Change-Id: I6cdcf675ebc8ad31d88b5469f87e1eae066b2e3c Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3178 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2020-07-10 07:30:59 +00:00
Jacek Kalwas	61668cc43e	nvme: introduce new set of cmd/cpl printers Having functions without qpair on the interface allows for wider usage e.g. by nvmf layer. Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com> Change-Id: I3a51ad53f00eb29e2ba2681ef4ff0cc2a197b65d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3176 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Community-CI: Broadcom CI Community-CI: Mellanox Build Bot	2020-07-10 07:30:59 +00:00
Shuhei Matsumoto	d80c9f6257	lib/nvme: Add underscore prefix to nvme_qpair_abort_queued_reqs() This is a preparation to the next patch. Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Change-Id: I15356c69e676dc41d3af69caa6d12c1fcb282152 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3071 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>	2020-07-08 07:54:01 +00:00
Michael Haeuptle	89013903fe	NVME: Fixes stuck IOs during hot remove (#1451 ) When a NVMe device is hot removed, subsequent calls to nvme_qpair_submit_request can fail with ENXIO. The failure path handling for ENXIO did not free the request which exhausts the qpair's free_req list eventually and all IOs are stuck going forward. This fix adds the same cleanup handling to nvme_qpair_submit_request for this error case as it is done in _nvme_qpair_submit_request. Signed-off-by: Michael Haeuptle <michael.haeuptle@hpe.com> Change-Id: I5677d53965bdbd6d339c013483cdf42ce782099a Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3018 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2020-06-29 09:18:29 +00:00
Shuhei Matsumoto	10c4193363	lib/nvme: Set the parent to failure when submission of any children failed When one of the children is failed to submit, if any children is already submitted, the function can return success to wait for those children to complete, but the parent should be set to failure. Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Change-Id: I2ea53856ee58da991bceca0058d1e1f55d42af37 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2492 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>	2020-06-24 08:19:43 +00:00
Seth Howell	684b3a49f0	lib/nvme: split request resubmission into function. This will need to be done separately for poll groups. Signed-off-by: Seth Howell <seth.howell@intel.com> Change-Id: I0e432493bdb02e13fe5c73a8a09911cef573307b Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1664 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>	2020-05-28 07:13:44 +00:00
Seth Howell	6189c0ceb7	lib/nvme: abort all requests when disconnecting a qpair. By aborting all requests from every qpair when it is disconnected, we can completely avoid having to abort requests when we enable the qpair since nothing will be left enabled. Signed-off-by: Seth Howell <seth.howell@intel.com> Change-Id: Iba3bd866405dd182b72285def0843c9809f6500e Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1788 Community-CI: Mellanox Build Bot Community-CI: Broadcom CI Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2020-04-22 19:06:26 +00:00
Seth Howell	b2a93a320d	lib/nvme: set qpairs to destroy when ctrlr is removed. This is the onlyreasonable thing to do. Plus we need to be in the destroying or disconnecting state to avoid an infinite loop when aborting requests. Signed-off-by: Seth Howell <seth.howell@intel.com> Change-Id: I38462a01f0455c3d6496434626f6f2f4663bf508 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1857 Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2020-04-22 19:06:26 +00:00
Seth Howell	7defb70d3a	lib/nvme: don't requeue I/O while destroying. When we destroy a qpair, we need to flush all of the I/O. But some applications will try to resubmit that I/O. We need to not re-queue those I/O while in the context of the destroy call so as to avoid an infinite loop. Signed-off-by: Seth Howell <seth.howell@intel.com> Change-Id: I3e4863a563d461092f6e6b4a893f965f41bf34e3 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1856 Community-CI: Mellanox Build Bot Community-CI: Broadcom CI Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2020-04-22 19:06:26 +00:00
Seth Howell	af2d56ed94	lib/nvme: Don't re-queue I/O while disconnecting. This can cause infinite loops if the callback tries to queue an additional I/O. Signed-off-by: Seth Howell <seth.howell@intel.com> Change-Id: I4b80b97d334082465d9228b799ef901645fa968e Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1854 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2020-04-22 19:06:26 +00:00
Seth Howell	b874f65743	lib/nvme: disconnect qpairs if they are failed during reset. Signed-off-by: Seth Howell <seth.howell@intel.com> Change-Id: I15079cb35d48221bd92b7ca41766148fdb58e668 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1855 Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2020-04-22 19:06:26 +00:00
Seth Howell	9fe5084860	lib/nvme: when destroying qpairs, abort queued requests. We should be giving completions for all requests when we destroy a qpair. Signed-off-by: Seth Howell <seth.howell@intel.com> Change-Id: I802f5120f2e8289aa825872f8085ac21b5fce0f3 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1756 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Broadcom CI Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com>	2020-04-14 11:34:24 +00:00
Alexey Marchuk	4279766935	nvme: Abort queued reqs when destroying qpair Change-Id: Idef1b88cf47cf9f82b1f4499ef836dfa741c0c7f Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1791 Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: <dongx.yi@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2020-04-14 11:33:39 +00:00
Jacek Kalwas	a7a0d02d8b	nvme: fix command specific status code Given enum was not aligned with spec. This status can be reported when size equals 0. Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com> Change-Id: If51f6b051c13880c1fd4e6bb0a02f134b28b5a88 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/928 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2020-02-20 09:49:24 +00:00
Changpeng Liu	ff9516bdcc	nvme: call the callback for the queued requests when there is submission failure For the requests which don't have children requests, SPDK may queue them to the queued_req list due to limited resources, in the completion path, we may resubmit them to the controller. When the controller was removed the submission path will return -ENXIO and we will free the requests directly, so the callback will not be trigerred for these requests. Here we added a flag to indicate the request is from queued_req list or not, so for the failure submission, we can triger user's callback. Fix issue #1097 Change-Id: I901ac81733c2319e540d24baf5b8faa1c649eb35 Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/477754 Community-CI: SPDK CI Jenkins <sys_sgci@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>	2019-12-20 10:04:57 +00:00
Seth Howell	61537a190e	nvme: replace nvme_qpair_state_equals. nvme_qpair_get_state fits more closely with the semantics in other modules. Change-Id: I6ea8e02abe27253d9b4d779a43ac1963be56356a Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/476920 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>	2019-12-09 13:55:41 +00:00
Seth Howell	24bca2eadd	nvme: add an enum for why a qpair disconnected Change-Id: I1a9517d9673051615942c873416505704740691a Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/475805 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-12-09 13:55:41 +00:00
Seth Howell	3911922005	nvme: remove redundant transport_qp_is_failed checks The qpair state transport_qpair_is_failed is actually equivalent to NVME_QPAIR_IS_CONNECTED in the qpair state machine. There are a couple of places where we check against transport_qp_is_failed and then immediately check to see if we are in the connected state. If we are failed, or we are not in the connected state we return the same value to the calling function. Since the checks for transport_qpair_is_failed are not necessary, they can be removed. As a result, there is no need to keep track of it and it can be removed from the qpair structure. Change-Id: I4aef5d20eb267bfd6118e5d1d088df05574d9ffd Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/475802 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-12-09 13:55:41 +00:00
Ziye Yang	542185b7e0	nvme/qpair: merge two if case into one. Purpose: To remove the duplicated code. Change-Id: Iab9989f9928698967533e45e7cffad4f09bde16a Signed-off-by: Ziye Yang <ziye.yang@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/473376 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-11-08 22:18:18 +00:00
Seth Howell	13f30a254e	nvme: don't disconnect qpairs from admin thread. Disconnecting qpairs from the admin thread during a reset led to an inevitable race with the data thread. QP related memory is freed during the disconnect and cannot be touched from the other threads. The only way to fix this is to force the qpair disconnect onto the data thread. This requires a small change in the way that resets are handled for pcie. Please see the code in reset.c for that change. fixes: `bb01a089` Change-Id: I8a39e444c7cbbe85fafca42ffd040e929721ce95 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/472749 Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-10-31 04:50:59 +00:00
Seth Howell	ae3a9b8f08	nvme_qpair: return -ENXIO when the qpair is failed. This will be the canonical way of informing the user that we have lost the qpair connection somehow. Also update all of the functions that will return -ENXIO to the user. Change-Id: Ic6c7c2d0e07e9d3e857a3476bb6b91fb4b6454fa Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/471416 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-10-22 21:14:22 +00:00
Seth Howell	81b20a4d96	nvme_ctrlr: Allow resets from failed state Failed is not a final state for either fabric or pcie controllers. We have historically not allowed resets in the failed state, but we should. Instead of checking for the failed state, we should check for the removed state. If the controller is removed, then we cannot even attempt a reset. Change-Id: I2c1a3d85db84f84cd1895cbfaf16575c8b496155 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/471415 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-10-22 21:14:22 +00:00
Seth Howell	552898ec17	nvme_qpair: fail the ctrlr only for errors on admin qpair. We shouldn't always fail the whole controller if we get a failure on an individual qpair. Change-Id: Id0c90af83e5231593a895be66e7a7de48939e240 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/471660 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-10-22 21:14:22 +00:00
Seth Howell	4c1a18c41d	nvme_qpair: fix check_enabled. check_enabled had a couple bugs in it that made it unfriendly for enabling I/O qpairs after a reset. 1. It was calling nvme_qpair_abort_queued_requests before setting the enabled flag to true. For applications that submit new I/O in the completion callback for old I/O, this means you enter an infinite loop of submitting requests, and then immediately completing them. SO instead, wait for the qpair to reset, then just submit those requests to the lower layer. 2. It didn't check whether we were already in the middle of calling it, so we could reenter function calls like nvme_qpair_abort_queued_requests. Also, now that we have a coherent state machine for qpairs, we can limit the enabling to a specific state in that state machine. Change-Id: Ie0b74819a6b16839965bced47c33dec967f725a8 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470256 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-10-22 21:14:22 +00:00
Seth Howell	08d4d977e8	nvme: combine qpair->is_connecting and is_enabled These will form the base of a little state machine for managing the nvme qpair structure. Change-Id: If6f6df38cc17221ac8fcb7d8c0d7e2e808897a99 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470534 Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-10-22 21:14:22 +00:00
Seth Howell	4473732398	nvme: allow fabrics commands during reconnect. When doing a reset on an NVMe-oF target with active I/O qpairs, we need to be able to submit fabrics commands on them in order to perform a reset. Currently, resetting a fabric controller with any I/O qpairs active will cause the reset to hang indefinitely. Change-Id: Ic972a301390a4dd64adabedfe01aa4e5253e40b0 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469935 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-10-11 20:13:26 +00:00
Seth Howell	2575aaec5a	nvme: make sure we queue requests in order. My recent changes that introduced batching to queued request resubmission also introduced a regression that can lead to reordering requests before submitting them to the drive. This change prevents that. We wait until inside the internal _nvme_qpair_submit_request function to check for queued entries to avoid queueing a request that has children. If a request that has children gets queued, when we process completions and resubmit the parent, it will result in the children being submitted. Since we only account for the number of requests we completed in the last iteration, some of the child requests may be requeued out of order, or worse, none of the child requests will end up being submitted to the transport and they will all be queued behind previously queued requests. Change-Id: I58e1c458c25fbf3f9f75364f05b1076b166a6212 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470890 Reviewed-by: Ziye Yang <ziye.yang@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-10-11 18:45:13 +00:00
Seth Howell	f5d88e46e2	nvme: always set ctrlr->is_failed through API Use the standard API function to fail the controller in all cases. This patch, and the several following patches are aimed at creating a mechanism for reporting up to the application layer that a controller is failed and or removed. To do this, I use the reset_cb to inform the upper layer that the controller is failed. This also requires changes to how we handle a controller reset to pave the way for doing optional reset retries in the libraries. Change-Id: I06dfce08326c23472a1caa8f6efbac2fd1a720f2 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469635 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2019-10-07 15:05:00 +00:00
Seth Howell	2c68fef058	nvme: move queued request resubmit to generic layer We were already passing up from each transport the number of completions done during the transport specific call. So just use that return code and batch all of the submissions together at one time in the generic code. This change and subsequent moves of code from the transport layer to the genric layer are aimed at making reset handling at the generic NVMe layer simpler. Change-Id: I028aea86d76352363ffffe661deec2215bc9c450 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469757 Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-10-07 15:05:00 +00:00
Seth Howell	afc9800b06	nvme: _nvme_qpair_submit_request does not requeue This will be handled by nvme_qpair_submit_request when it receives -EAGAIN from _nvme_qpair_submit_request. Change-Id: I5e76aae170c981df0cadaadcd5da1163c715006f Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470407 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>	2019-10-07 15:05:00 +00:00
Seth Howell	18dc53c531	nvme: move submit_request impl to a private function This patch series is aimed at preserving the order of qpair entries when resubmitting queued requests. The hope is that we will make the API fool proof and future proof against ever reordering any queued requests. Change-Id: Ib20d61d3abaed637c9c305b75081947630190fd4 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470062 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>	2019-10-07 15:05:00 +00:00
Seth Howell	7630daa204	nvme: move queueing requests to the generic layer The tailq and the requests all belong to the generic layer, might as well put the queueing code there for better encapsulation. Change-Id: Id5f08f798121b50a21044cfc61856999c50ca227 Signed-off-by: Seth Howell <seth.howell@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469758 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2019-09-30 21:17:47 +00:00
Jim Harris	0aa72ffb74	nvme: fix WRITE_TO_RO_RANGE status code WRITE_TO_RO_PAGE was incorrect and misleading. This 0x82 NVMe status code indicates a write to a read-only range of LBAs. So modify the constant name and associated usages to use WRITE_TO_RO_RANGE instead. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I993dbebb5acc2e685a0e99aa14084942ef79d659 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/465083 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-08-14 02:19:49 +00:00
Changpeng Liu	e27421b344	nvme: fix req leaks There are many req leaks when a controller failure occurs during submitting IO. It must free all of the children before freeing the parent req. If a part of the child req has been sent to the back end and a part of the child req fails, removes the failed req from the parent req and the parent req must be retained, freeing the parent req after all of the submitted reqs return. Change-Id: Ieb5423fd19c9bb0420f154b3cfc17918c2b80748 Signed-off-by: Huiming Xie <xiehuiming@huawei.com> Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/461734 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>	2019-07-22 04:15:34 +00:00
James Bergsten	5acf617c6e	nvme: add functions to pretty-print commands and completions This change attempts to address the Trello request to decode I/O errors in NVMe hello_world example. See https://trello.com/c/MzJJw7hM/2-decode-io-errors-in-nvme-helloworld-example As part of this change, spdk_nvme_cpl_get_status_string was declared in nvme.h, and spdk_nvme_qpair_print_command and spdk_nvme_qpair_print_completion were renamed and added to nvme.h, allowing all three to used "externally." To test the failing paths, two compile time defines were added to force a write or read error (bad LBA) respectively. As the example does a read after write, if the write fails, the example fails. Signed-off-by: James Bergsten <jamesx.bergsten@intel.com> Change-Id: Ib94b4a02495eb40966e3f49517a5bdf64485538a Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/457076 Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2019-07-15 07:47:03 +00:00
yidong0635	ff0a7dfc42	nvme: Handle CQ polling failures by marking the controller as failed. nvme_transport_qpair_process_completions calls nvme_rdma_qpair_process_completions There are some cases return -1 due to failure of "CQ errors". Handle CQ polling failures by marking the controller as failed. That a completion with an error will be treated as controller failed. Requests will be aborted after retry counter exceeded. Otherwise, code will keep on reporting errors without recovery. This is to fix issue #850. Change-Id: I0b324232310e107bf7fd5722aca54d402a19b14d Signed-off-by: yidong0635 <dongx.yi@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/460569 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-07-09 01:43:02 +00:00
Darek Stojaczyk	f9a6588f57	nvme: switch to spdk_malloc(). spdk_dma_malloc() is about to be deprecated. Change-Id: I6c308ee546c28c479ceb903bc1749bf5209dc6fe Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448172 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: <uma.willpower@gmail.com>	2019-06-27 04:34:50 +00:00
Jim Harris	b3d884b700	nvme: assign qpair when req is allocated There's no need to set this every time we allocate a request. While here, fix a typo near where we needed to modify the unit test to remove the qpair assertion. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I8af41a6c483415950f625d1ed2ef46088b75a622 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/456270 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-06-04 00:01:35 +00:00
Jim Harris	c85164bd69	nvme: add explicit "inline" keyword to a couple of functions Profiling showed these weren't getting inlined - so add the inline keyword to make sure it happens. This helps improve performance a bit. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: Ia86edccc9163258efdcddcce6989a71fb180caf6 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/456099 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com>	2019-05-30 23:09:16 +00:00
Jim Harris	ef1f844395	nvme: add qpair parameter to nvme_complete_request In some cases we have the qpair already when calling this function. So pass the qpair to avoid having to get it from the request. This shows about a 3% performance improvement for high IOPs single core tests. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I22fcca560492f4e7cf5ffedd252e41a027d0dd79 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/455286 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-05-22 14:51:01 +00:00
Jim Harris	af38d200e6	nvme: add ctrlr option for logging errors Currently the nvme driver will always log any request completed with error status. Some applications may not want this behavior. So provide an option to disable it at the controller level. When this option is enabled, any failed requests from queues associated with that controller (including the admin queue) will not log the failed request. Of course the application will still receive the failed status code and can decide to do its own logging there. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: Ia093fcd23cf321a820fd53183ee7e2dac4f9d378 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/454081 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>	2019-05-14 13:51:44 +00:00
Jim Harris	5309873d39	nvme: add qpair is_connecting flag This will be used on the adminq, and set while the qpair is connecting. It allows the qpair_process_completions routine to know that it should still try to process completions, even if the controller is resetting. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I377b9c934295eb5f45f03efd90c2a268defb4bd4 Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/453938 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>	2019-05-14 08:48:11 +00:00

1 2 3 4

158 Commits