Commit Graph

590 Commits

Author SHA1 Message Date
Alexey Marchuk
4af2b9bfb9 rdma: Fix incoming_queue cleanup when RDMA qpair is destroyed
RDMA qpair might be destroyed by defunct timer, so it can have
active recv elements in incoming_queue. This queue is cleaned
incorrectly, so recv element for the destroyed qpair still may
be presented in the queue and be processed later. That leads
to undefined behaviour.

Fixes #1086

Change-Id: Ieae186b2d2dce4ec88ab886b26165f6ef98e8d05
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/477957
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Community-CI: SPDK CI Jenkins <sys_sgci@intel.com>
2019-12-16 12:31:13 +00:00
Alexey Marchuk
5e3f93a75c rdma: Handle IBV_EVENT_SQ_DRAINED in asynchronous way
Send a message to the qpair thread to avoid modifying qpair
attributes in the acceptor poller thread which handles ibv events

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Change-Id: If685d8b57aa7cb8d29fb1c2c270023c2ed0c1f84
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/476715
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Community-CI: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-12-12 14:42:13 +00:00
Alexey Marchuk
d238788bcc rdma: Handle IBV_EVENT_QP_FATAL in asynchronous way
Send a message to the qpair thread to avoid modifying qpair
attributes in the acceptor poller thread which handles ibv events

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Change-Id: I8ea5658a2b226b0be9838eb375a8b80d15c456c5
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/476714
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Community-CI: SPDK CI Jenkins <sys_sgci@intel.com>
2019-12-12 14:42:13 +00:00
Alexey Marchuk
dc84fbaaa1 rdma: Add synchronization for LAST_WQE_REACHED event
The following scenario might occur when nvmf_tgt is stopped:
1. nvmf_tgt receives SIGINT, changes state to NVMF_TGT_FINI_STOP_SUBSYSTEMS
2. In this state nvmf_tgt stops all subsystems and disconnects associated qpairs
3. In the case of RDMA qpair, its state will be changed to IBV_QPS_ERR.
Once qpair changes the state to IBV_QPS_ERR, RDMA device generates
LAST_WQE_REACHED event when there are no more WQE that can be sonsumed
from the SRQ by this qpair.
4. When all subsystems are stopped, some of qpair may still be alive since they
haven't received LAST_WQE_REACHED event yet.
5. nvmf_tgt stops all poll groups and forcefully destroyes any qpairs linked to them.
6. At this moment LAST_WQE_REACHED event might be generated and received in another thread.
Handler of this event sends a message with a pointer to qpair. The qpair itself may already
be destroyed.
7. Thread that owned qpair receives a message (LAST_WQE_REACHED) with a pointer to alredy destroyed qpair and
destroyes it for the second time when all pointer are invalid.

ibv events related to qpair should be handled by the thread that
owns this qpair. This commit adds a new structure that describes
ibv event, helper functions for sending the event and a list
of events per rdma qpair; add syncronization for LAST_WQE_REACHED event

Fixes #1075

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Change-Id: I22bff89741708df2518760934ecb4e33fad49473
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/476712
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Community-CI: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-12-12 14:42:13 +00:00
Jacek Kalwas
baf588a58b nvmf/rdma: remove unsed define
NVMF_REQ_MAX_BUFFERS is defined in nvmf_internal.h and not used in rdma
source code.

Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com>
Change-Id: I50f24c8fc4ea773378418f7803361d8592f961ae
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/475529
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
2019-12-09 14:03:39 +00:00
Jacek Kalwas
f206551388 nvmf: fix status override in case parse_sgl fails
It is valuable to have more detail status instead
SPDK_NVME_SC_INTERNAL_DEVICE_ERROR.

Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com>
Change-Id: Ifd003b490a7ae9af017645c97636ceaf2f93d4b0
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/476634
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-12-09 14:02:37 +00:00
Changpeng Liu
bc13d02237 nvmf: move transport spdk_nvmf_*_req_get_xfer() function into the common nvmf library
Change-Id: I1619cc9b3feea1feb16282dc6c9cc8d5a380282c
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/475952
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: <jacek.kalwas@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-12-06 14:43:41 +00:00
Tomasz Zawadzki
4ea996ce19 nvmf/rdma: dont refer to rtransport when poll_group_create failed
This issue was found by code inspection.
It only occurs when pdk_nvmf_transport_poll_group_create()
itself calls spdk_nvmf_transport_poll_group_destroy().
Other transports at this time do not show this issue.

spdk_nvmf_rdma_poll_group_destroy() depends on
rgroup being assigned a transport, which is only being
done on generic nvmf layer using
spdk_nvmf_transport_poll_group_create()
after successful spdk_nvmf_rdma_poll_group_create().
When failure occurs during create, such assignment was not
performed so any references to rtransport will segfault.

Reported-by: Jacek Kalwas <jacek.kalwas@intel.com>
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Id54482d562bd6d7c71371306cf1de93bc05f4e8a
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/475002
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
2019-11-20 10:01:43 +00:00
Alexey Marchuk
6a77723eb4 rdma: Use WRs chaining when DIF is enabled
This patch adds the following:
1. Change signature of nvmf_rdma_fill_wr_sge - pass ibv_send_wr ** in
order to update caller's variable, add a pointer to the number of extra WRs
2. Add a check for the number of requested WRs to nvmf_request_alloc_wrs
3. Add a function to update remote address offset

Change-Id: I26f6567211b3ebfdb4981a7499f6df25e32cbb3a
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470475
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-10-18 17:28:52 +00:00
Alexey Marchuk
653496d23f rdma: Calculate and allocate the number of WRs required to process the request
RDMA WR has a predefined number of SGL entries (16) and with new logic added to
support DIF the number of entries required to process the request may exceed
this limit since IO unit buffers might be divided into several parts to remove
metadata chunks from the transmition. This change calculates and allocates
required number of WRs.
Error handling section in spdk_nvmf_rdma_request_fill_iovs has been updated
to free allocated WRs

Change-Id: Ie5d659d8305a454949827d1f4aff6d871b7e825d
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470474
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-10-18 17:28:52 +00:00
Alexey Marchuk
838c45c856 rdma: Move rdma WR setup to spdk_nvmf_rdma_request_fill_iovs
This change helps to avoid passing the number of additional WRs to
spdk_nvmf_rdma_request_fill_iovs in the following commits and minimizes
changes in spdk_nvmf_rdma_request_parse_sgl

Change-Id: Id530d0996af661051d94930e3f776bad2dcc5771
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470473
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-10-18 17:28:52 +00:00
Jan Kryl
0a04c076ea nvmf: Add context parameter to new_qpair() callback
It can be useful for passing additional information about nvmf
target to a handler for new nvmf connections. Context can be
stored in globals as it is currently done in nvmf code. However
in case of multiple targets or languages where accessing global
state is challenging (i.e. Rust), this becomes inconvenient.

Change-Id: Ia6a2fdba4601531822b3e5fda7ac5ab89d46f6c5
Signed-off-by: Jan Kryl <jan.kryl@mayadata.io>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469263
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Sasha Kotchubievsky <sashakot@mellanox.com>
2019-10-17 16:29:36 +00:00
Alexey Marchuk
568f4d2b3a rdma: Add a function to set WR parameters depending on xfer type.
Use this function in nvmf_request_alloc_wrs and nvmf_rdma_setup_request
to eliminate code duplication

Change-Id: I771e0e899043049c90298e050d4353145e36f0d1
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/471034
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-10-16 15:54:08 +00:00
Alexey Marchuk
a335a5245a rdma: Move rdma wr specific initialization to a separate function
Delete a pointer to spdk_nvme_cmd as it is not used directly

Change-Id: I36a6a6d95c0707f446a0797a55a9e60c62f9503c
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470472
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-10-16 15:54:08 +00:00
Alexey Marchuk
06481fc223 rdma: Check -EINVAL return value of spdk_nvmf_rdma_request_fill_iovs
Return -1 from spdk_nvmf_rdma_request_parse_sgl in the case of -EINVAL since 0 value
returned by spdk_nvmf_rdma_request_parse_sgl is treated as no memory case

Change-Id: I536592260a3bb658a1a4bc3a79c5b37fbacd3edc
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470471
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-10-16 15:54:08 +00:00
Alexey Marchuk
6ec974ed07 rdma: Update request length in multi SGL case when dif_insert_or_strip is enabled
Change-Id: I70e82b1dc2c0db5db5b5835fed5995d5d0156971
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470470
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-10-16 15:54:08 +00:00
Richael Zhuang
644fc12061 nvmf: upgrade to C11 atomics
Replace legacy __sync builtins with C11 __atomic ones to leverage fine
memory order controlling.

Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Change-Id: I8a1e976be6a0db73af4451a39a8f544310c72989
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/463458
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
2019-10-15 17:00:07 +00:00
Alexey Marchuk
e1101529e5 rdma: Use nvmf_request dif structure
Change-Id: I1f8eb4300f892905f18ccb6327a5abc30dc44de3
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470468
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-10-11 15:36:19 +00:00
Shuhei Matsumoto
6b05c10930 nvmf/rdma: Change iovpos of struct nvmf_rdma_request from int to uint32_t
The type of iovcnt of struct spdk_nvmf_request is uint32_t, and so
change the type of iovpos of struct spdk_nvmf_rdma_request from int
to uint32_t.

iovpos of struct spdk_nvmf_rdma_request is only incremented and
accessed. It is not used for comparison.

So to avoid rerunning CI, this fix is appended to the patch series.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I65fc5dfb7067f6e8f7cb1e555f010b246a72ec32
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469660
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-10-01 14:04:19 +00:00
Shuhei Matsumoto
68ee93aac7 nvmf/rdma: Pass pointer to iovec directly to nvmf_rdma_fill_wr_sge()
nvmf_rdma_fill_wr_sge() gets pointer to iovec at its head, but
nvmf_rdma_fill_wr_sgl() can pass it to nvmf_rdma_fill_wr_sge()
simply.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I16176d5d36ca9daf57640bfcbc49dfbf997afe54
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469639
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-09-30 21:11:52 +00:00
Shuhei Matsumoto
91f9c6f376 nvmf/rdma: Simplify nvmf_rdma_request_parse_sgl() by cached pointers
Pointers to struct spdk_nvmf_request and struct ibv_send_wr are
used in many lines of spdk_nvmf_rdma_request_parse_sgl().

Caching and using them simplifies and improves readability a little
for spdk_nvmf_rdma_request_parse_sgl().

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ib000c9d4e7fb7bb415f4ac4622b32b12cc787c80
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469537
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-09-30 21:11:52 +00:00
Shuhei Matsumoto
fda0e558a1 nvmf/rdma: Return rc simply when spdk_nvmf_request_get_buffers/_multi fails
spdk_nvmf_request_get_buffers()/_multi() may return not only -ENOMEM
but also -EINVAL, but spdk_nvmf_rdma_request_fill_iovs() and
nvmf_rdma_request_fill_iovs_multi_sgl() had returned -ENOMEM
regardless of the actual return value. Fix them in this patch.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ic19593ffa9c0731f63d198d4ae16feb3bb47f57c
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469378
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-09-30 21:11:52 +00:00
Shuhei Matsumoto
0462157650 nvmf/rdma: Add spdk_nvmf_request_get_buffers_multi() for multi SGL case
This patch is the end of the effort to unify buffer allocation
among NVMe-oF transports.

This patch aggregates multiple calls of spdk_nvmf_request_get_buffers()
into a single spdk_nvmf_request_get_buffers_multi().

As a side effect, we can move zeroing req->iovcnt into
spdk_nvmf_request_get_buffers() and spdk_nvmf_request_get_buffers_multi()
and do it in this patch.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I728bd330a1f533019957d58e06831a79fc17e382
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469206
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-09-30 21:11:52 +00:00
Shuhei Matsumoto
c0ee8ef7d5 nvmf: Merge each transport's fill_buffers() into spdk_nvmf_request_get_buffers()
This patch is close to the end of the effort to unify buffer allocation
among NVMe-oF transports.

Merge each transport's fill_buffers() into common
spdk_nvmf_request_get_buffers() of the generic NVMe-oF transport.

One noticeable change is to set req->data_from_pool to true not in
each specific transport but in the generic transport.

The next patch will add spdk_nvmf_request_get_multi_buffers() for
multi SGL case of RDMA transport.

This relatively long patch series is a preparation to support
zcopy APIs in NVMe-oF target.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Icb04e3a1fa4f5a360b1b26d2ab7c67606ca7c9a0
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469205
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-09-30 21:11:52 +00:00
Shuhei Matsumoto
16365fd802 nvmf/rdma: Merge filling wr->sg_list of non DIF case and DIF case
This patch merges nvmf_rdma_fill_wr_sgl_with_md_interleave()
into nvmf_rdma_fill_wr_sge(), and then removes
nvmf_rdma_fill_wr_sgl_with_md_interleave().

In nvmf_rdma_fill_wr_sgl(), pass DIF context, remaining data block
size, and offset to nvmf_rdma_fill_wr_sge() in the while loop.
For non DIF case, initialize all of them by zero.

In nvmf_rdma_fill_wr_sge(), classify non-DIF case and DIF case
by checking if DIF context is NULL.

As a minor change of wording, remaining is sufficiently descriptive
and simpler than remaining_io_buffer_length and so use remaining.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I55ed749c540ef34b9a328dca7fd3b4694e669bfe
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469350
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-09-30 21:11:52 +00:00
Shuhei Matsumoto
b48a97d454 nvmf/rdma: Separate filling wr->sg_list from filling req->iov for DIF case
This patch separates filling wr->sg_list from filling req->iov
in nvmf_rdma_fill_buffers_with_md_interleave() and create an new helper function
nvmf_rdma_fill_wr_sgl_with_md_interleave() to fill wr->sg_list by adding iovcnt to
struct spdk_nvmf_rdma_request.

The subsequent patches will merge nvmf_rdma_fill_buffers() into
spdk_nvmf_request_get_buffers().

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I03206895e37cf385fb8bd7498f2f4a24797c7ce1
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469204
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-09-30 21:11:52 +00:00
Shuhei Matsumoto
5593b61f93 nvmf/rdma: Separate filling wr->sg_list from filling req->iov
This patch separates filling wr->sg_list from filling req->iov
in nvmf_rdma_fill_buffers() and create an new helper function
nvmf_rdma_fill_wr_sgl() to fill wr->sg_list by adding iovcnt to
struct spdk_nvmf_rdma_request.

The next patch will do the same change for
nvmf_rdma_fill_buffers_with_md_interleave().

The subsequent patches will merge nvmf_rdma_fill_buffers() into
spdk_nvmf_request_get_buffers().

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ia4cdf134df39997deb06522cbcb6af6666712ccc
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469203
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-09-30 21:11:52 +00:00
Shuhei Matsumoto
5ea1c95947 nvmf/rdma: Update only iov_base when buffer replacement succeeds
When buffer replacement succeeds, only iov_base has to be updated.
This change is small but will be helpful to disaggregate buffer
allocation and filling WR SGL.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ifc72fd783b515dfaecac04939c183097f939e29b
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469202
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-09-30 21:11:52 +00:00
Shuhei Matsumoto
7fc89387da nvmf/rdma: Factor out setup WR operation from nvmf_rdma_fill_buffers_with_md_interleave()
Factor out setup WR operation from nvmf_rdma_fillbuffers_with_md_interleave()
into a function nvmf_rdma_fill_wr_with_md_interleave().

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I92689daa7dcc93aaa68ecf5706d4e1b75d7fabae
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469066
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-09-27 20:03:16 +00:00
Shuhei Matsumoto
8bbde3758a nvmf/rdma: Cleanup nvmf_rdma_fill_buffers_with_md_interleave()
This patch
- applies nvmf_rdma_get_lkey(),
- changes pointer to struct iovec from iovec to iov,
- changes pointer to ibv_sge from sg_list to sg_ele, and
- passes DIF context instead of decoded data block size and metadata size
- use cached pointer to nvmf_request to call
- change the ordering of operations to setup sg_ele slightly
for nvmf_rdma_fill_buffers_with_md_interleave().

Name changes are from the previous patch.

They are for consistency with nvmf_rdma_fill_buffers() and a
preparation for the next patch.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I942fb9d07db52b9ef9f43fdfa8235a9e864964c0
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469201
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-09-27 20:03:16 +00:00
Shuhei Matsumoto
c2f60ea452 nvmf/rdma: Move nvmf_rdma_get_lkey() up in a file
This reduces the diff in the next patch.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I85dccdc1a1a5a51777934121f50a6af97feda5a5
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469480
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-09-27 20:03:16 +00:00
Shuhei Matsumoto
5dbe96a053 nvmf/rdma: Factor out getting lkey and checking length in fill_wr_sge() into get_lkey()
Factor out getting lkey and checking translation length in
nvmf_rdma_fill_wr_sge() into a function nvmf_rdma_get_lkey().

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I495ba9ae4a48b4aa7dc35a0bd72708753846dfdc
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469349
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-09-26 16:12:28 +00:00
Shuhei Matsumoto
5e8858d7a3 nvmf/rdma: Simplify nvmf_rdma_fill_wr_sge() by using cached pointers
Cache pointers to iovec and ibv_sge at the head of the function
and use them throughout.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I493759bf3989ced4390d077280cd44c122847d08
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469348
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-09-26 16:12:28 +00:00
Shuhei Matsumoto
89a28bfd1b nvmf/rdma: Factor out WR SGE setup in fill_buffers() into fill_wr_sge()
Factor out setup WR operation from nvmf_rdma_fill_buffers() into a
function nvmf_rdma_fill_wr_sge().

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I813f156b83b6e1773ea76d0d1ed8684b1e267691
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468945
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
2019-09-26 16:12:28 +00:00
Shuhei Matsumoto
7c7a0c0a68 nvmf: Pass not num_buffers but length to spdk_nvmf_request_get_buffers()
The subsequent patches unifies getting buffers, filling iovecs, and
filling WRs in a single API. This is a preparation.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I077c4ea8957dcb3c7e4f4181f18b04b343e9927d
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468953
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
2019-09-26 16:12:28 +00:00
Shuhei Matsumoto
aa6964e585 nvmf/rdma: Call spdk_nvmf_request_get_buffers() not once but per WR
This is a preparation to unify getting buffers, filling iovecs,
and filling WRs in a single API in RDMA transport and then to unify
it among RDMA, TCP, and FC transport.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ia69d4409c8cccaf8d7298706d61cd4e2d35e4406
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468944
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
2019-09-26 16:12:28 +00:00
Shuhei Matsumoto
22cd4fe2ce nvmf: Check buffer array overflow in spdk_nvmf_request_get_buffers()
This patch makes multi SGL case possible to call spdk_nvmf_request_get_buffers()
per WR.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I977ebb0c6b2a67218c9b6fc20dc26a93a6ec770b
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468943
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
2019-09-26 16:12:28 +00:00
Shuhei Matsumoto
79945ef0ed nvmf: Hold number of allocated buffers in struct spdk_nvmf_request
This patch makes multi SGL case possible to call spdk_nvmf_request_get_buffers()
per WR.

This patch has an unrelated fix to clear req->iovcnt in
reset_nvmf_rdma_request() in UT. We can do the fix in a separate patch
but include it in this patch because it is very small.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: If6e5af0505fb199c95ef5d0522b579242a7cef29
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468942
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-09-26 16:12:28 +00:00
Shuhei Matsumoto
410455e40b nvmf/rdma: Reorder allocation of WRs and buffers in multi SGL case
This patch matches the ordering of single SGL case and multi SGL
case for parsing SGL.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Iea026b48e8957e140b71db7afaf8aca88634dc33
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468941
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
2019-09-26 16:12:28 +00:00
Shuhei Matsumoto
5e298147b8 nvmf/rdma: Remove duplicated zeroing req->iovcnt
spdk_nvmf_request_get_buffers() doesn't access req->iovcnt and so
zeroing req->iovcnt before calling spdk_nvmf_request_get_buffers()
can be removed.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I8c18bd7e6eaa7c89361bec9293dc4ff4057098a0
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469200
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
2019-09-26 16:12:28 +00:00
Shuhei Matsumoto
f0c212614a nvmf/rdma: Fix req->length was not cleared when nvmf_rdma_fill_buffers() fails
In nvmf_rdma_requst_fill_iovs_multi_sgl(), length of descriptors
are accumulated into req->length. However, req->length was not cleared
when nvmf_rdma_fill_buffers() fails in the middle. This patch fixes it.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I80a55d90d09c8af46d570e017d342afd69f41996
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469199
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
2019-09-26 16:12:28 +00:00
Shuhei Matsumoto
d409da0c84 nvmf/rdma: Fix wrong clear of WR SGL when rdma_fill_buffers() fails
wr->num_sge has to be used in spdk_nvmf_rdma_request_fill_iovs(),
and memset() can be used instead of clearing each variable.

Besides, holding cached pointer to the current WR simplifies the
code a little and so is done together in this patch.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Iebda158f85e3a0e3046686f76991217fa7297c24
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469198
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
2019-09-26 16:12:28 +00:00
Alexey Marchuk
73e87ed244 rdma: Add balanced connection scheduling
The current connection scheduling mechanism (RoundRobin) doesn't take into account the qpair type and assigns each new qpair to the next poll group. As a side effect there might occur a disbalance when some poll group handles more IO qpairs than others. In RDMA transport it is possible to get the qpair type before the controller creation using a private data from the rdma_cm event, this allows to schedule admin and IO qpairs in the balanced way.

Change-Id: I90c368a41c4cd0f5347a83cab7511e4494f05b29
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468993
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
2019-09-25 21:08:15 +00:00
Alexey Marchuk
645d5944f2 rdma: Store poll groups in RDMA transport.
Operations with poll groups list must be protected by rtransport->lock.
Make rtranposrt->lock recursive to avoid unnecessary mutex operations when
the poll group is being destroyed within spdk_nvmf_rdma_poll_group_create

Change-Id: If0856429c10ad3bfcc9942da613796cc86d68d8d
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468992
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-09-25 21:08:15 +00:00
yidong0635
5e9cea267e rdma: Fix scanbuild warning for gcc9+.
This issue can be reproduced on fedora30.
Add assert here is enough to fix this kind of warning.

Error log:
rdma.c:3070:20: warning: Access to field 'data_buf_pool' results in a
dereference of a null pointer (loaded from field 'transport')
                spdk_mempool_put(group->transport->data_buf_pool, buf);
                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.

This is to fix issue #965.

Change-Id: Ifb742ab914ee9a0381dca0bb769ba8aa564c816f
Signed-off-by: yidong0635 <dongx.yi@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468908
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-09-20 21:43:25 +00:00
Alexey Marchuk
7545e8c829 rdma: add DIF support for read operation
Add DIF verification after IO operation completion

Change-Id: Iaf4f29d07ca84b0341498eb4e44fc8cc159ecb9c
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/465249
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-09-20 16:07:46 +00:00
Alexey Marchuk
1bc5710a9e rdma: Add DIF support for write operation
Update transaction length wrt to medata size
Change buffers handling in the case of enabled DIF - add function nvmf_rdma_fill_buffer_with_md_interleave to split SGL into several parts with metadata blocks between them in order to perform RDMA operation with appropriate offsets
Add DIF generation before executing bdev IO operation
Add parsing of DifInsertOrStrip config parameter.
Since there is a limitation on the number of entries in SG list (16), the current approach has a limitation on the max transaction size which depends on the data block size. E.g. if data block size is 512 bytes then the maximum transaction size will be 512 * 16 = 8192 bytes.
In adiition, the size of IO buffer (IOUnitSize conf param) must be aligned to metadata size for better perfromance since metadata is treated as part of this buffer. E.g. if the initiator uses transaction size = 4096, data block size on nvme disk is 512 then IO buffer size should be aligned to (512 + 8) which is 4160. In other case an extra IO buffer will be consumed which will increase the number of entries in SGL and in iov.

Change-Id: I7ad2270fe9dcceb114ece34675eac44e5783a0d5
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/465248
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-09-20 16:07:46 +00:00
yidong0635
024127dcfd rdma: Add return value check for memory map notify.
Now code always return 0 , do this like nvme_rdma_mr_map_notify.
That callback can get the right return.

Change-Id: Ief2924e14321b2062f6001e7ae3f50d507206594
Signed-off-by: yidong0635 <dongx.yi@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468663
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-09-19 01:55:55 +00:00
Seth Howell
8126509c4f rdma: replace improperly aligned buffers in requests.
It is a very rare thing for a buffer to be split over two memory
regions. In fact, it is only possible in dpdk versions where
--match-allocations is not passed as a startup parameter to dpdk but
dynamic memory allocation is enabled.

By adding a small helper function, we avoid failing an I/O because it
was assigned one of these improperly aligned buffers. Also, we try to
remove the buffer from circulation so that it doesn't get picked up
again by another request.

Also, add a unit test to catch this case.

Change-Id: Ia09865c2f77160a960571665b29c4533b11758ae
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/467446
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
2019-09-17 19:43:01 +00:00
Seth Howell
98233769f4 rdma: simplify nvmf_rdma_fill_buffers
Just cleaning up a few things like variable names and ordering to make
the whole function more readable.

Change-Id: I1503cdb43ddd73e063d6e57e9ff0cf2a06e79728
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/467445
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
2019-09-17 19:43:01 +00:00
Evgeniy Kochetov
87ebcb08c1 nvmf/rdma: Handle completions for destroyed QP associated with SRQ
IB Architecture Specification vol.1 rel.13. in ch.10.3.1 "QUEUE PAIR
AND EE CONTEXT STATES" suggests the following destroy procedure for
QPs associated with SRQ:
- Put the QP in the Error State;
- wait for the Affiliated Asynchronous Last WQE Reached Event;
- either:
  * drain the CQ by invoking the Poll CQ verb and either wait for CQ
    to be empty or the number of Poll CQ operations has exceeded CQ
    capacity size; or
  * post another WR that completes on the same CQ and wait for this WR
    to return as a WC;
- and then invoke a Destroy QP or Reset QP.

Without the drain step it is possible that LAST_WQE_REACHED event is
received and QP is destroyed before the last receive WR completion is
polled from the CQ.

In SPDK there is no risk of resource leakage in this case. So, instead
of draining we can destroy QP and then just ignore receive completions
without QP and post receive WRs back to SRQ.

Fixes #903

Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ice6d3d5afc205c489f768e3b51c6cda8809bee9a
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/465747
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-09-12 17:04:48 +00:00
Michal Ben Haim
62615117f7 SPDK: changing TREQ value from 'not specified' to 'not required'.
Signed-off-by: Michal Ben Haim <michal.benhaim@kaminario.com>
Change-Id: Ia7bda5b18db24df97172d4500a499c4635d592d5
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/467499
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: John Kariuki <John.K.Kariuki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-09-10 17:51:26 +00:00
Shuhei Matsumoto
9796768132 nvmf: Move pending_data_buf_queue to common struct spdk_nvmf_transport_poll_group
This unifies buffer management among transports further and is a
preparation to make buffer allocation asynchronous.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I8c588eeac4081f50fe32605feb7352f72c628d95
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466847
Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Anil Veerabhadrappa <anil.veerabhadrappa@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-09-09 00:42:22 +00:00
Shuhei Matsumoto
0b068f8530 nvmf/rdma: Pass nvmf_request to nvmf_rdma_fill_buffers
Most variables related with I/O buffer are in struct spdk_nvmf_request
now. So we can pass nvmf_request instead of nvmf_rdma_request to
nvmf_rdma_request_fill_buffers and do it in this patch.

Additionally, we use the cached pointer to nvmf_request in
spdk_nvmf_rdma_request_fill_iovs which is the caller to
nvmf_rdma_request_fill_buffers in this patch.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ia7664e9688bd9fa157504b4f5075f79759d0e489
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466212
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
2019-08-30 16:56:46 +00:00
Shuhei Matsumoto
85b9e716e9 nvmf/rdma: Replace RDMA specific get/free_buffers by common APIs
Use spdk_nvmf_request_get_buffers() and spdk_nvmf_request_free_buffers(),
and then remove spdk_nvmf_rdma_request_free_buffers() and
nvmf_rdma_request_get_buffers().

Set rdma_req->data_from_pool to false after
spdk_nvmf_request_free_buffers().

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ie1fc4c261c3197c8299761655bf3138eebcea3bc
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/465875
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-08-29 18:17:38 +00:00
Shuhei Matsumoto
005b053a02 nvmf: Move data_from_pool flag to common struct spdk_nvmf_request
This is a prepration to unify buffer management among transports.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I6b1c208207ae3679619239db4e6e9a77b33291d0
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466002
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-08-29 18:17:38 +00:00
Shuhei Matsumoto
04ae83ec93 nvmf: Move allocated buffer pointers to common struct spdk_nvmf_request
This is a preparation to unify buffer management among transports.
struct spdk_nvmf_request already has SPDK_NVMF_MAX_SGL_ENTRIES (16) * 2
iovecs. Hence incresing the number of buffers twice will be no problem.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Idb525abbf35dc9f4b8547b785b5dfa77d106d8c9
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/465873
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
2019-08-29 18:17:38 +00:00
Evgeniy Kochetov
01887d3c96 nvmf/rdma: Fix data WR release
One of stop conditions in data WR release function was wrong. This
can cause release of uncompleted data WRs. Release of WRs that are
not yet completed leads to different side-effects, up to data
corruption.

The issue was introduced with send WR batching feature in commit
9d63933b7f.

This patch fixes stop condition and contains some refactoring to
simplify WR release function.

Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ie79f64da345e38038f16a0210bef240f63af325b
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466029
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-08-29 18:09:14 +00:00
Jacek Kalwas
8a14af685b nvmf/rdma: fix missing destory qp
From rdma_cma.h "Users must destroy any QP associated with an
rdma_cm_id before destroying the ID."

Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com>
Change-Id: I5ed0c25221c5401cdde8b31a4e217b9d79e7caaa
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/464290
Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-08-08 20:07:11 +00:00
Seth Howell
59a3afa0ff nvmf/rdma: pass iov_base to spdk_mem_map_translate
We should be checking directly against the base of the iov when doing
memory map translations. The current behavior is to check against the
starting address of the buffer which is a close address, but not exactly
the same.

Change-Id: I7f65224a6836a814708438f2866d84ae22882216
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/463893
Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: <jiandong.zheng@broadcom.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-08-02 07:15:36 +00:00
Jacek Kalwas
db0c7f6a4f nvmf/rdma: fix missing return statement
In case of failure during resource allocation within poll_group_create
there is a lack of return statement which could lead to NULL ptr
dereference.

Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com>
Change-Id: I84abe64a1843117d76b97e62656bdfc4fe2b35d8
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/463195
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-08-02 03:55:32 +00:00
Evgeniy Kochetov
c9c80e6932 nvmf/rpc: Fix io channel reference counting in NVMf statistics
NVMf statistics functions use spdk_get_io_channel function to get a
poll group. It increases reference counter in io channel and causes
problems on application exit. spdk_put_io_channel calls were added to
release the channel.

Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Change-Id: I832d1eae346c3bc3858ed0ed063ff7a7a897a2f5
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/463389
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-07-29 18:05:09 +00:00
Evgeniy Kochetov
fbe8f8040c nvmf/rdma: Add request latency statistics
This patch adds measurement of time request spends from the moment it
was polled till completion.

Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Change-Id: I1fcda68735f2210c5365dd06f26c10162e4ddf33
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/445291
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-07-26 20:30:00 +00:00
Evgeniy Kochetov
251db8144f nvmf/rdma: Add NVMf RDMA transport pending statistics
This patch adds statistics for pending state in NVMf RDMA subsytem
which may help to detect lack of resources and adjust configuration
correctly.

Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Change-Id: I9560d931c0dfb469659be42e13b8302c52912420
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452300
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-07-26 20:30:00 +00:00
Evgeniy Kochetov
38ab383a8f nvmf/rdma: Add RDMA polling statistics
RDMA polling statistics: number of polls and number of completion
entries returned.

Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Change-Id: Iabcf2cb6f6a35f595b89b58cdfcd177a637dda13
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/445289
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-07-26 20:30:00 +00:00
Evgeniy Kochetov
43bb4e6b1f rpc: Add NVMf transport statistics to nvmf_get_stats RPC method
This patch adds transport part to nvmf_get_stats RPC method and basic
infrastructure to report NVMf transport specific statistics.

Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Change-Id: Ie83b34f4ed932dd5f6d6e37897cf45228114bd88
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452299
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-07-26 20:30:00 +00:00
Alexey Marchuk
f0b7a6e7d1 rdma: fix possible double free on qpair destruction
Update rqpair->last_wqe_reached in the context of thread that owns qpair's poll group to avoid possible double free
This patch fixes #858

Change-Id: If5422944b7928c2cc05af528fbcc4482aeef22df
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/462012
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Lorne Li <lorneli@163.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-07-23 22:56:57 +00:00
Alexey Marchuk
5282edfd7b rdma: fix double free of qpair struct in case of failed initialization
qpair structure is freed and an error code is returned to the caller in the case of failed qpair initialization in function spdk_nvmf_rdma_qpair_initialize (e.g. bad return value of rdma_create_qp).
The return code is handled by nvmf_tgt_poll_group_add function which destroys the qpair for the second time.
This patch fixes #857

Change-Id: I0773652ecccbbd634ad272106e0a93c1e591d7d2
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/462011
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Lorne Li <lorneli@163.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-07-23 22:55:43 +00:00
lorneli
ba323d44ca nvmf/rdma: log spdk_nvmf_rdma_destroy_defunct_qpair
Func spdk_nvmf_rdma_destroy_defunct_qpair is a "last chance option"
to destroy qp manually if some driver/hardware doesn't drain qp's
failed wr as expected.

There's a probability that ibv_poll_cq polls wr of the destoryed qp
after spdk_nvmf_rdma_destroy_defunct_qpair's execution. Although in
practice the risk of this situation is minimal(if not non-existent),
add a log here so that we could detect this situation easily.

Change-Id: Ifa9534397513bcea34c18fbb8168eef8f53599c1
Signed-off-by: lorneli <lorneli@163.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/462441
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-07-23 19:35:16 +00:00
lorneli
b4d3066890 nvmf/rdma: defer qp destruction until nvmf layer closes qp
Currently rqpair will be destroyed directly in ibv_poll_cq path
if it has been drained, regardless of whether there are outstanding
I/Os issued to bdev layer. So after outstanding I/Os completing,
spdk_nvmf_rdma_close_qpair will be called from nvmf layer, accessing
a destroyed qp.

This path defers qp destruction in nvmf_rdma_destroy_drained_qpair
func until nvmf layer closes qp.

Fixes 851

Change-Id: I8bcce66f8053ddb105702ac603d5d73af54bdcfc
Signed-off-by: lorneli <lorneli@163.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/461237
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-07-23 19:35:16 +00:00
Alexey Marchuk
0754417fa9 rdma: Use optimal ceiling integer division
This form of the celinig division allows to remove an extra condition

Change-Id: I8a2de792172ec9115563e7fb914745c476f16e8d
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/462198
Reviewed-by: Seth Howell <seth.howell@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-07-22 09:22:11 +00:00
Darek Stojaczyk
96ec8bff78 nvmf/rdma: switch to spdk_*malloc()
spdk_dma_*malloc() is about to be deprecated.

Change-Id: I5bcac50baca785255eb068086e67c07d120b042f
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/459432
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-07-17 01:28:57 +00:00
Jacek Kalwas
e95e4028c1 nvmf/rdma: exclude getaddrinfo from lock
No need to have it under lock. Additionally in case of failure
there was a lack of rdma_destroy_id(). This is addresed within this
change as well.

Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com>
Change-Id: Idbb36d51ad4ef7ef81051463f56efc87ef00c966
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/462054
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-07-17 01:03:36 +00:00
Jacek Kalwas
0d4a5f7e69 nvmf/rdma: free list of devices
In case of failure during pd or map allocation freeing list of devices
was missing.

Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com>
Change-Id: If62f7b072f3894fd1a7e856c19b4ea51646dd20e
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/462079
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-07-17 00:59:34 +00:00
Jacek Kalwas
114a067738 nvmf/rdma: pd null check
In case of pd allocation by nvmf hooks there is a lack of null
check as oposed to pd allocation by ibv_alloc_pd.

Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com>
Change-Id: Iead6e0332bdee3da4adb6e657af298215c4e2196
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/461576
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-07-16 01:29:03 +00:00
Hailiang Wang
73a171a07c rdma: assert ibv_send_wr is not NULL
Vhost testing crashed from Nightly testing, because a member
access within null pointer of type 'struct ibv_send_wr'.

Change-Id: If8f34f23864883ea73516d2d1fe3b30137c04316
Signed-off-by: Hailiang Wang <hailiangx.e.wang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/458913
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-06-25 13:37:15 +00:00
Alexey Marchuk
53777de855 rdma: Unset IBV_SEND_SIGNALED flag for RDMA_WRITE operations
Unsetting this flag will decrease the number of WRs retrieved during CQ polling and will decrease
the oeverall processing time. Since RDMA_WRITE operations are always paired with RDMA_SEND (response),
it is possible to track the number of outstanding WRs relying on the completed response WR.
Completed WRs of type RDMA_WR_TYPE_DATA are now always RDMA_READ operations.

The patch shows %2 better peformance for read operations on x86 machine. The performance was measured using perf with the following parameters:
-q 16 -o 4096 -w read -t 300 -c 2
with nvme null device, each measurement was done 4 times

avg IOPS (with patch): 865861.71
avg IOPS (master): 847958.77

avg latency (with patch): 18.46 [us]
avg latency (master): 18.85 [us]

Change-Id: Ifd3329fbd0e45dd5f27213b36b9444308660fc8b
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Evgenii Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/456469
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-06-11 18:07:28 +00:00
Jim Harris
bf647c168a nvmf: increase default max num qps to 128
This matches the Linux kernel target.  Users can
still decrease this default when creating the
transport (i.e. -p option for nvmf_create_transport
in rpc.py).

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Icad59350a2cd35cfc4ad76d06399345191680c05

Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/454820
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-05-22 14:50:05 +00:00
Seth Howell
61948a1ca7 rdma: add check for allocating too many SRQ.
We could run into issues with this if we were using an arbitrarily large
amount of cores to run SPDK.

Change-Id: Ia7add027d7e6ef1ccb4a69ac328dbdf4f2751fd8
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452250
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-05-15 20:29:32 +00:00
Seth Howell
14777890a6 rdma: add an stailq for qpairs pending recv
This will help us not iterate through the whole list of connections when
only some of them have pending recvs.

Change-Id: I681bc98befbdda4e77ef333b7a086c08b2708eb3
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449266
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-05-13 22:09:55 +00:00
Seth Howell
c3884f943c rdma: batch rdma recvs per poll.
This will help save MMIO overhead. Especially in the SRQ case.

Change-Id: I6fb70cf6de4763450f97961f41ccdce3acec2e63
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449265
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-05-13 22:09:55 +00:00
Seth Howell
b4dc10fbb7 rdma: create a list for qpairs pending send transfers
By creating a list of qpairs, we can avoid looping over every connected
qpair to process sends each time we poll.

Change-Id: If24bbc363176f52fbfb756d56719edd885a21a11
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449264
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-05-10 22:24:35 +00:00
Seth Howell
9d63933b7f rdma: batch rdma sends.
By batching ibv sends each time we poll, we can reduce the number of
MMIO writes that we do.

Change-Id: Ia5a07b0037365abfa8732629c34d34a9ed49ac70
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449253
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-05-10 22:24:35 +00:00
Seth Howell
350e429a57 rdma: add a flag for disabling srq.
There are cases where srq can be a detriment. Add a flag to allow users
to disable srq even if they have a piece of hardware that supports it.

Change-Id: Ia3be8e8c8e8463964e6ff1c02b07afbf4c3cc8f7
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452271
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-05-06 18:11:13 +00:00
Jim Harris
b6206d657c trace: shorten max name from 44 to 24 characters
This restriction helps reduce the amount of padding when
printing out the event trace, allowing it to fit in a
small number of columns.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ifa31e5a6967c7b9bc7028069effb71533f80596f

Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452736
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-05-02 08:41:56 +00:00
Jim Harris
617184be3b trace: remove short_name
This was not used by any of the trace register descriptions.
Let's remove it rather keeping it around if we don't need it.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Idda809e2911db5be555ff6aa13695484a14bf665

Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452734
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
2019-05-02 08:41:56 +00:00
Seth Howell
6cc18a64aa rdma.c: Don't set recv->qpair to NULL
We can use the rpoller->srq to check if a qpair is valid when processing
recv completions.

Change-Id: I6aa360adc48a3312ddcf79f10e2a65b502a7314f
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452247
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-05-01 18:48:13 +00:00
Seth Howell
33f60621af lib: resize key mempools
Mempools are based off of a ring structure which allocates its elements
as a power of two. It also only exposes n-1 elements to the user. So
when we create a mempool with 2^n elements in it, we have to allocate a
ring with 2^n+1 entries. By decreasing the number of elements in these
key mempools by 1, we can save a decent amount of memory.

Change-Id: I942c9dd4cf59096969bc2559fb46fd2084a07f09
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448875
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-05-01 17:45:29 +00:00
Seth Howell
d05c553827 rdma: don't spam people with async event messages.
It used to be that we would get async events very infrequently. However,
with the introduction of SRQ, this number has gone up tremendously.
Change the way we report our these events so that we don't spam/confuse
people running the target.

Change-Id: I33070281fa854cbc17784d61bbbb870196ca8780
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452159
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-04-26 18:10:56 +00:00
Seth Howell
ec47f92b9b rdma: fix potential heap-use-after-free in srq shutdown
If there are outstanding recvs for a qpair when it is destroyed, we need
to clear the qpair from it before reposting it. Otehrwise, we have a
potential heap-use-after-free of double free (depending on whether the
recv completion is in error state or not).

See github issues #730

Change-Id: Ic2009c761cbcc5e89174f62fbd0872d0489c67ca
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452122
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-04-26 11:16:22 +00:00
Seth Howell
89d2efe07e rdma: set the srq param in the initiator.
We were setting this value in the target from our initiator, but it
turns out the rdma_conn_params struct is responsible for setting the
opposite side so we need to add it in the target side when accepting
connections.

Also, add a test to demonstrate target functionality when we overwhelm
the SRQ. It is useful to note that performance really tanks when you
start overwhelming the srq so it may be useful to use this test case to
check performance gains in edge cases over time.

Change-Id: Iac541bd9fc1d82eca9f21e7abc3f625663a6c460
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/451678
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-04-24 09:22:16 +00:00
jiaqizho
b70e698465 rdma:fix core dump when rdma_create_qp return error.
Signed-off-by: jiaqizho <jiaqi.zhou@intel.com>
Change-Id: Ie900e01820f69fc5b2d5e30d519c6b619d7a7281
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449507
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-04-22 18:40:35 +00:00
Seth Howell
7d7b44f2a6 rdma: decrement descriptor before checking SEND_WITH_INVAL
We were incrementing over the end of the descriptor list and assigning
undefined values to the rsp opcode in SEND_WITH_INVAL case. We were only
hitting this error when mixing sgl and inline requests in the same
workload. We were just by chance hitting a four bit value that was set
to all 1s from the in capsule data from the last request.

Change-Id: Ied06356f3d22fa34a2cd869dfad6bdca8720791d
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/450873
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-04-19 17:29:45 +00:00
Seth Howell
2cc6b0dfcb rdma: set the number of wr sge_entries per I/O
This was not being properly set in the multi-sgl path.
Also add a verification step to the fio configuration file to prevent
against future regressions.

Change-Id: I510b6acd92bc2fbc9b6fbec1d59945cc53584ad3
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/450305
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-19 17:29:45 +00:00
JinYu
dd90ff7a21 nvmf/rdma: fix bugs in spdk_nvmf_rdma_qpair_destroy
Rqpair qp and resources maybe not be created, if rqpair fail to
initialise. For example, in function new_qpair, the code run to
spdk_nvmf_qpair_disconnect, but rqpair is initialised in
poll_group_add.

Fix #557 segmentaion fault(core dump)

Change-Id: I1892e6d13e2d53dd5a7c4856d775f9b3b85da961
Signed-off-by: JinYu <jin.yu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/450986
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Hailiang Wang <hailiangx.e.wang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-18 21:47:57 +00:00
JinYu
c7395a1171 nvmf: fix the rqpair->current_send_depth
If rsp->status.sc != SUCCESS and xfer == DATA_CONTROLLER_TO_HOST,
We would not send the data WR, so clean the num_outstanding_data_wr.

Fix #728

Change-Id: I32259788e495ed76f8f02a9d871bd56356d93dc4
Signed-off-by: JinYu <jin.yu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/450726
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-04-16 14:42:03 +00:00
Seth Howell
1fb629c4d2 rdma: make the pending_data_buf_queue an STAILQ
Should speed up operations, and allows us to remove the 16 byte link
object from the request structure.

Change-Id: Ie62df1f44d22580a7a7ae41c498295841d1e3064
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448080
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-04 21:34:55 +00:00
Seth Howell
9f7582c3a5 rdma: reorder qpair elements to plug hole
Saves 8 bytes

Change-Id: Icb429ba79d7a085978950dd3045aa9ef28351101
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448073
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-04 04:34:59 +00:00
Seth Howell
91105e2031 rdma: Don't store ibv_qp_attr in the qpair.
We were only using one enum from this whole struct, so there is no need
to store it. Plus the queries we use to update it are so infrequent and
only occur during connect and disconnect so I think we can save quite a
bit of space by removing this without compromising performance.

Change-Id: Icf29977a3c10cb289564fa2760a0059f07a0f8cb
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448072
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-04 04:34:59 +00:00
Seth Howell
ab79560e65 rdma: simplify spdk_nvmf_rdma_poller_poll.
There was a lot of duplicated code here between states. I'm trying to
minimize the duplicated code without making it confusing.

Change-Id: I13183431e554c8a9f501b3385bbd7b59e2c83161
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448066
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-04 04:34:59 +00:00
Seth Howell
a8169c37e0 rdma: add error path for fill_iovs_multi_sgl
Catch an edge case where a multi sgl request is longer than the allowed
transfer size.

Change-Id: I79779050fe951d16f1240e2c3d8cf5037e576ea2
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/440766
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-04 04:34:59 +00:00
Seth Howell
6812b63c5f rdma: always allocate buffers for requests upfront
This is important to avoid thrash when we don't have enough buffers to
satisfy a request.

Change-Id: Id35fd492078b8e628c2118317f674f07e95d4dba
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449109
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-04-04 04:34:59 +00:00
Seth Howell
f4adbc79ce rdma: optimize and move buffers variable.
The buffers are really specific to the request and not the wr or data
object. In the case of multiple wr requests, the maximum number of
buffers per req is equal to the number of SGEs in the NVMe-oF request
*2.

Change-Id: Ic59498bfed461d180adb2fb9a481ac5b11fa9252
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449108
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-04-02 23:26:08 +00:00
Seth Howell
62700dac2e nvmf/rdma: Add support for multiple sgl descriptors to sgl parser
Enable parsing an nvmf request that contains an inline
nvme_sgl_last_segment_descriptor element. This is the next step
towards NVMe-oF SGL support in the NVMe-oF target.

Change-Id: Ia2f1f7054e0de8a9e2bfe4dabe6af4085e3f12c4
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/428745
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-02 23:26:08 +00:00
Seth Howell
934775db43 rdma: make semantic changes to fill_buffers func
Changing i to iovcnt in all references to the req->iov structure will be
important when we start processing multi-sgl requests.

Change-Id: I90a9b6d872b94f846ae7d29a45dd2703eafa6175
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449201
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-03-29 19:02:22 +00:00
Seth Howell
e70a759489 rdma: pull buffer assignment out of fill_iovs
This will be used by the multi-sgl version of this function as well.

Change-Id: Iafeba4836a77482fa2a158f86f1c17fe7fdeb510
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449104
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-03-29 19:02:22 +00:00
Seth Howell
a9fc7e1db8 rdma: use LAST_WQE_REACHED event in the SRQ path
This event is generated by NICs utilizing the SRQ feature when the last
RECV for that qpair is processed. I have confirmed this feature.

Change-Id: Ib6d6b6d02987f789b4d5dd3daf734e3351ee1974
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448063
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-03-25 17:23:51 +00:00
yidong0635
fc43fbba04 rdma: fixed heap used after free issue.
With ASAN to run this cases, it will report issue about heap used after free
in spdk_nvmf_rdma_qpair_destroy. Resources have been released before,
change the order to in this tailq to release resources.

ERROR: AddressSanitizer: heap-use-after-free on address
0x6080000080e0 at pc 0x0000006e1e3f bp 0x7fd48b6c3df0 sp 0x7fd48b6c3de0
READ of size 8 at 0x6080000080e0 thread T3 (reactor_1)
0x6e1e3e in spdk_nvmf_rdma_qpair_destroy spdk/lib/nvmf/rdma.c:813

Change-Id: Ia1c12bca84955a2de60399e6b265c9b8901bb51e
Signed-off-by: yidong0635 <dongx.yi@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448534
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-03-21 18:00:04 +00:00
Seth Howell
e59ac513fb rdma: remove reqs from read/write queues in error
Not doing so can cause us to hit asserts during the shutdown path. This
should fix an intermittent failure we are seeing on the test pool where
we hit the assert rdma_req->state != RDMA_REQUEST_STATE_FREE in
spdk_nvmf_rdma_request_process.

Note that this problem doesn't cause any data corruption when debug is
not enabled, it just causes us to probcess a subset of commands through
the state machine one extra time suring qpair shutdown.

Change-Id: Ibc36bfea87ec4089b8e2c7a915f48714fddb0b09
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447843
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-03-19 18:18:45 +00:00
Seth Howell
33668b2254 rdma: change structure of drained_qpair to work w/ messages.
This will become important later on.

Change-Id: I94e5af03359e476afbc68664e43f44269ad5974c
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448074
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-03-18 23:32:21 +00:00
Seth Howell
7dd3cf441a rdma: limit the completion queue based on the SRQ.
When we have a shared receive queue, the number of outstanding items
associated with a completion queue is deterministic, and limited by how
many RECVs we have total in the SRQ. So, we can set the total size of
the Completion queue at the beginning of time and never resize it.

Change-Id: I787e4c5bbd52ac8948a323d1301f926f887cd91c
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447492
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-18 23:32:21 +00:00
Seth Howell
a5972c6245 rdma: consolidate common error paths in qpair_init
Consolidating error paths is common practice in SPDK so do that here to
make the function more uniform and save space.

Change-Id: I98c5d5f7feeb688f1d8b24f4d2d3461a43d00c1d
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448191
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-03-18 23:32:21 +00:00
Seth Howell
97a43680a9 rdma: move cq_resize to its own function.
Change-Id: I07aef399320fd4a014f63760670ea765d2e18b4b
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448190
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-18 23:32:21 +00:00
Seth Howell
fa79f64ad1 rdma: Keep a pointer to the SRQ in the qpair
Change-Id: Id173038b6ad6b1564acf5d6886814f7d310964c7
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447471
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-18 23:32:21 +00:00
Seth Howell
01201d3e87 rdma: remove compile time config for SRQ
Change-Id: I44af3ee4dc6ec76045e1d0614910402487098a3d
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447120
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-18 23:32:21 +00:00
Seth Howell
0d3fcd10e9 rdma: add function to create qpair resources.
Change-Id: Id865e2a2821fe04c1f927038d6dd967848cd9a55
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446999
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-03-15 19:19:17 +00:00
Ben Walker
353fbcdaf0 nvmf/rdma: Create function to destroy rdma resources
This unifies the clean up path between SRQ and normal
operation.

Change-Id: I396d7e3749579f27b5bb1e89b9d6761a77ba5beb
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446979
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-03-15 19:19:17 +00:00
Ben Walker
b25751d99d nvmf/rdma: Add a structure to hold rqpair/rpoller resources
Depending on whether SRQ is enabled, resources may be allocated
to the rqpair or to the rpoller. Create a struct to hold these
pointers that can be used in both locations to avoid duplicated
code.

Change-Id: I2c8fc59009201d9e41721e6462a81732b529a9e0
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446978
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Eugene Kochetov <evgeniik@mellanox.com>
2019-03-15 19:19:17 +00:00
Ben Walker
527be2bf4e nvmf: Remove qpair_is_idle
This wasn't used anywhere.

Change-Id: I405af3c808be284d19218f3f04c1e90e33e31de8
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446977
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
2019-03-15 19:19:17 +00:00
Evgeniy Kochetov
ed0b611fc5 nvmf/rdma: Add shared receive queue support
This is a new feature for NVMEoF RDMA target, that is intended to save
resource allocation (by sharing them) and utilize the
locality (completions and memory) to get the best performance with
Shared Receive Queues (SRQs). We'll create a SRQ per core (poll
group), per device and associate each created QP/CQ with an
appropriate SRQ.

Our testing environment has 2 hosts.
Host 1:
  CPU: Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz dual socket (8 cores total)
  Network: ConnectX-5, ConnectX-5 VPI , 100GbE, single-port QSFP28, PCIe3.0 x16
  Disk: Intel Optane SSD 900P Series
  OS: Fedora 27 x86_64
Host 2:
  CPU: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz dual-socket (24 cores total)
  Network: ConnectX-4 VPI , 100GbE, dual-port QSFP28
  Disk: Intel Optane SSD 900P Series
  OS : CentOS 7.5.1804 x86_64
Hosts are connected via Spectrum switch.
Host 1 is running SPDK NVMeoF target.
Host 2 is used as initiator running fio with SPDK plugin.

Configuration:
- SPDK NVMeoF target: cpu mask 0x0F (4 cores), max queue depth 128,
  max SRQ depth 1024, max QPs per controller 1024
- Single NVMf subsystem with single namespace backed by physical SSD disk
- fio with SPDK plugin: randread pattern, 1-256 jobs, block size 4k,
  IO depth 16, cpu_mask 0xFFF0, IO rate 10k, rate process “poisson”

Here is a full fio command line:
fio  --name=Job --stats=1 --group_reporting=1 --idle-prof=percpu \
--loops=1 --numjobs=1 --thread=1 --time_based=1 --runtime=30s \
--ramp_time=5s --bs=4k --size=4G --iodepth=16 --readwrite=randread \
--rwmixread=75 --randrepeat=1 --ioengine=spdk --direct=1 \
--gtod_reduce=0 --cpumask=0xFFF0 --rate_iops=10k \
--rate_process=poisson \
--filename='trtype=RDMA adrfam=IPv4 traddr=1.1.79.1 trsvcid=4420 ns=1'

SPDK allocates the following entities for every work request in
receive queue (shared or not): reqs (1024 bytes), recvs (96 bytes),
cmds (64 bytes), cpls (16 bytes), in_capsule_buffer. All except the
last one are fixed size. In capsule data size is configured to 4096.
Memory consumption calculation (target):
- Multiple SRQ: core_num * ib_devs_num * SRQ_depth * (1200 +
  in_capsule_data_size)
- Multiple RQ: queue_num * RQ_depth * (1200 + in_capsule_data_size)
We ignore admin queues in calculations for simplicity.

Cases:
1. Multiple SRQ with 1024 entries:
   - Mem = 4 * 1 * 1024 * (1200 + 4096) = 20.7 MiB
     (Constant number – does not depend on initiators number)
2. RQ with 128 entries for 64 initiators:
   - Mem = 64 * 128 * (1200 + 4096) = 41.4 MiB

Results:
FIO_JOBS   kIOPS     Bandwidth,MiB/s  AvgLatency,us  MaxResidentSize,kiB
       RQ       SRQ     RQ      SRQ    RQ       SRQ      RQ       SRQ
1      8.623    8.623   33.7    33.7   13.89    14.03    144376   155624
2      17.3     17.3    67.4    67.4   14.03    14.1     145776   155700
4      34.5     34.5    135     135    14.15    14.23    146540   156184
8      69.1     69.1    270     270    14.64    14.49    148116   156960
16     138      138     540     540    14.84    15.38    151216   158668
32     276      276     1079    1079   16.5     16.61    157560   161936
64     513      502     2005    1960   1673     1612     170408   168440
128    535      526     2092    2054   3329     3344     195796   181524
256    571      571     2232    2233   6854     6873     246484   207856

We can see the benefit in memory consumption.

Change-Id: I40c70f6ccbad7754918bcc6cb397e955b09d1033
Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/428458
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-15 19:19:17 +00:00
Seth Howell
62266a72cf rdma: allocate protection domains for devices up front.
We were only using one pd per device anywas, and this is necessary for
shared receive queue support.

Change-Id: I86668d5b7256277fe50836863408af2215b5adf9
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447385
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-12 21:37:51 +00:00
Seth Howell
bb3e441388 rdma: destroy qpairs based on num_outstanding_wr.
Both Mellanox and Soft-RoCE NICs work with this approach.

Change-Id: I7b05e54037761c4d5e58484e1c55934c47ac1ab9
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446134
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-03-08 21:09:09 +00:00
Seth Howell
961cd6ab7e rdma: register a poller to destroy defunct qpairs
Not all RDMA drivers fail back the dummy recv and send operations that
we send to them when destroying a qpair. We still need to free the
resources from these qpairs to avoid eating up all of the system memory
after multiple connect and disconnect events. Since we won't be getting
any more completions, the best heuristic we can use is waiting a long
time and then freeing the resources.

qpair_fini is only called from the proper polling thread so we can safely
call process_pending to flush the qpair before closing it out.

Change-Id: I61e6931d7316d1e78bad26657bb671aa451e29f4
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/443057
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-03-04 19:12:48 +00:00
Seth Howell
59f0d22e40 rdma: Fix misordered assert and decrement.
In the error path, we were first decrementing a variable and then
asserting that it must be >0. These operations should occur in the
opposite order.

Change-Id: I6cec544faf17bb75cbfca3d3a3c173dc5db14f99
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446440
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: yidong0635 <dongx.yi@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-28 21:20:38 +00:00
Seth Howell
756ce464f6 rdma: update default number of shared buffers.
When the decision was made to uncouple the number of shared buffers from
the queue depth and allow the user to decide for themselves, the default
was also significantly lowered, which caused some issues when trying
torun performance tests (See https://github.com/spdk/spdk/issues/699).
While this is a user modifiable variable, it is still best to keep the
higher default value.

The original value was equivalent to max_queue_depth *
SPDK_NVMF_MAX_SGL_ENTRIES * 2 with the defaults for max_queue depth and
max_sgl_entries being 128 and 16 respectively. Hence 4096

fixes: 0b20f2e552

Change-Id: I809e97a10973093a2b485b85bca7160091166f70
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446525
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-28 21:09:50 +00:00
Zahra Khatami
a55b2109bb nvmf: remaning changes related to nvmf hooks
Change-Id: I6780fa43cebd9f48d1ae0ea6fbeb92a95c4dfa15
Signed-off-by: zkhatami88 <z.khatami88@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/443653
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-22 21:16:36 +00:00
Seth Howell
b38e3a60c6 rdma: change the logic of rdma_qpair_process_pending
I think this simplifies the process a little bit.

Change-Id: Icc87a59c9f6fd965ef35531975b7036d85c4bc95
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445916
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-22 18:31:02 +00:00
Seth Howell
80eecdd881 rdma: use an stailq for incoming_queue
Change-Id: Ib1e59db4c5dffc9bc21f26461dabeff0d171ad22
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445344
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-22 18:31:02 +00:00
Seth Howell
bfdc957c75 rdma: remove the state_cntr variable.
We were only using one value from this array to tell us if the qpair was
idle or not. Remove this array and all of the functions that are no
longer needed after it is removed.

This series is aimed at reverting
fdec444aa8 which has been tied to
performance decreases on master.

Change-Id: Ia3627c1abd15baee8b16d07e436923d222e17ffe
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445336
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-22 18:31:02 +00:00
Seth Howell
04ebc6ea28 RDMA: Remove the state_queues
Since we no longer rely on the state queues for draining qpairs, we can
get rid of most of them. We cn keep just a few, and since we don't ever
remove arbitrary elements, we can use stailqs to perform those
operations. Operations on Stailqs carry about half the overhead as
operations on tailqs

Change-Id: I8f184e6269db853619a3581d387d97a795034798
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445332
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-22 18:31:02 +00:00
yidong0635
9d838d24ad rdma: add return to avoid address points to the zero page
Error logs in nvmf_rdma_dump_request lead to report error about
address points to the zero page, add judgement to return.
this issue occurs in heavy load fio testing.

Change-Id: I50302be88b3af53f718e3800aa16df7c506ca4e8
Signed-off-by: yidong0635 <dongx.yi@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441110
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-02-15 04:29:40 +00:00
Seth Howell
b7651b681c NVMe-oF: add asserts for SGE counts
We should never be going over these limits in the respective transports,
but add asserts to check this during testing.

Change-Id: Ifcaa82ccf58546a38020b31df54ee5d1d9822b8b
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442777
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-12 23:34:20 +00:00
Seth Howell
145485769e nvmf: remove qpair state activating.
This intermediate state is unused and meaningless. the qpair transitions
into this state right before calling a synchronous operation and then
transitions to active as soon as that operation completes successfully.
If the operation did not complete successfully, we were leaving qpairs
in this weird intermediate state when for all intents and purposes they
had reverted to an uninitialized state. Keeping qpairs in the
uninitialized state until they have been added to a poll group creates a
meaningful distinction between states that can be actionable from the
transport level.

Change-Id: I6de9bc424b393b6fff221aa2f4212aaa91488629
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443471
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-12 20:39:44 +00:00
Seth Howell
b952668186 rdma: destroy uninitialized qpairs immediately.
Connections in the uninitialized state haven't been added to a poll
group yet, so submitting dummy requests to them will be pointless since
they will never be polled. We need to reject the connection and destroy
the qpair immediately.

Change-Id: Id5dd711882e1ae7c13ae32c06da2285186b00a1b
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443470
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-12 20:39:44 +00:00
Seth Howell
825cac2720 rdma.c: Create a single point of entry for qpair disconnect
Since there are multiple events/conditions that can trigger a qpair
disconnection, we need to funnel them to a single point of entry. If
more than one of these events occurs, we can ignore all but the first
since once a disconnect starts, it can't be stopped.

Change-Id: I749c9087a25779fcd5e3fe6685583a610ad983d3
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443305
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-12 20:39:44 +00:00
Seth Howell
b6b0a0ba59 rdma: adjust I/O unit based on device SGL support
For devices that support fewer SGE elements than our default values, we
need to adjust the I/O unit size so that we don't ever try to submit
more SGLs than we are allowed to.

Change-Id: I316d88459380f28009cc8a3d9357e9c67b08e871
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442776
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-12 18:46:57 +00:00
Seth Howell
92f5548a91 rdma: properly account num_outstanding_data_wr
This value was not being decremented when we got SEND completions for
write operations because we were using the recv send to indicate when we
had completed all writes associated with the request. I also erroneously
made the assumption that spdk_nvmf_rdma_request_parse_sgl would properly
reset this value to zero for all requests. However, for requests that
return SPDK_NVME_DATA_NONE rom spdk_nvmf_rdma_request_get_xfer, this
funxtion is skipped and the value is never reset. This can cause a
coherency issue on admin queues when we request multiple log files. When
the keep_alive request is resent, it can pick up an old rdma_req which
reports the wrong number of outstanding_wrs and it will permanently
increment the qpairs curr_send_depth.

This change decrements num_outstanding_data_wrs on writes, and also
resets that value when the request is freed to ensure that this problem
doesn't occur again.

Change-Id: I5866af97c946a0a58c30507499b43359fb6d0f64
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443811
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-12 18:43:44 +00:00
Seth Howell
41cd5ff4fb rdma: fix max_read_depth_definition.
max_read_depth should be based on max_qp_init_read_atomic, or the
maximum number of read values that the initiator will accept as
outstanding.

The device attributes object contains values for both the initiator
(remote side) and the target (local side). All attributes with the name
init in them are meant to correspond to the initiator. The
qp_read_atomic value represents the number of reads and atomic
operations that can have this device as the target. qp_init_read_atomic
represents how many read operations the initiator has said that we can
have outstanding that have the initiator's rdma device as the target.

Since this number represents how many outstanding reads we will send to
the initiator at once, we should use the qp_init_read_atomic value.

Change-Id: Iacc044e8321080de8accd9128ac3777bbb948afc
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442409
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-05 18:04:04 +00:00
Ben Walker
9521d11bdb nvmf/rdma: Remove stray spdk_nvmf_rdma_wr
Wasn't used.

Change-Id: I5b440e18a0a6cbb9b6137b7074a0312e51f41b95
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441592
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-04 19:14:54 +00:00
Ben Walker
608d80a033 nvmf/rdma: Eliminate management channel
This is a holdover from before poll groups were introduced.
We just need a per-thread context for a set of connections,
so now that a poll group exists we can use that instead.

Change-Id: I1a91abf52dac6e77ea8505741519332548595c57
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442430
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-04 18:20:13 +00:00
Ben Walker
4e614b3127 nvmf/rdma: Capitalize SEND in code comment for consistency
The READ and ATOMIC in the comment above are capitalized, so
make this all caps too.

Change-Id: I49fae2ceb826b22953d9b26d42b95f17e2dac617
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442427
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-04 18:12:31 +00:00
Ben Walker
e1dd85a5b7 nvmf: Don't increment current_recv_depth for dummy RECV
When a connection goes to close and has no I/O outstanding,
the current_recv_depth was being decremented beyond 0 and rolling over.

If the poll group then finds a successful receive completion on the next
poll (for a command that arrived prior to starting the disconnect but
hadn't been processed yet), it would trip the max queue depth check
added recently and start another disconnect process. If only one command
arrives in this window, everything actually works out ok.

However, if there are two receive completions sitting in the completion
queue after the disconnect process is started, the first one does the
double disconnect and the second one does another disconnect which ends
up dereferencing a null pointer.

Since there is always a special reserved slot for the dummy recv, don't
do decrements or increments of the current_recv_depth for the dummy
recv. This allows the code to still enforce the actual max_queue_depth
on recvs without underflowing or overflowing the counter.

Change-Id: I56c95b2424e956a3b007b25c50cbf47262245b8f
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442642
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-01-30 19:03:46 +00:00
zkhatami88
8e2f0cdb01 nvmf: Add mechanism to override nvmf pd/mr behavior
Change-Id: I8d3abfcd1934bbab5bf8dacae08e8a7f29992b93
Signed-off-by: zkhatami88 <z.khatami88@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/433977
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Sasha Kotchubievsky <sashakot@mellanox.com>
2019-01-30 19:03:35 +00:00
Seth Howell
1d0a8e1cec rdma: split PENDING_DATA_TRANSFER into two states.
Since we have different requirements for submitting RDMA read and write
operations, we should track them separately so that we don't block
writes when the device does not have enough resources for read
operations.

Change-Id: I5d6424c0e26f2f5362866d1bb21eb46700c245da
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441794
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-01-28 16:58:50 +00:00
Seth Howell
158dc9470d rdma: Make sure we don't submit too many WRs
Before, the number of WRs and the number of RDMA requests were linked by
a constant multiple. This is no longer the case so we need to make sure
that we don't overshoot the limit of WRs for the qpair.

Change-Id: I0eac75e96c25d78d0656e4b22747f15902acdab7
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/439573
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-01-28 16:58:50 +00:00
Seth Howell
dfdd76cf21 rdma: track outstanding data work requests directly.
This gives us more realistic control over the number of requests we can
submit.

Change-Id: Ie717912685eaa56905c32d143c7887b636c1a9e9
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441606
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-01-25 19:12:17 +00:00
Seth Howell
7289d370f7 rdma: fix rw_depth to read_depth:
rw_depth was a misinterpretation of the spec. It is based on the value
of max_qp_rd_atom which only governs the number of read and atomic
operations. However, we were using rw_depth to block both read and write
operations which is an unnecessary restriction. write operations should
only be governed by the number of Work Requests posted to the send
queue. We currently guarantee that we will never overshoot the queue
depth for Work requests since they are embedded in the requests and
limited to a size of max_queue_depth.

Change-Id: Ib945ade4ef9a63420afce5af7e4852932345a460
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441165
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-01-25 19:12:17 +00:00
Seth Howell
5301be93cd rdma: set wr opcodes while parsing the SGL.
Change-Id: I88fdf0b48653997f790cf5de6774d1c16621a9c1
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441605
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-01-25 19:12:17 +00:00
Seth Howell
1f9ac1179e rdma: add num_outstanding_data_wr tracker to req
This will be necessary later on when we need to throttle send and recv
requests in software.

Change-Id: Ifb25eaabd15e101fbfc2959a08a321f80857b280
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441604
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-01-25 19:12:17 +00:00
Xiaodong Liu
db5c3ce362 nvmf/rdma: dynamically enlarge CQ size
Assigned CQ size when creating CQ may run over due to
heavy workload with too many qpairs. Enlarge it dynamically
can prevent IBV_EVENT_CQ_ERR caused by CQ's runover.
This patch fixes issue #498:
https://github.com/spdk/spdk/issues/498

Change-Id: I6c2d7194d4147d812d49d4fe787fcba5c6bbede9
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/440853
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
2019-01-24 21:51:09 +00:00