Commit Graph

1036 Commits

Author SHA1 Message Date
Jim Harris
a95fdad68f nvmf: remove unnecessary size checks when creating transport
The individual transports will adjust these sizes when
necessary.  In fact, we have to remove this check, since
RDMA transport may adjust the io_unit_size based on the
max number of SGEs - and can adjust it to a value that
will fail this check if we reload the configuration.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I2708c7f5aaa54a368ec932ec40dd6447f1a4fde0

Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452474
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-05-02 14:44:57 +00:00
Jim Harris
b6206d657c trace: shorten max name from 44 to 24 characters
This restriction helps reduce the amount of padding when
printing out the event trace, allowing it to fit in a
small number of columns.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ifa31e5a6967c7b9bc7028069effb71533f80596f

Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452736
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-05-02 08:41:56 +00:00
Jim Harris
617184be3b trace: remove short_name
This was not used by any of the trace register descriptions.
Let's remove it rather keeping it around if we don't need it.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Idda809e2911db5be555ff6aa13695484a14bf665

Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452734
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
2019-05-02 08:41:56 +00:00
Anil Veerabhadrappa
2061874474 lib/nvmf: Validate requested SQ size for both admin and IO queue
During connect call based on queue type (AQ or IOQ), SQ size should be
validated against max sq size for that particular queue type.

Change-Id: I977d7556e4d04e37004d16c87efffd3b467fa62c
Signed-off-by: Anil Veerabhadrappa <anil.veerabhadrappa@broadcom.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452376
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-05-01 18:51:28 +00:00
Seth Howell
6cc18a64aa rdma.c: Don't set recv->qpair to NULL
We can use the rpoller->srq to check if a qpair is valid when processing
recv completions.

Change-Id: I6aa360adc48a3312ddcf79f10e2a65b502a7314f
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452247
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-05-01 18:48:13 +00:00
Seth Howell
33f60621af lib: resize key mempools
Mempools are based off of a ring structure which allocates its elements
as a power of two. It also only exposes n-1 elements to the user. So
when we create a mempool with 2^n elements in it, we have to allocate a
ring with 2^n+1 entries. By decreasing the number of elements in these
key mempools by 1, we can save a decent amount of memory.

Change-Id: I942c9dd4cf59096969bc2559fb46fd2084a07f09
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448875
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-05-01 17:45:29 +00:00
Seth Howell
d05c553827 rdma: don't spam people with async event messages.
It used to be that we would get async events very infrequently. However,
with the introduction of SRQ, this number has gone up tremendously.
Change the way we report our these events so that we don't spam/confuse
people running the target.

Change-Id: I33070281fa854cbc17784d61bbbb870196ca8780
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452159
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-04-26 18:10:56 +00:00
Seth Howell
ec47f92b9b rdma: fix potential heap-use-after-free in srq shutdown
If there are outstanding recvs for a qpair when it is destroyed, we need
to clear the qpair from it before reposting it. Otehrwise, we have a
potential heap-use-after-free of double free (depending on whether the
recv completion is in error state or not).

See github issues #730

Change-Id: Ic2009c761cbcc5e89174f62fbd0872d0489c67ca
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/452122
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-04-26 11:16:22 +00:00
Seth Howell
3856d82b50 subsystem: check for NULL bufs in reservation ops.
At the RDMA level, we allow processing requests that should contain a
data transfer, but specify a length of zero to be passed up the stack
without a data buffer. See spdk_nvmf_rdma_request_get_xfer. In the case
of the reservation requests, we weren't checking whether req->data was
NULL before trying to copy into it causing us to segfault if we got a
malformed reservation request.

Found when using the fuzzer.

Change-Id: I320174ec72a8d298ab6ca44ef6a99691631f00ca
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/451786
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-04-25 22:52:12 +00:00
Changpeng Liu
3f4426878a nvmf: disable the protection if the backend doesn't contain valid type
It's not an error if the NVMe hard drive was formatted to 512 + 8 but
has no protection type, so we will also disable the protection for
NVMoF target.

Change-Id: I07e605cff9545f46c642f7ca783a4727a26abece
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/451926
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-04-24 21:47:18 +00:00
Seth Howell
89d2efe07e rdma: set the srq param in the initiator.
We were setting this value in the target from our initiator, but it
turns out the rdma_conn_params struct is responsible for setting the
opposite side so we need to add it in the target side when accepting
connections.

Also, add a test to demonstrate target functionality when we overwhelm
the SRQ. It is useful to note that performance really tanks when you
start overwhelming the srq so it may be useful to use this test case to
check performance gains in edge cases over time.

Change-Id: Iac541bd9fc1d82eca9f21e7abc3f625663a6c460
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/451678
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-04-24 09:22:16 +00:00
Jim Harris
b92c3d412d nvmf: add tcp trace points for data read from socket
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ib04abb64dd379dd73c7ff3c8318591124b4bb7dd

Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/451477
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-04-23 17:59:23 +00:00
Gregory Shapiro
14032a984c NVMF: Add model number as parameter to construct_nvmf_subsystem (-d option).
Change-Id: Ia1a458a0ac1c5a17d2955a3f31c6dfe77538eb17
Signed-off-by: Gregory Shapiro <gregory.shapiro@kaminario.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/438562
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-04-23 16:51:16 +00:00
Changpeng Liu
68bb3995aa nvmf: trivial optimization to make the code more consistent
Make the use of spdk_uuid_compare() to be consistent in the file,
also change the SPDK_INFOLOG to SPDK_DEBUGLOG to avoid the
repeated log messages for RESERVATION CONFLICT response.

Change-Id: I72fefbd520cefcaf25182c3ca3d21e3d87d17e94
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/450884
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-23 16:30:24 +00:00
Changpeng Liu
4fa486a1e3 nvmf: add asynchronous event for reservation notificaiton
Now Host can get an asynchronous event notification when
registrants were unregistered/preempted or reservation was
released from the associate namespace, Host can send
get log page to clear related log pages and reservation
report to get the full overview of current reservation
configuration.

Change-Id: Idc57c19812490c7536503308989871515e9f2361
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/439935
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-04-23 16:30:24 +00:00
jiaqizho
b70e698465 rdma:fix core dump when rdma_create_qp return error.
Signed-off-by: jiaqizho <jiaqi.zhou@intel.com>
Change-Id: Ie900e01820f69fc5b2d5e30d519c6b619d7a7281
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449507
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-04-22 18:40:35 +00:00
Yair Elharrar
2b0ae30bf1 nvmf: fix segfault in case of multi-range unmap
In case of a DSM Deallocate (unmap) with multiple ranges, individual
bdev IOs are submitted for each range. If the bdev IO cannot be
allocated, the request is queued on io_wait_queue; however previously
submitted ranges may complete before memory is available for the next
range. In such a case, the completion callback will free unmap_ctx,
while the request is still queued for memory - causing a segfault
when the request is dequeued. To fix, introduce a new field tracking
the unmap ranges, and make sure the count is nonzero when the request
is queued for memory.

Signed-off-by: Yair Elharrar <yair@excelero.com>
Change-Id: Ifcac018f14af5ca408c7793ca9543c1e2d63b777
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447542
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-04-22 15:42:51 +00:00
Jim Harris
4ff7949893 nvmf: remove unused tcp trace point
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I8f2e26f46f8c37312c3201df8210b449279640d0

Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/451476
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-04-22 15:25:37 +00:00
Seth Howell
7d7b44f2a6 rdma: decrement descriptor before checking SEND_WITH_INVAL
We were incrementing over the end of the descriptor list and assigning
undefined values to the rsp opcode in SEND_WITH_INVAL case. We were only
hitting this error when mixing sgl and inline requests in the same
workload. We were just by chance hitting a four bit value that was set
to all 1s from the in capsule data from the last request.

Change-Id: Ied06356f3d22fa34a2cd869dfad6bdca8720791d
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/450873
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-04-19 17:29:45 +00:00
Seth Howell
2cc6b0dfcb rdma: set the number of wr sge_entries per I/O
This was not being properly set in the multi-sgl path.
Also add a verification step to the fio configuration file to prevent
against future regressions.

Change-Id: I510b6acd92bc2fbc9b6fbec1d59945cc53584ad3
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/450305
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-19 17:29:45 +00:00
Changpeng Liu
468c6c18bd nvmf: enable get log page with reservation notification page
Reservation notification log page can be returned via the
get log page command with correct page number, users can
get zeored page buffer if the controller didn't have any
reservation notification log.

Change-Id: I99f5e4b8917a6919eb68359628efa1bead4b21b5
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/439934
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: GangCao <gang.cao@intel.com>
2019-04-18 22:33:26 +00:00
Changpeng Liu
6025375024 nvmf: generate reservation notice log on controller's thread
All the reservation commands are processed on subsystem's thread,
however the reservation notice log are controller related, and
the get log page command with reservation page will be processed
on controller's thread, so we use the same thread for generating
the log.

Change-Id: Ie000320d74242b979f6638d703523f063347ec29
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449852
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-04-18 22:33:26 +00:00
Changpeng Liu
c596ea4bd5 nvmf: update subsystem's poll group information for register command
Existing code only update the subsystem's poll group reservation
information when unregistering the key, however, new registrant
and update the key actions also need to be updated.

Change-Id: Ib8db9eb457977757251403edb92eda073b846e59
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/451274
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Liang Yan <liang.z.yan@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-04-18 22:33:26 +00:00
JinYu
dd90ff7a21 nvmf/rdma: fix bugs in spdk_nvmf_rdma_qpair_destroy
Rqpair qp and resources maybe not be created, if rqpair fail to
initialise. For example, in function new_qpair, the code run to
spdk_nvmf_qpair_disconnect, but rqpair is initialised in
poll_group_add.

Fix #557 segmentaion fault(core dump)

Change-Id: I1892e6d13e2d53dd5a7c4856d775f9b3b85da961
Signed-off-by: JinYu <jin.yu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/450986
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Hailiang Wang <hailiangx.e.wang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-18 21:47:57 +00:00
JinYu
c7395a1171 nvmf: fix the rqpair->current_send_depth
If rsp->status.sc != SUCCESS and xfer == DATA_CONTROLLER_TO_HOST,
We would not send the data WR, so clean the num_outstanding_data_wr.

Fix #728

Change-Id: I32259788e495ed76f8f02a9d871bd56356d93dc4
Signed-off-by: JinYu <jin.yu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/450726
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-04-16 14:42:03 +00:00
Changpeng Liu
78bfb2a1d0 nvmf: generate reservation notification log pages
A host can use the Asynchronous Event Command to be notified of
the presense of one or more avaiable reservation notification
log pages.  A reservation notificaton log page should be created
whenever an unmasked reservation notification occurs.

Change-Id: I8b83e5319725286dd0a5efc1b22d8ac4673e31e1
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/439931
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-04-12 01:25:39 +00:00
Changpeng Liu
58d923e6cc nvmf: add parameter check for Reservation Acquire command
Nvmecli tool doesn't add parameter check when submitting
to NVMf target, so we add additional check in NVMf target
to prevent such cases.

Change-Id: Ieb2b3b3c22d71913f2743a0f9cdad4aba184c320
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/450574
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-11 01:13:59 +00:00
Changpeng Liu
7c331adfeb nvmf: update the subsystem poll group's reservation information correctly
Existing condition for updating subsystem poll group's reservation
information is wrong, when received the RELEASE command, the
reservation type may be changed to none, but it will not be
saved to the subsystem's poll group.

Change-Id: Idc177a0f03fb9611d6eda1e25a5b90caaa73d1be
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/450727
Reviewed-by: Liang Yan <liang.z.yan@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-04-11 01:13:59 +00:00
Seth Howell
1fb629c4d2 rdma: make the pending_data_buf_queue an STAILQ
Should speed up operations, and allows us to remove the 16 byte link
object from the request structure.

Change-Id: Ie62df1f44d22580a7a7ae41c498295841d1e3064
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448080
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-04 21:34:55 +00:00
Ziye Yang
4ee4023a0d nvme/tcp: Replace the data with iov in pdu struct
Purpose: To support the multiple SGL later.

Change-Id: I133a451100b736353cf98a6aaca879d290ff5b67
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448259
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-04-04 14:28:09 +00:00
Ziye Yang
8f3b4a3a6d nvme/tcp: Add a helper function nvme_tcp_pdu_set_data
This function will be exteneded later for multiple SGL
support.

Change-Id: I1f6962ec03c72e335efaa311a12d3891312fcc53
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449968
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: wuzhouhui <wuzhouhui@kingsoft.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-04-04 04:50:04 +00:00
Seth Howell
9f7582c3a5 rdma: reorder qpair elements to plug hole
Saves 8 bytes

Change-Id: Icb429ba79d7a085978950dd3045aa9ef28351101
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448073
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-04 04:34:59 +00:00
Seth Howell
91105e2031 rdma: Don't store ibv_qp_attr in the qpair.
We were only using one enum from this whole struct, so there is no need
to store it. Plus the queries we use to update it are so infrequent and
only occur during connect and disconnect so I think we can save quite a
bit of space by removing this without compromising performance.

Change-Id: Icf29977a3c10cb289564fa2760a0059f07a0f8cb
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448072
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-04 04:34:59 +00:00
Seth Howell
ab79560e65 rdma: simplify spdk_nvmf_rdma_poller_poll.
There was a lot of duplicated code here between states. I'm trying to
minimize the duplicated code without making it confusing.

Change-Id: I13183431e554c8a9f501b3385bbd7b59e2c83161
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448066
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-04 04:34:59 +00:00
Seth Howell
a8169c37e0 rdma: add error path for fill_iovs_multi_sgl
Catch an edge case where a multi sgl request is longer than the allowed
transfer size.

Change-Id: I79779050fe951d16f1240e2c3d8cf5037e576ea2
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/440766
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-04 04:34:59 +00:00
Seth Howell
6812b63c5f rdma: always allocate buffers for requests upfront
This is important to avoid thrash when we don't have enough buffers to
satisfy a request.

Change-Id: Id35fd492078b8e628c2118317f674f07e95d4dba
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449109
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-04-04 04:34:59 +00:00
Liang Yan
ad08de311e nvmf: fix reservation acquire typo
Change-Id: I91621dd1531eca1737385e4749b8d21152425740
Signed-off-by: Liang Yan <liang.z.yan@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/450026
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Karol Latecki <karol.latecki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-04-04 01:42:05 +00:00
Seth Howell
f4adbc79ce rdma: optimize and move buffers variable.
The buffers are really specific to the request and not the wr or data
object. In the case of multiple wr requests, the maximum number of
buffers per req is equal to the number of SGEs in the NVMe-oF request
*2.

Change-Id: Ic59498bfed461d180adb2fb9a481ac5b11fa9252
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449108
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-04-02 23:26:08 +00:00
Seth Howell
e590f607e6 nvmf: Report that we support more than one SGL element
Change-Id: Idf5aeb1fa3d6a3a83042bd699e0099b95e34f5b9
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/428776
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-02 23:26:08 +00:00
Seth Howell
62700dac2e nvmf/rdma: Add support for multiple sgl descriptors to sgl parser
Enable parsing an nvmf request that contains an inline
nvme_sgl_last_segment_descriptor element. This is the next step
towards NVMe-oF SGL support in the NVMe-oF target.

Change-Id: Ia2f1f7054e0de8a9e2bfe4dabe6af4085e3f12c4
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/428745
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-04-02 23:26:08 +00:00
Jim Harris
ca44fd6955 nvmf: put \0 at end of default serial number
It's not standard to put a newline here - let's use a null
character instead.

Found while using nvme-cli - when creating a subsystem with
default serial number, the right justified callout text had
an extra newline in it.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I8a81dafb4f6c30f7bf2dcebfa7a5b19cfe3ab5fc

Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449645
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-04-02 13:08:06 +00:00
Seth Howell
934775db43 rdma: make semantic changes to fill_buffers func
Changing i to iovcnt in all references to the req->iov structure will be
important when we start processing multi-sgl requests.

Change-Id: I90a9b6d872b94f846ae7d29a45dd2703eafa6175
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449201
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-03-29 19:02:22 +00:00
Seth Howell
e70a759489 rdma: pull buffer assignment out of fill_iovs
This will be used by the multi-sgl version of this function as well.

Change-Id: Iafeba4836a77482fa2a158f86f1c17fe7fdeb510
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449104
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-03-29 19:02:22 +00:00
Changpeng Liu
ca76e519f8 nvmf: verify each NVMe commands for reservation enabled's NS
The filter function can be used for IO commands, because all
the Admin commands related with reservations are not supported
in SPDK for now.

Change-Id: I44f0bf0017bafaee87d5f8ac03b0fd368f44c810
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/436941
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-03-26 02:57:11 +00:00
Seth Howell
a9fc7e1db8 rdma: use LAST_WQE_REACHED event in the SRQ path
This event is generated by NICs utilizing the SRQ feature when the last
RECV for that qpair is processed. I have confirmed this feature.

Change-Id: Ib6d6b6d02987f789b4d5dd3daf734e3351ee1974
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448063
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-03-25 17:23:51 +00:00
yidong0635
fc43fbba04 rdma: fixed heap used after free issue.
With ASAN to run this cases, it will report issue about heap used after free
in spdk_nvmf_rdma_qpair_destroy. Resources have been released before,
change the order to in this tailq to release resources.

ERROR: AddressSanitizer: heap-use-after-free on address
0x6080000080e0 at pc 0x0000006e1e3f bp 0x7fd48b6c3df0 sp 0x7fd48b6c3de0
READ of size 8 at 0x6080000080e0 thread T3 (reactor_1)
0x6e1e3e in spdk_nvmf_rdma_qpair_destroy spdk/lib/nvmf/rdma.c:813

Change-Id: Ia1c12bca84955a2de60399e6b265c9b8901bb51e
Signed-off-by: yidong0635 <dongx.yi@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448534
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-03-21 18:00:04 +00:00
Changpeng Liu
ba431e231e nvmf: store registrants' host id into subsystem's poll group
Now data structure spdk_nvmf_subsystem_pg_ns_info holds all the
reservation information from the associate namespace, so for the
IO processing routine we don't need to send a message to the
subsystem's thread to check the IO command is permited or not.

Change-Id: Ib6be6abf7bf5f24c230dff80c163a1eb963e20d0
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448256
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-21 17:35:11 +00:00
Changpeng Liu
1fd5b1da33 nvmf: update reservation state to subsytem poll group
Each subsystem's poll group will have a copy of namespace's
reservation information, for those NVMe commands which may
change the reservation state, the commnad itself should be
returned after updating each subsystem poll group's
reservation state.  Then it's safe to check the reservation
state in each poll group's thread.

Change-Id: I64a5baedee9024bcac3957b29eb0330a20f21684
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446213
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-21 17:35:11 +00:00
Seth Howell
e59ac513fb rdma: remove reqs from read/write queues in error
Not doing so can cause us to hit asserts during the shutdown path. This
should fix an intermittent failure we are seeing on the test pool where
we hit the assert rdma_req->state != RDMA_REQUEST_STATE_FREE in
spdk_nvmf_rdma_request_process.

Note that this problem doesn't cause any data corruption when debug is
not enabled, it just causes us to probcess a subset of commands through
the state machine one extra time suring qpair shutdown.

Change-Id: Ibc36bfea87ec4089b8e2c7a915f48714fddb0b09
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447843
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-03-19 18:18:45 +00:00
Seth Howell
33668b2254 rdma: change structure of drained_qpair to work w/ messages.
This will become important later on.

Change-Id: I94e5af03359e476afbc68664e43f44269ad5974c
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448074
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-03-18 23:32:21 +00:00
Seth Howell
7dd3cf441a rdma: limit the completion queue based on the SRQ.
When we have a shared receive queue, the number of outstanding items
associated with a completion queue is deterministic, and limited by how
many RECVs we have total in the SRQ. So, we can set the total size of
the Completion queue at the beginning of time and never resize it.

Change-Id: I787e4c5bbd52ac8948a323d1301f926f887cd91c
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447492
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-18 23:32:21 +00:00
Seth Howell
a5972c6245 rdma: consolidate common error paths in qpair_init
Consolidating error paths is common practice in SPDK so do that here to
make the function more uniform and save space.

Change-Id: I98c5d5f7feeb688f1d8b24f4d2d3461a43d00c1d
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448191
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-03-18 23:32:21 +00:00
Seth Howell
97a43680a9 rdma: move cq_resize to its own function.
Change-Id: I07aef399320fd4a014f63760670ea765d2e18b4b
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448190
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-18 23:32:21 +00:00
Seth Howell
fa79f64ad1 rdma: Keep a pointer to the SRQ in the qpair
Change-Id: Id173038b6ad6b1564acf5d6886814f7d310964c7
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447471
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-18 23:32:21 +00:00
Seth Howell
01201d3e87 rdma: remove compile time config for SRQ
Change-Id: I44af3ee4dc6ec76045e1d0614910402487098a3d
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447120
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-18 23:32:21 +00:00
Changpeng Liu
d11aa87320 nvmf: add reservation information to each subsystem's poll group
Change-Id: Idcbc3053daf756c818ae3715b4ba0cbd91ed3d44
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446212
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-15 20:45:43 +00:00
Changpeng Liu
2099401e94 nvmf: rename subsystem poll group's num_channels to num_ns
Array channels in the subsystem's poll group are indexed by
nsid - 1, so rename the previous num_channels to num_ms
makes more sense.  Also embed the channels into a namespace
data structure here, and this can be reused in the following
patch.

Change-Id: If5d9aab4b1d5bcf7a3c22f29fa58d84752f0d4cc
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446211
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-15 20:45:43 +00:00
Seth Howell
0d3fcd10e9 rdma: add function to create qpair resources.
Change-Id: Id865e2a2821fe04c1f927038d6dd967848cd9a55
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446999
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-03-15 19:19:17 +00:00
Ben Walker
353fbcdaf0 nvmf/rdma: Create function to destroy rdma resources
This unifies the clean up path between SRQ and normal
operation.

Change-Id: I396d7e3749579f27b5bb1e89b9d6761a77ba5beb
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446979
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-03-15 19:19:17 +00:00
Ben Walker
b25751d99d nvmf/rdma: Add a structure to hold rqpair/rpoller resources
Depending on whether SRQ is enabled, resources may be allocated
to the rqpair or to the rpoller. Create a struct to hold these
pointers that can be used in both locations to avoid duplicated
code.

Change-Id: I2c8fc59009201d9e41721e6462a81732b529a9e0
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446978
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Eugene Kochetov <evgeniik@mellanox.com>
2019-03-15 19:19:17 +00:00
Ben Walker
527be2bf4e nvmf: Remove qpair_is_idle
This wasn't used anywhere.

Change-Id: I405af3c808be284d19218f3f04c1e90e33e31de8
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446977
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
2019-03-15 19:19:17 +00:00
Evgeniy Kochetov
ed0b611fc5 nvmf/rdma: Add shared receive queue support
This is a new feature for NVMEoF RDMA target, that is intended to save
resource allocation (by sharing them) and utilize the
locality (completions and memory) to get the best performance with
Shared Receive Queues (SRQs). We'll create a SRQ per core (poll
group), per device and associate each created QP/CQ with an
appropriate SRQ.

Our testing environment has 2 hosts.
Host 1:
  CPU: Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz dual socket (8 cores total)
  Network: ConnectX-5, ConnectX-5 VPI , 100GbE, single-port QSFP28, PCIe3.0 x16
  Disk: Intel Optane SSD 900P Series
  OS: Fedora 27 x86_64
Host 2:
  CPU: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz dual-socket (24 cores total)
  Network: ConnectX-4 VPI , 100GbE, dual-port QSFP28
  Disk: Intel Optane SSD 900P Series
  OS : CentOS 7.5.1804 x86_64
Hosts are connected via Spectrum switch.
Host 1 is running SPDK NVMeoF target.
Host 2 is used as initiator running fio with SPDK plugin.

Configuration:
- SPDK NVMeoF target: cpu mask 0x0F (4 cores), max queue depth 128,
  max SRQ depth 1024, max QPs per controller 1024
- Single NVMf subsystem with single namespace backed by physical SSD disk
- fio with SPDK plugin: randread pattern, 1-256 jobs, block size 4k,
  IO depth 16, cpu_mask 0xFFF0, IO rate 10k, rate process “poisson”

Here is a full fio command line:
fio  --name=Job --stats=1 --group_reporting=1 --idle-prof=percpu \
--loops=1 --numjobs=1 --thread=1 --time_based=1 --runtime=30s \
--ramp_time=5s --bs=4k --size=4G --iodepth=16 --readwrite=randread \
--rwmixread=75 --randrepeat=1 --ioengine=spdk --direct=1 \
--gtod_reduce=0 --cpumask=0xFFF0 --rate_iops=10k \
--rate_process=poisson \
--filename='trtype=RDMA adrfam=IPv4 traddr=1.1.79.1 trsvcid=4420 ns=1'

SPDK allocates the following entities for every work request in
receive queue (shared or not): reqs (1024 bytes), recvs (96 bytes),
cmds (64 bytes), cpls (16 bytes), in_capsule_buffer. All except the
last one are fixed size. In capsule data size is configured to 4096.
Memory consumption calculation (target):
- Multiple SRQ: core_num * ib_devs_num * SRQ_depth * (1200 +
  in_capsule_data_size)
- Multiple RQ: queue_num * RQ_depth * (1200 + in_capsule_data_size)
We ignore admin queues in calculations for simplicity.

Cases:
1. Multiple SRQ with 1024 entries:
   - Mem = 4 * 1 * 1024 * (1200 + 4096) = 20.7 MiB
     (Constant number – does not depend on initiators number)
2. RQ with 128 entries for 64 initiators:
   - Mem = 64 * 128 * (1200 + 4096) = 41.4 MiB

Results:
FIO_JOBS   kIOPS     Bandwidth,MiB/s  AvgLatency,us  MaxResidentSize,kiB
       RQ       SRQ     RQ      SRQ    RQ       SRQ      RQ       SRQ
1      8.623    8.623   33.7    33.7   13.89    14.03    144376   155624
2      17.3     17.3    67.4    67.4   14.03    14.1     145776   155700
4      34.5     34.5    135     135    14.15    14.23    146540   156184
8      69.1     69.1    270     270    14.64    14.49    148116   156960
16     138      138     540     540    14.84    15.38    151216   158668
32     276      276     1079    1079   16.5     16.61    157560   161936
64     513      502     2005    1960   1673     1612     170408   168440
128    535      526     2092    2054   3329     3344     195796   181524
256    571      571     2232    2233   6854     6873     246484   207856

We can see the benefit in memory consumption.

Change-Id: I40c70f6ccbad7754918bcc6cb397e955b09d1033
Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/428458
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-15 19:19:17 +00:00
Ziye Yang
58739014a3 nvmf/tcp: use the nvme_tcp_readv_data
The purpose is to use the single readv to read both
the payload the digest(if there is a possible one).

And this patch will be prepared to support the
multiple SGL in NVMe tcp transport later.

Change-Id: Ia30a5e0080b041a65461d2be13db4e0592a70305
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447670
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-03-13 14:29:17 +00:00
Seth Howell
62266a72cf rdma: allocate protection domains for devices up front.
We were only using one pd per device anywas, and this is necessary for
shared receive queue support.

Change-Id: I86668d5b7256277fe50836863408af2215b5adf9
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447385
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-12 21:37:51 +00:00
Seth Howell
bb3e441388 rdma: destroy qpairs based on num_outstanding_wr.
Both Mellanox and Soft-RoCE NICs work with this approach.

Change-Id: I7b05e54037761c4d5e58484e1c55934c47ac1ab9
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446134
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-03-08 21:09:09 +00:00
Changpeng Liu
e39b4d6cdb nvmf: set controller/namespace identify data to enable reservation
Persist through power loss feature is not supported for now.

Change-Id: Id2a5088389dc28b9d28d88c04ff819d20ea11902
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/436940
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-08 20:55:10 +00:00
Changpeng Liu
4b55682e3a nvmf: add namespace reservation report command support
For number of registered controllers field in Reservation
Status Data Structure, we caculate all the controllers
in the subsystem which Host Identifier are same with
existing registrants.

Change-Id: Ib4de22c7020dbd8294f448f23c0c5c8c142629dd
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/436939
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-03-08 20:55:10 +00:00
Ziye Yang
4cd6544d44 nvmf: solve the memory leak issue caused by subsystem listerner port
The possible issue could be following if you shutdown NVMe-oF target
with TCP transport as an example,

=================================================================
==61022==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 560 byte(s) in 1 object(s) allocated from:
    #0 0x7ffff6efcfe0 in calloc (/lib64/libasan.so.3+0xc6fe0)
    #1 0x4c6216 in spdk_nvmf_tcp_listen /home/ziyeyang/spdk/lib/nvmf/tcp.c:680

Indirect leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x7ffff6efcfe0 in calloc (/lib64/libasan.so.3+0xc6fe0)
    #1 0x4a77b8 in spdk_posix_sock_create /home/ziyeyang/spdk/lib/sock/posix/posix.c:291

After checking the issue, it seems that we did not call
spdk_nvmf_transport_stop_listen when removing the subsystem listener.
And this patch can solve this issue.

Change-Id: Ic75d99cb0c6a3ba1c47ac79a2d8e3887b0f6b012
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/447020
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: yidong0635 <dongx.yi@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-03-08 20:33:33 +00:00
Changpeng Liu
84ee3a62c7 nvmf: add namespace reservation release command support
The reservation holder may release the reservation on
a namespace, release notification feature is supported
in comming patches.

Change-Id: If5d3158e691fcc782f7cf0b67a326bf62edf0531
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/436938
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-08 04:48:18 +00:00
Changpeng Liu
8ccf24ed52 nvmf: release the reservation when unregistering one registrant
Unregistering by a host may cause a reservation held by the host
to be released. If a host is the last remaining reservation holder
or is the only reservation holder, then the reservation is released
when the host unregisters.  This may occur with Acquire/preempt
and Register/unregister commands.

Change-Id: If59fe2fdaa69c8ad70f364618d6c281494ad6245
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446821
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-07 06:46:45 +00:00
Changpeng Liu
71ac18d1ad nvmf: add namespace reservation acquire command support
A registrant can obtain a reservation on a namespace by executing
acquire command. Acquire command is associated with specific namespace.
For now only Acquire and Preempt reservation acquire action is
supported, Preempt And Abort will be supported in future.

Change-Id: Ifcbb6b414827393ffc266ceada5982b743716321
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/436937
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-03-07 06:46:45 +00:00
Changpeng Liu
bc1d0b91b5 nvmf: add namespace reservation register command support
Reservations can be used by two or more hosts to coordinate
acccess to a shared namespace, host must register to a namespace
prior to establishing a reservation.  Unregistering by a host
may cause a reservation release, this feature will be supported
after reservation acquire patch.

Change-Id: Id44aa1f82f30d9ecc5999a2a9a7c20b2af77774a
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/436936
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-03-07 06:46:45 +00:00
Ziye Yang
791d89bfa7 nvme/tcp: optimize nvme_tcp_build_iovecs function.
Borrow the ideas from iSCSI and optimize
the nvme_tcp_build_iovecs function.

Change-Id: I19b165b5f6dc34b4bf655157170dec5c2ce3e19a
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/446836
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2019-03-07 02:59:33 +00:00
Seth Howell
961cd6ab7e rdma: register a poller to destroy defunct qpairs
Not all RDMA drivers fail back the dummy recv and send operations that
we send to them when destroying a qpair. We still need to free the
resources from these qpairs to avoid eating up all of the system memory
after multiple connect and disconnect events. Since we won't be getting
any more completions, the best heuristic we can use is waiting a long
time and then freeing the resources.

qpair_fini is only called from the proper polling thread so we can safely
call process_pending to flush the qpair before closing it out.

Change-Id: I61e6931d7316d1e78bad26657bb671aa451e29f4
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/443057
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-03-04 19:12:48 +00:00
Ziye Yang
5f3c92c2fd nvmf/tcp: fix the space alignment issue in spdk_nvmf_tcp_qpair
Change-Id: Ieedfb46cadc8610ca8a6c33372e3a82ae8052550
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446477
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-03-01 04:43:40 +00:00
Seth Howell
59f0d22e40 rdma: Fix misordered assert and decrement.
In the error path, we were first decrementing a variable and then
asserting that it must be >0. These operations should occur in the
opposite order.

Change-Id: I6cec544faf17bb75cbfca3d3a3c173dc5db14f99
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446440
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: yidong0635 <dongx.yi@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-28 21:20:38 +00:00
Seth Howell
756ce464f6 rdma: update default number of shared buffers.
When the decision was made to uncouple the number of shared buffers from
the queue depth and allow the user to decide for themselves, the default
was also significantly lowered, which caused some issues when trying
torun performance tests (See https://github.com/spdk/spdk/issues/699).
While this is a user modifiable variable, it is still best to keep the
higher default value.

The original value was equivalent to max_queue_depth *
SPDK_NVMF_MAX_SGL_ENTRIES * 2 with the defaults for max_queue depth and
max_sgl_entries being 128 and 16 respectively. Hence 4096

fixes: 0b20f2e552

Change-Id: I809e97a10973093a2b485b85bca7160091166f70
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446525
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-28 21:09:50 +00:00
Zahra Khatami
a55b2109bb nvmf: remaning changes related to nvmf hooks
Change-Id: I6780fa43cebd9f48d1ae0ea6fbeb92a95c4dfa15
Signed-off-by: zkhatami88 <z.khatami88@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/443653
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-22 21:16:36 +00:00
Seth Howell
b38e3a60c6 rdma: change the logic of rdma_qpair_process_pending
I think this simplifies the process a little bit.

Change-Id: Icc87a59c9f6fd965ef35531975b7036d85c4bc95
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445916
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-22 18:31:02 +00:00
Seth Howell
80eecdd881 rdma: use an stailq for incoming_queue
Change-Id: Ib1e59db4c5dffc9bc21f26461dabeff0d171ad22
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445344
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-22 18:31:02 +00:00
Seth Howell
bfdc957c75 rdma: remove the state_cntr variable.
We were only using one value from this array to tell us if the qpair was
idle or not. Remove this array and all of the functions that are no
longer needed after it is removed.

This series is aimed at reverting
fdec444aa8 which has been tied to
performance decreases on master.

Change-Id: Ia3627c1abd15baee8b16d07e436923d222e17ffe
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445336
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-22 18:31:02 +00:00
Seth Howell
04ebc6ea28 RDMA: Remove the state_queues
Since we no longer rely on the state queues for draining qpairs, we can
get rid of most of them. We cn keep just a few, and since we don't ever
remove arbitrary elements, we can use stailqs to perform those
operations. Operations on Stailqs carry about half the overhead as
operations on tailqs

Change-Id: I8f184e6269db853619a3581d387d97a795034798
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445332
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-22 18:31:02 +00:00
Shuhei Matsumoto
df99e28158 nvmf: Expose bdev's PI setting to NVMe-oF Initiator
This patch expose backend's bdev's PI setting to the corresponding
NVMe-oF Initiator by Ideintify command, and removes the check if
block size is 512 multiple.

These change enables NVMe-oF Initiator to send extended LBA payload.

Change-Id: Ia7aa8332d36f056872a515b6da90c83112edb909
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/445056
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-02-22 00:36:55 +00:00
Ziye Yang
2da86de69f nvmf/tcp: fix error message printing in spdk_nvmf_tcp_qpair_set_recv_state
If the current recv_state of qpair is same with the state to be set,
we will print error message. And checked the current code,
we should add a check to avoid this.

Change-Id: I49334f637c48e565e785d1fe6d0f000e18b2048a
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445653
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-21 18:04:10 +00:00
Ziye Yang
a1c5442d16 nvmf/tcp: remove the tqpair->group = NULL statement
Purpose: solve the coredump issue for the buffer
return later in spdk_nvmf_tcp_request_free_buffers.

If keep this statement, we cannot return the buffer
to the polling group.

Change-Id: Ib5c95ba54b37540950e654110fe6317cab507076
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/445435
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-02-21 03:37:47 +00:00
yidong0635
9d838d24ad rdma: add return to avoid address points to the zero page
Error logs in nvmf_rdma_dump_request lead to report error about
address points to the zero page, add judgement to return.
this issue occurs in heavy load fio testing.

Change-Id: I50302be88b3af53f718e3800aa16df7c506ca4e8
Signed-off-by: yidong0635 <dongx.yi@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441110
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2019-02-15 04:29:40 +00:00
Changpeng Liu
d5b89466cc nvmf: add get/set features with reservation notification mask support
Change-Id: I93089c4b362930d1e2b3a847639e6cc18b15f217
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/439933
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-14 01:28:43 +00:00
Ziye Yang
2d0ce5b48b nvmf/tcp: Implement correct behavior of timeout for C2Htermreq case
From TP8000 spec 7.4.7,

"In response to a C2HTermReq PDU, the host shall terminate the connection.
If the host does not terminate the connection in an implementation specific
period that does not exceed 30 seconds, the controller may terminate the
connection on its own".

It means that the timeout is designed for: when the target is
sending out C2hTermReq, if the host does not terminate the connection,
the target should terminate the connection.

PS: For detecting the malicous connection without sending response
(such as no response of R2T PDU) which should be another patch.

Change-Id: I586dbb235d99aeab5d748a19b9128cd8b0cef183
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/440831
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-13 18:20:28 +00:00
Changpeng Liu
da30cda946 nvmf: add get/set features with reservation persistence support
The persistence feature can't support for now, but as the features
are mandatory for reservation, so add the two function here, and
we can enable it with future patches for power loss persist feature.

Change-Id: Ic358eda00058809bbfd6984b0861f8b6b5aabecd
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/438213
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-13 06:10:53 +00:00
Seth Howell
bdc81134c2 nvmf: use io unit size in transport buffer pools
When this structure was brought up to the generic layer, the tcp
transport was using max_io_size and the rdma transport was using
io_unit_size. In the interest of conserving memory, we should use
io_unit_size instead of max_io_size.

Change-Id: I2633306fcbfd8c3d557445959c745cb2d9a0999e
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442778
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-12 23:34:20 +00:00
Seth Howell
b7651b681c NVMe-oF: add asserts for SGE counts
We should never be going over these limits in the respective transports,
but add asserts to check this during testing.

Change-Id: Ifcaa82ccf58546a38020b31df54ee5d1d9822b8b
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442777
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-12 23:34:20 +00:00
Seth Howell
145485769e nvmf: remove qpair state activating.
This intermediate state is unused and meaningless. the qpair transitions
into this state right before calling a synchronous operation and then
transitions to active as soon as that operation completes successfully.
If the operation did not complete successfully, we were leaving qpairs
in this weird intermediate state when for all intents and purposes they
had reverted to an uninitialized state. Keeping qpairs in the
uninitialized state until they have been added to a poll group creates a
meaningful distinction between states that can be actionable from the
transport level.

Change-Id: I6de9bc424b393b6fff221aa2f4212aaa91488629
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443471
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-12 20:39:44 +00:00
Seth Howell
b952668186 rdma: destroy uninitialized qpairs immediately.
Connections in the uninitialized state haven't been added to a poll
group yet, so submitting dummy requests to them will be pointless since
they will never be polled. We need to reject the connection and destroy
the qpair immediately.

Change-Id: Id5dd711882e1ae7c13ae32c06da2285186b00a1b
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443470
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-12 20:39:44 +00:00
Seth Howell
825cac2720 rdma.c: Create a single point of entry for qpair disconnect
Since there are multiple events/conditions that can trigger a qpair
disconnection, we need to funnel them to a single point of entry. If
more than one of these events occurs, we can ignore all but the first
since once a disconnect starts, it can't be stopped.

Change-Id: I749c9087a25779fcd5e3fe6685583a610ad983d3
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443305
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-12 20:39:44 +00:00
Seth Howell
b6b0a0ba59 rdma: adjust I/O unit based on device SGL support
For devices that support fewer SGE elements than our default values, we
need to adjust the I/O unit size so that we don't ever try to submit
more SGLs than we are allowed to.

Change-Id: I316d88459380f28009cc8a3d9357e9c67b08e871
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442776
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-12 18:46:57 +00:00
Seth Howell
92f5548a91 rdma: properly account num_outstanding_data_wr
This value was not being decremented when we got SEND completions for
write operations because we were using the recv send to indicate when we
had completed all writes associated with the request. I also erroneously
made the assumption that spdk_nvmf_rdma_request_parse_sgl would properly
reset this value to zero for all requests. However, for requests that
return SPDK_NVME_DATA_NONE rom spdk_nvmf_rdma_request_get_xfer, this
funxtion is skipped and the value is never reset. This can cause a
coherency issue on admin queues when we request multiple log files. When
the keep_alive request is resent, it can pick up an old rdma_req which
reports the wrong number of outstanding_wrs and it will permanently
increment the qpairs curr_send_depth.

This change decrements num_outstanding_data_wrs on writes, and also
resets that value when the request is freed to ensure that this problem
doesn't occur again.

Change-Id: I5866af97c946a0a58c30507499b43359fb6d0f64
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443811
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2019-02-12 18:43:44 +00:00
Seth Howell
ceb32abbd8 nvmf: don't set qpair->group to NULL.
The typical rdma qpair disconnect function goes through the function
_nvmf_rdma_disconnect_retry. When this function was introduced, it was
discovered that we could receive a qpair disconnect event for a given
qpair before that qpair had been assigned to a poll group. In order to
ensure that the disconnect procedure completed properly, we waited on
the current thread in _nvmf_rdma_disconnect_retry for the qpair to be
assigned a poll group before we finally disconnected. see rdma.c:2250.
Since _nvmf_rdma_disconnect_retry was not necessarily called from the
poll group's thread, we relied upon the assumption that the group
variable would never be set back to NULL. See the comment on rdma.c:
2243.

However, in _spdk_nvmf_qpair_destroy we were setting the group back to
NULL. This operation can result in the following set of operations
across multiple threads that prevent a qpair from ever being fully
destroyed.
1. thread 1: receive a disconnect event - call nvmf_rdma_disconnect
2. thread 1: from nvmf_rdma_disconnect call
spdk_nvmf_rdma_qpair_inc_refcnt - setting rqpair->refcnt to 1.
3. thread 2: call spdk_nvmf_rdma_poller_poll.
4. thread 2: in spdk_nvmf_rdma_poller_poll reap a completion with an
error status which causes us to call spdk_nvmf_qpair_disconnect -
rdma:2846
5. thread 2: spdk_nvmf_qpair_disconnect calls _spdk_nvmf_qpair_destroy which sets
qpair->group = NULL
6. thread 1: from nvmf_rdma_disconnect we call
_nvmf_rdma_disconnect_retry which checks if qpair->group == NULL. If
that is the case, we assume that the qpair has not been assigned a group
yet and send ourself a message to call _nvmf_rdma_disconnect_retry again. see rdma.c:2253
7. thread 2: from _spdk_nvmf_qpair_destroy we call
spdk_nvmf_transport_qpair_fini which results in a call to
spdk_nvmf_rdma_close_qpair. which sends dummy send and recvs to the
qpair.
8. thread 2: we call poller_poll and get completions for both the send
and recv dummy requests. This results in a call to
spdk_nvmf_rdma_qpair_destroy.
9. thread 2: spdk_nvmf_rdma_qpair_destroy checks rqpair->refcnt and when
it sees that it does not = 0 (see step 2 above) it returns without
freeing the resources. see rdma.c:629
10. thread 1: we keep churning in _nvmf_rdma_disconnect_retry sending
ourselves messages because rqpair->group is going to be null. Thread 1
never reaches line 2257 where it sends a message to call
_nvmf_rdma_qpair_disconnect. _nvmf_rdma_qpair_disconnect is the function
that decreases the rqpair->refcnt and allows us to make forward progress
on destroying the qpair.

I encountered this issue while trying to disconnect from our target
using the kernel initiator with an x722 NIC. I think the timing on this
bug comes out with that specific configuration because come of the calls
in the disconnect path on thread 1 fail causing it to take longer giving
a chance to the second thread to delete the qpair.

There are really two issues at play here. We don't have a single point
of entry for disconnecting RDMA qpairs, and we rely on the qpair->group
variable never being set back to NULL. This patch addresses the second
issue, and the next patch in the series addresses the first.

Change-Id: I65395d0bbb67edfa7bad2ddc70906606c3d83781
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443304
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-02-11 19:25:51 +00:00
Ben Walker
7a4d6af182 nvmf/tcp: Stay in AWAIT_PDU_READY state until atleast 1 byte arrives
This doesn't fix any bug, but it makes more sense to leave the qpair
in the NVME_TCP_PDU_RECV_STATE_AWAIT_PDU_READY state until it
receives at least one byte.

Change-Id: Ic5f34a733a80b58f65a1334fae7e07dbded2b3d0
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441811
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2019-02-08 16:35:12 +00:00
Ben Walker
63de221bf6 nvmf/tcp: Eliminate management channel in favor of poll group
The management channel was used in the RDMA transport prior
to the introduction of poll groups and made its way over to
the TCP transport when it was written. Eliminate it in favor
of just using the poll group.

Change-Id: Icde631dd97a6a29190c4a4a6a10a0cb7c4f07a0e
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442432
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
2019-02-06 16:02:43 +00:00
Seth Howell
41cd5ff4fb rdma: fix max_read_depth_definition.
max_read_depth should be based on max_qp_init_read_atomic, or the
maximum number of read values that the initiator will accept as
outstanding.

The device attributes object contains values for both the initiator
(remote side) and the target (local side). All attributes with the name
init in them are meant to correspond to the initiator. The
qp_read_atomic value represents the number of reads and atomic
operations that can have this device as the target. qp_init_read_atomic
represents how many read operations the initiator has said that we can
have outstanding that have the initiator's rdma device as the target.

Since this number represents how many outstanding reads we will send to
the initiator at once, we should use the qp_init_read_atomic value.

Change-Id: Iacc044e8321080de8accd9128ac3777bbb948afc
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442409
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-02-05 18:04:04 +00:00