Commit Graph

2592 Commits

Author SHA1 Message Date
Tomasz Zawadzki
07e31b028a ut/vhost: select vhost_backend for UT
As of right now the UT always used the empty struture of
struct spdk_vhost_dev_backend during the test. This meant
VHOST_BACKEND_BLK.
alloc_vdev() will require further changes to test both types
of backends. So for now change it to VHOST_BACKEND_SCSI,
since it currently does not touch any fields outside of the
struct spdk_vhost_dev.
Meanwhile next patch will do so for blk backend.

Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Ib5af7520bc8a21a7af03b810d4cc42726797a331
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12749
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
2022-05-20 19:40:56 +00:00
Tomasz Zawadzki
91426dc600 ut/vhost: add vhost_blk.c and stubs
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I5218d6ea95f6edb6f664bad75b17c68c0760d637
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10977
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
2022-05-20 19:40:56 +00:00
Tomasz Zawadzki
69820927da ut/vhost: initialize vhost libraries
Vhost library was not initialized as part of the test,
it will become necessary later in the series.

Suite startup/cleanup have no matching CUnit test case,
so only assert() can be used. Rather than CU_ASSERT().

Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Ieaa3d2f6b6f1899105362181f285f585ff9724d7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10945
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-05-20 19:40:56 +00:00
Alexey Marchuk
619b4dba8a lib/reduce: Check if user's buffer crosses huge page boundary
If compress driver doesn't support SGL input of output
then we need to copy user's buffers into reduce internal
buffers

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I0c07243a5b668d0e0adcc153e5b573f59c26ab64
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12281
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2022-05-20 17:39:57 +00:00
Alexey Marchuk
b86e85f56f lib/reduce: Properly allocate comp/decomp buffers
Reduce library allocates one big chunk of memory and
then splits it between requests. The problem is that
a chunk of memory assigned to a request may cross huge
page boundary and if compress driver doesn't support
SGL input of output, operation will be failed.
To avoid this problem, align buffer start on 2MiB
and check each chunk of memory if it crosses huge page
boundary.

Fixes issue #2454

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ie730b8ba928f27a43bde1222b6c18d29b797575a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12249
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2022-05-20 17:39:57 +00:00
Jonas Pfefferle
192e64bcc5 bdev: spdk_bdev_ext_io_opts missing size check
ext_io_opts uses the size member to allow backwards
compatibility however currently we only check if it is
below or equal the current size of the opts struct and
that it is not 0. size is only used when we copy opts
because of split or push/pull.
This patch introduces size checks to allow safe access
to e.g. metadata and memory domain pointers of the user
provided opts pointer. The minimum size of the struct
passed is now the size of the initial version of
spdk_bdev_ext_io_opts. To not introduce additional
checks when opts are consumed by a bdev module we
now always copy if the size is smaller than the
current opts struct size.
When introducing new members to opts additional
checks might be needed if those are directly accessed
through the passed pointer or bdev_io->internal.ext_opts.

Change-Id: Ibd181a5840a3d5022018a9f61403df961ffd6e1d
Signed-off-by: Jonas Pfefferle <pepperjo@japf.ch>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12550
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2022-05-20 15:55:50 +00:00
GangCao
7cfb12f437 Bdev/Lvol: check base bdev's md before examining
To fix issue #2514

Change-Id: If507382202e729f5934a354e2515a035ad5aeb0c
Signed-off-by: GangCao <gang.cao@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12750
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-05-20 09:18:18 +00:00
Shuhei Matsumoto
e4584d937e bdev/nvme: Poll adminq more often during ctrlr disconnection
During ctrlr reconnection, spdk_nvme_ctrlr_reconnect_poll_async()
is executed by a non-timed poller.

We should poll adminq more often during ctrlr disconnection too.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ib1f5b41015aed20deda8df6f2c837981ac233c04
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12615
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-05-20 09:17:28 +00:00
Shuhei Matsumoto
fcf52fbff5 bdev/nvme: Reversed orderings for reset between PCIe and NVMe-oF
As described in the NVMe specification, a controller level reset
includes the following actions:
- the controller stops processing any outstanding admin or I/O
  commands;
- all I/O SQs and CQs are deleted.

In a full controller reset sequence for a PCIe controller, if we do
a controller level reset first, we can abort outstanding commands
after the hardware has actually been stopped.

For NVMe-oF controller, each I/O qpair is an independent network
connection and is disconnected safely. We do not want to change
NVMe-oF controller.

Fixes the issue #2360

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: If05febac74705bfd3df5abd15064c1203126e027
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12447
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2022-05-20 09:17:28 +00:00
Shuhei Matsumoto
736b9da034 nvme: Do Controller Level Reset when disconnecting adminq for PCIe
As described in the previous patches, we need to delete all I/O
SQ/CQs before aborting trackers when disconnecting a controller.

The following patches reorder the operations. This patch changes
adminq disconnection to initiate a Controller Level Reset and
adminq completion processes it if ctrlr->is_disconnecting is true.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I64f06bae2ce8a9127124029fd042db0028198e3c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12560
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
2022-05-19 08:23:57 +00:00
Alexey Marchuk
1eca87c39c blobstore: Preallocate md_page for new cluster
When a new cluster is added to a thin provisioned blob,
md_page is allocated to update extents in base dev
This memory allocation reduces perfromance, it can
take 250usec - 1 msec on ARM platform.

Since we may have only 1 outstainding cluster
allocation per io_channel, we can preallcoate md_page
on each channel and remove dynamic memory allocation.

With this change blob_write_extent_page() expects
that md_page is given by the caller. Sicne this function
is also used during snapshot deletion, this patch also
updates this process. Now we allocate a single page
and reuse it for each extent in the snapshot.

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I815a4c8c69bd38d8eff4f45c088e5d05215b9e57
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12129
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2022-05-18 09:02:02 +00:00
GangCao
7bcd316de1 bdev: abort all IOs when unregistering the bdev
To fix issue: #2484

When unregistering the bdev, will send out the message
to each thread to abort all the IOs including IOs from
nomem_io queue, need_buf_small queue and need_buf_large queue.

The new SPDK_BDEV_STATUS_UNREGISTERING state is newly
added to indicate this unregister operation.

In this case, the bdev unregister operation becomes the
async operation as each thread will be sent the message
to abort the IOs and as the last step, it will unregister
the required bdev and associted io device.

On the other hand, the queued_resets will be handled
separately and not aborted in the bdev unregister.

New unit test cases are also added:
  enomem_multi_bdev_unregister: to abort the IO from
nomem_io queue during the unregister operation
  bdev_open_ext_unregister: to handle the events and
async operations from the unregister operation

Change-Id: Ib1663c0f71ffe87144869cb3a684e18eb956046b
Signed-off-by: GangCao <gang.cao@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12573
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
2022-05-18 07:30:00 +00:00
Alexey Marchuk
007fb1d3cb nvme: Fix keyed/unkeyd SGL nvme cmd dump
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I0a08518b5c30455a17158aa440715515d0c066fc
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12133
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2022-05-17 20:11:43 +00:00
Shuhei Matsumoto
00d46b80b2 bdev/nvme: Disable automatic failback in multipath mode
By default, failback to the preferred I/O path is done automatically
if it is restored. Some users may want to keep using the backup I/O
path even if the preferred I/O path is restored. In this case,
bdev_nvme_set_preferred_path can be used to do manual failback.

We may be able to clear/fill I/O path cache more strictly but it will
be complicated and have bugs. This patch does the minimal change,
just skips an apparent case.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I78fe5faee6ff04e88ae3d7c6be6da1c20637c912
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12431
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2022-05-17 12:54:45 +00:00
Alexey Marchuk
b0262063d3 vbdev_lvol: Report memory domains
Update functional test to verify that lvol supports
memory domains

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I5e91eedc8879359c3add45d417b6f3eaad4d75b9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11375
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
2022-05-16 10:14:26 +00:00
Alexey Marchuk
248ccd8607 lvol: Use blobstore ext API in data path
The new blobstore ext API is used when the user
provides ext_io_opts in bdev layer.
To store blobstore ext_io_opts, vbdev_lvol reports
non-zero get_ctx_size in bdev module interface.

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I64076b5369533be0c1d69ca48aef9d70a9abe488
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11373
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
2022-05-16 10:14:26 +00:00
Alexey Marchuk
a236084542 blob: Add readv/writev_ext functions
These function accept optional spdk_blob_ext_io_opts
structure. If this structure is provided by the user
then readv/writev_ext ops of base dev will be used
in data path

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I370dd43f8c56f5752f7a52d0780bcfe3e3ae2d9e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11371
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
2022-05-16 10:14:26 +00:00
Alexey Marchuk
ba8f1a9e5d blob: Add readv/writev ext ops to spdk_bs_dev
Introduce spdk_blob_ext_io_opts structure which
is used in the new *_ext functions.
Zeroes dev is updated with implementation of
readv_ext which uses  memory domains memzero
or regular memset().

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Id94542196eff999827bf00591fd43804256fccb4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11369
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
2022-05-16 10:14:26 +00:00
Alexey Marchuk
5fd9561f54 dma: Add memzero function
Add functions to set and call memzero callback to
memory domains library.

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ia6ddc3c9e0ca6e9172189964d180444e5da71d30
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12343
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-05-16 10:14:26 +00:00
Shuhei Matsumoto
5e5423de93 nvme: Add DISABLED to ctrlr's state to show completion of Controller Level Reset
In the following patches, nvme_ctrlr_process_init() will be used to
disable the controller when disconnecting the admin qpair for PCIe
transport. In this case, we will have to exit nvme_ctrlr_process_init()
after CSTS.RDY is 0. However, spdk_nvme_ctrlr_reset() and
spdk_nvme_ctrlr_reconnect_poll_async() have to continue
nvme_ctrlr_process_init() until the controller becomes ready.

To differentiate stop and continue clearly, add a new state
NVME_CTRLR_STATE_DISABLED to enum nvme_ctrlr_state.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ic0a5fb7114d4eeb1cefec28bc404184768fb0a96
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12613
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2022-05-12 07:28:02 +00:00
paul luse
d58a2f6cc5 lib/accel: support multiple accel modules (aka engines) at once
We enable multiple engines by:

* getting rid of the globals that point to the one available HW
and one available SW engine

* adding a submit_tasks() entry point for the SW engine so that
it is treated like any other engine allowing us to just call
submit_tasks() to the assigned engine for the opcode instead of
checking what is supported

* changing the definition of engine capabilities from
"HW accelerated" to simply "supported"

* during init, use a global (g_engines_opc) that contains engines
and is indexed by opcode so we know what the best engine is for each
op code

* future patches will add RPC's to override engine priorities or
specifically assign an opcode(s) to an engine.

Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I9b9f3d5a2e499124aa7ccf71f0da83c8ee3dd9f9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11870
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
2022-05-05 07:11:32 +00:00
Shuhei Matsumoto
8f9b977504 bdev/nvme: Add active/active policy for multipath mode
The NVMe bdev module supported active-passive policy for multipath mode
first. By this patch, the NVMe bdev module supports active-active policy
for multipath node next. Following the Linux kernel native NVMe multipath,
the NVMe bdev module supports round robin algorithm for active-active
policy.

The multipath policy, active-passive or active-active, is managed per
nvme_bdev. The multipath policy is copied to all corresponding
nvme_bdev_channels.

Different from active-passive, active-active caches even non_optimized
path to provide load balance across multiple paths.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ie18b24db60d3da1ce2f83725b6cd3079f628f95b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12001
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
2022-05-05 07:11:24 +00:00
Shuhei Matsumoto
22b77a3c80 bdev/nvme: Set preferred I/O path in multipath mode
If we specify a preferred path manually for each NVMe bdev, we will
be able to realize a simple static load balancing and make the failover
more controllable in the multipath mode.

The idea is to move I/O path to the NVMe-oF controller to the head of
the list and then clear the I/O path cache for each NVMe bdev channel.
We can set the I/O path to the I/O path cache directly but it must be
conditional and make the code very complex. Hence, let find_io_path() do
that.

However, a NVMe bdev channel may be acquired after setting the preferred
path. To cover such case, sort the nvme_ns list of the NVMe bdev too.

This feature supports only multipath mode. The NVMe bdev module supports
failover mode too. However, to support the latter, the new RPC needs to
have trid as parameters and the code and the usage will be come very
complex. Add a note for such limitation.

To verify one by one exactly, add unit test.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ia51c74f530d6d7dc1f73d5b65f854967363e76b0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12262
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: <tanl12@chinatelecom.cn>
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-05-05 07:11:24 +00:00
Jim Harris
81a3b8a596 nvmf: make nacwu 0-based
spdk_bdev_get_acwu() is a 1-based number, so we need
to subtract 1 from it before assigning the value to
nsdata->nacwu.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I32708b28a35670cba6013a48b79389fa48226285
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12399
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2022-04-29 07:29:06 +00:00
Richael Zhuang
9bff828f99 sock: introduce dynamic zerocopy according to data size
MSG_ZEROCOPY is not always effective as mentioned in
https://www.kernel.org/doc/html/v4.15/networking/msg_zerocopy.html.

Currently in spdk, once we enable sendmsg zerocopy, then all data
transferred through _sock_flush are sent with zerocopy, and vice
versa. Here dynamic zerocopy is introduced to allow data sent with
MSG_ZEROCOPY or not according to its size, which can be enabled by
setting "enable_dynamic_zerocopy" as true.

Test with 16 P4610 NVMe SSD, 2 initiators, target's and initiators'
configurations are the same as spdk report:
https://ci.spdk.io/download/performance-reports/SPDK_tcp_perf_report_2104.pdf

For posix socket, rw_percent=0(randwrite), it has 1.9%~8.3% performance boost
tested with target 1~40 cpu cores and qdepth=128,256,512. And it has no obvious
influence when read percentage is greater than 50%.

For uring socket, rw_percent=0(randwrite), it has 1.8%~7.9% performance boost
tested with target 1~40 cpu cores and qdepth=128,256,512. And it still has
1%~7% improvement when read percentage is greater than 50%.

The following is part of the detailed data.

posix:
qdepth=128
rw_percent      0             |           30
cpu  origin  thisPatch  opt   | origin  thisPatch opt
1	286.5	298.5	4.19%		 307	304.15	-0.93%
4	1042.5	1107	6.19%		1135.5	1136	0.04%
8	1952.5	2058	5.40%		2170.5	2170.5	0.00%
12	2658.5	2879	8.29%		3042	3046	0.13%
16	3247.5	3460.5	6.56%		3793.5	3775	-0.49%
24	4232.5	4459.5	5.36%		4614.5	4756.5	3.08%
32	4810	5095	5.93%		4488	4845	7.95%
40	5306.5	5435	2.42%		4427.5	4902	10.72%

qdepth=512
rw_percent      0             |           30
cpu  origin  thisPatch  opt   | origin  thisPatch opt
1    275	 287	4.36%		294.4	295.45	0.36%
4	 979	1041	6.33%		1073	1083.5	0.98%
8	1822.5	1914.5	5.05%		2030.5	2018.5	-0.59%
12	2441	2598.5	6.45%		2808.5	2779.5	-1.03%
16	2920.5	3109.5	6.47%		3455	3411.5	-1.26%
24	3709	3972.5	7.10%		4483.5	4502.5	0.42%
32	4225.5	4532.5	7.27%		4463.5	4733	6.04%
40	4790.5	4884.5	1.96%		4427	4904.5	10.79%

uring:
qdepth=128
rw_percent      0             |           30
cpu  origin  thisPatch  opt   | origin  thisPatch opt
1	270.5	287.5	6.28%		295.75	304.75	3.04%
4	1018.5	1089.5	6.97%		1119.5	1156.5	3.31%
8	1907	2055	7.76%		2127	2211.5	3.97%
12	2614	2801	7.15%		2982.5	3061.5	2.65%
16	3169.5	3420	7.90%		3654.5	3781.5	3.48%
24	4109.5	4414	7.41%		4691.5	4750.5	1.26%
32	4752.5	4908	3.27%		4494	4825.5	7.38%
40	5233.5	5327	1.79%		4374.5	4891	11.81%

qdepth=512
rw_percent      0             |           30
cpu  origin  thisPatch  opt   | origin  thisPatch opt
1	259.95	 276	6.17%		286.65	294.8	2.84%
4	955 	1021	6.91%		1070.5	1100	2.76%
8	1772	1903.5	7.42%		1992.5	2077.5	4.27%
12	2380.5	2543.5	6.85%		2752.5	2860	3.91%
16	2920.5	3099	6.11%		3391.5	3540	4.38%
24	3697	3912	5.82%		4401	4637	5.36%
32	4256.5	4454.5	4.65%		4516	4777	5.78%
40	4707	4968.5	5.56%		4400.5	4933	12.10%

Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Change-Id: I730dcf89ed2bf3efe91586421a89045fc11c81f0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12210
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2022-04-28 07:29:28 +00:00
Alex Michon
2bc134eb4b bdev/nvme: Fix aborting fuse commands
When sending a fused compare and write command, we pass a callback
bdev_nvme_comparev_and_writev_done that we expect to be called twice
before marking the io as completed. In order to detect if a call to
bdev_nvme_comparev_and_writev_done is the first or the second one, we
currently rely on the opcode in cdw0. However, cdw0 may be set to 0,
especially when aborting the command. This may cause use-after-free
issues and this may call the user callbacks twice instead of once.
Use a bit in the nvme_bdev_io instead to keep track of the number of
calls to bdev_nvme_comparev_and_writev_done.

Signed-off-by: Alex Michon <amichon@kalrayinc.com>
Change-Id: I0474329e87648e44b08998d0552b2a9dd5d34ac2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12180
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-04-26 07:47:09 +00:00
Konrad Sztyber
3056c8ac02 nvmf/tcp: delay qpair destruction
This patch adds an extra spdk_thread_send_msg() call to destroy a qpair
to make sure that it isn't freed from the context of a socket write
callback.  Otherwise, spdk_sock_close() won't abort pending requests,
causing their completions to be exected after the qpair is freed.

Fixes #2471

Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: Ia510d5d754baccca1e444afdb10696ab9b58e28b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12332
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-04-25 07:36:05 +00:00
Shuhei Matsumoto
494eb6e58b bdev: Fix race among bdev_reset(), bdev_close(), and bdev_unregister()
There is a race condition when a bdev is unregistered while reset is
submitted from the upper layer very frequently.

spdk_io_device_unregister() may fail because it is called while
spdk_for_each_channel() is processed.

    spdk_io_device_unregister io_device bdev_Nvme0n1 (0x7f4be8053aa1)
    has 1 for_each calls outstanding

To avoid this failure, defer calling spdk_io_device_unregister() until
reset completes if reset is in progress when unregistration is ready
to do, and then reset completion calls spdk_io_device_unregister()
later.

A bdev cannot be opened if it is already deleting. So we do not need
to hold mutex.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ida1681ba9f3096670ff62274b35bb3e4fd69398a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12222
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
2022-04-22 09:45:14 +00:00
Shuhei Matsumoto
50b6329ca0 bdev/nvme: Factor out ctrlr info json dump into a helper function
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I7f1e08ff13d890cb780e7b66c18a77ab85c82029
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12311
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-04-22 09:44:57 +00:00
Shuhei Matsumoto
13ca6e52d3 bdev/nvme: Handle ANA transition (change or inaccessible state) correctly
Previously, if a namespace is in ANA inaccessible state, I/O had been
queued infinitely. Fix this issue according to the NVMe spec.

Add a temporary poller anatt_timer and a flag ana_transition_timedout for
each nvme_ns.

Start anatt_timer if the nvme_ns enters ANA transition. If anatt_timer
is expired, set ana_transition_timedout to true. Cancel anatt_timer or
clear ana_transition_timedout if the nvme_ns exits ANA transition.

nvme_io_path_become_available() returns false if ana_transition_timedout
is true.

Add unit test case to verify these addition.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ic76933242046b3e8e553de88221b943ad097c91c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12194
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
2022-04-22 09:44:57 +00:00
Ben Walker
e22c933edb idxd: Make many internal idxd_user functions take an idxd_user object
This reduces a lot of casting.

Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: Ibc04f422858642d0e20c9b020bb6c5d1b70256fe
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11534
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
2022-04-20 08:20:45 +00:00
Shuhei Matsumoto
4b73223542 nvme_rdma: Wait until lingering qpair becomes quiet before completing disconnection
The code to handle the lingering qpair when deleting it was really
complicated.

The RDMA transport can connect or disconnect qpair asynchronously.

Then we can include the code to handle the lingering qpair into the
code to disconnect qpair now.

If the disconnected qpair is still busy, defer completion of the
disconnection until qpair becomes idle.

If poll group is not used, we can complete disconnection immediately
because cq is already destroyed.

The related data and unit test cases are not necessary anymore.
So delete them in this patch.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ic8f81143fcad0714ac9b7db862313aa8094eeefb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11778
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2022-04-18 18:35:29 +00:00
Shuhei Matsumoto
9717b0c3df nvme_rdma: Connect and disconnect qpair asynchronously
Add three states, INITIALIZING, EXITING, and EXITED to the rqpair
state.

Add async parameter to nvme_rdma_ctrlr_create_qpair() and set it
to opts->async_mode for I/O qpair and true for admin qpair.

Replace all nvme_rdma_process_event() calls by
nvme_rdma_process_event_start() calls.

nvme_rdma_ctrlr_connect_qpair() sets rqpair->state to INITIALIZING
when starting to process CM events.

nvme_rdma_ctrlr_connect_qpair_poll() calls
nvme_rdma_process_event_poll() with ctrlr->ctrlr_lock if qpair is
not admin qpair.

nvme_rdma_ctrlr_disconnect_qpair() returns if qpair->async is true
or qpair->poll_group is not NULL before polling CM events, or polls
CM events until completion otherwise. Add comments to clarify why
we do like this.

nvme_rdma_poll_group_process_completions() does not process submission
for any qpair which is still connecting.

Change-Id: Ie04c3408785124f2919eaaba7b2bd68f8da452c9
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11442
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2022-04-18 18:35:29 +00:00
Tomasz Zawadzki
6f89388ed3 lib/vhost: move vhost_user related fields from spdk_vhost_dev
spdk_vhost_dev structure should only contain generic fields
that are to be used by either vhost, vhost_blk or vhost_scsi
layer.

The vhost_user backend can hold its properties in
spdk_vhost_user_dev, which is maintained within rte_vhost.

Both structures contain references back to each other.
The reference in spdk_vhost_dev is a void pointer to
allow future transports to keep the reference
to their own structures.

Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I68640c524426d885c20242146365ba242fa9df8e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11813
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2022-04-15 07:49:32 +00:00
paul luse
37b68d7287 accel: cleanup by getting rid of capabilties enum
In support of upcoming patches and to greatly simplify things,
the capabilites enum which held bit positions for each opcode
has been removed.  Only the opcodes enum remains and thus only
opcodes are used throughout.  For the capabiltiies bitmap a helper
function is added to convert from opcode to bit position.  Right
now it is used in the IO path but in upcoming patches that goes away
and the conversion is only done at init time.

Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: Ic4ad15b9f24ad3675a7bba4831f4e81de9b7bc70
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11949
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2022-04-14 08:32:50 +00:00
Ziv Hirsch
e749fa9c27 nvmf: fix buffer overflow on admin commands
When req->iovcnt is bigger than 1, `memset(req->data, 0, req->length)` is wrong.

Signed-off-by: Ziv Hirsch <zivhirsch13@gmail.com>
Change-Id: Ie53eba686b4c5889bbde3b3644d51acbef303b42
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12216
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2022-04-14 08:31:35 +00:00
Kamil Godzwon
492cd95440 valgrind: fixed ASAN/Valgrind options
Patch for not running tests if ASAN and
Valgrind options are both enabled.

Fixes #2422

Signed-off-by: Kamil Godzwon <kamilx.godzwon@intel.com>
Change-Id: I50c91bede687f0aee571c1f2540530a7fafcb49c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11998
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Karol Latecki <karol.latecki@intel.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-04-11 13:05:16 +00:00
Tomasz Zawadzki
f9fccbae63 lib/vhost: separate out vhost_user backend callbacks
Previously spdk_vhost_dev_backend held callbacks
for vhost_blk and vhost_scsi functionality, along
with ones that are called by the vhost_user backend.

This patch separates out those callbacks into two
structures:
- spdk_vhost_dev_backend - to be implemented by vhost_blk
and vhost_scsi
- spdk_vhost_user_dev_backend - is only implemented by
vhost_user backend, callbacks for session managment
specific to that transport

Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I348090df5dddeb2b1945b082b85aec53d03c781b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11812
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2022-04-11 07:44:09 +00:00
Ben Walker
3edf1e200e test/bdev: In bdev_nvme_ut, handle spdk_nvme_poll_group_remove when
there is no group

The real implementation handles this by returning -ENOENT, so do the
same in the test.

Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I405b6f60bf4dcdb22c57e48bbaf66d57522a49c5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11508
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
2022-04-07 07:23:56 +00:00
Ben Walker
2250a441c4 test/bdev: In bdev_nvme_ut, count a disconnect as "activity"
Count disconnecting a queue pair as activity so that the unit test
poll_threads() calls don't bail out until the disconnectedd_qpair_cb is
called at least once.

Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: Idc437d6c589dbf133bfcbb5edba1087f928a718c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11507
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
2022-04-07 07:23:56 +00:00
Ben Walker
c86778398b bdev/nvme: Remove ctrlr from nvme_ctrlr_channel
This was neither set nor used.

Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I3119135843c5fc0b8724e593db40df46e6b5bdb0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12097
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-04-07 07:23:56 +00:00
yupeng
64eebbd132 bdev/raid: Add concat module
The concat module can combine multiple underlying bdevs to a single
bdev. It is a special raid level. You can add a new bdev to the end of
the concat bdev, then the concat bdev size is increased, and it won't
change the layout of the exist data. This is the major difference
between concat and raid0. If you add a new underling device to raid0,
the whole data layout will be changed. So the concat bdev is extentable.

Change-Id: Ibbeeaf0606ff79b595320c597a5605ab9e4e13c4
Signed-off-by: Peng Yu <yupeng0921@gmail.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11070
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-04-05 07:39:00 +00:00
Shuhei Matsumoto
428b17a0a8 bdev: Add spdk_for_each_bdev/bdev_leaf for clean up and further improvements
To execute a callback function for each registered bdev or unclaimed
bdev, add new public APIs, spdk_for_each_bdev() and
spdk_for_each_bdev_leaf().

These functions are safe for race conditions by opening before and
closing after executing the provided callback function.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I59b702ffec7b4fc5e9779de5a3a75d44922b829b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12088
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-04-05 07:30:47 +00:00
Alexey Marchuk
be440c01c9 raid: Report memory domains
Use spdk_bdev_readv/writev_block_ext even when
there is no ext opts passed by bdev layer

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I0b9f17150cdba1a1023478bae745ab4438ea99bb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10070
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2022-04-04 09:57:56 +00:00
Alexey Marchuk
99719ef049 raid0: Use extended bdev rw API
That is a preparation for support of  memory domains
in bdev_raid

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I3a6e01eccd4d7e4bc197dc5ffe268d42081d41de
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11429
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2022-04-04 09:57:56 +00:00
Alexey Marchuk
1299439f3d bdev: pull/push data if bdev doesn't support
memory domains

If bdev doesn't support any memory domain then allocate
internal bounce buffer, pull data for write operation before
IO submission, push data to memory domain once IO completes
for read operation.

Update test tool, add simple pull/push functions
implementation.

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ie9b94463e6a818bcd606fbb898fb0d6e0b5d5027
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10069
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
2022-04-04 09:57:56 +00:00
Shuhei Matsumoto
4573e4cc23 module/bdev: Use spdk_bdev_unregister_by_name() if possible
Replace spdk_bdev_get_by_name() + spdk_bdev_unregister() by
spdk_bdev_unregister_by_name() wherever possible.

This simplifies the code and makes the code more reliable.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I91388c9d0b2e244cb745720a480803b03c42a226
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12066
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-04-04 09:57:43 +00:00
Shuhei Matsumoto
96c007d301 bdev: Add spdk_bdev_unregister_by_name() to handle race condtions
To unregister a bdev more correctly, we had to call
spdk_bdev_open_ext(), spdk_bdev_desc_get_bdev(), spdk_bdev_unregister(),
and then spdk_bdev_close(). This was correct but complicated.

Hence add a new public API spdk_bdev_unregister_by_name() which does
the whole correct sequence of bdev unregistration.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I9068d4ac49dca944436e0ba587308fd356dfef75
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12065
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-04-04 09:57:43 +00:00
Tomasz Zawadzki
6301f8915d lib/sock: provide a hint to picking optimal poll group
The process of matching qpair to poll group is split into
two distinct parts that occur on different threads.
See spdk_nvmf_tgt_new_qpair().

This results in a race condition for TCP between spdk_sock_map_lookup()
and spdk_sock_map_insert(), which are called in spdk_nvmf_get_optimal_poll_group()
and spdk_nvmf_poll_group_add() respectively.

Fixes #2113

This patch picks a hint from nvmf_tcp for next poll group,
which is then passed down to spdk_sock_map_lookup().

When matching placement_id exists, but does not have
a poll group assigned - the hint will be used.

Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I4abde2bc9c39225c9f5dd7c3654fa2639bb0a27f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10271
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2022-04-01 12:41:26 +00:00
Chunsong Feng
0db0c443df nvmf/rdma: Improve read performance in DIF strip mode
The rdma buffer for stripping DIF metadata is added. CPU strips the DIF
metadata and copies it to the rdma buffer, improving the rdma write
bandwith. The network bandwidth during 4KB random read test is increased
from 79 Gbps to 99 Gbps, the IOPS is increased from 2075K to 2637K.

Fixes issue #2418

Signed-off-by: Chunsong Feng <fengchunsong@huawei.com>
Change-Id: If1c31256f0390f31d396812fa33cd650bf52b336
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11861
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-04-01 11:19:18 +00:00