Use spdk_bdev_readv/writev_block_ext even when
there is no ext opts passed by bdev layer
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I0b9f17150cdba1a1023478bae745ab4438ea99bb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10070
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
That is a preparation for support of memory domains
in bdev_raid
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I3a6e01eccd4d7e4bc197dc5ffe268d42081d41de
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11429
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Use spdk_bdev_open_ext() + spdk_bdev_desc_get_bdev() +
spdk_bdev_close() instead of spdk_bdev_get_by_name().
Additionally, use bdev pointer instead of &fdisk->disk because it is
more readable.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I924595268e6785592a6e777e90a8c245a0346719
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12070
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
spdk_bdev_get_by_name() uses RB tree and is fast. However it is not
secure from race condition. We can use spdk_bdev_open_ext() instead
but what we want to get is not spdk_bdev but vbdev_delay.
vbdev_delay is managed by the g_delay_nodes list.
The g_delay_nodes includes only vbdev_delay. Even if its performance
is O(N), it is more intutive, more secure, and small list.
So replace spdk_bdev_get_by_name() by simple list traversal.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I3e184066e237e10132523591133900231055b5af
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12069
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Replace spdk_bdev_get_by_name() + spdk_bdev_unregister() by
spdk_bdev_unregister_by_name() wherever possible.
This simplifies the code and makes the code more reliable.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I91388c9d0b2e244cb745720a480803b03c42a226
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12066
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
For deletion RPCs, a boolean false response had been sent rather than
a error response when they failed. However, boolean false resonse had
caused false negative, that is, test code had regarded as success by
mistake. For example, the following test code regards as success if
JSON RPC returns a boolean false response.
if $rpc_py bdev_pmem_delete $pmem_bdev_name; then
error "bdev_pmem_delete deleted pmem bdev for second time!"
fi
This patch fixes such false negative issue by explicitly returning a
error response if deletion RPCs fail.
Only the bdev_virtio_detach_controller RPC has implemented this.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I5409a070cbd2364dbb63b42421b032534c6f9a0b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12077
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: John Kariuki <John.K.Kariuki@intel.com>
The process of matching qpair to poll group is split into
two distinct parts that occur on different threads.
See spdk_nvmf_tgt_new_qpair().
This results in a race condition for TCP between spdk_sock_map_lookup()
and spdk_sock_map_insert(), which are called in spdk_nvmf_get_optimal_poll_group()
and spdk_nvmf_poll_group_add() respectively.
Fixes#2113
This patch picks a hint from nvmf_tcp for next poll group,
which is then passed down to spdk_sock_map_lookup().
When matching placement_id exists, but does not have
a poll group assigned - the hint will be used.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I4abde2bc9c39225c9f5dd7c3654fa2639bb0a27f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10271
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Setting this optional parameter to true makes the
RPC completion wait until the attach for all
discovered NVM subsystems have completed.
This is especially useful for fio or bdevperf, to
ensure that all of the namespaces are actually
available before testing.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Icf04a122052f72e263a26b3c7582c81eac32a487
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12044
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Sendmsg_idx is used in _sock_check_zcopy to check whether idx in
MSG_ERRQUEUE and req's idx match. There is no need to update
it if zerocopy is disabled.
Change-Id: I24bb367e0bff006782d9052470857f6e4db90681
Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12104
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
We try avoiding write a closed socket by checking if the return value
of recv() is zero. However it is not possible to completely avoid writing
a socket which is already closed by the target.
Repeatedly adding/removing listener in the NVMe-oF TCP target caused
SIGPIPE to the NVMe-oF initiator.
Fix the issue by adding MSG_NOSIGINAL to the flag of sendmsg().
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I273679c91c4b867792e966b1dc2121f6d2188f16
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12119
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
The function returns -ENOBUFS when handling OCF_WRITE_FLUSH requests
on a disk with a size which its lower 32-bits are zeros. Zeroing the
request buffer length occurs when creating a new ocf_io and assigning
a 64-bit value to a 32-bit value in vbdev_ocf.c:io_handle.
This patch fixes the condition to check if the request have a payload
(data->iovs) before checking for the size limitation.
Signed-off-by: Gal Hammer <gal.hammer@huawei.com>
Signed-off-by: Shai Fultheim <shai.fultheim@huawei.com>
Change-Id: I2c6d03fee32a8fbed7beffdac6fef6a478ea4211
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10896
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Rafal Stefanowski <rafal.stefanowski@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
After the change that the NVMe bdev module disconnects qpair asynchronously,
disconnected_qpair_cb() got NOTICELOG always when a qpair was disconnected
and freed. This was very noisy.
We have three cases that disconnected_qpair_cb() is called now, 1) qpair
was destroyed in a full ctrlr reset sequence, 2) the upper layer closed
I/O channel, and 3) qpair detected error, and was disconnected and freed.
Get NOTICELOG for 3) but get DEBUGLOG for 1) and 2) with some rewording.
Additionally, to improve readability, change if-else ordering.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ib63bcfd4b72a82a13d3cda208c71cdb40a42fd6b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12085
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
We want to be able to save the discovery configuration
in a generated JSON-RPC file.
The obvious change needed here is to add a
bdev_nvme_start_discovery RPC to the config file
for each discovery context.
But we also need to make sure we do not emit
bdev_nvme_attach_controller RPCs for controllers
that were attached via the discovery service. These
controllers will be attached by the discovery service
instead - or maybe not at all if the discovery
log page returns different results.
Do both of these changes here, since they are
somewhat tied to each other.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ic2072150c3efdd0a8d01da09e33a647e4929779b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11818
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
The only rc that we may get in _comp_bdev_io_submit func
is ENOTSUP. ENOMEM is not available since submission funcs
are void.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I8980644a02889c5e64a2b9b1382dff6d8a7ffa9b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11974
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
With recent changes libreduce should provide correct buffers
if the driver doesn't support SGL in/out. This patch verifies
that we don't use SGLs when they are not supported.
Since even a single buffer can be split on 2MB page
boundary, it is not enough just to check iovs count.
Added asserts that the first elements of mbufs are
not null to avoid scan build errors
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I620e43bf5b1abd25cab412fe08346a6d767c9be9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11973
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
rte_pktmbuf_free frees the given mbuf and any chained mbufs.
It can cause double free of some mbuf if we free every mbuf
in a loop. Instead use rte_pktmbuf_free_bulk which correctly release
chained mbufs.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I55fd7832ff656f519a4ed2f02de8ef1a0f637a02
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11972
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
In case of any error (except of device one) the compression
operation is queued for further resubmission. However in some
cases (e.g. mbufs config error or compress driver error) this
resubmission doesn't have any sense since we'll hit the same
problem later. This patch enqueues operations when there were
no mbufs or comp_operation descriptors or the compress operation
was not processed.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I1e0eab5e4ea80f84d969814a916b6cd783a77fe1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11971
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
When configuring multipath, if the first controller did not have
namespace#0 but the second controller had namespace#0, prchk_flags
was not set as expected because we could set prchk only for the first
controller.
This patch fixes the bug by copying prchk_flags from the first
controller to the following controllers.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ib866fc88bfdf981d1e89ef5a863f50ff41f4e159
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12050
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
This clean up makes the following patches a little easier.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I0415de9b99567b4de1ad7b35298ea51a664d4a32
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12049
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This keeps track if an nvme_ctrlr was created
implicitly by the discovery service.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I493b7cacfe563737f45a1fffca98855a1929a751
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11817
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
These parameters will be used for any controller created
by the discovery service.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I221b791f38b9c5797ba084c647a98b82c102a121
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11942
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Attempt to start a connection once per second, but
after a connection is successfully started, change
the timer period to one millisecond instead.
This ensures lower response time to AER events when
the discovery controller is operational, but then
decreasing rate of unsuccessful connect attempts (and
associated log messages) if/when a discovery controller
fails.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ie24036303f5b00f4a42b6575656f401ea4d578f2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11774
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Also cycle through the discovery paths if the initial
connect_async() operation fails.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I50f36949d9bba0e3bff81505712076f1a1a7aad5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11773
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
If a discovery controller fails at some point, we
will want to detach it. This can happen separately
from detaching the controller because we are stopping
the discovery service. So break out the ctrlr
detach operation into a separate phase of the
discovery_poller.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ia601b767d32bda1c8899d3a95029781c0aeee136
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11772
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
It's already done in ioat_destroy_cb(). Proper fix required moving
the rest of the cleanup code in accel_engine_ioat_exit() to a new
device unregister callback.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: Iaaa595cf5b51f7a4842315fc06270b0857ebf0c5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11930
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Fill is sent in as a uint8, we need to populate the full uint64
input with the uint8 pattern or we'll get a miscompare. This is
how idxd was doing it, instead of adding the same code to ioat just
move it up a layer.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: Ia4aab1c6230f35ab88bb8a0e3b8e16dbd93007c7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11947
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Some compress drivers may not support SGL for in or
out buffers. Extend spdk_reduce_backing_dev with two
flags that will be used by reduce library to correctly
build iovs
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Icee9383364124888c2109894c959c06710d91250
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11968
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
- More user-friendly style of error and debug messages.
- Remove "ERROR" word from SPDK_ERRLOG() to avoid duplicating
in the log.
Signed-off-by: Yuriy Umanets <yumanets@nvidia.com>
Change-Id: Iaee068f96e66f567fc23b34ae0ae6221c1bd710c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11632
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
- Fixed duplication of key, key2, drv_name, cipher, etc., fields in
struct bdev_names and struct vbdev_crypto. Moved all of them into
the new struct vbdev_crypto_opts, which is re-used by both structs.
This aslo removes duplication in error handling and fininalization
logic that checks the keys are zeroed out and properly freed.
- Moved unhexlify into vbdev rpc code. All keys passed to vbdev
already in the binary form.
- Provide meaningful error messages in the rpc response on keys
validation issues during setup of crypto vbdev.
- Updated unit tests.
Signed-off-by: Yuriy Umanets <yumanets@nvidia.com>
Change-Id: I1fab8771bbbc0cd2f359f0d105fec28fb86893b3
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11631
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
- Added hexlify() and unhexlify() for key and key2. This is required
for keys that contain zero and non-ascii characters. Since binary
keys may contain zero character, strlen(key) cannot be used and
key_size and key2_size are used instead. Non-asci chars are not
allowed in json and using hexlified keys fixes this issue as well.
- Updated documentation to clearly state that hexlified keys must
be used.
- Updated test scripts.
Signed-off-by: Yuriy Umanets <yumanets@nvidia.com>
Change-Id: I3fce7839f7eaa67d0307071eba80b4cea472d731
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11891
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Bdev modules must not access internal bdev_io
structure, so add a new pointer in a public
section. Pointer in internal section will be
used in next patch
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ib631563015b3e5fa9300d22b7ae59d8db43c8275
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10421
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
SPDK has settled on what the optimal DSA configuration is, so let's
always use it.
Change-Id: I24b9b717709d553789285198b1aa391f4d7f0445
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11532
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Detach and stop are two different operations. This
->detach field was used to denote when the associated
discovery service should be stopped. So call the field
'stop' instead. That may trigger the currently
attached discovery controller to be detached.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I61c7fc860cd9dbcfab71eedfd223c06c51a41f27
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11771
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
We will now keep a list of the possible paths to
the discovery subsystem. One of them will be the
path we are currently connected to (which at service
start, is the path specified by the user).
Additional entries are added for discovery log page
entries referencing the discovery subsystem.
When the discovery service starts, we just have the
initial entry in the list - the discovery poller
tries to connect to it, and if the connect starts
successfully, removes it from the list and points
ctx->entry_ctx_in_use to it.
This will be useful later when we want to iterate
through the available paths to the discovery
subsystem if the current path fails.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I5b18e0f20c4607e29ac0f12f27ba7eb169d0206d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11770
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
This reduces some code duplication since the same
function will be reused in an upcoming patch.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Id6764171ff93c95de49792a4488f2c205b8eddb6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11769
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
We used to wait until the discovery service could
connect to the discovery subsystem before calling
the callback function provided by the caller (mainly
the start_discovery RPC).
Moving forward, we will be handling the case where
the discovery subsystem is unavailable temporarily.
For now, let's not fail the bdev_nvme_start_discovery
call if we cannot connect to the discovery subsystem.
This will keep the initial service start path the same
as the path where the discovery subsystem is temporarily
unavailable. In the future, we can consider adding
functionality to the start_discovery RPC that waits
up to X number of seconds to see if we were able to
connect and fail otherwise.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Icb05523b9d59f508bfbc0233595c8bf58c10488f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11768
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
For RDMA transport, adminq will find transport error first because
usually only adminq polls CM events.
Change-Id: I7b22cc8883bf02198f1a90d2654c1de6f2e736e6
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11331
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This is a preparation to the following patches.
Change-Id: I1bb0052c745d4f83ff621e4110907a8ac1f1d597
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11330
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
If qpair is disconnected asynchronously, it takes time from detecting
transport error to actually disconnected. We should avoid using the
path as soon as possible after detecting any transport error.
Poll group clears I/O path cache if it finds transport error and avoid
using the path which had transport error.
These changes will reduce the failover time.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I00580159a84372a115ed5e62a6ce13eed4368999
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11329
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
spdk_nvme_ctrlr_disconnect() will be made asynchronous in the
following patches and so we will need to have some changes.
spdk_nvme_ctrlr_disconnect() disconnects adminq and ctrlr synchronously
now.
If spdk_nvme_ctrlr_disconnect() is made asynchronous,
spdk_nvme_ctrlr_process_admin_completions() will complete to disconnect
adminq and ctrlr, and will return -ENXIO only if adminq is disconnected.
However even now spdk_nvme_ctrlr_process_admin_completions() returns
-ENXIO if adminq is disconnected.
So as a preparation, set a callback before calling spdk_nvme_ctrlr_disconnect()
and call the callback if it is set and spdk_nvme_ctrlr_process_admin_completions()
returns -ENXIO.
Besides, fix the return value of bdev_nvme_poll_adminq() in this patch.
Change-Id: I2559f86bb8cf9a92b5b386ed816c00b08c9832df
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10950
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
As we do when deleting ctrlr_channel, disconnect and then free I/O
qpair in a ctrlr reset sequence.
Deleting ctrlr_channel and resetting ctrlr_channel may cause conflicts.
This patch processes such conflicts correctly.
If destroy_ctrlr_channel_cb() is executed between pending and executing
reset_destroy_qpair(), reset_destroy_qpair() is not executed because
ctrlr_channel is not found. In this case, destroy_qpair_channel()
starts disconnecting qpair and deletes ctrlr_channel. Then
disconnected_qpair_cb() releases a reference to poll group.
If destroy_ctrlr_channel_cb() is excuted between executing reset_destroy_qpair()
and disconnected_qpair_cb(), destroy_ctrlr_channel_cb() skips
ctrlr_channel for a reset sequence.
Change-Id: I1f49f74b94aefbea178680aa53ded3a12876c676
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10766
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
At the moment MLX5 uses different number of qp descriptors than the
other pmd crypto drivers. Adding it to vbdev_crypto on init and re-use
everywhere we need it.
Signed-off-by: Yuriy Umanets <yumanets@nvidia.com>
Change-Id: Iea4d4787fc5fd91f27c4a70cf78c5660f09bc854
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11878
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Even released memory contains key and key2 until it is re-allocated
for other purposes. Zero out key and key2 when not longer needed.
Signed-off-by: Yuriy Umanets <yumanets@nvidia.com>
Change-Id: If80f3faeb98b5b5acab7f2f857f284909247d1ac
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11877
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Since IV length is the same for all pmd crypto drivers,
AES_CBC_IV_LENGTH is renamed to IV_LENGTH.
Signed-off-by: Yuriy Umanets <yumanets@nvidia.com>
Change-Id: If8769db119eb599a17c267e8950f18f5a0ea995b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11875
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
When connection is disconnected, bdev_nvme will call
bdev_nvme_failover, and then reset the controller.
nvme_ctrlr->reset_start_tsc should be updated in function
bdev_nvme_failover, then bdev_nvme_check_xxx_timeout can
work well.
Change-Id: I99b639545e9dd4082cdc14696bb7872cb4917b1d
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11957
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Creates a JSON on scheduler side to return after .get_opts is called
and parses a JSON on .set_opts call.
The JSON passed to dynamic scheduler on .set_stats is a copy of a
pointer already available during RPC framework_set_scheduler call.
Getting and setting scheduler stats via RPC calls is going to be
implemented in the next patch in this series.
Change-Id: I62880a71066a140c74336a5725e7b10952008e5c
Signed-off-by: Krzysztof Karas <krzysztof.karas@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11448
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
For RDMA transport, current synchronous qpair disconnect occupied CPU for
a second when qpair disconnect gets timeout.
To remove this limitation, we will do the following:
- make spdk_nvme_ctrlr_disconnect_io_qpair() asynchronous,
- spdk_nvme_qpair_process_completions() returns -ENXIO only if the
qpair is actually disconnected.
Even at this patch, spdk_nvme_poll_group_process_completions() invokes
disconnected_qpair_cb only if a qpair is actually disconnected. This
behavior will be maintained.
To use the upcoming asynchronous qpair disconnect easily, when
deleting a ctrlr_channel, disconnect the qpair, and then free the qpair
and release a reference to the poll group when the qpair is actually
disconnected.
We need to delete a nvme_qpair asynchronously after the corresponding
nvme_ctrlr_channel is deleted and defer the deletion of the corresponding
nvme_ctrlr until the nvme_qpair is deleted. To satisfy this requirement,
utilize the reference count of the nvme_ctrlr.
disconnected_qpair_cb() may call spdk_nvme_ctrlr_free_io_qpair() and
spdk_io_device_unregister() successively. The spdk_io_device_unregister()
will execute spdk_nvme_detach_async() from its callback.
spdk_nvme_ctrlr_free_io_qpair() has to complete earlier than
spdk_nvme_detach_async() starts. spdk_nvme_ctrlr_free_io_qpair() is
executed after unwinding stack. spdk_nvme_detach_async() is executed
after sending a message. Sending message is later than unwinding stack.
Hence the requirement is satisfied naturally.
spdk_io_device_unregister() for the nvme_ctrlr is required to be called
on the nvme_ctrlr->thread. To satisfy this requirement, redirect
nvme_ctrlr_unregister() to the nvme_ctrlr->thread. This change is too
small to stand as an independent patch. So include the change in this
patch.
Change-Id: Id8c01966c40b1dae9c4ef17f1b0b3f60a0bd17d5
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10765
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This is another preparation to disconnect qpair asynchronously.
Add nvme_qpair object and move the qpair and poll_group pointers and
the io_path_list list from nvme_ctrlr_channel to nvme_qpair. nvme_qpair
is allocated dynamically when creating nvme_ctrlr_channel, and
nvme_ctrlr_channel points to nvme_qpair.
We want to keep the times of references at I/O path. Change nvme_io_path
to point nvme_qpair instead of nvme_ctrlr_channel, and add
nvme_ctrlr_channel pointer to nvme_qpair.
nvme_ctrlr_channel may be freed earlier than nvme_qpair. nvme_poll_group
lists nvme_qpair instead of nvme_ctrlr_channel and nvme_qpair has a
pointer to nvme_ctrlr.
By using the nvme_ctrlr pointer of the nvme_qpair, a helper function
nvme_ctrlr_channel_get_ctrlr() is not necessary any more. Remove it.
Change-Id: Ib3f579d3441f31b9db7d3844ec56c49e2bb53a5d
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11832
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
The following patches will have the following changes.
Add nvme_qpair object and move qpair and poll_group pointers and
the io_path_list list from nvme_ctrlr_channel to nvme_qpair. nvme_qpair
is allocated dynamically when creating nvme_ctrlr_channel, and
nvme_ctrlr_channel points to nvme_qpair.
qpair is disconnected asynchronously and nvme_ctrlr_channel is
deleted asynchronously.
To make the following patches simpler, refactor two functions,
bdev_nvme_create_ctrlr_channel_cb() and
bdev_nvme_destroy_ctrlr_channel_cb(). The details are as follows.
Factor out nvme_qpair_create() from bdev_nvme_create_ctrlr_channel_cb()
and factor out nvme_qpair_delete() from
bdev_nvme_destroy_ctrlr_channel_cb(). Then reorder a few operation
in these.
Additionally, reorder a operation in _bdev_nvme_add_io_path().
Change-Id: Idf0328fa77a54f40fe52ca72c3842dde82d55972
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11831
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
In the following patches, spdk_nvme_ctrlr_disconnect_io_qpair() will
be changed to be asynchronous, spdk_nvme_ctrlr_disconnect_io_qpair()
will be called first and then spdk_nvme_ctrlr_free_io_qpair() after
the qpair is actually disconnected.
We will not be able to keep the current bdev_nvme_destroy_qpair()
function.
As a preparation, inline bdev_nvme_destroy_qpair() and remove it.
Additionally, this patch has the following changes.
Previously I/O qpair was freed and then I/O path caches were cleared.
Both are SPDK thread local. So there is no dependency for the ordering
of these two operations. However, it will reduce the size of the
following patches if we clear I/O path caches before freeing I/O qpair
when the qpair is disconnected. Hence we clear I/O path caches and then
free I/O qpair.
Remove DTRACE for bdev_nvme_destroy_qpair() for now.
It will be restored in the following patches.
Furthermore, fix potential NULL pointer acces in
bdev_nvme_create_qpair().
Change-Id: I0ab78ccb0d240e56b95b53179341afcd909a31f6
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10746
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Add three options for I/O error resiliency to spdk_nvme_bdev_opts.
Then the RPC bdev_nvme_set_options can configure these.
These can be overridden if these are given by the RPC bdev_nvme_attach_controller.
Change-Id: If3ee23aeef8b7585fe0fb5ec4695df5866fc1e74
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11830
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This is not in the fast path, so using INFOLOG
instead of DEBUGLOG allows these messages to be
enabled in release builds.
While here, set this flag in the discovery.sh
test script so that we get better information if
there are test failures.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I1c0d087b5c0cb40118691f4a1bc16adc2fdaad9c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11932
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
If the req's cb_fn will close the socket, there is heap-use-after-free
error if continuing to access sock.
Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Change-Id: I88c6adb9d25e52d94b08f53e8ccac611c4d29fff
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11855
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Currently shadow doorbell updates are not counted; add statistics for
those, and rename the other statistic for clarity.
Signed-off-by: John Levon <john.levon@nutanix.com>
Change-Id: I211a77902e38265c99b15862034c6d022dc582a0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11844
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Change dynamic scheduler parameters from #define to
global variables.
Change-Id: I5bbbf40ac66971bcc24fc8bf0ac5d13efdc7412f
Signed-off-by: Krzysztof Karas <krzysztof.karas@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11447
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
The following patches will enable us to specify I/O error resiliency
options per nvme_ctrlr as global options. To do it easier, move
per controller options about I/O error resiliency into struct nvme_ctrlr_opts.
prchk_flags is not exactly for resiliency but move it into struct
nvme_ctrlr_opts too.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I85fd1738bb6e293cd804b086ade82274485f213d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11829
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The following patches will add options per struct nvme_ctrlr in the
NVMe bdev module. bdev_opts will be used for it.
Additionally, fabrics_connect_timeout_us is set directly to
spdk_nvme_ctrlr_opts. So remove it from the RPC request structure.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I981cda5e69375edc43a8581cd3b43497c38a3d56
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11827
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
If 0 - UINT32_MAX or UINT32_MAX - 0 is substituted into a int variable,
we cannot get any expected result.
Fix the bug and add unit test case to verify the fix.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ib045273238753e16755328805b38569909c8b83a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11836
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
In vbdev_crypto_init_crypto_drivers() when g_session_mp init failed it
was possible to jump to cleanup label but return 0 instead of -ENOMEM.
Signed-off-by: Yuriy Umanets <yumanets@nvidia.com>
Change-Id: I128968699b0d2dbb2f769ac5fd7bd53ab409562b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11659
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
_crypto_operation_complete(bdev_io) should not be called in
_crypto_operation() because it is done by caller function
on read or write.
Signed-off-by: Yuriy Umanets <yumanets@nvidia.com>
Change-Id: Ie03412c72f41abf661b069d4b00eaf74f40261d6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11629
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
- Fixed missed spdk_bdev_module_release_bdev() during error handling.
- Fill the keys with zeros before releasing memory.
- Fixed issue with g_number_of_claimed_volumes that can become negative
because of invalid error handling.
Signed-off-by: Yuriy Umanets <yumanets@nvidia.com>
Change-Id: I4171f4326d87b1d8f886416bf53b0f2043ccbfe7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11628
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
This will be helpful in later patches, when we handle
detach not just at discovery service stop, but also
when a discovery controller is disconnected.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ie62d62f73b328c6e058f6480c61fbdf91e854e2a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11767
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
If the path to a subsystem changes from one discovery
log to the next, we should add the new paths first,
and only then remove paths. This ensures we don't
remove the last path to a subsystem, causing associated
bdevs to get unregisterd and reregistered.
This requires adding a new log_page member to
discovery_ctx, since we now need to walk the log
page to find removed paths after all the new paths
are attached.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I99fc2e40e6f7e2e26d558ebe7bc5208fe474c0ea
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11766
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Specifically when a compress bdev already exists on the supplied
base. Before this you'd get a bunch of nasty messages providing
really no clue as to what was wrong.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I8cce8902909659fba0e9613891c7ef8ebe4b06d0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11806
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Plumbing for flags was added in prior pathces. This patch
introduces and respects the relevant flags for use with PMEM
aka durable memory through the accel_fw, IDXD, IOAT and SW
modules.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I792f31459e061d220965feced60e0c236d819a68
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9455
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This patch is just plumbing the flags param. Use of it for PMEM
will come in upcoming patches.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I620df072aaad3f8062a0312bbea3da1bc3f911b9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9281
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Previously required flags were hardcoded in the low level library.
By having the user pass them in there is more flexbility and control.
This was driven by the need to add a new flag for pmem durability,
coming in a future patch in this series.
There is no change in functionality with this patch, just movement
of where flags are set and by whom and the plumbing of 'flags'..
Also note that some flags in scenarios that we know are required are
still set by the library.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I194278f9e3cec0886628585cf84bcc2eae635e0a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9449
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
It is possible that physical address returned from spdk_vtophys() will
lie on the page boundary for the mbuf size we want. In this case we have
to allocate one more mbuf and setup its chaining with the original mbuf.
This holds true for src and dst mbufs, though reproduced only for dst.
Signed-off-by: Yuriy Umanets <yumanets@nvidia.com>
Change-Id: Ibf82a97fac2ee0217a906a7c6f8558bdc2eedda2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11626
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
If re-enqueue of pending crypto ops failed in crypto_dev_poller()
and DPDK reports errors then stop re-enqueue, remove the ops from
the re-submit queue and fail the IO.
Signed-off-by: Yuriy Umanets <yumanets@nvidia.com>
Change-Id: I258f7b8986f35fa70e4af25bc8ad2b3b26aa206b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11625
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
- Continue init of the other crypto devices (mlx5) after failure of
rte_vdev_init(AESNI_MB) in vbdev_crypto_init_crypto_drivers(). It
simply may not be enabled in DPDK because it requires IPSec_MB>=1.0
installed in the system. Reproduces with --with-dpdk=dpdk/install
option used, when the target DPDK is built without control of IPSec
version from the SPDK side.
- Updated crypto_ut to test the new behavior of error handling from
rte_vdev_init(AESNI_MB) in vbdev_crypto_init_crypto_drivers().
Signed-off-by: Yuriy Umanets <yumanets@nvidia.com>
Change-Id: Icd4db8877afe87db8166c40d6e7b414cd43c9c25
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11624
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
- Switched to using rte_mempool for mbufs instead of spdk_mempool. This
allows using rte pkt_mbuf API that properly handles mbuf fields we need
for mlx5 and we don't have to do it manually when sending crypto ops.
- Using rte_mempool *g_mbuf_mp in vbdev crypto ut and added the mocking
API code.
- crypto_ut update to follow pkt_mbuf API rules.
Signed-off-by: Yuriy Umanets <yumanets@nvidia.com>
Change-Id: Ia5576c672ac2eebb260bfdbb528ddb9edcd8f036
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11623
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
- Properly rte_cryptodev_stop() and rte_cryptodev_close() device on
errors in create_vbdev_dev().
- Check for device id before removing its qp from the qp list.
- Maintain correct g_qat_total_qp counter if qat qp is removed on
errors.
Signed-off-by: Yuriy Umanets <yumanets@nvidia.com>
Change-Id: I088d7057eebff89ff0d995adcc2a05c724c3323b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11622
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
- Fixed bug in vbdev_crypto_config_json(). crypto_bdev->key was used
for "key2" json field.
- Fixed bug in vbdev_crypto_dump_info_json(). crypto_bdev->key was used
for "key2" json field.
Signed-off-by: Yuriy Umanets <yumanets@nvidia.com>
Change-Id: Iac441bc30b03234c96d646db14ee36ad56a546dc
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11621
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
in bdev subsystem, if any of the bdev module fails to initialize in
bdev_modules_init(), this function just stops immediately. in general,
the non-zero rc is returned to the callback func passed to spdk_subsystem_init().
if spdk app is used for building the spdk application, it's very
likely that app_start_rpc() is used as this very callback func.
in this case, app_start_rpc() would just pass the `rc` to spdk_app_stop()
which tears down all subsystems one after another.
bdev tears itself down by calling all its modules' module_fini(),
including those whose .module_init never gets called. the problem is,
if a bdev module marks its `.async_fini` true, and it calls
spdk_bdev_module_fini_done() only if spdk_io_device_unregister(),
then a bdev module which fails to initialize would leave us an spdk
application hanging in the air.
a typical logging message sequence looks like:
[2022-02-27 20:47:13.766578] bdev.c:1438:spdk_bdev_initialize: *ERROR*: bdev modules init failed
[2022-02-27 20:47:13.766622] subsystem.c: 169:spdk_subsystem_init_next: *ERROR*: Init subsystem bdev failed
[2022-02-27 20:47:13.766638] app.c: 691:spdk_app_stop: *WARNING*: spdk_app_stop'd on non-zero
[2022-02-27 20:47:13.766658] thread.c:2050:spdk_io_device_unregister: *ERROR*: io_device 0x10d3c30 not found
this is exactly the case we could run into if a bdev module fails to
initialize and bdev_null is unable to call spdk_bdev_module_fini_done()
when being teared down, because spdk_io_device_unregister() just refuses
to call the callback if the I/O device is never registered.
since `g_null_read_buf` is set in bdev_null_initialize(), in this change,
this pointer is checked for zero before calling spdk_io_device_unregister(),
if it is NULL, spdk_bdev_module_fini_done() is called directly instead
of calling spdk_io_device_unregister(). this helps to address the
hanging issue.
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Change-Id: I3a41fcd2f1c986e416dacecd5ca352dfd1e379b7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11750
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
- Reduce the size of initial memory needed by OCF.
Number of allocator buffers equal to 16383 is tested to work
on 24 caches running IO of io_size=512 and io_depth=512, which
should be more than enough for any real life scenario.
This reduces initial OCF memory usage from 726 MiB to 392 MiB.
- Fix string handling for the name of the mempool.
Signed-off-by: Rafal Stefanowski <rafal.stefanowski@intel.com>
Change-Id: I40063ab1897c479c25904ae4096c5dae3351f73b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10843
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
These macros are used to prefix the following to
any discovery-related DEBUGLOG or ERRLOG:
Discovery[127.0.0.1:8009]
Inside the brackets are the traddr and trsvcid of
the discovery service associated with that message.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ib1991a13f550bb8c9aaf1194a56b218cbd71c96c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11733
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
This is useful for adding trid details to discovery
related log messages in a later patch.
Future patches will update this trid if the
current discovery ctrlr fails and we need to fail
over to a different path to the discovery subsystem.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I51712bab2d891ae9c683f8716b4228741f64e7db
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11732
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
For now, just allocate entries and put them on a new TAILQ on
the discovery_ctx. Future patches will use these to try
to reattach to the discovery subsystem if the current discovery
ctrlr fails.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I3cd841df2260bbe8a497bbbf36dea4a1081f25c0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11731
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
It will be referenced in a second location in
an upcoming patch, so move its definition now to
reduce the size of that patch and avoid a forward
declaration.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iae12cc613190c03f0d48d71475df98384f8e47c7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11730
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
This name better describes the purpose of this structure.
Currently it is used to represent discovery log page entries
for NVM subsystems found by the discovery service. Upcoming
patches will also use this structure to represent discovery
log page entries for the discovery subsystem.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I84996c9968200c50c32427f0233cb707cdc2d54c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11547
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
For now, if the discovery service finds a discovery subsystem,
don't connect to it. Support for nested discovery controllers
will be coming soon, but for now we need to make sure we don't
try to connect to a discovery subsystem as if it was an NVM
subsystem.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I00234718b0e39eda6e1cb1b1150a4fadcf6d8b11
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11546
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
This is a bug fix. free() was called to the object allocated by
spdk_malloc().
Hence
free(): invalid pointer: 0x00002000146ece00
was printed.
This was found during multipath testing.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Icf6aa6dcdda728fef91b3acad7a1f1ee219c27af
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11710
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
A common pattern is:
if (foo->thread == spdk_get_thread())
cb(arg);
else
spdk_thread_send_msg(foo->thread, cb, arg);
for cases where it's important the callback runs on a particular thread,
but it doesn't matter if it's synchronous or asynchronous.
Add a new API to support this pattern, and convert over the current
instances.
Signed-off-by: John Levon <john.levon@nutanix.com>
Change-Id: Idfbf77c02c9321c52e07181ffd8b0c437e1ab335
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11503
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
structure io_event defined in aio_abi.h has
res member with type __s64 which is typically
mapped to long long int.
When we print error message, res member can be
treated as an error code.
In the following error message:
failed to complete aio: requested len is 4096, but
completed len is 18446744073709551611
the last digit in int representation is -5 which is -EIO
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reported-by: Anil <aniruddha080699@gmail.com>
Change-Id: I33b98d2118bbc9cace2d9da7cf9cd9bd06d784e6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11453
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
apply_firmware_complete bug fix: after each
firmware image download command finished,
apply_firmware_complete is called and issue the next
firmware image download command, and get another bdev_io.
After last command, apply_firmware_complete_reset
only release the last bdev_io, and all the ios in previous
commands are not release.
So after rpc_bdev_nvme_apply_firmware cycling,
the io pool will be used up and cause assert.
Signed-off-by: Gu, Zhimin <kookoo.gu@intel.com>
Change-Id: Icb1c722d85b1985521e5f25031ae70557b7ba84a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11586
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This parameter was ignored, and was a parameter to the
nvmf_set_config RPC.
For reference, this was deprecated in June 2020, commit
c37cf9fb.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I013f4d7cf874e7e26a8a1d299fdf9d8fa05da580
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11544
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
In a test case, test/nvmf/host/failover.sh, we got ANA error even if
the target did not enable ANA reporting.
We marked the corresponding namespace as ANA state updating but we had
no way to clear it.
Check if we can read ANA log page before setting the flag.
If read ANA log page failed, disable ANA feature until the nvme_ctrlr
is created again. In this operation, all ana_state_updating flags are
cleared.
Fixes#2335
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I4e2608a35d9dfa0395ad74fceebae9faf8cd973c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11399
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
There is io error when running NVMe over TCP fio test with uring
socket. It's easy to reproduce the bug with the following
configuration:
target 1 core, 16NVMe SSD, 2 initiators each connects to 8 NVMe
namespaces, each runs fio with numjobs=3.
For if in each round, we inset the sock to the head of the
pending_recv list, and then get max_events socks from head of the
list to process, there is possibility that some socks are always
not processed.
Although there was a strategy to cycle the pending_recv list to make
sure we poll things not in the same order. Such as a list: A B C D E F,
if max_events is 3, then this strategy makes the list is rearranged to
D E F A B C. But it will make this strategy not effective if using
TAILQ_INSERT_HEAD(&group->pending_recv, sock...).
Using TAILQ_INSERT_TAIL(&group->pending_recv, sock...) can fix it.
Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Change-Id: I8429b8eee29a9f9f820ad291d1b65ce2c2be22ea
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11154
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
To allow SO_MINOR updates on LTS for the whole year it is supported,
the major version for all components needs to be increased.
This is to prevent scenario where two versions exists with matching
versions, but conflicting ABI.
Ex. Next SPDK release adds an API call increasing the minor version,
then LTS needs just a subset of those additions.
Increasing major so version after LTS, allows the future releases
to update versions as needed. Yet allowing LTS to increase minor
version separately.
Disabled test for increasing SO version without ABI change, as
that is goal of this patch. This check shall be removed with SPDK 22.05
release.
This patch:
- increases SO_VER by 1 for all components
- resets SO_MINOR to 0 for all components
- removes suppressions for ABI tests
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Id1a5358882dc496faa5b0b5c9a63b326c378c551
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11361
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Provides RPCs for the qpair error injection APIs to bdev_nvme.
These RPCs are useful in testing NVMeoF/NVMe behavior for various
error scenarios in production.
Signed-off-by: Michael Haeuptle <michael.haeuptle@hpe.com>
Change-Id: I0db7995d7a712d4f8a60e643d564faa6908c3a55
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10992
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
It may take a long time to detect network transport error
when e.g. port is removed on remote target. This timeout
depends on 2 parameters - retry_count and ack_timeout.
bdev_nvme_set_options supports configuration of retry_count
but transport_ack_timeout is missed. Note: this parameter
is used by RDMA transport only.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I7c3090dc8e4078f64d444e2392a9e0a6ecdc31c0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11175
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: <tanl12@chinatelecom.cn>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
The nvmf subsystem cannot control which core its
threads get scheduled on. Even in the normal, default
case, the app thread has already been scheduled on the
first core, so the first nvmf thread will get
scheduled on the second core, etc.
So instead, always use a 0-based index for the names
of the nvmf threads.
Reported-by: Jacek Kalwas <jacek.kalwas@intel.com>
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I8a0f161860b985f36920845de28b39dbae9fdca5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11351
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jacek Kalwas <jacek.kalwas@intel.com>
This patch aligns namespace comparison with Linux kernel
implementation:
- UUID is optional and may be NULL
- command set (CSI) should be the same
Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com>
Change-Id: I8f889989f24cd51b104057217f87eb303b30fa68
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11312
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Add more dtrace probes to help with identifying issues
in production.
Change-Id: I8fb621a15c5e33ae94d75b4fc31135e2635dcfce
Signed-off-by: Krzysztof Karas <krzysztof.karas@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10561
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This is a workaround for #2338.
Ideally the fix should remove this define and use number of cores
from the application.
With large number of QAT devices following error can be obsered:
compdev_isal_create():
ISA-L library version used: 2.30.0
vbdev_compress.c: 358:vbdev_init_compress_drivers: *NOTICE*: created virtual PMD compress_isal
EAL: memzone_reserve_aligned_thread_unsafe(): Number of requested memzone segments exceeds RTE_MAX_MEMZONE
RING: Cannot reserve memory
isal_comp_pmd_qp_setup(): Failed to create unique name for isal compression device
vbdev_compress.c: 268:create_compress_dev: *NOTICE*: FYI failed to setup a queue pair on compressdev 48 with error 4294967295 so limiting to 84 qpairs
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I689ab6bda991e3864da9f4135f57849e3c0c3986
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11179
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Historically only QAT_SYM devices for crypto
were supported. The DPDK submodule explicitly
disabled its compilation.
For details please see:
https://review.spdk.io/gerrit/c/spdk/dpdk/+/9217
Starting with DPDK 21.11 QAT_SYM and QAT_ASYM were
merged together, so it is no longer possible to
disable it QAT_ASYM as it was before.
As vbdev_crypto didn't make use of it,
this driver is now skipped in preparation for
update to DPDK 21.11.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Ib606a4b450cd224d96bc21a64384297b2182967c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11178
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
When nvmf_tgt application shuts down, it stops all
subsystems, than destroyes poll groups and than
destroyes nvmf_tgt. Part of nvmf_tgt destruction is
destruction of subsystems and this process may require
cross thread communication but since poll groups and
threads are already destroyed, we may get segfaults.
One possible solution is to change the order and destroy
nvmf_tgt before destroying poll groups but it doesn't
work since nvmf_tgt is registered as io_device and
poll groups have its channel, so it can't be destroyed
while poll groups exist.
This patch adds a new state to nvmf_tgt state machine
which destroys all subsystems before destroying poll
groups and nvmf_tgt. It guarantees that all threads
exist when subsystems are destroyed.
Also rename state NVMF_TGT_FINI_FREE_RESOURCES to
NVMF_TGT_FINI_DESTROY_TARGET, the new name better
reflects the purpose of this state.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I08971d78cc9ad70d43cd43c346fd74d35c8bda60
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9668
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
From SAM-4, section 5.13 (Sense Data);
“When a command terminates with a CHECK CONDITION status, sense data shall be returned
in the same I_T_L_Q nexus transaction (see 3.1.50) as the CHECK CONDITION status. After
the sense data is returned, it shall be cleared except when it is associated with a unit
attention condition and the UA_INTLCK_CTRL field in the Control mode page (see SPC-4)
contains 10b or 11b.”
SPDK does not set UA_INTLCK_CTRL to 10b or 11b, so we set the unit attention condition
immediately against a single IO or Admin IO after reporting it via a CHECK CONDITION.
Once the failed IO received at iSCSI initiator side, it will be retried. In the case of
resize operation, if there is no IO from iSCSI initiator side, the unit attention
condition will be delayed to report until the first IO is received at the iSCSI target
side.
Meanwhile, we clear the resizing (newly added) flag on our SCSI LUN structure after
first time we report the resize unit attention condition.
The kernel initiator won’t actually resize the corresponding block device automatically.
It will report a uevent, and then you can set up udev rules to trigger a rescan. SPDK
iSCSI initiator will automatically report the LUN size change.
Change-Id: Ifc85b8d4d3fbea13e76fb5d1faf1ac6c8f662e6c
Signed-off-by: GangCao <gang.cao@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11086
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
We can just queue things up until we get -EBUSY and not track the queue
depth.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I49d3bcae0e6705a322de54fa91c9e1c6dfaea0c2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11028
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Now that we have a much more robust retry framework,
set the default bdev_retry_count to 3. Users can
still override this default with the bdev_nvme_set_options
RPC as before. This ensures that by default, we will
retry I/O when possible.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I045bf4969d02be32b951e72a148ce6b6e251dec1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11107
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
We can do all of the configuration in spdk_idxd_get_channel, and the
configuration step was always done immediately after getting the channel
anyway.
Change-Id: I9fef342e393261f0db6308cd5be4f49720420aa0
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10349
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
The state of a nvme_ctrlr can be more fine grained than a boolean
and such state gives more information to end users for debug or
root cause analysis.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I3e2459f449e2dac73f04b155e38b696495f1a335
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10183
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
If ctrlr_loss_timeout_sec is set to -1, reconnect is tried repeatedly
indefinitely, and I/Os continue to be queued.
This patch adds another option fast_io_fail_timeout_sec, a flag
fast_io_fail_timedout to nvme_ctrlr.
If the time fast_io_fail_timeout_sec passed after starting reset,
set fast_io_fail_timedout to true not to use the path for I/O submission.
fast_io_fail_timeout_sec is initialized to zero as same as
ctrlr_loss_timeout_sec and reconnect_delay_sec.
The name of the parameter follows the famous DM-multipath, its fast_io_fail_tmo.
Change-Id: Ib870cf8e2fd29300c47f1df69617776f4e67bd8c
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10301
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Previously reconnect retry was not controlled and was repeated indefinitely.
This patch adds two options, ctrlr_loss_timeout_sec and reconnect_delay_sec,
to nvme_ctrlr and add reset_start_tsc, reconnect_is_delayed, and
reconnect_delay_timer to nvme_ctrlr to control reconnect retry.
Both of ctrlr_loss_timeout_sec and reconnect_delay_sec are initialized to
zero. This means reconnect is not throttled as we did before this patch.
A few more changes are added.
Change nvme_io_path_is_failed() to return false if reset is throttled
even if nvme_ctrlr is reseting or is to be reconnected.
spdk_nvme_ctrlr_reconnect_poll_async() may continue returning -EAGAIN
infinitely. To check out such exceptional case, use ctrlr_loss_timeout_sec.
Not only ctrlr reset but also non-multipath ctrlr failover is controlled.
So we need to include path failover into ctrlr reconnect.
When the active path is removed and switched to one of the alternative paths,
if ctrlr reconnect is scheduled, connecting to the alternative path is left
to the scheduled reconnect.
If reset or reconnect ctrlr is failed and the retry is scheduled,
switch the active path to one of alternative paths.
Restore unit test cases removed in the previous patches.
Change-Id: Idec636c4eced39eb47ff4ef6fde72d6fd9fe4f85
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10128
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
This is a clean up as a preparation to the following patches.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ib8bc90e17f52086d4e887463e04f65273bb1079b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11068
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
We noticed the difference between the SPDK 21.10 and the latest master
in a test.
The simplified scenario is as follows:
1. Start SPDK NVMe-oF target
2. Run bdevperf for the target with -f parameter to suppress exit
on failure.
3. Kill the target after I/O started.
With the SPDK 21.10, bdevperf retries failed I/Os and exits after
the test time is over.
With the latest SPDK master, bdevperf hungs and does not exit even
after the test time is over.
The cause was as follows:
reset ctrlr is repeated very quickly (once per 10ms by default) and hence
I/Os were queued infinitely because nvme_io_path_is_failed() returned
false if nvme_ctrlr is resetting.
We should queue I/O when nvme_ctrlr is resetting only if reset is throttoled
and fail-fast for the repeated failures is supported.
Hence in this patch, fix the degradation and remove the related unit
test cases.
Reported-by: Evgeniy Kochetov <evgeniik@nvidia.com>
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I4047d42dc44488a05264c6a841d101a7c371358b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11062
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
config_param and config_file are not conflict to specify rados configurations,
support specify both of them is more reasonable. Therefore, After this patch,
users can choose the one from the three ways: config_param, config_file + key_file
or config_param + config_file + key_file.
Signed-off-by: Tan Long <tanl12@chinatelecom.cn>
Change-Id: Ide17af72c4965df1e6541f4f50d4fa5309865486
Signed-off-by: Tan Long <tanl12@chinatelecom.cn>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10679
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
In project practice, config_file and key_file are often used to connect
to a rados cluster, config_file includes "mon_host" and other rados
configurations like "rbd_cache", and key_file includes the secret key
and the access authority to each pool for current user. This patch adds
key_file option, user can specify config_file and key_file or only config_param
to connect rados cluster. This will make it much more flexible for users with
his/her convenience.
Signed-off-by: Tan Long <tanl12@chinatelecom.cn>
Change-Id: I6b49aad70b578bdeb3ac8ea9ca0fcbd931582025
Signed-off-by: Tan Long <tanl12@chinatelecom.cn>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10485
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
If acceleration tasks are exhausted, then we can exit
the submission loop earlier, also print number of IOVs
for each R/W request.
Change-Id: Ia98ed43b0bb2be229b7c0054f3ade0ad39337b09
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10836
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This uses a batch with the fence flag for now. There are several other
implementation options that will be explored in the future.
Change-Id: I4f344d671400508de05f80b026d42f775c5b9588
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10289
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Compare two scattered memory regions
Change-Id: I6ce5c9e7bc1ee1ef0e9173c00e86628d43a1e41f
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10287
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
This RPC will stop the specified discovery service,
including detaching from any controllers that were
attached as part of that discovery service.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I9222876457fc45e1acde680a7bd1925917c22308
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10832
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Requests that are completed immediately (i.e. those not using the accel
engine) are now queued and their completion is delayed to the completion
poller. It ensures that they're not completed from the context of a
submission, which gets rid of an spdk_thread_send_msg() call.
It significantly improves performance on some workloads. For instance,
4k zcopy reads (queue depth 128) on an malloc bdev exposed through
NVMe/TCP went from 204k IOPS to 485k IOPS.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: I196f55fc07d167f1ed117d2430e9c37f9d05f70d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10805
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The only thing these functions were doing was completing the IO, so it
could just be inlined.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: I5fbd9df763dd68953b1bda9c7752c57ef9ee5dd6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10804
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This poller is registered on each IO channel and can be used to schedule
asynchronous completion of a request. This can be especially useful for
requests that can be completed immediately. For now, nothing enqueues
the requests to be completed through this poller - this will be changed
in the following patch.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: If6b26541907bb46402fc0904216bff74dad57b88
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10803
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
It'll allow the malloc bdev to store per-thread data. For now, it's
only used to keep the pointer to the accel library's IO channel, more
fields will be added in subsequent patches.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: I604a38877ae8d6075b911f5a484d1793d4bc2ddb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10802
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This patch adds support for zero-copy operations in the delay bdev.
They use the same delay values as regular IO operations:
- (avg|p99)_read_latency for zcopy_start with populate=true,
- (avg|p99)_write_latency for zcopy_end with commit=true.
All other zcopy operations (e.g. zcopy_start with populate=false) are
not delayed.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: I8b32c1d99f9f2b36b16617122881ea95d02ecc87
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10798
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
If a path whose namespace is optimized is restored, the corresponding
I/O path cache should be cleared and the path should be chosen as the
optimal path.
This bug was found by a system test.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: Ibc3983dbff3418adb090a09df32c2a92a8910d05
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11004
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Rename a few functions for a full ctrlr reset sequence to
clarify what we do and make the following patches easier.
Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I051e3ab68c3cd77fd6040a2d069d50a700123ae6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10920
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We sort secondary trids to avoid using disconnected trids for failover.
However the sort had a bug.
This bug was found by running test/nvmf/host/multipath.sh in a loop.
Verify the fix by adding unit test.
Fixes#2300
Signed-off-by: Shuhei Matsumoto <shuheimatsumoto@gmail.com>
Change-Id: I22b0ede4d2ef98b786c3e0d1f5337a2d568ba56d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10921
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
This patch adds the framework for a discovery
service in the bdev/nvme module.
Users can specify an IP/port of a discovery service.
The bdev/nvme module will connect to a discovery
controller, get the discovery log page, and then
register for AERs. It will connect to each
subsystem specified in the initial log page.
AER completions will trigger fetching the log
page again, at which point new subsystems will
be connected to, or removed subsystems will be
detached.
This patch does the following:
* Adds the new start_discovery RPC
* Connects to the discovery controller
* Gets the discovery log page
* Registers for AERs
* Detach from discovery controllers at shutdown
Subsequent patches in this series will:
* Connect to subsystems listed in discovery log page
* Detach from subsystems that were listed in earlier
discovery log pages but subsequently removed
* Add a stop_discovery RPC
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I54bfa896a48c5619676f156b5ea9f2d1f886c72f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10694
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Replace total_busy/idle_tsc by current ones for
decision making when tuning frequency.
Change-Id: I89524a9febfa963b14c3120433e5aff9de2a28dd
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10342
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
If qpair creation failed, ctrlr_ch remains in group->ctrlr_ch_list
but memory for ctrlr_ch is freed. Next attempt to get ctrlr's io
channel will modify data in already freed memory and may corrupt
another allocation.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I85002f2e6ac86a0ffda6dabfa57e79b59074fb5a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10840
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuheimatsumoto@gmail.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
It is possible that the application calls
get_io_channel during nvme controller reset.
In that case IO qpair won't be created and the
application will get a NULL pointer.
It is possible to repeat get_io_channel later but
there is no such indiciation for the application,
so it can't distinguish between a real failure and
"try again" case during controller reset.
This patch ignores IO qpair creation error if
controller is resetting. IO qpair will be created
when reset completes.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Id39202f5a6878453ff54e35df91d5dc49a5f046a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10828
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuheimatsumoto@gmail.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
We will want to use bdev_nvme_create() to attach to
controllers identified through discovery. In this case,
we won't be reporting bdev names back to an RPC caller,
so there's no need to allocate an array of names to be
filled out since they won't be used.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ia386d034df2c2d5a60f9aa18338ba415ec03d763
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10689
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
We will need to add another step in the fini path for
stopping discovery pollers, so this patch prepares for
that.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ifecbbac60262f3aae7f7a7ced09b7a600df7c2e8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10590
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
One of the objects wasn't enclosed with spdk_json_write_object_end(),
causing the resulting configuration to be broken.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: Ib0311e002e43d4ad01c61feb6af54cb4212b477b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10755
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Not every transport requires accept poller - transport specific
layer can have its own policy and way of handling new connection.
APIs to notify generic layer are already in place
- spdk_nvmf_poll_group_add
- spdk_nvmf_tgt_new_qpair
Having accept poller removed should simplify interrupt mode impl
in transport specific layer.
Fixes issue #1876
Change-Id: Ia6cac0c2da67a298e88956734c50fb6e6b7521f1
Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7268
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Batching will be made available for DSA specifically through the new
idxd_perf tool.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: Ic51d9ad3692074805b1ffa705cea8be35737c778
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9846
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We set cb_ctx to NULL when calling spdk_nvme_probe_async(). It looks
that nvme_probe_ctx has not been used anywhere for a long time.
nvme_probe_ctx is not public data structure.
Remove nvme_probe_ctx to simplify the code and make the following
patches easier.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I7dd5f970a7fde1c9c189fae3c8f28f84d7aed991
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10554
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
This refactoring will be helpful for the following patches to unify ctrlr
reset and failover and failover trid also when reconnecting ctrlr.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I4623a5dd310ac7516c270ccd3b0541c27cc880d8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10443
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
The failover_in_progress flag is used to decide the return value of
bdev_nvme_failover().
bdev_nvme_delete() calls bdev_nvme_failover() with remove=true to remove
nvme_ctrlr->active_path_id. However bdev_nvme_failover() returns zero
if nvme_ctrlr->failover_in_progress is true. bdev_nvme_failover() may
return zero even if it does not remove nvme_ctrlr->active_path_id.
The following will be better.
bdev_nvme_failover() returns -EBUSY if nvme_ctrlr->resetting is true,
and the caller repeats calling bdev_nvme_failover() until the target trid
becomes alternative path or bdev_nvme_failover() returns zero.
To do that, the failover_in_progress flag is not necessary any more.
Removing the failover_in_progress will also simplify the following
patches to unify ctrlr reset and failover.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I57ab944beb1d06ea4def144c81c69705860de35f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10441
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Checking if nvme_ctrlr can be unregistered is not so simple and
a few changes will be added. So factoring out the check into a
helper function will be valuable.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I310c7e3ad2dae9583df4db575d342c2cb111f3f3
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10461
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
When a I/O or admin passthrough failed, if the corresponding nvme_ctrlr
is not available, we should failover to another path.
When no path was found, if there is at least one nvme_ctrlr which is
not failed, we should wait until it is recovered.
We should improve error recovery not only for multipath (multipath is
"multipath") but also for failover (multipath is omitted or "failover").
To do this easily, clarify the conditions of availability and failure of
nvme_ctrlr and realize them by helper functions.
Use new helper functions for other cases to improve readability too.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I716731f72811d2ec4dfc91f9eadb191d75739af6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10381
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
We had not checked the bit 0 of the Namespace Multipath I/O and
Namespace Sharing Capabilities (NMIC) field in the Identify Namespace
data structure.
If the bit 0 of the NMIC is zero, it is likely that namespaces are not
identical.
We should check if the value of the NMIC first, and do it in this patch.
Additionally, it is not usual if the bit 0 of the CMIC and the bit 0 of
the NMIC do not match. So in unit tests rename the parameter multi_ctrlr
by multipath for ut_attach_ctrlr() and use it for the value of the NMIC.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I6aa7cbcc99be2507dbf18930f7b585a9ea7d0f90
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10380
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
bdev_nvme_reset() deletes all qpairs, reset a ctrlr, and then create
all qpairs. Any qpair may fail to be created, and then the reset
request may fail. However, already created qpairs were left.
Let's delete the already created qpairs and then fail the reset request.
This will make us easier to control reconnect, deley reconnect by
a few seconds, or stop reconnect after repeated failures and then
delete ctrlr.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I414e2281b4bf0cbd1cf461d8fc64a22f43d26d13
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9896
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Replace the spdk_nvme_ctrlr_reset_async() and spdk_nvme_reset_poll_async()
calls by the spdk_nvme_ctrlr_disconnect(), spdk_nvme_ctrlr_reconnect_async(),
and spdk_nvme_ctrlr_reconnect_poll_async() calls in a reset ctrlr
sequence.
spdk_nvme_ctrlr_disconnect() can fail if ctrlr is already resetting or
removed. But both cases are not possible. reset is controlled and the callback
to the hot remove is called when the ctrlr is hot removed. So we assume
spdk_nvme_ctrlr_disconnect() always succeed.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I1299e198597b2a2110f80b9a868e2dae015682ee
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10092
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
spdk_for_each_channel() always passes status=0 to its completion callback
if each channel completes the requested function successfully.
bdev_nvme_reset_destroy_qpair() always succeeds.
Hence bdev_nvme_reset_ctrlr() does not have to check if the passed
status is not zero.
The following patches will aggregate multiple flags into a single
state for nvme_ctrlr. This change will simplify these.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I1c30c9b20c96886516029e69e90dc23d777a69b4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10077
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
In the following patches, we want to retry reconnect if reconnect failed
in a reset ctrlr sequence but we want to delay the retry. While
we wait the delayed retry, we want to quiesce ctrlr completely.
As part of quiesce ctrlr operations, we want to pause adminq poller but
we need to do it on the nvme_ctrlr->thread.
If a reset ctrlr sequence runs on the nvme_ctrlr->thread, we can avoid
redirecting the pending destruct request at completion too.
So we redirect the reset ctrlr sequence into the nvme_ctrlr->thread.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I538b962e2a7b5cf00ebbac2a1e888482ddeeee61
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10075
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
For IPV6, cm->cmsg_level is SOL_IPV6 and cm->cmsg_type is
IPV6_RECVERR. However these combination was not included.
To clarify the fix check if positive conditions are satisfied and
then reverse the result.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I675f4337f383d3526fed1b86794697f41113ed4c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10428
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Incorrect decode function used for the param "config_file" in
rpc_bdev_rbd_register_cluster
Signed-off-by: Tan Long <tanl12@chinatelecom.cn>
Change-Id: I6286c5d0d8396a1b548095975924087ba4ee92d2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10444
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
In the following patches, bdev_nvme_reset() will execute the reset ctrlr
operation on the nvme_ctrlr->thread until completion as bdev_nvme_admin_passthru()
does. Hence change the callback bdev_nvme_reset_io_continue() to
redirect to the orig_thread by using bio. Furthermore, use bio->cpl.cdw0
to store the completion status of the reset processing. bdev_nvme_reset()
does not use bio->cpl.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I361cc44494190ba83ad6e360788d78851416c46c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10074
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
In the following patches, bdev_nvme_reset() will execute the reset ctrlr
operation on the nvme_ctrlr->thread until completion as bdev_nvme_admin_passthru()
does. Hence change the callback rpc_bdev_nvme_reset_controller_cb
to redirect to the orig_thread by using a dynamically allocated context.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I8ee61857ac034024d00190875740a675ef1db8b0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10073
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch supports admin passthrough retry when we get any error
with DNR=0 but ABORTED_BY_REQUEST up to retry_count times.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I1bf29570791fdbe8651fa70c4c8685bb740fb86b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9944
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
When resetting ctrlr, adminq is disconnected first. If adminq is disconnected,
admin passthrough request is rejected with -ENXIO.
But resetting ctrlr may succeed. If resetting ctrlr succeeds, adminq is
connected again, and admin passthrough request will be
submitted successfully.
On the other hand, if ctrlr is failed, admin passthrough request is
rejected with -ENXIO. But when resetting ctrlr, ctrlr is set to unfailed.
Hence bdev_nvme_admin_passthru() skips any ctrlr which is resetting
or failed, and calls bdev_nvme_admin_passthru_complete() with -ENXIO
if no available ctrlr is found.
bdev_nvme_admin_passthru_complete() queues admin passthrough request
and retry it one second later if ctrlr is resetting.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ic748dc4faf29ebf717ae5c29dcf7c55fe2ea9243
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9942
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Allow to specify optimal IO boundary for
malloc bdev, it can be used to test split
of IO requests on generic bdev layer
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ic3529dc00cf852ea5cf40d0553d846a698fff6c7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10068
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Separate the admin passthrough case from bdev_nvme_io_complete_nvme_status()
into bdev_nvme_admin_passthru_complete_nvme_status() and from
bdev_nvme_io_complete() into bdev_nvme_admin_passthru_complete(),
respectively.
Then make the return type of bdev_nvme_admin_passthru() to void
by using bdev_nvme_admin_passthru_complete().
Besides, refactor bdev_nvme_admin_passthru() slightly.
These clean up make the following patches simpler.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I79b89ee1b6360aa6ac6fc3c03f0469be99b0c1f2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9899
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
The NVMe bdev module queues retried I/Os itself now.
bdev_nvme_abort() needs to check and abort the target I/O if it
is queued for retry.
This change will cover admin passthrough requests too because they
will be queued on the same thread as their callers and the public
API spdk_bdev_reset() requires to be submitted on the same thread
as the target I/O or admin passthrough requests.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: If37e8188bd3875805cef436437439220698124b9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9913
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
The completion status of spdk_bdev_abort() is SPDK_BDEV_IO_STATUS_SUCCESS
or SPDK_BDEV_IO_STATUS_FAILED if it is successfully submitted.
In the generic bdev layer, spdk_bdev_abort() does not update cdw0 but
just set SPDK_BDEV_IO_STATUS_SUCCESS or SPDK_BDEV_IO_STATUS_FAILED.
In the NVMe bdev module, for the abort request, spdk_bdev_io_complete()
is called instead of spdk_bdev_io_complete_nvme_status() and the
completion status is SPDK_BDEV_IO_STATUS_SUCCESS or
SPDK_BDEV_IO_STATUS_FAILED.
So let's skip updating cdw0 and call spdk_bdev_io_complete() directly
with SPDK_BDEV_IO_STATUS_SUCCESS or SPDK_BDEV_IO_STATUS_FAILED if
bdev_nvme_abort() does not find the target I/O in any ctrlr.
The next patch will fix spdk_bdev_io_get_nvme_status() for the abort
I/O.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I8fb5389cd27d7467cc6ae18e152bd5228f9437f7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9976
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
This patch enables each nvme_ctrlr_channel to access the underlying
nvme_bdev_channels. This change is used to maintain io_path cache
of nvme_bdev_channel.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I22cd3763da1642d4e68dee3a9273e9cc698a4ca8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9893
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
After multipath feature is supported, one bdev will have more than one
nvme ctrlr. Fore ease of view, display each ctrlr's trid info.
Moreover, rename nvme_bdev_ctrlr_get as nvme_bdev_ctrlr_get_by_name here
to keep consistent with nvme_ctrlr_get_by_name.
Signed-off-by: Kai Li <lik271@chinatelecom.cn>
Change-Id: I417506699bbea6ed13dac0fee942749757d2ae47
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10129
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Current code only print the last namespace of nvme bdev, fix the print
way to show all the namespace.
And this patch will be prepared for the next patch to show io path status for multipath, like: which one is the primary or the backup, and the old status and current status,etc.
Signed-off-by: tanlong <948985618@qq.com>
Change-Id: I4fca154df52c929b8d046198934db0e58586c378
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10140
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
This will allow runtime observation of the
dynamic scheduler in a release build.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I7436b5dbcaa0df1529f828ef75f4e9335a092893
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9672
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
If I/O got ANA error, ANA state may be out of date. So in this case
read ANA log page and update ANA states. Mark nvme_ns to be updating
to avoid using while updating ANA state.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ia43d38b3a589c84d6d0479dedcced033e76fb194
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9458
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
If an I/O failed by ANA error, the corresponding ANA state might be
out of date. In the following patches, for this case, read the latest
ANA log page and update the ANA state. Such reading ANA log page may be
done on multiple threads concurrently including AER ANA change.
Hence protect ANA log page by adding an new flag ana_log_page_updating
to struct nvme_ctrlr and using it.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I8bb84091d50a5fdc0d9893b585be972dfd31c0f1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9526
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This will enable us to add more flags without creating any extra hole.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I166e2bd3d116c8cebf75bfe4f290b390d9e3888e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9851
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Add bdev_retry_count to spdk_bdev_nvme_opts and retry_count to
nvme_bdev_io, respectively.
Set type of both to int because we want use -1 for infinite retry.
Set the default value of bdev_retry_count to zero for the backward
compatibility.
bdev_retry_count is configurable by the RPC bdev_nvme_set_options.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I9bc746fcea54aa8722c76f79c70c2ae2b375aa53
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9864
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
retry_count of struct spdk_bdev_nvme_opts controls the number of retries
in the transport layer, and is set to transport_retry_count of struct
spdk_nvme_ctrlr_opts.
The next patch will add bdev_retry_count to struct spdk_bdev_nvme_opts
to control the number of retries in the bdev layer.
For clarification, rename retry_count to transport_retry_count of
struct spdk_bdev_nvme_opts. Then deprecate the retry_count parameter
and add and use an new parameter transport_retry_count instead for
the RPC bdev_nvme_set_options.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I0689c54aa1c96ee99d24236e8ff1a594ad7208e4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9924
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
We can't call spdk_io_channel_iter_get_channel() in the
completion callback of spdk_for_each_channel(), the value
is always NULL.
Change-Id: I65bc972da8a7ab309f3cab438432196a59f26bd4
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9983
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
SPDK nvmf target reports all listeners on all subsystems
in discovery pages, kernel target reports only subsystems
listening on a port where discovery command is received.
NVMEoF specification allows to specify any addresses/
transport types. Ch 5: The set of Discovery Log entries should
include all applicable addresses on the same fabric as the
Discovery Service and may include addresses on other fabrics.
To align SPDK and kernel targets behaviour, add filtering
rules to allow flexible configuration of what should be
listed in discovery log page entries.
Fixes#2082
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ie981edebb29206793d3310940034dcbb22c52441
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9185
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
You can now detach specific paths based on the host parameters. This is
useful for two paths to the same target that use different local NICs.
Change-Id: I4858bfda7d940052ca77ffb0bbe764a688fb315d
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9827
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
This patch is to solve the issue that two nvmf target connect the same rbd image and used for multipath.
The scenario is host wants to access the same rbd image via two gateways, host and gateways are working as nvmf ini and tgt, and two gateways connect with the rbd image, io can switch to another gateway once one is broken. The targets of multipath must have the same uuid, so this patch add a new argument for bdev_rbd_create, like malloc dev.
Signed-off-by: tanlong <948985618@qq.com>
Change-Id: I593fedb6c5d94f625f1b331fdc40e2db488f7fb7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9935
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Use dedicated OCF API functions to set default parameters during
startup configuration instead of manual and incomplete
initialization.
Signed-off-by: Rafal Stefanowski <rafal.stefanowski@intel.com>
Change-Id: Ied1afa9c249a032451a266fd97ce09e6088a0f97
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9786
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The previous patch supported I/O retry when no available io_path
was found at submission.
This patch supports I/O retry when we get I/O path error at completion.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I93a1664944b15ab0a826a321e2ea7a2574263afe
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9850
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
If ANA state is inaccessible or qpair is disconnected, I/O cannot
be submitted.
But if qpair is connected, ANA state may become accessible, or if
qpair is disconnected, it may become connected via resetting.
Hence even if find_io_path() returned NULL, queue I/O and retry it
one second later if qpair is connected or ctrlr is resetting.
Sort retried I/Os by expiration values in ticks, and activate a timed
poller per nvme_bdev_channel only if there is any retried I/O. So
the poller function bdev_nvme_retry_ios() always returns BUSY because
if the poller runs earlier than the closest retried I/O or runs when
there is no retried I/O, it is more like a bug of the framework.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Id28110a0d63ebc1c5772814e2ff8a47934df1644
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9830
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
- remove metadata updater
- handle 'zero' flag in mempool allocator
- adapt ocf_mngt_cache_start() to new OCF API
Signed-off-by: Rafal Stefanowski <rafal.stefanowski@intel.com>
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Change-Id: I34afd856cc1306ffe305f71a445e7474c9b0a2d9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9129
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
If NN is very large this saves a lot of memory. This lookup is
not generally used in the I/O path anyway.
Change-Id: I98e190006843ad5d0bac8483bf9feb800d4a665a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9884
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
bdev_nvme_find_io_path() selects an io_path whose qpair is connected
and ANA state is optimized or non-optimized.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I79c978795562b606ee27aa43020684d8bcbf50c5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9405
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reset all controllers of a bdev controller sequentially. When resetting
a controller is completed, check if there is next controller, and
start resetting the controller.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I169a84b931c6b03b36bb971d73d5a05caabf8e65
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7274
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Previously the NVMe bdev module had completed the outstanding reset and
then canceled pending resets. This was complex.
On the other hand, the generic bdev layer cancels pending resets
and then completes the outstanding reset.
Following the generic bdev layer simplifies the code and makes us easier
to control retry reset, delay retry reset by a few seconds, or stop retry
after repeated failures and then delete ctrlr.
Update unit tests accordingly.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I9a68422918ebcb052b3a281316ffba9b3450ecd4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9816
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Previous patches (5363eb3c) tried to work around the
32-bit unmap and write_zeroes LBA counts by breaking
up larger operations into smaller chunks of max size
UINT32_MAX lba chunks.
But some SSDs may just ignore unmap operations that
are not aligned to full physical block boundaries -
and a UINT32_MAX lba unmap on a 512B logical /
4KiB physical SSD would not be aligned. If the SSD
decided to ignore the unmap/deallocate (which it is
allowed to do according to NVMe spec), we could end
up with not unmapping *any* blocks. Probably SSDs
should always be trying hard to unmap as many
blocks as possible, but let's not try to depend on
that in blobstore.
So one option would be to break them into chunks
close to UINT32_MAX which are still aligned to
4KiB boundaries. But the better fix is to just
change the unmap and write_zeroes APIs to take
64-bit arguments, and then we can avoid the
chunking altogether.
Fixes issue #2190.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I23998e493a764d466927c3520c7a8c7f943000a6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9737
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Restore the previous nice idea and unify canceling pending resets
into bdev_nvme_complete_pending_resets().
cb_arg of spdk_for_each_channel() was reserved for a different
purpose but it was gone.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I1c01051ae9d8eccc7fc3ec485dd344ff9f55087b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9815
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
The all setters of the reset_cb_fn use boolean as the result eventually.
Using boolean as the result earlier makes the code simpler and the
following patches easier.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I9ec4f670ed7a000a824ab7e261bab07a3dc21f43
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9814
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
bdev_nvme_admin_passthru() chooses the first ctrlr which is not failed.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: If41a1d1e1bde4bddfa92e5a385509daa3f0ce4de
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9525
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
bdev_nvme_admin_passthrough(), bdev_nvme_reset_io(), and
bdev_nvme_abort() do not use io_path. So simply fall through even
if the optimal io_path is not found for these, and clear the
cached io_path for these.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ib26fcbf99c95bbfb6e825c1b7c6455241c198d92
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9675
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Even if the NVMe bdev module supports I/O retry, it will not retry reset or abort.
For clarification, bdev_nvme_reset_io() and bdev_nvme_abort() call
bdev_nvme_io_complete() themselves for error cases.
For bdev_nvme_abort(), we do not need to differentiate error processing
among return values. Simply complete the request with failure.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Id5d51cbba47c6360a6177efd7d5f2e978c48ee9b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9674
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Even when I/O retry is supported, reset will not be retried. However,
bdev_nvme_io_complete() will process I/O retry. Hence inline
bdev_nvme_io_complete() into bdev_nvme_reset_io_complete() to exclude
reset from I/O retry. The result of reset is success or failure, so omit
the -ENOMEM case.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I667e74cbbac4a13cefb6896f898476ba48bcd0fa
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9687
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Save the io_path to which the current I/O is submitted into struct
nvme_bdev_io to know which io_path caused I/O error, or to reuse it
to avoid repeated traversal.
Besides, add a simple helper function nvme_io_path_is_available() to
check if the io_path can be reused.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Idadd3a3188d9b333b335ea79e20a244aa0ce0bdd
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9296
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
We have io_path structure now and returning io_path rather than
ns and qpair match the function name. The following patches will
cache the returned io_path into nvme_bdev_io.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I5d773da18591fc324667f6b5c489a38f497bf3d8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9295
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch removes the critical limitation that ctrlrs which are
aggregated need to have no namespace. After this patch, we can
add multiple namespaces into a single nvme_bdev.
The conditions that such namespaces satisfy are,
- they are in the same NVM subsystem,
- they are in different ctrlrs,
- they are identical.
Additionally, if we add one or more namespaces to an existing
nvme_bdev and there are active nvme_bdev_channels, the corresponding
I/O paths are added to these nvme_bdev_channels.
Even after this patch, ANA state is not utilized in I/O paths yet.
It will be done in the following patches.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I15db35451e640d4beb99b138a4762243bee0d0f4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8131
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Pointer 'ctx->req.multipath' returned from call to
function 'strdup' may be NULL. Reported by Klocwork.
Change-Id: Id4a188ec5340f02c9bd0643db0acb03409dd5829
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9843
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
_find_optimal_core was always consolidating idle threads to g_main_lcore.
Meanwhile for active threads lower lcore id were prioritized over the
higher ones.
So long as g_main_lcore can fit the active thread, it should be prioritized
over any other. Regardless of the lcore id.
Fix#2080
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I30b3a7353bcf243d4362b4db9dde5446c435d675
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9243
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
It can be much simpler when inlined.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: Ifad89d7b9557a45eee601fc2004066a0fd9289c0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9826
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Specifying only a transport id is not enough. We need to be able to
describe the host parameters too.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: Iadbea553aee4b38e7cacab0b486e7e5746d0d1ab
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9825
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This is the currently active path identifier in a failover scenario. The
path is defined by more than just the transport identifier, so fix the
name.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I682c6f4c54f75307e2615bf80e70358180d99fe2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9576
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This defines a unique path between a host and a target.
Change-Id: Ia3d24c1b34199a8b596aaf17900ca9694a9da77d
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9505
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The new path must differ in some way.
Change-Id: I98fd2b2cb3220b482efe0a19bfce94ec4e72bec2
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9418
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This parameter may have the values "disable" or "failover". The default
is failover to match existing behavior. In the future we expect to
change the default behavior to disable.
Further, we expect to add an "enable" option soon to do full multipath.
Change-Id: Iebbdc9b95f23101f18d64e085933463498e627be
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9343
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
We know that the librbd's read/write operations will be handled by a
non spdk thread, so we can get rid of the epoll based group based
polling and directly use the async completion. This makes the code
is simple and easy to maintain.
And we still need to keep the io_device registration for this module,
because the I/Os are async. We need the channel reference on "rbd_if"
in order to know which rbd disks are still active.
Change-Id: I1c140a4b286dbfe113ed2a67bd2875de605e8f24
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9335
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Prior a regular round robin could result in strange performance
if an idxd device from another socket was used.
Signed-off-by: peluse <peluse@localhost.localdomain>
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Id863c79067beabe73ef89d92b3fb3c436821b97a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9367
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Call rte_cryptodev_close() to free qpair memory instead of using
an internal function.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I1bd7f0dd86de83f278f6be3263cdf3fbd8e1c77f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9720
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
There's no need to keep polling with iscsi_bdev_conn_poll() when the request
list is empty, as new requests already restart the poller when needed.
Signed-off-by: John Levon <john.levon@nutanix.com>
Change-Id: Iec5553f8c14d32e894989c9f7a448b6817a821b1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9375
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
It's not providing a lot of value, while being pretty problematic, as
the read is blocking and cannot be easily changed to be non-blocking, as
dump_info_json is a synchronous interface.
Now that dump_info_json isn't using any synchronous interfaces from the
NVMe driver, we can send a bdev_get_bdevs call in the async_init.sh
test to verify that.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: Ida31c8d1000a52b0782f698afe46b071ed4e41df
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9488
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch replaces the synchronous `spdk_nvme_detach()` calls with its
asynchronous counterparts in the controller unregister path.
An additional poller is introduced to periodically poll the NVMe driver
for detach completion. Once the detach is completed, the poller is
unregistered and the nvme_ctrlr is destroyed. The poller uses the same
period (1ms) as the async probe poller.
Since reset and detach cannot happen at the same time, reset_poller was
renamed to reset_detach_poller and it can now store the pointer either
to the reset or detach poller, depending on the circumstances.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: I5eb2dd6383d98d25d1f9748af08c1a13d18acb0e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8729
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This is done in preparation for using the non-blocking versions of the
spdk_nvme_detach API, which will delay controller's delation until the
detach is completed asynchronously.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: Ia785408c9a94427e60bf239e6036a5e89d589f61
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8727
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
It can match by any provided parameter to remove paths.
Change-Id: I5e7a87342bbb90943dc97fb52f142814fcf0acfa
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9453
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Instead of storing an spdk_nvme_transport_id, store the object that
contains it. This will make a few later patches easier.
Change-Id: I36b74889fe39af3b7ab2b900fb3ea4b3f39e1f83
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9484
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Rename it to _is_core_at_limit(). This function
currently returns true if the core is at the limit
(instead of over the limit) which is really the semantics
that we want - so just change the name of the function
to make it more precise.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Idf815f67c71463c3b98bc00211aafdc291abdbd2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9582
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
We have _is_core_over_limit() which determines if a core is
currently over its busy:total tsc ratio. We use this to determine
if we need to move threads off of a core that is too busy.
But when we pick a core to move a thread *to* we were allowing the
dst core to fill to 100%, rather than the SCHEDULER_CORE_LIMIT.
This patch fixes that, which has the nice effect of keeping
thread-to-core assignments much more stable when running
I/O workloads.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Id98b08803939d2a25104082e6436bb8d4727d7c2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9578
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This will lead the scheduler to be quicker to move
threads to an unused core - favoring performance over
power savings.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ibaa5edc61a4bdca5550bd23a562c3645fded25e9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9551
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
If a core has a very high busy percentage, we should
not assume that moving a thread will gain that
thread's busy tsc as newly idle cycles for the
current core.
So if the current core's percentage is above
SCHEDULER_CORE_BUSY (95%), do not adjust the
current core's busy/idle tsc when moving a thread
off of it. If moving the thread does actually
result in some newly idle tsc, it will get adjusted
next scheduling period.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I26a0282cd8f8e821809289b80c979cf94335353d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9581
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
For the src thread, add the busy_tsc of the thread
we are moving to the idle_tsc of the current core.
This is consistent with how are accounting for the
cycles in the target core too.
We will disable the load_balancing.sh script for now.
We will reenable it later in this patch set once
a few other changes are made, along with some updates
to the load_balancing.sh script based on the changes
made in this patch set.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I8af82610804e97dabf62ccd90f75a0e6e37d276f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9550
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This will be useful in some upcoming patches where we will
be calculating these percentages in more places.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: If7d84c00fe1b666988fe06537836ba7b9cb161aa
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9580
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Purpose: Better to put this varable into the bdev_rbd_io structure
instead of on the stack. And we will use this variable in next patch,
when the callback function is completed, so we should not put it on the stack.
Change-Id: I11ff46ef07908084012bc1ce040eceb667334a40
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9334
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The purpose is that we will remove the group reaping of
rbd_io later, so we need to know the original thread info of the
rbd_io.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I69f60261447fdac0b0885fdb213e92c246439047
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9585
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Allow to return more than one memory domain.
This change aligns bdev and nvme API and provides
more flexibility for custom transports.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ica9b12ad8463c361be6cb62ee2c0513eec0b486d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9546
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Enable dump of transport stats in functional test.
Update unit tests to support the new statistics
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I815aeea7d07bd33a915f19537d60611ba7101361
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8885
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This patch enables us to aggrete multiple ctrlrs in the same NVM
subsystem into a single bdev ctrlr to create multipath.
This patch has a critical limitation that ctrlrs which are aggregated
need to have no namespace. Hence any nvme bdev is not created.
However it will be removed in the next patch.
The design is as follows.
A nvme_bdev_ctrlr is created to aggregate multiple nvme_ctrlrs in
the same NVM subsystem. The name of the nvme_ctrlr is changed to be
the name of the nvme_bdev_ctrlr.
NVMe bdev module has both the failover feature and the multipath
feature now. To choose which of failover or multipath to use, add an new
parameter multipath to the RPC bdev_nvme_attach_controller.
When we attach a new trid to the existing nvme_bdev_ctrlr, we use the failover
feature if multipath is false, we use the multipath feature if multipath is
false.
nvme_bdev_ctrlr has a list for nvme_ctrlr and it is guarded by the
global mutex. Callers can query nvme_ctrlrs from a nvme_bdev_ctrlr via
trid as a key. nvme_bdev_ctrlr is not registered as io_device.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I20571bf89a65d53a00fb77236ad1b193e88b8153
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8119
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Previously, if an I/O qpair is disconnected, we tried reconnecting
the qpair. However, this reconnect operation was very likely to fail
and will not match the upcoming asynchronous connect/reconnect
operation. We need an extra callback to make this reconnect operation
asynchronous, but we do not want to have it.
Hence if an I/O qpair is disconnected, we free the I/O qpair and then
reset the corresponding nvme_ctrlr immediately. If the admin qpair is
also disconnected, the nvme_ctrlr is reset immediately. However this
event may never happen. So we do not wait for the error of the admin
qpair.
The NVMf host may disconnect connections by itself intentionally.
In this case, resetting the nvme_ctrlr will surely fail. But resetting
the nvme_ctrlr frees all I/O qpairs of the nvme_ctrlr and these I/O
qpairs are not created again until resetting the nvme_ctrlr succeeds.
Resetting the nvme_ctrlr once at most is more efficient than repeating
reconnecting the I/O qpair. So this change is valuable even for such
intentional disconnection. However, it is helpful to know the event that
I/O qpair is disconnected. Hence change DEBUGLOG to NOTICELOG in the
disconnected callback. The disconnected callback is not repeated, and
we do not need to worry about NOTICELOG flooding.
Refine the unit test case to verify this change.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I376b749c2f55d010692bf916370e8bb4249b795f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9515
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This is similar to how we name other module library
directories.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iadaf59231323180b48b5d0cf2e6acb3d8bfc9807
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9549
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Options modified on sock after connected is also moved to a
function.
Signed-off-by: Liu Qing <winglq@gmail.com>
Change-Id: I4c2a9ae9858c102764959d055bed208b4b0621d9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9516
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This reverts commit 2cd948c4a6.
This commit caused drop in performance tests.
More info in issue https://github.com/spdk/spdk/issues/2158
Signed-off-by: Maciej Wawryk <maciejx.wawryk@intel.com>
Change-Id: Id5d353535323c79e773e33377af388dae47238cb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9510
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Poll group has a list of the associated ctrlr channels to update
them when the corresponding I/O qpair is disconnected to do path
failover.
Another idea is that poll group has a list of the associated bdev
channels. However, two or more bdev channels may share a single
ctrlr channel and I/O qpair is per ctrlr channel. What we want to do
by this addition is to stop I/O submission to any failed I/O qpair
and choose alternative I/O qpair. Hence we take the first idea.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ia287bd1b803313e66b8505a19694a40133b675f1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8124
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Previously nvme_ctrlr_channel had a pointer to a nvme_ctrlr and used it
throughout. However the use cases are not performance critical and
are better to convert from nvme_ctrlr_channel to nvme_ctrlr by
spdk_io_channel_get_ctx() and spdk_io_channel_get_io_device().
Provide a convenient macro nvme_ctrlr_channel_get_ctrlr() to do it.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ie22e8c318af88e09b95824c67b3e874d85425f1a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9195
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
bdev_nvme_find_admin_path() is used only in a place and it's role
is to find a non-failed ctrlr even after multipath is supported.
Inline it into bdev_nvme_admin_passthru() will be better.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: If58507c49b43d047e1f3ef25bbdfb571c36a1956
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9194
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@gmail.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Previously bdev_nvme_reset() returned -EBUSY if ctrlr is being
destructed and returned -EAGAIN if ctrlr is being reset.
These did not match what spdk_nvme_ctrlr_reset() returned.
Reset operation will be more important than current when multipath
is supported and reset operation is made asynchronous.
Hence change bdev_nvme_reset() to follow spdk_nvme_ctrlr_reset().
bdev_nvme_reset() returns -ENXIO if ctrlr is being destructed and
returns -EBUSY if ctrlr is being reset.
Additionally change the return value of bdev_nvme_failover()
accordingly. After the change bdev_nvme_failover() returns -ENXIO
if being destructed and returns -EBUSY if ctrlr is being reset.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ie2c6f8601050b1043d83de9cf01490751784e4e5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8859
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Konrad Sztyber <konrad.sztyber@gmail.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Following the last patch, include hostid into ctrlr_opts rather than
passing it as a parameter for bdev_nvme_create().
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I0d04db1c5767ec76a9a7cd255c3a8d56b0b8f583
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9344
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@gmail.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
nvme_ctrlr_populate_namespace_done() needs nvme_ns, probe_ctx, and completion
status.
When multipath is supported, nvme_ctrlr_populate_namespace_done() will
be called from the spdk_for_each_channel() callback. The spdk_for_each_channel()
callback can have one context and one completion status.
Hence nvme_ns or probe_ctx need to have another.
If an nvme_ctrlr has multiple namespaces, a probe_ctx is shared by
multiple nvme_ns.
So let nvme_ns has probe_ctx.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Iedaaab80616d34d01935f4ebc31e1f3b84e09e32
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9047
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
For multipath, when an new nvme_ns is added to an existing nvme_bdev,
we need to add the corresponding qpair to all existing nvme_bdev_channels
dynamically. The RPC bdev_nvme_attach_controller has to return after that.
Hence populating namespace needs to be asynchronous.
On the other hand, when an nvme_ns is removed from an existing nvme_bdev,
we need to remove the corresponding qpair from all existing nvme_bdev_channels
dynamically. Hence depopulating namespace needs to be asynchronous.
By the recent refactoring, two callbacks, nvme_ctrlr_populate_namespace_done()
and nvme_ctrlr_depopulate_namespace_done() were removed accidentally.
We need to restore these.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I37ea2420588cef3a18648dec053f8bd2b884e86b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9392
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
A few functions can be made private in bdev_nvme.c. Hence delete
these function declarations from bdev_nvme.h.
Besides, bdev_nvme_remove_trid() is only a function declaration.
Hence delete it together in this patch.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I7bf9ae6da332c6426f6eb40926e7c4130d9acd8a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9444
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
An error might occur after succesful transport creation
when the new transport is added to nvmf poll groups, e.g.
in nvmf_transport_poll_group_create. In that case
transport is not detroyed and poll groups are not fully
functional. To correct this behaviour, destroy transport if
spdk_nvmf_tgt_add_transport fails. Also update nvmf_tgt
initialization step to check that all poll groups were
created.
Change-Id: I116e6944729d846c1755c2844c77825f65db8c12
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9255
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jacek Kalwas <jacek.kalwas@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
This only existed to share code between OCSSD and regular NVM
namespaces. Now OCSSD is gone, so just merge the files into bdev_nvme.
Change-Id: Idb73cc05d67144de5dd20af8db24c8f6974d10a7
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9337
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
nvme_ctrlr_populate_namespaces
Avoid relying on this number. Different targets have interpreted its
meaning in different ways and it cannot be used anymore in practice. It
may also be very, very large.
Change-Id: I94e8eae49d6ccdbd8be302b30a120d89242b6d39
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9316
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Try to use these accessors instead of directly using the namespaces
array. This will make changing the data structure easier later on.
Change-Id: I3367d0e0065894f3aa199ed1698d27976b4cbbb5
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9315
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
If the number of namespaces is very large, this can cause excessive
memory allocation. This is especially true because when the number of
namespaces is large, it is almost always very sparsely populated.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I27d94956c222ae3c49c6a7422164ae3a8ec8d963
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9302
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Do simple validity checks first, then check for duplicate controllers.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: Ic21d64574b37a3d9148e5cd6d5a7599449ea8fe1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9341
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Jacek Kalwas <jacek.kalwas@intel.com>
It is more flexible now as it is possible to get nvme ns handle to do
additional management or queries, however if nvme ctrlr handle is needed
there is already public nvme API for that with nvme ns as an input.
Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com>
Change-Id: I5493168ad31cc95687962288d57fb5457f2d7dd6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9357
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
When preparing for a reset, use this new call to tell
the driver to avoid sending DELETE_CQ/SQ commands to a
PCIe controller when they aren't needed.
Fixes issue #2073.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I9ebb7d5c3f7cbb1c3192f162f32edbbea41acde1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9250
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Matt Dumm <matt.dumm@hpe.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
Added writing out JSON configuration for the scheduler
subsystem.
Signed-off-by: Krzysztof Karas <krzysztof.karas@intel.com>
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I51b1f94b3f56d0bfb8a87127163c8e248d6846b6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7119
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This patch moves schedueler and governor related API from
the internal event.h to public scheduler.h.
With this it is possible to create subsystem responsible
for handling the schedulers.
Three schedulers and a governor were moved to scheduler modules
from event framework.
This will allow next patch to add JSON RPC configuration
to the whole subsystem.
Along with easier addition of other schedulers.
Removed debug logs from gscheduler, as they serve little purpose.
Signed-off-by: Krzysztof Karas <krzysztof.karas@intel.com>
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I98ca1ea4fb281beb71941656444267842a8875b7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6995
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
When we rotate the socket in the list, we did not check whether the uc pointer
is NULL, then we cause the coredump when using uc pointer.
When we add a new socket (with pollin event) into the pending_recv list,
we add it into the end of the pending_recv list, then it delays the execution
of the newly socket, and it can cause the timeout especially when a socket
is in a connection phase.
So the purpose of this patch is:
1 Revise the rotation logic to handle the two cases, i.e., (1)sock is in the
beginning of the list; (2)sock is in the end of list. The purpose is
to avoid NULL pointer access, and efficently handle the exceptional case.
2 When there is new pollin event of the socket, we should add socket in the beginning
of the list. And this can avoid the new socket handling starvation.
Since max poll event num is 32 from upper layer and if we always put the new socket
in the end of the list, then starvation will occur if there are many socket connection events.
Because if we add the new socket into the end of the pending list, we will always handle the
existing socks first, then the later coming socket(with relatively pollin event) will always be
handled late. Then in the sock connection initialization phase, it will consume a relatively
long time, then the upper layer connection based on this socket will cause timeout,.e.g.,
ctrlr.c: 185:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:
uuid:af56cce7-2008-408c-a8a0-9c710857febf from subsystem nqn.2019-02.io.spdk:cnode0 due to
keep alive timeout.
[2021-08-25 20:13:42.201139] ctrlr.c: 579:_nvmf_ctrlr_add_io_qpair:
*ERROR*: Unknown controller ID 0x1
Fixes#2097
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I171b83ffd800539e86660c7607538e120fbe1a91
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9223
Reviewed-by: John Kariuki <John.K.Kariuki@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
There's only one type now.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I8fbf330797e772b1c45a04970c95bf4894c26639
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9348
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
There's only one kind of namespace now that OCSSD is gone, so simplify
those whole path.
Change-Id: I721de11c3e7be8b3a13ada25b6d6163a67c6659f
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9329
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
As far as we're aware, this is not in use by anyone. OCSSD has largely
been replaced by ZNS and no OCSSD drives made it to the market.
Change-Id: I020ee277da5292f8c4777f224acafd87586f8238
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9328
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The values returned weren't actually used, and instead
make the code unclear especially in the error cases.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iee2661545a6831f6151a183d4ac723c835e84d13
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9349
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Shouldn't really ever happen but is somehow things get out of whack
and we get a busy back from the low level lib we don't want to
increment our flow control counter.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I49176e8ad2bf6f658ac970efea5db89eebc747f1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9232
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
bdev_nvme_create() is called only by a single caller and hostnqn is
just copied to ctrlr_opts even if it is passed separately.
Hence include hostnqn into ctrlr_opts rather than passing it as a
parameter for bdev_nvme_create().
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I75b640bcecefa94950b0c19936fab0571c428125
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9332
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The NVMe bdev module will have two similar features, multipath and
failover when it supports multipath.
Take a case that we add two different trids with the same name by the
bdev_nvme_attach_controller RPC as an example.
The failover adds secondary trid to an existing nvme_ctrlr. The multipath
feature creates another nvme_ctrlr and adds it to the same nvme_bdev_ctrlr
which has an existing nvme_ctrlr.
We want to use bdev_nvme_attach_controller for both failover and multipath.
To do it cleanly, separate callback to spdk_nvme_connect_async() between
creating ctrlr and setting failover.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Id9bc175af6201cdd74e12d4903fc81afe4f91189
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9225
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We get ext_opts by getting spdk_bdev_io from nvme_bdev_io in
bdev_nvme_readv() or bdev_nvme_writev() now. But this is not aligned
with the current design pattern and not so efficient.
We pass contents of bdev_io via parameters to bdev_nvme_readv() or
bdev_nvme_writev().
Let's follow it about ext_opts.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I8a43f31934a36fa2d43800ec8bf17916edfca477
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9292
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@gmail.com>
When an abort command completes successfully,
cdw0 bit 0 may be set to indicate that the controller
was unable to abort the specified cid. In that case
the bdev nvme abort completion callback needs
to reset the controller.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I42a5e96e19e113c38dec67c2d8575a11f39d41a9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9320
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Converted log message to debug that is called too many times
during a hot remove, filling up the log file.
Signed-off-by: Michael Haeuptle <michael.haeuptle@hpe.com>
Change-Id: I08f7ff7c36c6388270878291df8a1e83646e8aee
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9258
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
When a bdev is being unregistered, after all
channels have been closed, the bdev layer calls
the module's destruct callback for the bdev before
calling the bdev unregister callback.
For the rbd module, the destruct callback is
bdev_rbd_destruct. This callback unregisters the
rbd io_device which is an asynchronous operation.
We need to return >0 from bdev_rbd_destruct to
inform the bdev layer that this is an asynchronous
operation, so that it does not immediately call
the bdev unregister callback. Once the rbd io_device
is unregistered, we can call spdk_bdev_destruct_done()
which will trigger the bdev layer to finally call
the bdev unregister callback.
Without this fix, deleting an rbd bdev would
complete before the backing cluster reference
had been released. This meant that even
if you had deleted all rbd bdevs, there might still
be cluster references in place for a short period of
time. It's better to wait to complete the delete
operation until the cluster reference has been
released to avoid this issue (which this patch
now does).
Fixes issue #2069.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I8ac156c89d3e235a95ef196308cc349e6078bfd7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9115
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
The bdev layer doesn't call the destruct callback until
all channels have been released, but because the channel
delete callback passes message to the main thread, we can
end up with a complicated race condition. Currently we
have a deferred_free code path to handle this race, but
we can handle this a bit more cleanly by doing the
construct operation on the main_td as well.
This also simplifies the next patch which will
asynchronously destruct the bdev to fix an RPC bug.
Here's the race:
1) first channel was created on thread A, so disk->main_td = thread A
2) second channel was created on thread B
3) first channel is freed (but disk->main_td is still thread A)
4) spdk_bdev_unregister is called on thread C
5) bdev layer gives callback on thread B to upper layer
6) upper layer on thread B frees channel
7) bdev_rbd_destroy_cb runs on thread B and has to send msg to thread A
for processing
8) bdev layer calls bdev_rbd_destruct on thread C (since step #4 was on
thread C)
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I25ede2dc56e24dac0919aed05b9def2560823ee7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9158
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Klocwork filtered it out that there is a dereference
of pointer 'lvol' before its NULL check.
Change-Id: I83a026e8762d1e004cf1a351585f48fb1e24ae41
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9208
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch makes use of async_fini_start flag to
make fini_start asynchronous.
During this time all lvol stores which have no open
lvols are unloaded. This is required, since lvs
holds claim on the underlying bdev.
Fixes#1630
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: If443cb087324d08a4a70df71c7afd930ab654f90
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9095
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Callback for bdev modules is called 'module_fini',
meanwhile after its execution bdev modules were to call
'spdk_bdev_module_finish_done()'.
This function carries incorrect name, so it was deprecated
and replaced with 'spdk_bdev_module_fini_done()'.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I9a12dff746ea8b4b1570a3794470f7b24e29003e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9148
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
fini_start() is called for each bdev module before
iterating over all unclaimed bdevs to unregister them.
This allows bdev modules to behave differently during
each such unregister. Ex. unloading lvol store when
all lvol bdevs on it are unregistered.
Another use of this callback is to unclaim all bdevs
that can be at that point. Especially ones that will
not receive callback due to no bdev registered.
Ex. offline raid bdev, when some underlying bdevs are missing.
fini_start() being synchronous does not help in cases
where to release claim on the bdev, an asynchronous operation
is required. Ex. lvol store with no bdevs present, requires
async lvs unload to be called.
This patch adds async_fini_start flag for the bdev modules,
to be used when async fini_start is required. When done,
bdev module has to call spdk_bdev_module_finish_start_done().
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I63438b325d4cc53fd236bf9ff143abf6bdd81c49
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9094
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
When removing a socket from a sock_group, the recv_task should be
cancelled last, because it can be sent out while cancelling other tasks
(if POLLERR is received). Otherwise, we could end up with outstanding
recv requests from a socket removed from a group.
Fixes#2112.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: Ic8e24c210541390dd8bdffe8d3bc4e7dd746d4b7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9239
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Once bdev finish starts, bdev unregister is called on all
unclaimed bdevs. This means that for lvs with at least one
lvol present, there will be a corresponding bdev unregister.
Yet the vbdev_lvol module does not attempt to unload the lvs,
once last lvol from that lvs is unregistered. Leaving
the base bdev for lvs claimed.
This patch fixes that by using fini_start callback from
bdev_module to mark when shutdown begins. After that
last lvol unregistered on lvs will unload it.
Expanded struct lvol_bdev to contain lvol_store_bdev.
Closing the lvol will free spdk_lvol, so lvol->lvol_store cannot
be accessed.
Changed ut_lvol_destroy UT to ut_bdev_finish. Previous UT didn't
really test vbdev_lvol_destroy, but 'hotremove' of the lvol bdev.
In effect there is no hotremove of the lvol bdevs (only lvs bdev).
spdk_bdev_unregister() can only be called from within vbdev_lvol,
or during bdev module finish.
This UT will now check the bdev module finish.
Note that at this point lvs with no lvols will not trigger
lvs unload. Next patches in series will introduce async fini_start,
to allow for the unload.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I8f51e8c1fcfdc55a5d090a3bc84ccefda813aef8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9093
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
OCF creates vbdev with block size equal in size with a core device.
We need to ensure that cache's bdev block size is not bigger than
core's bdev block size, so there are no IO errors due to IO length
smaller than cache device's block size.
The reason why this is implemented late in the cache start and not
as soon as we want to construct OCF vbdev is that cache or core
device can be added later after OCF vbdev creation and only then
we are certain to have both devices to compare.
Fixes#2088
Signed-off-by: Rafal Stefanowski <rafal.stefanowski@intel.com>
Change-Id: I536c783ca71b52f212217c597b7997f2d2e89491
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9229
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
The new API is used if bdev ext_opts is not NULL.
Change-Id: I414b5d19bff54114d6708efed89ba19b5955f56a
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6271
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Added struct lvol_bdev wrapping around spdk_bdev.
This will help with passing around any context.
For this patch only spdk_lvol is still used.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I23f5be5edda526ad607ea8b45274c945a42d90db
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9147
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Same iterator was used in two places already, and new one
will be added in next patch.
Replaced _SAFE variant since entries cannot be removed
while iterating in this loop.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I5ee911a5b653cfbdf0b4a39d283bd5eee66c49d8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9092
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
async_mode option is currently supported in PCIe transport layer
to create io qpair asynchronously. User polls the io_qpair for
completions, after create cq and sq completes in order, pqpair
is set to READY state. I/O submitted before the qpair is ready
is queued internally. Currently other transports only support
synchronous io qpair creation.
Signed-off-by: Monica Kenguva <monica.kenguva@intel.com>
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: Ib2f9043872bd5602274e2508cf1fe9ff4211cabb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8911
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
In order to not affect the loopback test.
Also create a sock_common.c file which can be used by posix/uring
implementation. We do not put such code in sock.c. Because sock.c
is the general layer. Other users may include their own user space
sock impelmentations. So put those common code in sock_common.c instead.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I983ec2313119539e6eed2d9f11ba1488c0ed6560
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8769
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This patch tries to add MSG zero copy feature.
Though io_uring supports buffer registration, it only
support io_uring_prep_write_fixed. It means only one
registered buffer can be used. It does not satisfy our
current usage mode.
According to this situation, we still use the MSG_ZEROCOPY
flags in io_uring_prep_sendmsg.
Furthermore, this new feature is dependent on the kernel
version, The currently verified version is
kernel 5.12 rc3. So it is not enabled in the default manner.
For example, if you want to use it on the target side, you can
use the following rpc to configure:
./scripts/rpc.py sock_impl_set_options -i uring --enable-zerocopy-send-server
Change-Id: Ie7bb828f466362add94891989ddf0950dccd9e80
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/957
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Sending the unmap in zone reset is optional (it's only a hint), so if
the base bdev doesn't support it, the reset is completed immediately.
Fixes#2064.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: If2c57eadc20d352a71853d7023599503330e1252
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9154
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Parses and verifies hexadecimal cpu bit mask specified by the user.
Added verification to check for cpu cores range, making sure poll groups cores
assigned within the range of cpu cores allocated for the application.
RPC nvmf_set_config now takes an argument to configure ‘poll groups’,
a new parameter for NVMf subsystem. This parameter sets a CPU mask
to spawn threads which run an event loop for a ‘poll group’.
Change-Id: Ied9081c2213715ec94de00a8b37153730b8ac2ed
Signed-off-by: Yuri <yuriy.kirichok@hpe.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5611
Community-CI: Mellanox Build Bot
Reviewed-by: Matt Dumm <matt.dumm@hpe.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Purpose: Only Open image on one spdk_thread
due to the limitation of librbd module in order to
eliminate the lock overhead among different
spdk_threads on operating on the same image.
Change-Id: I64c62e8ae1c3324b92cfd953b44ec08af6688530
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6812
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This parameter is still part of API spdk_sock_impl_opts
structure but it is not used. Keep it to support ABI
compatibility since it is located in the middle of the
structure and removing it may break socket opts initialization
or parsing.
Change-Id: Ib641ad7d965d68bc9ebb65dba531408d88cf6fa1
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8914
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Revise the if case to avoid the assert issue.
Change-Id: I095f3d111423e17abaa1f951fe22efb3d2e851b7
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8872
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Implement bdev_nvme_reset_controller rpc, which allows the NVMe
controller to be reset over RPC. Implement bdev_nvme_reset_rpc()
which starts the reset of the controller and returns the result of
the controller reset via the callback function after it completes.
Signed-off-by: Jonathan Teh <jonathan.teh@mayadata.io>
Change-Id: Id98d5e56feb315b7e44e9bb5e5f495e9b1cd1de0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8456
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
In bdev_nvme_reset_ctrlr(), get a controller reset context and start
a poller that calls spdk_nvme_ctrlr_reset_poll_async() to perform the
controller reset asynchronously.
Signed-off-by: Jonathan Teh <jonathan.teh@mayadata.io>
Change-Id: I1e3ae42291c3b43b69c99ca56997dc1965c3ac59
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8454
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
For clarity, this element was added when crc+copy API was
added so might as well have all the CRC related functions use
it instead of `dst` to avoid confusion.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: Ic43adbd0df51c1a349847701ef318f452306d0b3
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8229
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Support in accel_perf is coming up in a later patch.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I63a1d3b9b1a3254fdca78e27c473b9b3468c93c8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8202
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Create a single nvme_bdev_channel for each nvme_bdev and each SPDK
thread. nvme_bdev_channel has a pair of nvme_ns and nvme_ctrlr_channel.
The pair of nvme_ns and nvme_ctrlr_channel will be aggregated by
nvme_ns_channel in the following patches.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I095a2d6afa4ea23a87e4452b2f9d4c7e0087abe0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6605
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
NVMe bdev module manages ANA log page itself now. So NVMe driver
should disable managing ANA log page.
Add a new option disable_read_ana_log_page to struct spdk_nvme_ctrlr_opts.
Then NVMe bdev module enables it when calling spdk_nvme_connect_async().
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Id5249efe90a4d50763c3a7eaa1eb9572f60fbc8c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8313
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
When ANA change event is notified, increment reference count, read
ANA log page, and parse it to update ANA states of namespaces.
Then remove the spdk_nvme_ns_get_ana_state() call and its stub in
unit tests.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I254ae6cb993694bf0d7f4fa4b1039b5f9243b5cb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8335
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
If ctrlr supports ANA log page, nvme_ctrlr allocates a buffer for ANA
log page and read ANA log page itself, and then each nvme_ns sets its
ANA state by parsing ANA log page.
Most code was brought from NVMe driver because NVMe driver already
supports ANA log page management. However it had a bug that assumed
each descriptor is 8-bytes aligned. Fix the bug together in this
patch. Besides, the implementation in NVMe driver was synchronous.
NVMe bdev module reads ANA log page asynchronously instead.
The next patch will support ANA log page update by AER handler.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ib8eab887633b043b394a45702037859414b8e0a0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8318
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch is used to add the support for users to configure
use kernel or userspace idxd library.
Change-Id: Ie159b897bc9595894ad8f333168efaea6c2a3d78
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7332
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch is used to improve the performance when
we need to reorder the list.
PS: Bring the similar operations from posix implementation.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I7031b35ddb597730ee160690e8ab9caf9b2b64b7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8675
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Now that we've deprecated the RPCs for a release, we can remove the whole
library.
Change-Id: I0f1a357fcfb3404efac39aa021928841c2f22ff1
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4305
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
bdev_nvme_reset() will be used by JSON RPC and we will have to call
the callback to JSON RPC at bdev_nvme_reset_complete(). To do it
easily, register the current completion function for nvme_bdev_io
in bdev_nvme_reset_complete() into nvme_ctrlr as a generic callback.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ie59551dc343215a95bfa5b22f234fc153c9db1b5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8589
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
nvme_ctrlr will be registered as io_device even when multipath is
supported. Hence while spdk_for_each_channel() is executed in reset
processing, we can get nvme_ctrlr both spdk_io_channel_iter_get_ctx()
and spdk_io_channel_iter_get_io_device(). This duplication is not
necessary. Use spdk_io_channel_iter_get_io_device() and pass NULL
to the context parameter of spdk_for_each_channel() for clarification.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ifdbd0af4274081c4be7ab0735eb8bf9ae10e3493
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8588
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
The current nvme_ctrlr will be registered as io_device even when
multipath is supported. Then we do not have to differentiate completion
processing between reset from bdev_io and internal reset. Hence
inline bdev_nvme_reset_io_complete() into bdev_nvme_reset_complete().
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ife2c4c93d423da3953174ac860485a6e095a28bd
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8587
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
When we support multipath, reset_io will hold the controller currently
being reset to reset all underlying controllers sequentially.
bdev_nvme_submit_request() basically passes nvme_bdev_io to each I/O
type, and we have a convenient helper function bdev_nvme_io_complete()
which has nvme_bdev_io as a parametetr.
So revert the previous change to bring nvme_bdev_io as context
for reset processing.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I19697e8252505bab519a42889d1a88d967932f22
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8586
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reset requests from the upper layer will reset the underlying
ctrlrs of a bdev ctrlr but internal reset requests will reset only
the specified ctrlr.
To clarify such difference, rename bdev_nvme_reset() by
bdev_nvme_reset_io() and remove the underscore prefix from
_bdev_nvme_reset() and related functions.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I9a2d124f6e2039bfecfdd6599827354d6c373f2e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8492
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
nvme_bdev_first_ctrlr() and nvme_bdev_next_ctrlr() were not possible
to hold mutex correctly, and nvme_ctrlr_get() and nvme_ctrlr_get_by_name()
had not held mutex.
nvme_bdev_first_ctrlr() and nvme_bdev_next_ctrlr() were replaced by
nvme_ctrlr_for_each() in the last patch.
In this patch, add mutex to three helper functions, nvme_ctrlr_get(),
nvme_ctrlr_get_by_name(), and nvme_ctrlr_for_each().
Add mutex to nvme_ctrlr_create() but it will be removed in the
following patches because nvme_ctrlr will be added to not global
ctrlr list but ctrlr list per nvme_bdev_ctrlr.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ife27066d2dcac82db0616b0afeaf68e5705d7da1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8722
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Replace two helper functions, nvme_bdev_first_ctrlr() and
nvme_bdev_next_ctrlr() by an new helper function nvme_ctrlr_for_each().
This will make us easier to guard data structure correctly in the
following patches.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ibd81286e454fd6127fd150a7d48d8381bd1b89d3
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8721
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This object aggregates multiple I/O qpairs for their completion
operations and may be a higher layer object. However, the
aggregation is only to poll completions efficiently. Hence if we
follow the new naming rule, nvme_poll_group is better than
nvme_ctrlr_poll_group and nvme_bdev_poll_group, and rename
nvme_bdev_poll_group by nvme_poll_group.
Besides, many functions in NVMe bdev module have a naming rule,
bdev_nvme + verb + objective
Follow this rule for a few functions related with nvme_poll_group.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I5e5cb6001d4a862c2121b7265cbbffe0c2109785
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8720
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This object is per I/O path and will be aggregated by an new upper
layer object.
Hence rename nvme_bdev_ctrlr by nvme_ctrlr. Then the following patches
will add nvme_bdev_ctrlr as a different upper layer object.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ieed634447785cc98140b3d49c52a2c753988ece7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8381
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This object is used for failover and per I/O path. A controller may
have multiple of this object. A controller is per path and may be
aggregated by an new object. Hence this object is a lower layer
object.
Based on the new naming rule, rename nvme_bdev_ctrlr_trid by
nvme_ctrlr_trid.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I0d5e5812560a6947a0c25af05dea168e8745130e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8380
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
This object will be aggregated by the upper layer object nvme_bdev.
Hence based on the new naming rule, rename nvme_bdev_ns by nvme_ns.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I96a70213b29fb53437acd080a0787ec9f5a6759a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8379
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
We will name the lower level objects starting with nvme_* and the
upper level objects starting with nvme_bdev_*.
This object is a channel per ctrlr and another new channel will be
added on top of this object.
Rename nvme_io_path by nvme_ctrlr_channel based on the new naming rule.
nvme_io_path will be used for a new object which is used to find an
optimal I/O path and to reset multiple ctrlrs sequentially when
multipath is supported.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I1d4fa6d4625de3413d629a1ff412e00de12dfaf4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8378
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
NVMe specification defines namespace identification descriptors i.e.
EUI64, NGUID, UUID.
BDEV abstracts NVMe specific details that is why only UUID is exposed,
however if NGUID is supported it is prefered to identify namespace
with NGUID over UUID.
If NGUID is not supported by NVMe Controller then fallback to UUID.
Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com>
Change-Id: If51889a3664c0daa7cbe983048231793e3c502e0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8627
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
spdk_nvme_ctrlr_free_io_qpair now always returns 0, so
update the code to account for that.
Fixes issue #2012.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I61c78459472573adbfeb28052ae3379d7880567c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8660
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
When entering the if case to order the list, there is bug should be
fixed. The original code does not address this.
The way this happens is when there is a connection left in the socks_with_data list
between polls and there are enough new events detected that it would exceed the
maximal number of events. A connection is left on this list between polls if it isn't
fully drained via reads by the upper layer on each poll loop.
Currently, the maximal socket event num is 32. Then we did not hit this issue
in our normal test cases. But when you use NVMe-oF tcp target to test which is
described in #2105, there are more than 32 active sockets, and it exceeeds
the maximal num of events of polling (32), so we will trigger this issue.
Fixes issue #2015
Change-Id: I9384476fdba8826f5fe55a5d2594e3f4ed3832ba
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8541
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Without it multiple threads can race and end up sharing a device
when the intention is sharing only after full round robin.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I29b854ff837d56078bc033802d3df244728a29aa
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8187
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Recent work identified race conditions having to do with the
dynamic flow control mechanism for the idxd engine. In order
to both address the issue and simplify the code a new scheme
is now in place. Essentially every DSA device will be allowed
to accomodate 8 channels and each channel will get a fixed 1/8
the number of work queue entries regardless of how many
channels there are. Assignment of channels to devices is round
robin and if/when no more channels can be accommodated the get
channel request will fail.
The performance tests also revealed another issue that was
masked before, it's a one-line so is in this patch for convenience.
In the idxd poller we limit the number of completions allowed
during one run to avoid the poller thread from starving other
threads since as operations complete on this thread they are
immediately replaced up to the limit for the channel.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I913e809a934b562feb495815a9b9c605d622285c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8171
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
In uring_sock_create(), we loops through all the addresses available.
If something is wrong, we should close(fd) and set fd to -1, and
try the next address. Only, when one fd satisfies all conditions,
we will break the loop with the useful fd.
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Change-Id: I22eada5437776fe90a6b57ab42cbad6dc4b0585c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8311
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
The reference count of nvme_bdev_ctrlr will be used to update ANA log
page safely, and nvme_bdev_ctrlr_destruct() can be used to decrement
reference count after completing ANA log page update.
However, nvme_bdev_ctrlr_destruct() is not a good name for this case.
Furthermore, nvme_bdev_ctrlr_destruct() does not set the destruct flag
to true, and the next patch will need nvme_bdev_ctrlr_acquire().
Hence rename nvme_bdev_ctrlr_destruct() by nvme_bdev_ctrlr_release().
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I285b7ab0963d0f4ea4a7a9fd29bd026d37ba8460
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8334
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Factor out registering nvme_bdev_ctrlr as io_device and populating
namespaces after creating nvme_bdev_ctrlr into a helper function.
We extract spdk_io_device_register() from nvme_bdev_ctrlr_create()
because free(NULL) is correct but spdk_io_device_unregister(NULL) is
not allowed, and hence it is very simple if we call spdk_io_device_register()
only after nvme_bdev_ctrlr is successfully created.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ia4d85ccf96f3ef62e51db9d08ec606d4100c7ebd
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8317
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reorder a few operations and increment nvme_bdev_ctrlr->num_ns
after allocating nvme_bdev_ctrlr->namespaces[i] successfully.
Then unify the goto label for error cases to err and the err label
simply calls nvme_bdev_ctrlr_delete().
There is one noticeable change in this patch. Previously the
controller had not been detached when creating nvme_bdev_ctrlr failed.
However, after this patch, the controller will be detached when creating
nvme_bdev_ctrlr failed. This will be reasonable change.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ifd8c4649036f1c5e5cd688f89727b2bd2e982735
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8316
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Consolidate populate_namespaces_cb() calls for error cases into
connect_attach_cb(). Then remove ctx parameter from
bdev_nvme_add_secondary_trid() because it is not necessary now.
The next patch will inline _nvme_bdev_ctrlr_create() into
nvme_bdev_ctrlr_create().
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ia94f456df160c1cc874acac4c70aad27102cb0b6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8314
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
In posix_sock_create(), we loops through all the addresses available.
If something is wrong, we should close(fd) and set fd to -1, and
try the next address. Only, when one fd satisfies all conditions,
we will break the loop with the useful fd.
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Change-Id: Icbfc10246c92b95cacd6eb058e6e46cf8924fc4c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8310
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
In blobfs_fuse_start(), bfuse->bdev_name and bfuse->mountpoint
are allocated by calling strdup(), which may return NULL.
Here, we will go to err if strdup() returns NULL.
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Change-Id: I0599254b3436a310ddd26732312281f07a4972ec
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8303
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
After reading the code in detail, I think that we should
not set pipe_has_data= true and socket_has_data at the same time.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I8f9f96b16f4f0e0c585877a0dd687a240252a7cf
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8283
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Set nvme bdev physical block size based value of NPWG and NAWUPF namespace
field.
The logic to set bdev phys_blocklen is based on how Linux nvme block driver
sets it. If the underlying nvme namespace supports NPWG/NAWUPF then bdev
phys_blocklen is set to min(npwg, nawupf)
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Change-Id: I6d254a9e730dccc230b9db4d1217bf7ab2f39b6c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8224
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Upcoming patches will add support for accel_perf tool. Also
following will come vectored support and batch versions.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I43bd11b8efe40e6df0e2c8bd2995b9a9341f6457
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8142
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
In order to avoid latency imbalances, the user can specify a cpu mask
on which the poll groups should run. This code update added data structures
to control set of CPU cores.
Change-Id: Iaf69d75da2fc6fed350d97d11027ce09e9432210
Signed-off-by: Yuri <yuriy.kirichok@hpe.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5610
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Also add scripts/bpf/nvmf.bt to enable and log these
probes.
This patch also adds a script that can generate
a bpftrace script snippet with string maps for
needed enumerations (currently nvmf_tgt_state and
spdk_nvmf_subsystem_state). This allows us to
dynamically generate this from the source code, and
can be extended for other enums we may want to
add in the future.
Thanks to Michal Berger for converting my original
gen_enums.py script into gen_enums.sh!
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Signed-off-by: Michal Berger <michalx.berger@intel.com>
Change-Id: Iff34a6218aef40055ac14932eea5fc00e1c8bcf5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7194
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Adding iov to the spdk_bdev_zcopy_start function enable spdk_bdev_zcopy_start to
be used by transport layers as the iov is owned by the transport command
Signed-off-by: matthewb <matthew.burbridge@hpe.com>
Change-Id: I6d2be7f49566048bf25b7711ada8d2fb49fea6ee
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6816
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
We will print a notice log if the drive can support SECURITY
SEND/RECEIVE commands but not OPAL, so remove the error logs.
Change-Id: Ib26aa727ad1e703d53c387af8507b920606ea9c6
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8055
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
By the next patch, nvme_io_channel will be used as an I/O channel
to a single nvme_bdev. This channel is created to a single
nvme_bdev_ctrlr and has a corresponding I/O qpair. nvme_io_path
will be a better name especially when we support multipath.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ic34162f3c383676c5249396a09173329fc6febce
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8095
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Previously only a single thread could submit abort commands for admin
commands and it was the thread of the corresponding controller.
When we support multipath, we need to traverse the list of controllers
to which the target admin command is submitted. Threads of controllers
may be different.
On the other hand, the previous implementation made the I/O flow very
clean, but the I/O flow will not be clean if there are many controllers
and the subsystem does not have its thread.
This patch changes the policy so that any SPDK thread can submit abort
commands for admin commands.
Then when multipath is supported, we will be able to traverse the
list of controllers simply on the current thread to abort either I/O
command or admin command.
We already are able to submit any admin command on any thread anytime
including abort command. Hence this will not cause any issue.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ib69de33f2e84b03861c7d95ce060035bdb589e4b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8121
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
It is unlikely that managing namespaces by nvme_bdev is complicated.
Hence we do not need the helper function nvme_bdev_to_bdev_ns().
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I77b4dcd12b2f2a219f58e5bc7b7e51dd10635da4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8118
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
We can hold bdev_io directly in nvme_bdev_ctrlr as an outstanding reset.
We can put spdk_bdev_io_from_ctx(bio) into a parameter for a few
functions because it is used only once in a function.
Passing not spdk_bdev_io but nvme_bdev_io to bdev_nvme_verify_pi_error()
remove unnecessary substitution.
This is a little more efficient and simplifies the implementation.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: If49ad9fa42abf27decf3afcd8c994f55faa3bc70
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8094
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Following Linux NVMe host, add UUID and EUI64 comparison to
bdev_nvme_compare_ns().
Besides, previously the return value of memcmp() had been used as
the return value of bdev_nvme_compare_ns() and this was wrong.
Fix it in this patch together.
Add unit test cases for bdev_nvme_compare_ns().
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I069ab53e77741d6348b847d51e84a9338e2f3787
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7755
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Use the macros for red black tree provided by Free BSD to speed up bdev
name lookup in spdk_bdev_get_by_name().
In the bdev_multi_allocation test, we can get 3x ~ 5x speed up when
creating multiple bdevs for various bdev nums.
Signed-off-by: Jiewei Ke <jiewei@smartx.com>
Change-Id: I49a2fbcccf06d4c36cbd445ce59e0b0dd4ada31d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7837
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Purpose: Let the users know the current available registered Rados
cluster and the related info.
Change-Id: I115c129ae6e4b0372579aad168fd88f8be136357
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7990
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This is useful for applications even if they elect not to use the SPDK
event framework.
This doesn't shift everything in one go - just the subsystem
initialization logic. Configuration file loading also needs to move
in a separate patch later.
Change-Id: Id419df1045442d416650ed90e5ee78adfdd623d7
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6641
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Checked the definition of strncmp. If substring s1
is found in strnmp(s1, s2, len), then it will return 0.
For the len value, it is better to use strcmp. Otherwise,
if s1=cluster1, s2=cluster & len=strlen(s2),
strncmp will return 0. But they are two different strings. For
cluster names, they are different.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I15a06184d834cd1567b329d0322cd6bdea6fee4b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7991
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Mellanox Build Bot
bdev_nvme_compare_ns() will be used to check if all namespaces of
one nvme_bdev are identical, and this is the convenient location.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I3fa6072c1cceec53268e53bf398fa1e8f069035e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7169
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
This will be helpful to simplify the following patches.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I66939f2953c66582bfcb79cfe187814280e89680
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7324
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
When multipath is supported, a subsystem has multiple controllers and
bdev_nvme_create_cb() will create a channel per nvme_bdev_ctrlr by
iterating the list of nvme_bdev_ctrlrs, and will hold lock while
doing it.
If the code to access nvme_bdev_ctrlr is put in a place, the
following patches will be easier and smaller.
Hence reorder the code of bdev_nvme_create_cb() as a preparation.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I2f2e66758c3374c678cc44bbb0116f4611c6753a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7754
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Factor out spdk_bdev_io_complete() calls into a helper function
bdev_nvme_io_complete().
This simplifies the code a little and will be helpful for the following
patches to retry I/Os. These are not performance critical but we
specify inline explicitly by following bdev_nvme_io_complete_nvme_status().
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I9fafacfd8571c037c3bc34382c251317309da334
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7497
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Mellanox Build Bot
Factor out spdk_bdev_io_complete_nvme_status() calls into a helper
function bdev_nvme_io_complete_nvme_status().
This simplifies the code a little and will be helpful for the following
patches to retry I/Os. Specify inline explicitly to avoid performance
regression.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I1ac451486e1c6a4401842490411e986fac191d59
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7484
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
bdev_nvme_submit_request() calls spdk_bdev_io_complete() with failed
if bdev_nvme_reset() returns negated rc other than -ENOMEM.
So let bdev_nvme_submit_request() process it.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I2569634ff0f18fb433cb685de1366e43abf5a9fe
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7524
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This is called only in a place. Inlining this into bdev_nvme_submit_request()
will simplify the following patches by removing unnecessary cast.
Besides, use -ENXIO if I/O path is not found. This will be better than -1.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ib6c38f89db1c1e651941aad18d31dd0891f380de
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7871
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
I/Os will be retried if spdk_bdev_io_complete() is called with
SPDK_BDEV_IO_STATUS_FAILED. To do it easier, consolidate exit paths
of bdev_nvme_get_buf_cb().
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I0a67b88a107d616c5a5b0fc5ff963ad1402f5651
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7487
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Move from a single flag indicating that the socket is on the
pending_events list to two flags - pipe_has_data and socket_has_data. If
either flag is true, the socket is on the socks_with_data list.
This is necessary to track enough state to avoid doing extra recv()
system calls.
Change-Id: I65e5701dccb0a5bade19f266f164f26706b110d4
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7595
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Revise bdev_rbd_create rpc call to add an optional
parameter "--cluster-name", e.g., "--cluster-name Rados".
Then users can create a rbd bdev with registered
Rados Cluster. This shared strategy can be used to
remove the thread creation overhead if multiple rbds
are connected to the same Ceph cluster.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Ide5800f8fc6b2074805272a59731c666fe279b9a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7584
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This patch is used to add two rpc calls:
bdev_rbd_register_cluster
bdev_rbd_unregister_cluster
Then in the next patch, rbd bdev constructed on the same cluster object
can share the common Rados_t structure in order to remove the thread creation
overhead and improve the scalability.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I898cc4ffabb8e6721ba5bef099cbf948c64d2c98
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7551
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The bdev layer can do the unmap split now based on the backend
device. For now we only use 1 unmap descriptor, the bdev layer
can help us to do the split.
Fix issue #1888
Change-Id: Iaf740bafd4f2bb4b108133fee2aafd2f53da9b2b
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7519
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Also free scsi_task data structure for the asynchronous
libiscsi APIs.
Change-Id: I0bff706bfb795e51a4b10c357913ae66493dca5d
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7513
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
If no ioat devices are registered, we should return.
Change-Id: I03435946716ef653b230515da32e8ccbdf5a188a
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7834
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The coredump info can be viewed in
https://github.com/spdk/spdk/issues/1935
We face this issue because no idxd device is attached, but we
still register the hw engine by spdk_accel_hw_engine_register
in accel_engine_idxd_init.
Fixes#1935
Change-Id: I537f06e2b2923faac7f2cd6a28903e77f1f6aaa5
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7832
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This makes use of newly added spdk_bdev_wait_for_examine(),
to only respond to RPC when bdev was fully examined.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: If82cd913ab6653e8cc0da38c639b384b6c0303ba
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5482
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This makes use of newly added spdk_bdev_wait_for_examine(),
to only respond to RPC when bdev was fully examined.
Fixes#1760
Issue above was triggered in DD tests where application
finished before the examine had a chance to fully finish.
This patch addresses it by making sure that nvme attach
RPC waits for completion of the examine.
Later patch in series adds the bdev_wait_for_examine RPC
to multiple static configuration files. Making sure similar
issues do no occur for bdev modules which do not have changes
in their RPC as here.
The issue does not occur for JSON configs generated from apps,
see patch:
(e57bb1af)lib/bdev: build bdev_wait_for_examine into subsystem
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Ie3ca2933af97a40ae01ecc3eefe2161d2d34c602
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5483
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch is design to use the single Rados Cluster
object in the same RBD if there are multiple I/O channels created.
And this patch will be prepared for the next patch to share
the same cluster among different RBD bdevs.
Change-Id: I1509f29a9c1088da308a3f88980f0c7fed26476f
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7601
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This module only requires the spdk_bdev_module
APIs, so use this #define to avoid having to
bump so version when spdk_bdev is modified.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I292c38e60cbf21d0d5a6583cb2a2b097c524f3d1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7811
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Michal Berger <michalx.berger@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
- Use consistent cache line size units in KiB across RPC calls
and config files. The KiB units are much easier to use then
the bytes units and are more human readable.
- Properly handle cache start when cache line size is incorrect.
- Add test to check if cache line size value is reported correctly.
- Add cache line size info to JSON RPC documentation.
Fixes#1858
Signed-off-by: Rafal Stefanowski <rafal.stefanowski@intel.com>
Change-Id: Iec9ede85f6884b64605d2d112947b3f175cbd938
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7614
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Recently, checked the spdk_io_device_unregister function,
it will have deferred free behaviour, and the io_device
will possibly be freed in put_io_channel function.
And this means that it is not safe to directly call:
spdk_io_device_unregister (io_device, NULL);
Then free io_device relately resource.
Because there will be channel to use the resources associated
with io_device. Then we will possibly cause a NULL pointer access.
I found this issue in bdev rbd module, and I think that the
same issue could happen in other modules. So it is better to put the
resource free function as the call back function.
Change-Id: Icc1f86d72b672faefb3b7f416030b818a8cf45ce
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7646
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Passing cpu_mask hints that match only single core were
usefull to prevent any accidents when doing round-robin
in case of 'static' scheduler.
In practice this is not required in case of 'static' scheduler,
the threads will be spread out over all reactors anyway.
This hinders other schedulers which try to respect the cpu_mask
hints, as they would not move the thread to any reactor.
Preventing bunching up less used threads on single reactor.
Drawback of this patch is that poll group names will not match
the cores they are on.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I5fb308362dd045228ea9fcca24f988388854c054
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7028
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Purpose: We will also support the kernel idxd driver, so we do not
need export this feature in the module file.
Change-Id: I965e031497920f527962ba187bccd81de6977b8f
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7336
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
There's no good reason to reduce the capacity by aligning it to the
number of optimal open zones. If such alignment is required by the
users of the zone block bdev, it should be done on their own layer.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: Ic8614a82715e9f064619aa8fdb75d1a0b851490c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7656
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We cannot rely on DSM/DEALLOCATE as a write zeroes
alternative, even if DLFEAT reports that deallocated
blocks will be read as all zeroes. DEALLOCATE is
advisory, meaning that blocks may not actually be
deallocated. In cases where they are not deallocated,
they will not be read back later as zeroes.
QEMU 6.0 started reporting DLFEAT as returning zeroes
for deallocated blocks but for some of our write
zeroes tests, blocks aren't actually deallocated.
We may be able to add quirks in the future if we know
that a controller reliably deallocates blocks, but
for now we need to revert this completely.
Note that since bdev/nvme module now does not support
write zeroes in any cases, we need to disable the
write zeroes call in the unit tests.
Fixes issue #1932.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I79f0673774b621a9ffcc46891728cc7719e34cdb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7723
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
(Note: this patch was previously applied as b32cfc46 and then reverted
as 63642bef.)
Today the in-guest nvme device shows physical_block_size=512 even though
the backend iSCSI bdev supports physical_block_size=4K
iSCSI targets exposes physical block size using
logical_block_per_physical_block_exponent in READ_CAPACITY_16
NPWG is one of the way to let Linux nvme driver set
physical_block_size of the nvme block device.
This patch adds spdk_bdev.phys_blocklen which is updated if the iSCSI
backend exposes physical_block_size.
Later phys_blocklen is used in nvmf to set NPWG and NAWUPF to report
back during NS identity.
Linux driver uses min(nawupf, npwg) to set physical_block_size.
Similarly in scsi_bdev fill lbppbe in READ_CAP16 response
based on spdk_bdev.phys_blocklen.
Fixes#1884
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Change-Id: I0b6c81f1937e346d448f49c927eda8c79d2d75c0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7739
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This reverts commit b32cfc467b.
This commit fails the ABI checks and only got through because the checks
were disabled until 21.04 hit.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: Id26b8f8ba551193d99b1ccbd31b35378b4095a20
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7731
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Today the in-guest nvme device shows physical_block_size=512 even though
the backend iSCSI bdev supports physical_block_size=4K
iSCSI targets exposes physical block size using
logical_block_per_physical_block_exponent in READ_CAPACITY_16
NPWG is one of the way to let Linux nvme driver set
physical_block_size of the nvme block device.
This patch adds spdk_bdev.phys_blocklen which is updated if the iSCSI
backend exposes physical_block_size.
Later phys_blocklen is used in nvmf to set NPWG and NAWUPF to report
back during NS identity.
Linux driver uses min(nawupf, npwg) to set physical_block_size.
Similarly in scsi_bdev fill lbppbe in READ_CAP16 response
based on spdk_bdev.phys_blocklen.
Fixes#1884
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Change-Id: I0b6c81f1937e346d448f49c927eda8c79d2d75cf
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7310
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: <dongx.yi@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
When zcero copy send is enabled and used by initiator,
it could significantly increase latency in some payloads.
To enable more fine graing configuration of zero copy
send feature, add new parameters enable_zerocopy_send_server
and enable_zerocopy_send_client to spdk_sock_impl_opts to
enable/disable zcopy for specific type of sockets.
Exisiting enable_zerocopy_send parameter affects all types
of sockets.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I111c75608f8826980a56e210c076ab8ff16ddbdc
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7457
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Karol Latecki <karol.latecki@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
On host side the connections are created and then added to thread's
poll group. Those connections could use different NIC queues underneath.
To route all connections of poll group through single queue a unique
placement id is chosen as group_placement_id and each socket of poll
group is marked with group_placment_id using getsockopt(SO_MARK) option.
The driver could use so_mark value of skb to determine the queue to use.
Change-Id: I06bda777fe07a62133b80b2491fa7772150b3b5d
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6160
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
We do not want to do any further work on adding
the sock to the group if the epoll_ctl (or kevent)
fails.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I44b6dc86ce5676aa1b8d6c50b86f22758e4e37fa
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7594
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot