In the following patches, we want to retry reconnect if reconnect failed
in a reset ctrlr sequence but we want to delay the retry. While
we wait the delayed retry, we want to quiesce ctrlr completely.
As part of quiesce ctrlr operations, we want to pause adminq poller but
we need to do it on the nvme_ctrlr->thread.
If a reset ctrlr sequence runs on the nvme_ctrlr->thread, we can avoid
redirecting the pending destruct request at completion too.
So we redirect the reset ctrlr sequence into the nvme_ctrlr->thread.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I538b962e2a7b5cf00ebbac2a1e888482ddeeee61
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10075
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
For IPV6, cm->cmsg_level is SOL_IPV6 and cm->cmsg_type is
IPV6_RECVERR. However these combination was not included.
To clarify the fix check if positive conditions are satisfied and
then reverse the result.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I675f4337f383d3526fed1b86794697f41113ed4c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10428
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Incorrect decode function used for the param "config_file" in
rpc_bdev_rbd_register_cluster
Signed-off-by: Tan Long <tanl12@chinatelecom.cn>
Change-Id: I6286c5d0d8396a1b548095975924087ba4ee92d2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10444
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
In the following patches, bdev_nvme_reset() will execute the reset ctrlr
operation on the nvme_ctrlr->thread until completion as bdev_nvme_admin_passthru()
does. Hence change the callback bdev_nvme_reset_io_continue() to
redirect to the orig_thread by using bio. Furthermore, use bio->cpl.cdw0
to store the completion status of the reset processing. bdev_nvme_reset()
does not use bio->cpl.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I361cc44494190ba83ad6e360788d78851416c46c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10074
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
In the following patches, bdev_nvme_reset() will execute the reset ctrlr
operation on the nvme_ctrlr->thread until completion as bdev_nvme_admin_passthru()
does. Hence change the callback rpc_bdev_nvme_reset_controller_cb
to redirect to the orig_thread by using a dynamically allocated context.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I8ee61857ac034024d00190875740a675ef1db8b0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10073
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch supports admin passthrough retry when we get any error
with DNR=0 but ABORTED_BY_REQUEST up to retry_count times.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I1bf29570791fdbe8651fa70c4c8685bb740fb86b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9944
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
When resetting ctrlr, adminq is disconnected first. If adminq is disconnected,
admin passthrough request is rejected with -ENXIO.
But resetting ctrlr may succeed. If resetting ctrlr succeeds, adminq is
connected again, and admin passthrough request will be
submitted successfully.
On the other hand, if ctrlr is failed, admin passthrough request is
rejected with -ENXIO. But when resetting ctrlr, ctrlr is set to unfailed.
Hence bdev_nvme_admin_passthru() skips any ctrlr which is resetting
or failed, and calls bdev_nvme_admin_passthru_complete() with -ENXIO
if no available ctrlr is found.
bdev_nvme_admin_passthru_complete() queues admin passthrough request
and retry it one second later if ctrlr is resetting.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ic748dc4faf29ebf717ae5c29dcf7c55fe2ea9243
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9942
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Allow to specify optimal IO boundary for
malloc bdev, it can be used to test split
of IO requests on generic bdev layer
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ic3529dc00cf852ea5cf40d0553d846a698fff6c7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10068
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Separate the admin passthrough case from bdev_nvme_io_complete_nvme_status()
into bdev_nvme_admin_passthru_complete_nvme_status() and from
bdev_nvme_io_complete() into bdev_nvme_admin_passthru_complete(),
respectively.
Then make the return type of bdev_nvme_admin_passthru() to void
by using bdev_nvme_admin_passthru_complete().
Besides, refactor bdev_nvme_admin_passthru() slightly.
These clean up make the following patches simpler.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I79b89ee1b6360aa6ac6fc3c03f0469be99b0c1f2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9899
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
The NVMe bdev module queues retried I/Os itself now.
bdev_nvme_abort() needs to check and abort the target I/O if it
is queued for retry.
This change will cover admin passthrough requests too because they
will be queued on the same thread as their callers and the public
API spdk_bdev_reset() requires to be submitted on the same thread
as the target I/O or admin passthrough requests.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: If37e8188bd3875805cef436437439220698124b9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9913
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
The completion status of spdk_bdev_abort() is SPDK_BDEV_IO_STATUS_SUCCESS
or SPDK_BDEV_IO_STATUS_FAILED if it is successfully submitted.
In the generic bdev layer, spdk_bdev_abort() does not update cdw0 but
just set SPDK_BDEV_IO_STATUS_SUCCESS or SPDK_BDEV_IO_STATUS_FAILED.
In the NVMe bdev module, for the abort request, spdk_bdev_io_complete()
is called instead of spdk_bdev_io_complete_nvme_status() and the
completion status is SPDK_BDEV_IO_STATUS_SUCCESS or
SPDK_BDEV_IO_STATUS_FAILED.
So let's skip updating cdw0 and call spdk_bdev_io_complete() directly
with SPDK_BDEV_IO_STATUS_SUCCESS or SPDK_BDEV_IO_STATUS_FAILED if
bdev_nvme_abort() does not find the target I/O in any ctrlr.
The next patch will fix spdk_bdev_io_get_nvme_status() for the abort
I/O.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I8fb5389cd27d7467cc6ae18e152bd5228f9437f7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9976
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
This patch enables each nvme_ctrlr_channel to access the underlying
nvme_bdev_channels. This change is used to maintain io_path cache
of nvme_bdev_channel.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I22cd3763da1642d4e68dee3a9273e9cc698a4ca8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9893
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
After multipath feature is supported, one bdev will have more than one
nvme ctrlr. Fore ease of view, display each ctrlr's trid info.
Moreover, rename nvme_bdev_ctrlr_get as nvme_bdev_ctrlr_get_by_name here
to keep consistent with nvme_ctrlr_get_by_name.
Signed-off-by: Kai Li <lik271@chinatelecom.cn>
Change-Id: I417506699bbea6ed13dac0fee942749757d2ae47
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10129
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Current code only print the last namespace of nvme bdev, fix the print
way to show all the namespace.
And this patch will be prepared for the next patch to show io path status for multipath, like: which one is the primary or the backup, and the old status and current status,etc.
Signed-off-by: tanlong <948985618@qq.com>
Change-Id: I4fca154df52c929b8d046198934db0e58586c378
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10140
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
This will allow runtime observation of the
dynamic scheduler in a release build.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I7436b5dbcaa0df1529f828ef75f4e9335a092893
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9672
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
If I/O got ANA error, ANA state may be out of date. So in this case
read ANA log page and update ANA states. Mark nvme_ns to be updating
to avoid using while updating ANA state.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ia43d38b3a589c84d6d0479dedcced033e76fb194
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9458
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
If an I/O failed by ANA error, the corresponding ANA state might be
out of date. In the following patches, for this case, read the latest
ANA log page and update the ANA state. Such reading ANA log page may be
done on multiple threads concurrently including AER ANA change.
Hence protect ANA log page by adding an new flag ana_log_page_updating
to struct nvme_ctrlr and using it.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I8bb84091d50a5fdc0d9893b585be972dfd31c0f1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9526
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This will enable us to add more flags without creating any extra hole.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I166e2bd3d116c8cebf75bfe4f290b390d9e3888e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9851
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Add bdev_retry_count to spdk_bdev_nvme_opts and retry_count to
nvme_bdev_io, respectively.
Set type of both to int because we want use -1 for infinite retry.
Set the default value of bdev_retry_count to zero for the backward
compatibility.
bdev_retry_count is configurable by the RPC bdev_nvme_set_options.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I9bc746fcea54aa8722c76f79c70c2ae2b375aa53
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9864
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
retry_count of struct spdk_bdev_nvme_opts controls the number of retries
in the transport layer, and is set to transport_retry_count of struct
spdk_nvme_ctrlr_opts.
The next patch will add bdev_retry_count to struct spdk_bdev_nvme_opts
to control the number of retries in the bdev layer.
For clarification, rename retry_count to transport_retry_count of
struct spdk_bdev_nvme_opts. Then deprecate the retry_count parameter
and add and use an new parameter transport_retry_count instead for
the RPC bdev_nvme_set_options.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I0689c54aa1c96ee99d24236e8ff1a594ad7208e4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9924
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
We can't call spdk_io_channel_iter_get_channel() in the
completion callback of spdk_for_each_channel(), the value
is always NULL.
Change-Id: I65bc972da8a7ab309f3cab438432196a59f26bd4
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9983
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
SPDK nvmf target reports all listeners on all subsystems
in discovery pages, kernel target reports only subsystems
listening on a port where discovery command is received.
NVMEoF specification allows to specify any addresses/
transport types. Ch 5: The set of Discovery Log entries should
include all applicable addresses on the same fabric as the
Discovery Service and may include addresses on other fabrics.
To align SPDK and kernel targets behaviour, add filtering
rules to allow flexible configuration of what should be
listed in discovery log page entries.
Fixes#2082
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ie981edebb29206793d3310940034dcbb22c52441
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9185
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
You can now detach specific paths based on the host parameters. This is
useful for two paths to the same target that use different local NICs.
Change-Id: I4858bfda7d940052ca77ffb0bbe764a688fb315d
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9827
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
This patch is to solve the issue that two nvmf target connect the same rbd image and used for multipath.
The scenario is host wants to access the same rbd image via two gateways, host and gateways are working as nvmf ini and tgt, and two gateways connect with the rbd image, io can switch to another gateway once one is broken. The targets of multipath must have the same uuid, so this patch add a new argument for bdev_rbd_create, like malloc dev.
Signed-off-by: tanlong <948985618@qq.com>
Change-Id: I593fedb6c5d94f625f1b331fdc40e2db488f7fb7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9935
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Use dedicated OCF API functions to set default parameters during
startup configuration instead of manual and incomplete
initialization.
Signed-off-by: Rafal Stefanowski <rafal.stefanowski@intel.com>
Change-Id: Ied1afa9c249a032451a266fd97ce09e6088a0f97
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9786
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The previous patch supported I/O retry when no available io_path
was found at submission.
This patch supports I/O retry when we get I/O path error at completion.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I93a1664944b15ab0a826a321e2ea7a2574263afe
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9850
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
If ANA state is inaccessible or qpair is disconnected, I/O cannot
be submitted.
But if qpair is connected, ANA state may become accessible, or if
qpair is disconnected, it may become connected via resetting.
Hence even if find_io_path() returned NULL, queue I/O and retry it
one second later if qpair is connected or ctrlr is resetting.
Sort retried I/Os by expiration values in ticks, and activate a timed
poller per nvme_bdev_channel only if there is any retried I/O. So
the poller function bdev_nvme_retry_ios() always returns BUSY because
if the poller runs earlier than the closest retried I/O or runs when
there is no retried I/O, it is more like a bug of the framework.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Id28110a0d63ebc1c5772814e2ff8a47934df1644
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9830
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
- remove metadata updater
- handle 'zero' flag in mempool allocator
- adapt ocf_mngt_cache_start() to new OCF API
Signed-off-by: Rafal Stefanowski <rafal.stefanowski@intel.com>
Signed-off-by: Michal Mielewczyk <michal.mielewczyk@intel.com>
Change-Id: I34afd856cc1306ffe305f71a445e7474c9b0a2d9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9129
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
If NN is very large this saves a lot of memory. This lookup is
not generally used in the I/O path anyway.
Change-Id: I98e190006843ad5d0bac8483bf9feb800d4a665a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9884
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
bdev_nvme_find_io_path() selects an io_path whose qpair is connected
and ANA state is optimized or non-optimized.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I79c978795562b606ee27aa43020684d8bcbf50c5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9405
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reset all controllers of a bdev controller sequentially. When resetting
a controller is completed, check if there is next controller, and
start resetting the controller.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I169a84b931c6b03b36bb971d73d5a05caabf8e65
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7274
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Previously the NVMe bdev module had completed the outstanding reset and
then canceled pending resets. This was complex.
On the other hand, the generic bdev layer cancels pending resets
and then completes the outstanding reset.
Following the generic bdev layer simplifies the code and makes us easier
to control retry reset, delay retry reset by a few seconds, or stop retry
after repeated failures and then delete ctrlr.
Update unit tests accordingly.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I9a68422918ebcb052b3a281316ffba9b3450ecd4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9816
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Previous patches (5363eb3c) tried to work around the
32-bit unmap and write_zeroes LBA counts by breaking
up larger operations into smaller chunks of max size
UINT32_MAX lba chunks.
But some SSDs may just ignore unmap operations that
are not aligned to full physical block boundaries -
and a UINT32_MAX lba unmap on a 512B logical /
4KiB physical SSD would not be aligned. If the SSD
decided to ignore the unmap/deallocate (which it is
allowed to do according to NVMe spec), we could end
up with not unmapping *any* blocks. Probably SSDs
should always be trying hard to unmap as many
blocks as possible, but let's not try to depend on
that in blobstore.
So one option would be to break them into chunks
close to UINT32_MAX which are still aligned to
4KiB boundaries. But the better fix is to just
change the unmap and write_zeroes APIs to take
64-bit arguments, and then we can avoid the
chunking altogether.
Fixes issue #2190.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I23998e493a764d466927c3520c7a8c7f943000a6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9737
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Restore the previous nice idea and unify canceling pending resets
into bdev_nvme_complete_pending_resets().
cb_arg of spdk_for_each_channel() was reserved for a different
purpose but it was gone.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I1c01051ae9d8eccc7fc3ec485dd344ff9f55087b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9815
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
The all setters of the reset_cb_fn use boolean as the result eventually.
Using boolean as the result earlier makes the code simpler and the
following patches easier.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I9ec4f670ed7a000a824ab7e261bab07a3dc21f43
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9814
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
bdev_nvme_admin_passthru() chooses the first ctrlr which is not failed.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: If41a1d1e1bde4bddfa92e5a385509daa3f0ce4de
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9525
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
bdev_nvme_admin_passthrough(), bdev_nvme_reset_io(), and
bdev_nvme_abort() do not use io_path. So simply fall through even
if the optimal io_path is not found for these, and clear the
cached io_path for these.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ib26fcbf99c95bbfb6e825c1b7c6455241c198d92
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9675
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Even if the NVMe bdev module supports I/O retry, it will not retry reset or abort.
For clarification, bdev_nvme_reset_io() and bdev_nvme_abort() call
bdev_nvme_io_complete() themselves for error cases.
For bdev_nvme_abort(), we do not need to differentiate error processing
among return values. Simply complete the request with failure.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Id5d51cbba47c6360a6177efd7d5f2e978c48ee9b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9674
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Even when I/O retry is supported, reset will not be retried. However,
bdev_nvme_io_complete() will process I/O retry. Hence inline
bdev_nvme_io_complete() into bdev_nvme_reset_io_complete() to exclude
reset from I/O retry. The result of reset is success or failure, so omit
the -ENOMEM case.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I667e74cbbac4a13cefb6896f898476ba48bcd0fa
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9687
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Save the io_path to which the current I/O is submitted into struct
nvme_bdev_io to know which io_path caused I/O error, or to reuse it
to avoid repeated traversal.
Besides, add a simple helper function nvme_io_path_is_available() to
check if the io_path can be reused.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Idadd3a3188d9b333b335ea79e20a244aa0ce0bdd
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9296
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
We have io_path structure now and returning io_path rather than
ns and qpair match the function name. The following patches will
cache the returned io_path into nvme_bdev_io.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I5d773da18591fc324667f6b5c489a38f497bf3d8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9295
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch removes the critical limitation that ctrlrs which are
aggregated need to have no namespace. After this patch, we can
add multiple namespaces into a single nvme_bdev.
The conditions that such namespaces satisfy are,
- they are in the same NVM subsystem,
- they are in different ctrlrs,
- they are identical.
Additionally, if we add one or more namespaces to an existing
nvme_bdev and there are active nvme_bdev_channels, the corresponding
I/O paths are added to these nvme_bdev_channels.
Even after this patch, ANA state is not utilized in I/O paths yet.
It will be done in the following patches.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I15db35451e640d4beb99b138a4762243bee0d0f4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8131
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Pointer 'ctx->req.multipath' returned from call to
function 'strdup' may be NULL. Reported by Klocwork.
Change-Id: Id4a188ec5340f02c9bd0643db0acb03409dd5829
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9843
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
_find_optimal_core was always consolidating idle threads to g_main_lcore.
Meanwhile for active threads lower lcore id were prioritized over the
higher ones.
So long as g_main_lcore can fit the active thread, it should be prioritized
over any other. Regardless of the lcore id.
Fix#2080
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I30b3a7353bcf243d4362b4db9dde5446c435d675
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9243
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
It can be much simpler when inlined.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: Ifad89d7b9557a45eee601fc2004066a0fd9289c0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9826
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Specifying only a transport id is not enough. We need to be able to
describe the host parameters too.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: Iadbea553aee4b38e7cacab0b486e7e5746d0d1ab
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9825
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This is the currently active path identifier in a failover scenario. The
path is defined by more than just the transport identifier, so fix the
name.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I682c6f4c54f75307e2615bf80e70358180d99fe2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9576
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This defines a unique path between a host and a target.
Change-Id: Ia3d24c1b34199a8b596aaf17900ca9694a9da77d
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9505
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The new path must differ in some way.
Change-Id: I98fd2b2cb3220b482efe0a19bfce94ec4e72bec2
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9418
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This parameter may have the values "disable" or "failover". The default
is failover to match existing behavior. In the future we expect to
change the default behavior to disable.
Further, we expect to add an "enable" option soon to do full multipath.
Change-Id: Iebbdc9b95f23101f18d64e085933463498e627be
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9343
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
We know that the librbd's read/write operations will be handled by a
non spdk thread, so we can get rid of the epoll based group based
polling and directly use the async completion. This makes the code
is simple and easy to maintain.
And we still need to keep the io_device registration for this module,
because the I/Os are async. We need the channel reference on "rbd_if"
in order to know which rbd disks are still active.
Change-Id: I1c140a4b286dbfe113ed2a67bd2875de605e8f24
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9335
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Prior a regular round robin could result in strange performance
if an idxd device from another socket was used.
Signed-off-by: peluse <peluse@localhost.localdomain>
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Id863c79067beabe73ef89d92b3fb3c436821b97a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9367
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Call rte_cryptodev_close() to free qpair memory instead of using
an internal function.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I1bd7f0dd86de83f278f6be3263cdf3fbd8e1c77f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9720
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
There's no need to keep polling with iscsi_bdev_conn_poll() when the request
list is empty, as new requests already restart the poller when needed.
Signed-off-by: John Levon <john.levon@nutanix.com>
Change-Id: Iec5553f8c14d32e894989c9f7a448b6817a821b1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9375
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
It's not providing a lot of value, while being pretty problematic, as
the read is blocking and cannot be easily changed to be non-blocking, as
dump_info_json is a synchronous interface.
Now that dump_info_json isn't using any synchronous interfaces from the
NVMe driver, we can send a bdev_get_bdevs call in the async_init.sh
test to verify that.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: Ida31c8d1000a52b0782f698afe46b071ed4e41df
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9488
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch replaces the synchronous `spdk_nvme_detach()` calls with its
asynchronous counterparts in the controller unregister path.
An additional poller is introduced to periodically poll the NVMe driver
for detach completion. Once the detach is completed, the poller is
unregistered and the nvme_ctrlr is destroyed. The poller uses the same
period (1ms) as the async probe poller.
Since reset and detach cannot happen at the same time, reset_poller was
renamed to reset_detach_poller and it can now store the pointer either
to the reset or detach poller, depending on the circumstances.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: I5eb2dd6383d98d25d1f9748af08c1a13d18acb0e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8729
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This is done in preparation for using the non-blocking versions of the
spdk_nvme_detach API, which will delay controller's delation until the
detach is completed asynchronously.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: Ia785408c9a94427e60bf239e6036a5e89d589f61
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8727
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
It can match by any provided parameter to remove paths.
Change-Id: I5e7a87342bbb90943dc97fb52f142814fcf0acfa
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9453
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Instead of storing an spdk_nvme_transport_id, store the object that
contains it. This will make a few later patches easier.
Change-Id: I36b74889fe39af3b7ab2b900fb3ea4b3f39e1f83
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9484
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Rename it to _is_core_at_limit(). This function
currently returns true if the core is at the limit
(instead of over the limit) which is really the semantics
that we want - so just change the name of the function
to make it more precise.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Idf815f67c71463c3b98bc00211aafdc291abdbd2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9582
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
We have _is_core_over_limit() which determines if a core is
currently over its busy:total tsc ratio. We use this to determine
if we need to move threads off of a core that is too busy.
But when we pick a core to move a thread *to* we were allowing the
dst core to fill to 100%, rather than the SCHEDULER_CORE_LIMIT.
This patch fixes that, which has the nice effect of keeping
thread-to-core assignments much more stable when running
I/O workloads.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Id98b08803939d2a25104082e6436bb8d4727d7c2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9578
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This will lead the scheduler to be quicker to move
threads to an unused core - favoring performance over
power savings.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ibaa5edc61a4bdca5550bd23a562c3645fded25e9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9551
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
If a core has a very high busy percentage, we should
not assume that moving a thread will gain that
thread's busy tsc as newly idle cycles for the
current core.
So if the current core's percentage is above
SCHEDULER_CORE_BUSY (95%), do not adjust the
current core's busy/idle tsc when moving a thread
off of it. If moving the thread does actually
result in some newly idle tsc, it will get adjusted
next scheduling period.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I26a0282cd8f8e821809289b80c979cf94335353d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9581
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
For the src thread, add the busy_tsc of the thread
we are moving to the idle_tsc of the current core.
This is consistent with how are accounting for the
cycles in the target core too.
We will disable the load_balancing.sh script for now.
We will reenable it later in this patch set once
a few other changes are made, along with some updates
to the load_balancing.sh script based on the changes
made in this patch set.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I8af82610804e97dabf62ccd90f75a0e6e37d276f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9550
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This will be useful in some upcoming patches where we will
be calculating these percentages in more places.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: If7d84c00fe1b666988fe06537836ba7b9cb161aa
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9580
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Purpose: Better to put this varable into the bdev_rbd_io structure
instead of on the stack. And we will use this variable in next patch,
when the callback function is completed, so we should not put it on the stack.
Change-Id: I11ff46ef07908084012bc1ce040eceb667334a40
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9334
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The purpose is that we will remove the group reaping of
rbd_io later, so we need to know the original thread info of the
rbd_io.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I69f60261447fdac0b0885fdb213e92c246439047
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9585
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Allow to return more than one memory domain.
This change aligns bdev and nvme API and provides
more flexibility for custom transports.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ica9b12ad8463c361be6cb62ee2c0513eec0b486d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9546
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Enable dump of transport stats in functional test.
Update unit tests to support the new statistics
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I815aeea7d07bd33a915f19537d60611ba7101361
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8885
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This patch enables us to aggrete multiple ctrlrs in the same NVM
subsystem into a single bdev ctrlr to create multipath.
This patch has a critical limitation that ctrlrs which are aggregated
need to have no namespace. Hence any nvme bdev is not created.
However it will be removed in the next patch.
The design is as follows.
A nvme_bdev_ctrlr is created to aggregate multiple nvme_ctrlrs in
the same NVM subsystem. The name of the nvme_ctrlr is changed to be
the name of the nvme_bdev_ctrlr.
NVMe bdev module has both the failover feature and the multipath
feature now. To choose which of failover or multipath to use, add an new
parameter multipath to the RPC bdev_nvme_attach_controller.
When we attach a new trid to the existing nvme_bdev_ctrlr, we use the failover
feature if multipath is false, we use the multipath feature if multipath is
false.
nvme_bdev_ctrlr has a list for nvme_ctrlr and it is guarded by the
global mutex. Callers can query nvme_ctrlrs from a nvme_bdev_ctrlr via
trid as a key. nvme_bdev_ctrlr is not registered as io_device.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I20571bf89a65d53a00fb77236ad1b193e88b8153
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8119
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Previously, if an I/O qpair is disconnected, we tried reconnecting
the qpair. However, this reconnect operation was very likely to fail
and will not match the upcoming asynchronous connect/reconnect
operation. We need an extra callback to make this reconnect operation
asynchronous, but we do not want to have it.
Hence if an I/O qpair is disconnected, we free the I/O qpair and then
reset the corresponding nvme_ctrlr immediately. If the admin qpair is
also disconnected, the nvme_ctrlr is reset immediately. However this
event may never happen. So we do not wait for the error of the admin
qpair.
The NVMf host may disconnect connections by itself intentionally.
In this case, resetting the nvme_ctrlr will surely fail. But resetting
the nvme_ctrlr frees all I/O qpairs of the nvme_ctrlr and these I/O
qpairs are not created again until resetting the nvme_ctrlr succeeds.
Resetting the nvme_ctrlr once at most is more efficient than repeating
reconnecting the I/O qpair. So this change is valuable even for such
intentional disconnection. However, it is helpful to know the event that
I/O qpair is disconnected. Hence change DEBUGLOG to NOTICELOG in the
disconnected callback. The disconnected callback is not repeated, and
we do not need to worry about NOTICELOG flooding.
Refine the unit test case to verify this change.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I376b749c2f55d010692bf916370e8bb4249b795f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9515
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This is similar to how we name other module library
directories.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iadaf59231323180b48b5d0cf2e6acb3d8bfc9807
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9549
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Options modified on sock after connected is also moved to a
function.
Signed-off-by: Liu Qing <winglq@gmail.com>
Change-Id: I4c2a9ae9858c102764959d055bed208b4b0621d9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9516
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This reverts commit 2cd948c4a6.
This commit caused drop in performance tests.
More info in issue https://github.com/spdk/spdk/issues/2158
Signed-off-by: Maciej Wawryk <maciejx.wawryk@intel.com>
Change-Id: Id5d353535323c79e773e33377af388dae47238cb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9510
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Poll group has a list of the associated ctrlr channels to update
them when the corresponding I/O qpair is disconnected to do path
failover.
Another idea is that poll group has a list of the associated bdev
channels. However, two or more bdev channels may share a single
ctrlr channel and I/O qpair is per ctrlr channel. What we want to do
by this addition is to stop I/O submission to any failed I/O qpair
and choose alternative I/O qpair. Hence we take the first idea.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ia287bd1b803313e66b8505a19694a40133b675f1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8124
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Previously nvme_ctrlr_channel had a pointer to a nvme_ctrlr and used it
throughout. However the use cases are not performance critical and
are better to convert from nvme_ctrlr_channel to nvme_ctrlr by
spdk_io_channel_get_ctx() and spdk_io_channel_get_io_device().
Provide a convenient macro nvme_ctrlr_channel_get_ctrlr() to do it.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ie22e8c318af88e09b95824c67b3e874d85425f1a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9195
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
bdev_nvme_find_admin_path() is used only in a place and it's role
is to find a non-failed ctrlr even after multipath is supported.
Inline it into bdev_nvme_admin_passthru() will be better.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: If58507c49b43d047e1f3ef25bbdfb571c36a1956
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9194
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@gmail.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Previously bdev_nvme_reset() returned -EBUSY if ctrlr is being
destructed and returned -EAGAIN if ctrlr is being reset.
These did not match what spdk_nvme_ctrlr_reset() returned.
Reset operation will be more important than current when multipath
is supported and reset operation is made asynchronous.
Hence change bdev_nvme_reset() to follow spdk_nvme_ctrlr_reset().
bdev_nvme_reset() returns -ENXIO if ctrlr is being destructed and
returns -EBUSY if ctrlr is being reset.
Additionally change the return value of bdev_nvme_failover()
accordingly. After the change bdev_nvme_failover() returns -ENXIO
if being destructed and returns -EBUSY if ctrlr is being reset.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ie2c6f8601050b1043d83de9cf01490751784e4e5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8859
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Konrad Sztyber <konrad.sztyber@gmail.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Following the last patch, include hostid into ctrlr_opts rather than
passing it as a parameter for bdev_nvme_create().
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I0d04db1c5767ec76a9a7cd255c3a8d56b0b8f583
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9344
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@gmail.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
nvme_ctrlr_populate_namespace_done() needs nvme_ns, probe_ctx, and completion
status.
When multipath is supported, nvme_ctrlr_populate_namespace_done() will
be called from the spdk_for_each_channel() callback. The spdk_for_each_channel()
callback can have one context and one completion status.
Hence nvme_ns or probe_ctx need to have another.
If an nvme_ctrlr has multiple namespaces, a probe_ctx is shared by
multiple nvme_ns.
So let nvme_ns has probe_ctx.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Iedaaab80616d34d01935f4ebc31e1f3b84e09e32
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9047
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
For multipath, when an new nvme_ns is added to an existing nvme_bdev,
we need to add the corresponding qpair to all existing nvme_bdev_channels
dynamically. The RPC bdev_nvme_attach_controller has to return after that.
Hence populating namespace needs to be asynchronous.
On the other hand, when an nvme_ns is removed from an existing nvme_bdev,
we need to remove the corresponding qpair from all existing nvme_bdev_channels
dynamically. Hence depopulating namespace needs to be asynchronous.
By the recent refactoring, two callbacks, nvme_ctrlr_populate_namespace_done()
and nvme_ctrlr_depopulate_namespace_done() were removed accidentally.
We need to restore these.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I37ea2420588cef3a18648dec053f8bd2b884e86b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9392
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
A few functions can be made private in bdev_nvme.c. Hence delete
these function declarations from bdev_nvme.h.
Besides, bdev_nvme_remove_trid() is only a function declaration.
Hence delete it together in this patch.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I7bf9ae6da332c6426f6eb40926e7c4130d9acd8a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9444
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
An error might occur after succesful transport creation
when the new transport is added to nvmf poll groups, e.g.
in nvmf_transport_poll_group_create. In that case
transport is not detroyed and poll groups are not fully
functional. To correct this behaviour, destroy transport if
spdk_nvmf_tgt_add_transport fails. Also update nvmf_tgt
initialization step to check that all poll groups were
created.
Change-Id: I116e6944729d846c1755c2844c77825f65db8c12
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9255
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jacek Kalwas <jacek.kalwas@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
This only existed to share code between OCSSD and regular NVM
namespaces. Now OCSSD is gone, so just merge the files into bdev_nvme.
Change-Id: Idb73cc05d67144de5dd20af8db24c8f6974d10a7
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9337
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
nvme_ctrlr_populate_namespaces
Avoid relying on this number. Different targets have interpreted its
meaning in different ways and it cannot be used anymore in practice. It
may also be very, very large.
Change-Id: I94e8eae49d6ccdbd8be302b30a120d89242b6d39
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9316
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Try to use these accessors instead of directly using the namespaces
array. This will make changing the data structure easier later on.
Change-Id: I3367d0e0065894f3aa199ed1698d27976b4cbbb5
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9315
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
If the number of namespaces is very large, this can cause excessive
memory allocation. This is especially true because when the number of
namespaces is large, it is almost always very sparsely populated.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I27d94956c222ae3c49c6a7422164ae3a8ec8d963
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9302
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Do simple validity checks first, then check for duplicate controllers.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: Ic21d64574b37a3d9148e5cd6d5a7599449ea8fe1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9341
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Jacek Kalwas <jacek.kalwas@intel.com>
It is more flexible now as it is possible to get nvme ns handle to do
additional management or queries, however if nvme ctrlr handle is needed
there is already public nvme API for that with nvme ns as an input.
Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com>
Change-Id: I5493168ad31cc95687962288d57fb5457f2d7dd6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9357
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
When preparing for a reset, use this new call to tell
the driver to avoid sending DELETE_CQ/SQ commands to a
PCIe controller when they aren't needed.
Fixes issue #2073.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I9ebb7d5c3f7cbb1c3192f162f32edbbea41acde1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9250
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Matt Dumm <matt.dumm@hpe.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
Added writing out JSON configuration for the scheduler
subsystem.
Signed-off-by: Krzysztof Karas <krzysztof.karas@intel.com>
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I51b1f94b3f56d0bfb8a87127163c8e248d6846b6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7119
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This patch moves schedueler and governor related API from
the internal event.h to public scheduler.h.
With this it is possible to create subsystem responsible
for handling the schedulers.
Three schedulers and a governor were moved to scheduler modules
from event framework.
This will allow next patch to add JSON RPC configuration
to the whole subsystem.
Along with easier addition of other schedulers.
Removed debug logs from gscheduler, as they serve little purpose.
Signed-off-by: Krzysztof Karas <krzysztof.karas@intel.com>
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I98ca1ea4fb281beb71941656444267842a8875b7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6995
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
When we rotate the socket in the list, we did not check whether the uc pointer
is NULL, then we cause the coredump when using uc pointer.
When we add a new socket (with pollin event) into the pending_recv list,
we add it into the end of the pending_recv list, then it delays the execution
of the newly socket, and it can cause the timeout especially when a socket
is in a connection phase.
So the purpose of this patch is:
1 Revise the rotation logic to handle the two cases, i.e., (1)sock is in the
beginning of the list; (2)sock is in the end of list. The purpose is
to avoid NULL pointer access, and efficently handle the exceptional case.
2 When there is new pollin event of the socket, we should add socket in the beginning
of the list. And this can avoid the new socket handling starvation.
Since max poll event num is 32 from upper layer and if we always put the new socket
in the end of the list, then starvation will occur if there are many socket connection events.
Because if we add the new socket into the end of the pending list, we will always handle the
existing socks first, then the later coming socket(with relatively pollin event) will always be
handled late. Then in the sock connection initialization phase, it will consume a relatively
long time, then the upper layer connection based on this socket will cause timeout,.e.g.,
ctrlr.c: 185:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host nqn.2014-08.org.nvmexpress:
uuid:af56cce7-2008-408c-a8a0-9c710857febf from subsystem nqn.2019-02.io.spdk:cnode0 due to
keep alive timeout.
[2021-08-25 20:13:42.201139] ctrlr.c: 579:_nvmf_ctrlr_add_io_qpair:
*ERROR*: Unknown controller ID 0x1
Fixes#2097
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I171b83ffd800539e86660c7607538e120fbe1a91
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9223
Reviewed-by: John Kariuki <John.K.Kariuki@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
There's only one type now.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I8fbf330797e772b1c45a04970c95bf4894c26639
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9348
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
There's only one kind of namespace now that OCSSD is gone, so simplify
those whole path.
Change-Id: I721de11c3e7be8b3a13ada25b6d6163a67c6659f
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9329
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
As far as we're aware, this is not in use by anyone. OCSSD has largely
been replaced by ZNS and no OCSSD drives made it to the market.
Change-Id: I020ee277da5292f8c4777f224acafd87586f8238
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9328
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The values returned weren't actually used, and instead
make the code unclear especially in the error cases.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iee2661545a6831f6151a183d4ac723c835e84d13
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9349
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Shouldn't really ever happen but is somehow things get out of whack
and we get a busy back from the low level lib we don't want to
increment our flow control counter.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I49176e8ad2bf6f658ac970efea5db89eebc747f1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9232
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
bdev_nvme_create() is called only by a single caller and hostnqn is
just copied to ctrlr_opts even if it is passed separately.
Hence include hostnqn into ctrlr_opts rather than passing it as a
parameter for bdev_nvme_create().
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I75b640bcecefa94950b0c19936fab0571c428125
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9332
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The NVMe bdev module will have two similar features, multipath and
failover when it supports multipath.
Take a case that we add two different trids with the same name by the
bdev_nvme_attach_controller RPC as an example.
The failover adds secondary trid to an existing nvme_ctrlr. The multipath
feature creates another nvme_ctrlr and adds it to the same nvme_bdev_ctrlr
which has an existing nvme_ctrlr.
We want to use bdev_nvme_attach_controller for both failover and multipath.
To do it cleanly, separate callback to spdk_nvme_connect_async() between
creating ctrlr and setting failover.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Id9bc175af6201cdd74e12d4903fc81afe4f91189
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9225
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We get ext_opts by getting spdk_bdev_io from nvme_bdev_io in
bdev_nvme_readv() or bdev_nvme_writev() now. But this is not aligned
with the current design pattern and not so efficient.
We pass contents of bdev_io via parameters to bdev_nvme_readv() or
bdev_nvme_writev().
Let's follow it about ext_opts.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I8a43f31934a36fa2d43800ec8bf17916edfca477
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9292
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@gmail.com>
When an abort command completes successfully,
cdw0 bit 0 may be set to indicate that the controller
was unable to abort the specified cid. In that case
the bdev nvme abort completion callback needs
to reset the controller.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I42a5e96e19e113c38dec67c2d8575a11f39d41a9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9320
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Converted log message to debug that is called too many times
during a hot remove, filling up the log file.
Signed-off-by: Michael Haeuptle <michael.haeuptle@hpe.com>
Change-Id: I08f7ff7c36c6388270878291df8a1e83646e8aee
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9258
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
When a bdev is being unregistered, after all
channels have been closed, the bdev layer calls
the module's destruct callback for the bdev before
calling the bdev unregister callback.
For the rbd module, the destruct callback is
bdev_rbd_destruct. This callback unregisters the
rbd io_device which is an asynchronous operation.
We need to return >0 from bdev_rbd_destruct to
inform the bdev layer that this is an asynchronous
operation, so that it does not immediately call
the bdev unregister callback. Once the rbd io_device
is unregistered, we can call spdk_bdev_destruct_done()
which will trigger the bdev layer to finally call
the bdev unregister callback.
Without this fix, deleting an rbd bdev would
complete before the backing cluster reference
had been released. This meant that even
if you had deleted all rbd bdevs, there might still
be cluster references in place for a short period of
time. It's better to wait to complete the delete
operation until the cluster reference has been
released to avoid this issue (which this patch
now does).
Fixes issue #2069.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I8ac156c89d3e235a95ef196308cc349e6078bfd7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9115
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
The bdev layer doesn't call the destruct callback until
all channels have been released, but because the channel
delete callback passes message to the main thread, we can
end up with a complicated race condition. Currently we
have a deferred_free code path to handle this race, but
we can handle this a bit more cleanly by doing the
construct operation on the main_td as well.
This also simplifies the next patch which will
asynchronously destruct the bdev to fix an RPC bug.
Here's the race:
1) first channel was created on thread A, so disk->main_td = thread A
2) second channel was created on thread B
3) first channel is freed (but disk->main_td is still thread A)
4) spdk_bdev_unregister is called on thread C
5) bdev layer gives callback on thread B to upper layer
6) upper layer on thread B frees channel
7) bdev_rbd_destroy_cb runs on thread B and has to send msg to thread A
for processing
8) bdev layer calls bdev_rbd_destruct on thread C (since step #4 was on
thread C)
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I25ede2dc56e24dac0919aed05b9def2560823ee7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9158
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Klocwork filtered it out that there is a dereference
of pointer 'lvol' before its NULL check.
Change-Id: I83a026e8762d1e004cf1a351585f48fb1e24ae41
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9208
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch makes use of async_fini_start flag to
make fini_start asynchronous.
During this time all lvol stores which have no open
lvols are unloaded. This is required, since lvs
holds claim on the underlying bdev.
Fixes#1630
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: If443cb087324d08a4a70df71c7afd930ab654f90
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9095
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Callback for bdev modules is called 'module_fini',
meanwhile after its execution bdev modules were to call
'spdk_bdev_module_finish_done()'.
This function carries incorrect name, so it was deprecated
and replaced with 'spdk_bdev_module_fini_done()'.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I9a12dff746ea8b4b1570a3794470f7b24e29003e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9148
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
fini_start() is called for each bdev module before
iterating over all unclaimed bdevs to unregister them.
This allows bdev modules to behave differently during
each such unregister. Ex. unloading lvol store when
all lvol bdevs on it are unregistered.
Another use of this callback is to unclaim all bdevs
that can be at that point. Especially ones that will
not receive callback due to no bdev registered.
Ex. offline raid bdev, when some underlying bdevs are missing.
fini_start() being synchronous does not help in cases
where to release claim on the bdev, an asynchronous operation
is required. Ex. lvol store with no bdevs present, requires
async lvs unload to be called.
This patch adds async_fini_start flag for the bdev modules,
to be used when async fini_start is required. When done,
bdev module has to call spdk_bdev_module_finish_start_done().
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I63438b325d4cc53fd236bf9ff143abf6bdd81c49
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9094
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
When removing a socket from a sock_group, the recv_task should be
cancelled last, because it can be sent out while cancelling other tasks
(if POLLERR is received). Otherwise, we could end up with outstanding
recv requests from a socket removed from a group.
Fixes#2112.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: Ic8e24c210541390dd8bdffe8d3bc4e7dd746d4b7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9239
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Once bdev finish starts, bdev unregister is called on all
unclaimed bdevs. This means that for lvs with at least one
lvol present, there will be a corresponding bdev unregister.
Yet the vbdev_lvol module does not attempt to unload the lvs,
once last lvol from that lvs is unregistered. Leaving
the base bdev for lvs claimed.
This patch fixes that by using fini_start callback from
bdev_module to mark when shutdown begins. After that
last lvol unregistered on lvs will unload it.
Expanded struct lvol_bdev to contain lvol_store_bdev.
Closing the lvol will free spdk_lvol, so lvol->lvol_store cannot
be accessed.
Changed ut_lvol_destroy UT to ut_bdev_finish. Previous UT didn't
really test vbdev_lvol_destroy, but 'hotremove' of the lvol bdev.
In effect there is no hotremove of the lvol bdevs (only lvs bdev).
spdk_bdev_unregister() can only be called from within vbdev_lvol,
or during bdev module finish.
This UT will now check the bdev module finish.
Note that at this point lvs with no lvols will not trigger
lvs unload. Next patches in series will introduce async fini_start,
to allow for the unload.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I8f51e8c1fcfdc55a5d090a3bc84ccefda813aef8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9093
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
OCF creates vbdev with block size equal in size with a core device.
We need to ensure that cache's bdev block size is not bigger than
core's bdev block size, so there are no IO errors due to IO length
smaller than cache device's block size.
The reason why this is implemented late in the cache start and not
as soon as we want to construct OCF vbdev is that cache or core
device can be added later after OCF vbdev creation and only then
we are certain to have both devices to compare.
Fixes#2088
Signed-off-by: Rafal Stefanowski <rafal.stefanowski@intel.com>
Change-Id: I536c783ca71b52f212217c597b7997f2d2e89491
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9229
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
The new API is used if bdev ext_opts is not NULL.
Change-Id: I414b5d19bff54114d6708efed89ba19b5955f56a
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6271
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Added struct lvol_bdev wrapping around spdk_bdev.
This will help with passing around any context.
For this patch only spdk_lvol is still used.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I23f5be5edda526ad607ea8b45274c945a42d90db
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9147
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Same iterator was used in two places already, and new one
will be added in next patch.
Replaced _SAFE variant since entries cannot be removed
while iterating in this loop.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I5ee911a5b653cfbdf0b4a39d283bd5eee66c49d8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9092
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
async_mode option is currently supported in PCIe transport layer
to create io qpair asynchronously. User polls the io_qpair for
completions, after create cq and sq completes in order, pqpair
is set to READY state. I/O submitted before the qpair is ready
is queued internally. Currently other transports only support
synchronous io qpair creation.
Signed-off-by: Monica Kenguva <monica.kenguva@intel.com>
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: Ib2f9043872bd5602274e2508cf1fe9ff4211cabb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8911
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
In order to not affect the loopback test.
Also create a sock_common.c file which can be used by posix/uring
implementation. We do not put such code in sock.c. Because sock.c
is the general layer. Other users may include their own user space
sock impelmentations. So put those common code in sock_common.c instead.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I983ec2313119539e6eed2d9f11ba1488c0ed6560
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8769
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This patch tries to add MSG zero copy feature.
Though io_uring supports buffer registration, it only
support io_uring_prep_write_fixed. It means only one
registered buffer can be used. It does not satisfy our
current usage mode.
According to this situation, we still use the MSG_ZEROCOPY
flags in io_uring_prep_sendmsg.
Furthermore, this new feature is dependent on the kernel
version, The currently verified version is
kernel 5.12 rc3. So it is not enabled in the default manner.
For example, if you want to use it on the target side, you can
use the following rpc to configure:
./scripts/rpc.py sock_impl_set_options -i uring --enable-zerocopy-send-server
Change-Id: Ie7bb828f466362add94891989ddf0950dccd9e80
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/957
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Sending the unmap in zone reset is optional (it's only a hint), so if
the base bdev doesn't support it, the reset is completed immediately.
Fixes#2064.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: If2c57eadc20d352a71853d7023599503330e1252
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9154
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Parses and verifies hexadecimal cpu bit mask specified by the user.
Added verification to check for cpu cores range, making sure poll groups cores
assigned within the range of cpu cores allocated for the application.
RPC nvmf_set_config now takes an argument to configure ‘poll groups’,
a new parameter for NVMf subsystem. This parameter sets a CPU mask
to spawn threads which run an event loop for a ‘poll group’.
Change-Id: Ied9081c2213715ec94de00a8b37153730b8ac2ed
Signed-off-by: Yuri <yuriy.kirichok@hpe.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5611
Community-CI: Mellanox Build Bot
Reviewed-by: Matt Dumm <matt.dumm@hpe.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Purpose: Only Open image on one spdk_thread
due to the limitation of librbd module in order to
eliminate the lock overhead among different
spdk_threads on operating on the same image.
Change-Id: I64c62e8ae1c3324b92cfd953b44ec08af6688530
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6812
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This parameter is still part of API spdk_sock_impl_opts
structure but it is not used. Keep it to support ABI
compatibility since it is located in the middle of the
structure and removing it may break socket opts initialization
or parsing.
Change-Id: Ib641ad7d965d68bc9ebb65dba531408d88cf6fa1
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8914
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Revise the if case to avoid the assert issue.
Change-Id: I095f3d111423e17abaa1f951fe22efb3d2e851b7
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8872
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Implement bdev_nvme_reset_controller rpc, which allows the NVMe
controller to be reset over RPC. Implement bdev_nvme_reset_rpc()
which starts the reset of the controller and returns the result of
the controller reset via the callback function after it completes.
Signed-off-by: Jonathan Teh <jonathan.teh@mayadata.io>
Change-Id: Id98d5e56feb315b7e44e9bb5e5f495e9b1cd1de0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8456
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
In bdev_nvme_reset_ctrlr(), get a controller reset context and start
a poller that calls spdk_nvme_ctrlr_reset_poll_async() to perform the
controller reset asynchronously.
Signed-off-by: Jonathan Teh <jonathan.teh@mayadata.io>
Change-Id: I1e3ae42291c3b43b69c99ca56997dc1965c3ac59
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8454
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
For clarity, this element was added when crc+copy API was
added so might as well have all the CRC related functions use
it instead of `dst` to avoid confusion.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: Ic43adbd0df51c1a349847701ef318f452306d0b3
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8229
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Support in accel_perf is coming up in a later patch.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I63a1d3b9b1a3254fdca78e27c473b9b3468c93c8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8202
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Create a single nvme_bdev_channel for each nvme_bdev and each SPDK
thread. nvme_bdev_channel has a pair of nvme_ns and nvme_ctrlr_channel.
The pair of nvme_ns and nvme_ctrlr_channel will be aggregated by
nvme_ns_channel in the following patches.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I095a2d6afa4ea23a87e4452b2f9d4c7e0087abe0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6605
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot