Not every transport requires accept poller - transport specific
layer can have its own policy and way of handling new connection.
APIs to notify generic layer are already in place
- spdk_nvmf_poll_group_add
- spdk_nvmf_tgt_new_qpair
Having accept poller removed should simplify interrupt mode impl
in transport specific layer.
Fixes issue #1876
Change-Id: Ia6cac0c2da67a298e88956734c50fb6e6b7521f1
Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7268
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
If a subsystem has no listeners, then there is no need
to update the discovery log when adding a host, or setting
a subsystem to allow all hosts.
This eliminates some unnecessary discovery log update
notifications, especially when setting 'allow any hosts'
on a subsystem immediately after it is created (and before
it has any listeners).
Update unit test to check the adding a host to a
subsystem without listeners does not rev the genctr.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I63dab5df564269e574bb925890088f52063aa378
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10546
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Jacek Kalwas <jacek.kalwas@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
The discovery log isn't updated when a subsystem is created
or deleted, it's only updated when a listener for a
subsystem is added or removed.
So remove the nvmf_update_discovery_log() in the subsystem
create and delete paths. They just generate extra AER
completions that potentially cause the host to do unneeded
work.
Note that if a subsystem is deleted with active listeners,
the subsystem delete path will remove each of the listeners
before deleting the subsystem itself. So the discovery log
will still get updated when those listeners are removed.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Id01bbfa3b24d3e1279a614a2fd60be41387a03b1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10545
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Jacek Kalwas <jacek.kalwas@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Batching will be made available for DSA specifically through the new
idxd_perf tool.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: Ic51d9ad3692074805b1ffa705cea8be35737c778
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9846
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The failover_in_progress flag is used to decide the return value of
bdev_nvme_failover().
bdev_nvme_delete() calls bdev_nvme_failover() with remove=true to remove
nvme_ctrlr->active_path_id. However bdev_nvme_failover() returns zero
if nvme_ctrlr->failover_in_progress is true. bdev_nvme_failover() may
return zero even if it does not remove nvme_ctrlr->active_path_id.
The following will be better.
bdev_nvme_failover() returns -EBUSY if nvme_ctrlr->resetting is true,
and the caller repeats calling bdev_nvme_failover() until the target trid
becomes alternative path or bdev_nvme_failover() returns zero.
To do that, the failover_in_progress flag is not necessary any more.
Removing the failover_in_progress will also simplify the following
patches to unify ctrlr reset and failover.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I57ab944beb1d06ea4def144c81c69705860de35f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10441
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
We had not checked the bit 0 of the Namespace Multipath I/O and
Namespace Sharing Capabilities (NMIC) field in the Identify Namespace
data structure.
If the bit 0 of the NMIC is zero, it is likely that namespaces are not
identical.
We should check if the value of the NMIC first, and do it in this patch.
Additionally, it is not usual if the bit 0 of the CMIC and the bit 0 of
the NMIC do not match. So in unit tests rename the parameter multi_ctrlr
by multipath for ut_attach_ctrlr() and use it for the value of the NMIC.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I6aa7cbcc99be2507dbf18930f7b585a9ea7d0f90
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10380
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Replace the spdk_nvme_ctrlr_reset_async() and spdk_nvme_reset_poll_async()
calls by the spdk_nvme_ctrlr_disconnect(), spdk_nvme_ctrlr_reconnect_async(),
and spdk_nvme_ctrlr_reconnect_poll_async() calls in a reset ctrlr
sequence.
spdk_nvme_ctrlr_disconnect() can fail if ctrlr is already resetting or
removed. But both cases are not possible. reset is controlled and the callback
to the hot remove is called when the ctrlr is hot removed. So we assume
spdk_nvme_ctrlr_disconnect() always succeed.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I1299e198597b2a2110f80b9a868e2dae015682ee
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10092
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Also we don't treat exceptions when getting INTEL log pages
as a fatal error, the initialization will still contine.
Change-Id: Ic2fd2be510fde2679c1546482934d0a180266936
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10341
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
When NVMe passthru command (IO or admin) fails on submission (e.g. it
is not supported), set DNR bit in completion status field. There is no
sense in retrying the command in this case.
Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com>
Change-Id: I55960c128bd9fc31f6defef0b9832259a71684b1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8578
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
If NVMe admin passthru command is not supported by underlying bdev,
set status code in NVMe completion to INVALID_OPCODE.
Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com>
Change-Id: I29c4e1f8263b76b27c199cfd2d9b2474432ec70b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10517
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The originally detected problem is that SPDK NVMf target fails command
with invalid opcode with status code INTERNAL_DEVICE_ERROR instead of
INVALID_OPCODE. All unknown commands on IO queue are passed to
underlying block device layer as NVME_IO type. It is not checked if
this type of commands is supported and, when command fails,
INTERNAL_DEVICE_ERROR is set as status code. If command fails on
submission, status code is set to INVALID_OPCODE which is more
relevant.
This patch adds check if command type is supported to
bdev_nvme_*_passthru functions. If not supported, it is failed with
ENOTSUP.
Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com>
Change-Id: I4d7f7639da17dd3b1dc3eee7eb1b4a4f876117a2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8567
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Move the CONFIGURE_AER state before SET_KEEP_ALIVE to
make sure that we run the CONFIGURE_AER state for
discovery controllers.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ia4e24f6507c43e3fece06b9161ff8e0b8fa0e97d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10332
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
In the following patches, we want to retry reconnect if reconnect failed
in a reset ctrlr sequence but we want to delay the retry. While
we wait the delayed retry, we want to quiesce ctrlr completely.
As part of quiesce ctrlr operations, we want to pause adminq poller but
we need to do it on the nvme_ctrlr->thread.
If a reset ctrlr sequence runs on the nvme_ctrlr->thread, we can avoid
redirecting the pending destruct request at completion too.
So we redirect the reset ctrlr sequence into the nvme_ctrlr->thread.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I538b962e2a7b5cf00ebbac2a1e888482ddeeee61
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10075
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Discovery services using the SPDK nvme driver may
use long-lasting connections that detect AER completions
to determine when there are changes in the discovery
log. This means that we still need to send keep alives
on discovery controller admin queues. So move the
SET_KEEP_ALIVE_TIMEOUT state immediately after
IDENTIFY, and run the SET_KEEP_ALIVE_TIMEOUT state
even for discovery controllers.
Note, we need the IDENTIFY's KAS value to properly
set the keep alive timeout, so we have to keep the
IDENTIFY state before SET_KEEP_ALIVE_TIMEOUT.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I5c6403c28fb72d42629c5f9009a89c4bfd44d162
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10329
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
In the following patches, bdev_nvme_reset() will execute the reset ctrlr
operation on the nvme_ctrlr->thread until completion as bdev_nvme_admin_passthru()
does. Hence change the callback bdev_nvme_reset_io_continue() to
redirect to the orig_thread by using bio. Furthermore, use bio->cpl.cdw0
to store the completion status of the reset processing. bdev_nvme_reset()
does not use bio->cpl.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I361cc44494190ba83ad6e360788d78851416c46c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10074
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch supports admin passthrough retry when we get any error
with DNR=0 but ABORTED_BY_REQUEST up to retry_count times.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I1bf29570791fdbe8651fa70c4c8685bb740fb86b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9944
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
When resetting ctrlr, adminq is disconnected first. If adminq is disconnected,
admin passthrough request is rejected with -ENXIO.
But resetting ctrlr may succeed. If resetting ctrlr succeeds, adminq is
connected again, and admin passthrough request will be
submitted successfully.
On the other hand, if ctrlr is failed, admin passthrough request is
rejected with -ENXIO. But when resetting ctrlr, ctrlr is set to unfailed.
Hence bdev_nvme_admin_passthru() skips any ctrlr which is resetting
or failed, and calls bdev_nvme_admin_passthru_complete() with -ENXIO
if no available ctrlr is found.
bdev_nvme_admin_passthru_complete() queues admin passthrough request
and retry it one second later if ctrlr is resetting.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ic748dc4faf29ebf717ae5c29dcf7c55fe2ea9243
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9942
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
For DSM command, the NVMe drive may take a long time to finish it,
if we set a small timeout value for DSM command, the bdev/nvme module
will try to reset the IO queue pair when timeout happens,
in `spdk_nvme_ctrlr_free_io_qpair`, we will abort the outstanding
IO requests first, then in the `nvme_pcie_ctrlr_delete_io_qpair`,
we will poll the CQ for any requests that have been completed by
the NVMe controller, if there are NVMe completions in the CQ,
we will finish them again, thus double completions happened.
Here we rename `nvme_qpair_abort_reqs` to `nvme_qpair_abort_all_queued_reqs`,
so the common layer will just abort queued request, and let each
transport to abort outstanding requests case by case.
Fix#2233.
Change-Id: Icae6214239160c615418cb514fc51cfe77b59211
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10233
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The NOTICELOGs really clutter the output during
application start - it's better to make these DEBUGLOGs
instead.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I3ae37d5d057d7b972017befbc0834de414b9710b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10191
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
The NVMe bdev module queues retried I/Os itself now.
bdev_nvme_abort() needs to check and abort the target I/O if it
is queued for retry.
This change will cover admin passthrough requests too because they
will be queued on the same thread as their callers and the public
API spdk_bdev_reset() requires to be submitted on the same thread
as the target I/O or admin passthrough requests.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: If37e8188bd3875805cef436437439220698124b9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9913
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
The recent changes merged multiple Data-OUT PDUs within the same
sequence into a single subtask up to 64KB.
However, they were not enough.
For a large write operation, the hardware iSCSI HBA host sent an immediate
data whose size was not block size multiples and then more solicit
data through R2T exchanges.
One example for a 64KB write operation was as follows:
host sent SCSI Write with 5792 bytes and F = 1
target replied a R2T
host sent Data-OUT with 15880 bytes
host sent Data-OUT with 11536 bytes
host sent Data-OUT with 2848 bytes
host sent Data-OUT with 11536 bytes
host sent Data-OUT with 5744 bytes
host sent Data-OUT with 12200 bytes and F = 1
The hardware iSCSI HBA host can decide the size of the unsolicited data
but the SPDK iSCSI target can require the host to send the solicited data
whose size is block size multiples.
Hence we merge immediate data to the following R2T data if the immediate
data is not more than 64KB and more R2T data come.
Add another test case to check if the fix works for the above example.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I4906b4e1a8b61e08862f4ccc27a6caf165126530
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9708
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This was sometimes used as the maximum array index and sometimes as the
maximum count. Make it consistent everywhere and give it a better name.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I518efd99a7d36584624490b0b3497bb6e81ce9ac
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10101
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
After multipath feature is supported, one bdev will have more than one
nvme ctrlr. Fore ease of view, display each ctrlr's trid info.
Moreover, rename nvme_bdev_ctrlr_get as nvme_bdev_ctrlr_get_by_name here
to keep consistent with nvme_ctrlr_get_by_name.
Signed-off-by: Kai Li <lik271@chinatelecom.cn>
Change-Id: I417506699bbea6ed13dac0fee942749757d2ae47
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10129
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
In the bdev-zone API, there are a few functions that takes a zone_id:
spdk_bdev_get_zone_info(), spdk_bdev_zone_management(), and the
spdk_bdev_zone_append() functions.
The way a zoned application is usually written is that it starts off
by getting the zone report for all zones (zone_id will be sent in as 0),
and then the application will keep the whole zone report in memory.
Therefore, an application usually have access to the zone_id/zslba for
all zones. However, there are cases, e.g. when getting an error on write,
where the completion callback will only have the lba of the write that
failed.
Add a helper function that can be used to get the zone_id/slba for a
given lba. Having this helper in bdev-zone will avoid SPDK applications
needing to provide their own implementation for this.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Change-Id: I978335f87f7d49bc33aed81afcaa6d9f0af8a1e4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10180
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
When data segment size is 64KB and data digest is enabled, if
data segment and data digest are split into different two packets,
- pdu->mobj[0] became full first when reading data semgment,
- pdu->mobj[1] was allocated but unused and data digest was read.
In this case, two SCSI write tasks were submitted by mistake and
the second SCSI write task had no data.
Fix the bug in this patch.
When iscsi_pdu_payload_read() is called and pdu->mobj[0] is full,
allocate pdu->mobj[1] only if any of data segment remains to read.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I9a0c36c05f90092c3c2122a7eb91e10976830b40
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9965
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This meant to zero the entire active namespace list.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I2da2293b53acd57d3480cf93b052eb1520de35d4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10028
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
verify_io() keeps track of a buf pointer, but the
buf pointer never actually gets used. So remove
this buf pointer.
Found by clang-13.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I79dfeac7f004b56f7d4404f41b2ff18b96968a20
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10056
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
If I/O got ANA error, ANA state may be out of date. So in this case
read ANA log page and update ANA states. Mark nvme_ns to be updating
to avoid using while updating ANA state.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ia43d38b3a589c84d6d0479dedcced033e76fb194
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9458
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
If an I/O failed by ANA error, the corresponding ANA state might be
out of date. In the following patches, for this case, read the latest
ANA log page and update the ANA state. Such reading ANA log page may be
done on multiple threads concurrently including AER ANA change.
Hence protect ANA log page by adding an new flag ana_log_page_updating
to struct nvme_ctrlr and using it.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I8bb84091d50a5fdc0d9893b585be972dfd31c0f1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9526
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Add bdev_retry_count to spdk_bdev_nvme_opts and retry_count to
nvme_bdev_io, respectively.
Set type of both to int because we want use -1 for infinite retry.
Set the default value of bdev_retry_count to zero for the backward
compatibility.
bdev_retry_count is configurable by the RPC bdev_nvme_set_options.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I9bc746fcea54aa8722c76f79c70c2ae2b375aa53
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9864
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
SPDK nvmf target reports all listeners on all subsystems
in discovery pages, kernel target reports only subsystems
listening on a port where discovery command is received.
NVMEoF specification allows to specify any addresses/
transport types. Ch 5: The set of Discovery Log entries should
include all applicable addresses on the same fabric as the
Discovery Service and may include addresses on other fabrics.
To align SPDK and kernel targets behaviour, add filtering
rules to allow flexible configuration of what should be
listed in discovery log page entries.
Fixes#2082
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ie981edebb29206793d3310940034dcbb22c52441
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9185
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
The qpair's state member is only 3 bits of a uint8_t,
and the in_completion_context bit is another bit in that
same uint8_t.
We know that the qpair's state is only ever updated by
one thread, but it is possible that the state could
be modified by one thread, while another thread
is modifying in_completion_context.
in_completion_context is only modified by the thread
that is polling the qpair (or the qpair's poll group).
But with async mode, another thread that has a qpair
on the same PCIe controller could poll its adminq and
reap the SQ completion for the qpair that's owned by
the other thread.
So do *not* set the generic qpair state to CONNECTED
from the SQ completion callback. Instead just set
the pcie_state to READY, and let the thread that owns
the qpair detect the qpair is READY and set the state
to CONNECTED itself.
Fixes issue #2157.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I9efc0c954504f1841e1c3890ae78211ad0d1990e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9975
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Currently either HW Engine Channel or SW Engine Channel will be used.
In the case that HW Engine Channel is used while does not support related
operations like IOAT for CRC, it will shift back to the SW Engine's handle.
So that this is an issue that it still refers to the HW Engine Channel
while needs SW Eninge Channel to handle.
This patch introduces the SW Eninge Channel and always initializes there
in case that HW Engine does not support some operations.
Related UT also added to simulate the case the IOAT does not support CRC
and then SW Eninge needs to properly handle it.
Change-Id: I4ecdcd09ab669a616b37c567b45b1e6499800ec9
Signed-off-by: GangCao <gang.cao@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9874
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
In some cases a single virtually contriguos memory
buffer can be translated to several chunks of memory.
To make such translation possible, update structure
spdk_memory_domain_translation_result to use a pointer
to iovec.
Add a single iov structure or cases where translation
is always 1:1, it will make easier translation callback
implementation. For RDMA transport translation of address
is always 1:1, so treat iovcnt other than 1 as an
error.
Change-Id: I65605575d43a490490eba72c1eb19f3a09d55ec6
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9779
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Instead of a union with domain type specific
parameters, store an opaque pointer to user
context. Depending on the memory domain type,
this context can be cast to a specific struct,
e.g. to spdk_memory_domain_rdma_ctx for RDMA
memory domains.
This change provides more flexibility to
applications to create and manage custom
memory domains
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Change-Id: Ib0a8297de80773d86edc9849beb4cbc693ef5414
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9778
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Push operation complements existing pull
operation and allows to implement read data
flow using memory domains.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Change-Id: I0a3ddcb88c433dff7a9c761a99838658c72c43fd
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9701
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The previous patch supported I/O retry when no available io_path
was found at submission.
This patch supports I/O retry when we get I/O path error at completion.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I93a1664944b15ab0a826a321e2ea7a2574263afe
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9850
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
If ANA state is inaccessible or qpair is disconnected, I/O cannot
be submitted.
But if qpair is connected, ANA state may become accessible, or if
qpair is disconnected, it may become connected via resetting.
Hence even if find_io_path() returned NULL, queue I/O and retry it
one second later if qpair is connected or ctrlr is resetting.
Sort retried I/Os by expiration values in ticks, and activate a timed
poller per nvme_bdev_channel only if there is any retried I/O. So
the poller function bdev_nvme_retry_ios() always returns BUSY because
if the poller runs earlier than the closest retried I/O or runs when
there is no retried I/O, it is more like a bug of the framework.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Id28110a0d63ebc1c5772814e2ff8a47934df1644
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9830
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
The new name suits better to the following "data push"
operation
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ic3249f65de203f375477f8e87b0749b9502d165c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9878
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
iovs are not needed in the callback
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I29718f1f2e65881628b72dea938e40c60348b85d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9877
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
If NN is very large this saves a lot of memory. This lookup is
not generally used in the I/O path anyway.
Change-Id: I98e190006843ad5d0bac8483bf9feb800d4a665a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9884
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
In the SPDK NVMe driver, spdk_nvme_ctrlr_reset_async() sets
ctrlr->is_failed to false and spdk_nvme_ctrlr_reset_poll_async() sets
ctrlr->is_failed to true if it fails.
On the other hand, in the unit test for the NVMe bdev module,
the stub for spdk_nvme_ctrlr_reset_async() does nothing and
the stub for spdk_nvme_ctrlr_reset_poll_async() sets ctrlr->is_failed
to false if it succeeds.
This bug made us very difficult to write unit test for I/O retry.
Hence fix this bug.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ic0dcf1109ce543a53fca74708fc86c8c74a17692
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9829
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
bdev_nvme_find_io_path() selects an io_path whose qpair is connected
and ANA state is optimized or non-optimized.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I79c978795562b606ee27aa43020684d8bcbf50c5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9405
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reset all controllers of a bdev controller sequentially. When resetting
a controller is completed, check if there is next controller, and
start resetting the controller.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I169a84b931c6b03b36bb971d73d5a05caabf8e65
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7274
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Previously the NVMe bdev module had completed the outstanding reset and
then canceled pending resets. This was complex.
On the other hand, the generic bdev layer cancels pending resets
and then completes the outstanding reset.
Following the generic bdev layer simplifies the code and makes us easier
to control retry reset, delay retry reset by a few seconds, or stop retry
after repeated failures and then delete ctrlr.
Update unit tests accordingly.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I9a68422918ebcb052b3a281316ffba9b3450ecd4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9816
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Otherwise compiler complains that this variable
is used uninitialized (scsi_dev_find_free_lun does
reference it).
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I8fce604f22e06f7007669510f8ba0ae27c44261a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9868
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Previous patches (5363eb3c) tried to work around the
32-bit unmap and write_zeroes LBA counts by breaking
up larger operations into smaller chunks of max size
UINT32_MAX lba chunks.
But some SSDs may just ignore unmap operations that
are not aligned to full physical block boundaries -
and a UINT32_MAX lba unmap on a 512B logical /
4KiB physical SSD would not be aligned. If the SSD
decided to ignore the unmap/deallocate (which it is
allowed to do according to NVMe spec), we could end
up with not unmapping *any* blocks. Probably SSDs
should always be trying hard to unmap as many
blocks as possible, but let's not try to depend on
that in blobstore.
So one option would be to break them into chunks
close to UINT32_MAX which are still aligned to
4KiB boundaries. But the better fix is to just
change the unmap and write_zeroes APIs to take
64-bit arguments, and then we can avoid the
chunking altogether.
Fixes issue #2190.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I23998e493a764d466927c3520c7a8c7f943000a6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9737
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
bdev_nvme_admin_passthru() chooses the first ctrlr which is not failed.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: If41a1d1e1bde4bddfa92e5a385509daa3f0ce4de
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9525
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We have io_path structure now and returning io_path rather than
ns and qpair match the function name. The following patches will
cache the returned io_path into nvme_bdev_io.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I5d773da18591fc324667f6b5c489a38f497bf3d8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9295
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch removes the critical limitation that ctrlrs which are
aggregated need to have no namespace. After this patch, we can
add multiple namespaces into a single nvme_bdev.
The conditions that such namespaces satisfy are,
- they are in the same NVM subsystem,
- they are in different ctrlrs,
- they are identical.
Additionally, if we add one or more namespaces to an existing
nvme_bdev and there are active nvme_bdev_channels, the corresponding
I/O paths are added to these nvme_bdev_channels.
Even after this patch, ANA state is not utilized in I/O paths yet.
It will be done in the following patches.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I15db35451e640d4beb99b138a4762243bee0d0f4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8131
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Most SCSI hosts, Linux, Windows, VMware, supports 256 LUNs per
device now, and it is not easy to test even if any other non-free
OS or driver supports more than 256 LUNs.
Hence increase the macro constant SPDK_SCSI_DEV_MAX_LUN from 64 to
256. Then we do not need to expose it publicly now. So move it to
lib/scsi/scsi_internal.h.
Update the CHANGELOG together.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Iacde46c3854f326eebfb8befb47d41fce383b027
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9631
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Change a fixed size array to a linked list to manage LUNs per SCSI
device.
Keep the linked list sorted by LUN ID because this is necessary to
efficiently find the lowest free LUN ID or check the specified LUN is free.
To avoid traversing the linked list twice, change scsi_dev_find_free_lun()
to return the LUN which comes just before where we want to insert an new LUN.
Additionally, previously spdk_scsi_dev_add_lun_ext() had not checked if
the specified LUN ID was duplicated. Fix the bug in this patch.
Add unit test cases for the function scsi_dev_find_free_lun().
These changes will enable the following patches to increase
SPDK_SCSI_DEV_MAX_LUN from 64 to 256 without consuming additional memory.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I7f6f070ddc680127cf86ae255055da2d1d29e4ec
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9630
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Specifying only a transport id is not enough. We need to be able to
describe the host parameters too.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: Iadbea553aee4b38e7cacab0b486e7e5746d0d1ab
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9825
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This is the currently active path identifier in a failover scenario. The
path is defined by more than just the transport identifier, so fix the
name.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I682c6f4c54f75307e2615bf80e70358180d99fe2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9576
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This defines a unique path between a host and a target.
Change-Id: Ia3d24c1b34199a8b596aaf17900ca9694a9da77d
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9505
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
By this change, we will not need to traverse LUN list or tree in the
callback to hot remove.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ibe72fba824553d0189b9120884aa2113599a568d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9627
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Add two public APIs spdk_scsi_dev_get_first_lun() and
spdk_scsi_dev_get_next_lun() to remove the dependency on the macro
constant SPDK_SCSI_DEV_MAX_LUN from lib/iscsi and lib/vhost.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I6546697f823fe9f4fa34e1161f5c7fa912dd2d59
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9608
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Call rte_cryptodev_close() to free qpair memory instead of using
an internal function.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I1bd7f0dd86de83f278f6be3263cdf3fbd8e1c77f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9720
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch replaces the synchronous `spdk_nvme_detach()` calls with its
asynchronous counterparts in the controller unregister path.
An additional poller is introduced to periodically poll the NVMe driver
for detach completion. Once the detach is completed, the poller is
unregistered and the nvme_ctrlr is destroyed. The poller uses the same
period (1ms) as the async probe poller.
Since reset and detach cannot happen at the same time, reset_poller was
renamed to reset_detach_poller and it can now store the pointer either
to the reset or detach poller, depending on the circumstances.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: I5eb2dd6383d98d25d1f9748af08c1a13d18acb0e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8729
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
According to the specification, we should also post an AER
error event for this error case.
Fix#2171.
Change-Id: Ifb2343453ea5e36ce244938a939537ee6ed1c4e1
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9584
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
It can match by any provided parameter to remove paths.
Change-Id: I5e7a87342bbb90943dc97fb52f142814fcf0acfa
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9453
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Instead of storing an spdk_nvme_transport_id, store the object that
contains it. This will make a few later patches easier.
Change-Id: I36b74889fe39af3b7ab2b900fb3ea4b3f39e1f83
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9484
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
If a core has a very high busy percentage, we should
not assume that moving a thread will gain that
thread's busy tsc as newly idle cycles for the
current core.
So if the current core's percentage is above
SCHEDULER_CORE_BUSY (95%), do not adjust the
current core's busy/idle tsc when moving a thread
off of it. If moving the thread does actually
result in some newly idle tsc, it will get adjusted
next scheduling period.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I26a0282cd8f8e821809289b80c979cf94335353d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9581
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
For the src thread, add the busy_tsc of the thread
we are moving to the idle_tsc of the current core.
This is consistent with how are accounting for the
cycles in the target core too.
We will disable the load_balancing.sh script for now.
We will reenable it later in this patch set once
a few other changes are made, along with some updates
to the load_balancing.sh script based on the changes
made in this patch set.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I8af82610804e97dabf62ccd90f75a0e6e37d276f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9550
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
The values 100 and 200 are used a lot in this part of the
unit tests, many times for different reasons. So add
some more variables and use some of the existing ones more
often to make some of this more clear to the reader.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I2196bb6a1ac4b86ab0ddd9a3b88863664116cca5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9625
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Refactor this part of the unit tests to make it a bit
easier to maintain as the dynamic scheduler itself is
modified.
For example, depending on the simulated thread loads,
we may need to pass extra events to cores for
purposes of setting interrupt mode. The important
thing to test here isn't how many events it takes to
do that, but what is the end result.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iad2e861cfa0bfd16c853332650e3ab3a9727f490
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9624
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
spdk.mock.unittest.mk contains platform specific definitions to wrap
syscalls. Allow SPDK_MOCK_SYSCALLS to be predefined before it is
included to extend the list of syscalls to be wrapped. Update rpc
Makefile to use this mechanism so that the platform specific definitions
are used.
Signed-off-by: Nick Connolly <nick.connolly@mayadata.io>
Change-Id: If51c0e7a31cf0eda45a844cb8cfa579efe173c42
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9621
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
When data digest is enabled for a nvme tcp qpair, we can use accel_fw
to calculate the data crc32c. Then if there are multiple
c2h pdus are coming, we can use both CPU resource directly
and accel_fw framework to caculate the checksum. Then the datao value compare
will not match since we will not update "datao" in the pdu coming order.
For example, if we receive 4 pdus, named as A, B, C, D.
offset data_len (in bytes)
A: 0 8192
B: 8192 4096
C: 12288 8192
D: 20480 4096
For receving the pdu, we hope that we can continue exeution even if
we use the offloading engine in accel_fw. Then in this situation,
if Pdu(C) is offloaded by accel_fw. Then our logic will continue receving
PDU(D). And according to the logic in our code, this time we leverage CPU
to calculate crc32c (Because we only have one active pdu to receive data).
Then we find the expected data offset is still 12288. Because "datao" in tcp_req will
only be updated after calling nvme_tcp_c2h_data_payload_handle function. So
while we enter nvme_tcp_c2h_data_hdr_handle function, we will find the
expected datao value is not as expected compared with the data offset value
contained in Pdu(D).
So the solution is that we create a new variable "expected_datao"
in tcp_req to do the comparation because we want to comply with the tp8000 spec
and do the offset check.
We still need use "datao" to count whether we receive the whole data or not.
So we cannot reuse "datao" variable in an early way. Otherwise, we will
release tcp_req structure early and cause another bug.
PS: This bug was not found early because previously the sw path in accel_fw
directly calculated the crc32c and called the user callback. Now we use a list and the
poller to handle, then it triggers this issue. Definitely, it will be much easier to
trigger this issue if we use real hardware engine.
Fixes#2098
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I10f5938a6342028d08d90820b2c14e4260134d77
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9612
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Abort any queued admin requests once admin queue gets enabled. A request
can get queued if a controller is being reset and it gets submitted
while admin qpair is being reconnected. If these requests aren't
aborted, the init process will stall, as requests don't get resubmitted
while controller is resetting and subsequent admin commands required for
the initialization would be queued too.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: If456a297d2d434b3cc741816cbfb13b01d37e963
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9324
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Allow to return more than one memory domain.
This change aligns bdev and nvme API and provides
more flexibility for custom transports.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ica9b12ad8463c361be6cb62ee2c0513eec0b486d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9546
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Enable dump of transport stats in functional test.
Update unit tests to support the new statistics
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I815aeea7d07bd33a915f19537d60611ba7101361
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8885
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This patch enables us to aggrete multiple ctrlrs in the same NVM
subsystem into a single bdev ctrlr to create multipath.
This patch has a critical limitation that ctrlrs which are aggregated
need to have no namespace. Hence any nvme bdev is not created.
However it will be removed in the next patch.
The design is as follows.
A nvme_bdev_ctrlr is created to aggregate multiple nvme_ctrlrs in
the same NVM subsystem. The name of the nvme_ctrlr is changed to be
the name of the nvme_bdev_ctrlr.
NVMe bdev module has both the failover feature and the multipath
feature now. To choose which of failover or multipath to use, add an new
parameter multipath to the RPC bdev_nvme_attach_controller.
When we attach a new trid to the existing nvme_bdev_ctrlr, we use the failover
feature if multipath is false, we use the multipath feature if multipath is
false.
nvme_bdev_ctrlr has a list for nvme_ctrlr and it is guarded by the
global mutex. Callers can query nvme_ctrlrs from a nvme_bdev_ctrlr via
trid as a key. nvme_bdev_ctrlr is not registered as io_device.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I20571bf89a65d53a00fb77236ad1b193e88b8153
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8119
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Previously, if an I/O qpair is disconnected, we tried reconnecting
the qpair. However, this reconnect operation was very likely to fail
and will not match the upcoming asynchronous connect/reconnect
operation. We need an extra callback to make this reconnect operation
asynchronous, but we do not want to have it.
Hence if an I/O qpair is disconnected, we free the I/O qpair and then
reset the corresponding nvme_ctrlr immediately. If the admin qpair is
also disconnected, the nvme_ctrlr is reset immediately. However this
event may never happen. So we do not wait for the error of the admin
qpair.
The NVMf host may disconnect connections by itself intentionally.
In this case, resetting the nvme_ctrlr will surely fail. But resetting
the nvme_ctrlr frees all I/O qpairs of the nvme_ctrlr and these I/O
qpairs are not created again until resetting the nvme_ctrlr succeeds.
Resetting the nvme_ctrlr once at most is more efficient than repeating
reconnecting the I/O qpair. So this change is valuable even for such
intentional disconnection. However, it is helpful to know the event that
I/O qpair is disconnected. Hence change DEBUGLOG to NOTICELOG in the
disconnected callback. The disconnected callback is not repeated, and
we do not need to worry about NOTICELOG flooding.
Refine the unit test case to verify this change.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I376b749c2f55d010692bf916370e8bb4249b795f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9515
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This is similar to how we name other module library
directories.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iadaf59231323180b48b5d0cf2e6acb3d8bfc9807
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9549
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Also make stub for spdk_mempool_get_bulk consistent with DPDK APIs.
Change-Id: I021378ea92651d75a73cc9f447df57c2f71680fa
Signed-off-by: Mao Jiang <maox.jiang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9356
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Moved frequenty used stack vars to globals and added setup and
teardown functions. Should be useful in upcoming patches as well
wrt code savings.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I468bec8856c354fcc954628e4e733594a6580104
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7013
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Renamed nvme_qpair_abort_reqs() to nvme_qpair_abort_reqs_with_cbarg() to
highlight the fact that it only aborts requests with specified cb_arg
and to distinguish it from _nvme_qpair_abort_reqs() which aborts all
requests immediately.
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: I32fec5ab0501b1beb8605689d73ec42a6424fba5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9323
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>