The backend device module can report such capabilities
to the bdev layer, and we can split unmap request into
multiple children requests in the bdev layer.
Change-Id: I81daf7e58b797a2673ef102f2a037de20771092e
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7515
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
(Note: this patch was previously applied as b32cfc46 and then reverted
as 63642bef.)
Today the in-guest nvme device shows physical_block_size=512 even though
the backend iSCSI bdev supports physical_block_size=4K
iSCSI targets exposes physical block size using
logical_block_per_physical_block_exponent in READ_CAPACITY_16
NPWG is one of the way to let Linux nvme driver set
physical_block_size of the nvme block device.
This patch adds spdk_bdev.phys_blocklen which is updated if the iSCSI
backend exposes physical_block_size.
Later phys_blocklen is used in nvmf to set NPWG and NAWUPF to report
back during NS identity.
Linux driver uses min(nawupf, npwg) to set physical_block_size.
Similarly in scsi_bdev fill lbppbe in READ_CAP16 response
based on spdk_bdev.phys_blocklen.
Fixes#1884
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Change-Id: I0b6c81f1937e346d448f49c927eda8c79d2d75c0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7739
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
We have some cases where libraries need the
spdk_bdev_module structure definition and a couple
of related APIs, but not everything else (i.e.
spdk_bdev, spdk_bdev_io), for purposes of avoiding
abidiff errors.
For example, nvmf creates a dummy spdk_bdev_module,
and then uses it with the spdk_bdev_module_claim_bdev
API to ensure multiple subsystems cannot add the same
bdev as a namespace.
But when nvmf includes bdev_module.h, it pulls in the
spdk_bdev structure definition as well. This means
when the spdk_bdev structure is modified, it requires
a major version bump since abidiff detects the
difference in the debug info.
Alternatives considered:
* We could add a specific suppression into our abidiff
script for nvmf and struct spdk_bdev, but it would be
a risk (albeit very very small one) that we could
add a real dependency on struct spdk_bdev in the future,
and the suppression would hide the difference.
* We could also break out bdev_module.h into multiple
header files, but the ways of doing that either result
in odd file naming, or modifying every bdev module to
include a new header.
* We could add a public bdev API to expose what the
bdev library needs, but that seemed even more intrusive
than this change. nvmf is kind of abusing the bdev_module
API here, and I'd prefer to not promote that kind of
usage by adding something to the bdev API.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ie8fdef8ea294d005b9ae7934dde49c62748420d1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7737
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Move some of the bdev_module APIs immediately after
the bdev_module structure definition. This prepares
for an upcoming patch.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I78d534ba047048ec27d8d3159666584a52211de3
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7736
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This reverts commit b32cfc467b.
This commit fails the ABI checks and only got through because the checks
were disabled until 21.04 hit.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: Id26b8f8ba551193d99b1ccbd31b35378b4095a20
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7731
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Today the in-guest nvme device shows physical_block_size=512 even though
the backend iSCSI bdev supports physical_block_size=4K
iSCSI targets exposes physical block size using
logical_block_per_physical_block_exponent in READ_CAPACITY_16
NPWG is one of the way to let Linux nvme driver set
physical_block_size of the nvme block device.
This patch adds spdk_bdev.phys_blocklen which is updated if the iSCSI
backend exposes physical_block_size.
Later phys_blocklen is used in nvmf to set NPWG and NAWUPF to report
back during NS identity.
Linux driver uses min(nawupf, npwg) to set physical_block_size.
Similarly in scsi_bdev fill lbppbe in READ_CAP16 response
based on spdk_bdev.phys_blocklen.
Fixes#1884
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Change-Id: I0b6c81f1937e346d448f49c927eda8c79d2d75cf
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7310
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: <dongx.yi@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Add support in bdev_zone.h for getting the maximum zone append data
transfer size.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Change-Id: I61203e64d51601232c6578a090fa52975364c1f3
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6910
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The NVMe Zoned Namespace Command Set Specification has, in addition to a
Max Open Resources limit, a Max Active Resources limit.
An active resource is defined as zone being in zone state implicit open,
explicit open, or closed.
Create a function spdk_bdev_get_max_active_zones() in the generic SPDK
zone layer, so that this limit can be exposed to the user.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Change-Id: I6f61fc45e1dc38689dc54d5649c35fa9b91dbdfc
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6908
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
spdk_vbdev_register() was deprecated in SPDK 19.04.
config_text field in spdk_bdev_module was deprecated in SPDK 20.10.
spdk_bdev_part_base_construct() was deprecated in SPDK 20.10.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Ib795ccdf61154c168032ccf8b81ea77e5e663851
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6628
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This helps user to locate whether bdev_io fails in
spdk bdev layer or inside Linux AIO.
SPDK_BDEV_IO_STATUS_AIO_ERROR indicates bdev_io fails
due to Linux AIO or its lower layer's failure.
New functions spdk_bdev_io_complete_aio_status and
spdk_bdev_io_get_aio_status can be used to report out
the errno from Linux AIO.
Change-Id: I32640e4a0459cca057278c02ea5a7522f3408a02
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5690
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
The backend device such as virtio-blk or virtio-scsi
may support the SIZE_MAX and SEG_MAX. Then SPDK needs
to split the big IO. Add this feature in bdev.
Change-Id: I2442e14121ccf141682964425e96382fec482af3
Signed-off-by: Jin Yu <jin.yu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4600
Community-CI: Broadcom CI
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Currently only nvme bdev module implements this interface. Bdev module
context (in this case spdk_nvme_ctrlr opaque handle) allows for nvme
interface usage for additional management.
Signed-off-by: Jacek Kalwas <jacek.kalwas@intel.com>
Change-Id: I6302c9229d5f7f294a3c1472d9e8dc1519637ffb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4924
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This patch removes function for bdev modules to
present options of the bdevs.
blob_bdev.h refers to the spdk_bdev_module, so would need
to be bumped too.
At this time spdk_bdev_module is left unchanged to prevent
that.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I3cacb087c998d928c5d8c2722b7f041d82bb43f1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4748
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI
Add an new API spdk_bdev_part_base_construct_ext() to pass not bdev but
bdev_name to fix the race condition due to the time gap between
spdk_bdev_get_by_name() and spdk_bdev_open(). A pointer to a bdev is
valid only while the bdev is opened.
In the new API, spdk_bdev_get_by_name() is included in
spdk_bdev_part_base_construct_ext() and the caller has to know if
the bdev exists or not. Hence spdk_bdev_part_base_construct_ext()
returns return code and returns the created part object by the double
pointer.
Another critical change is that base is just freed if spdk_bdev_open_ext()
failed with -ENODEV. The reason is that if we call spdk_bdev_part_base_free()
for that case, the configuration is removed by the registered callback
and so bdev_examine() will not work.
The following patches will replace spdk_bdev_part_base_construct()
by spdk_bdev_part_base_construct_ext() for the corresponding bdev
modules.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I2db027a159559c403cdfbd71800afba590b0f328
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4576
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
This is technically a function that will remain
static in the library where SPDK_BDEV_MODULE_REGISTER
is invoked. It was missed in earlier checks because it
is created via a macro.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I32a59f88bb7c112183763f222d4e1c8c864c545a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2794
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
The I/O aborted by the abort command should be completed with SC = 07h, i.e.,
"Command Abort Requested". However, if the generic bdev layer or non-NVMe
bdev module aborted the I/O, the aborted I/O would complete with SC = 06h, i.e.,
"Internal Error". To fix this unexpected behavior, add an new I/O status
SPDK_BDEV_IO_STATUS_ABORTED and update spdk_bdev_io_get_nvme_status() to
set SC to 07h if the I/O status is SPDK_BDEV_IO_STATUS_ABORTED.
If the NVMe bdev module aborts the I/O, the I/O status is set to
SPDK_BDEV_IO_STATUS_NVME_ERROR and SC is set as expected.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ifc99a97248a8d54a8c8d2fab74a90c7ce99c2e6e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2582
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Add spdk_bdev_abort function as a new public API.
This goes all the way down to the bdev driver module and attempts to
abort all I/Os which has bio_cb_arg as its callback argument.
We can separate when only a single I/O has bio_cb_arg and when multiple
I/Os have bio_cb_arg, but unify both by using parent - children I/O
relationship. To avoid confusion, return matched_ios by _bdev_abort() and
store it into split_outstanding by the caller.
Exclude any I/O submitted after this abort command because the same cb_arg
may be used by all I/Os and abort may never complete.
bdev_io needs to have both bio_cb_arg and bio_to_abort because bio_cb_arg
is used to continue abort processing when it is stopped due to the capacity
of bdev_io pool, and bio_to_abort is used to pass it to the underlying
bdev module at submission. Parent I/O is not submitted directly, and is
only used in the generic bdev layer, and parent I/O's bdev_io uses bio_cb_arg.
Hence add bio_cb_arg to bdev structure and add bio_to_abort to abort structure.
In the meantime of abort operation, target I/Os may be completed. Hence
check if the target I/O still exists at completion, and set the completion
status to false only if it still exists.
Upon completion of this, i.e., this returned zero, the status
SPDK_BDEV_IO_STATUS_SUCCESS indicates all I/Os were successfully aborted,
or the status SPDK_BDEV_IO_STATUS_FAILED indicates any I/O was failed to
abort by any reason.
spdk_bdev_abort() does not support aborting abort or reset request
due to the complexity for now.
Following patches will support I/O split case.
Add unit tests together to cover the basic paths.
Besides, ABI compatibility check required us to bump up SO version of
a few libraries or modules. Bump up SO version of blob bdev module simply
because it does not have any out-of-tree consumer, and suppress bumping
up SO version of lvol library because the affected struct spdk_lvol
is not part of public APIs.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I515da688503557615d491bf0bfb36322ce37df08
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2014
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Added new function for getting NVMe specific return code
for fused commands. Also changed one of the return codes
in fused commands so that we could distinguish error
cases.
Signed-off-by: Maciej Szwed <maciej.szwed@intel.com>
Change-Id: I86417ea4f5b8f3e6496162be3d6c6128076e35d4
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/481666
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
A new API was added `spdk_bdev_io_get_aux_buf` allowing the caller to request
an auxiliary buffer for its own private use. The API is used in the same manner that
`spdk_bdev_io_get_buf` is used and the length of the buffer is always the same as the
bdev_io primary buffer. 'spdk_bdev_io_put_aux_buf' is called to free the
auxiliary buffer.
The initial use case is crypto, in the next patch in series it is used. No UT were
added as the logic isn't that complicated and it is fully tested with each run
of crypto.
Fixed a comment typo also (not mine for once).
Signed-off-by: paul luse <paul.e.luse@intel.com>
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ib1939fcbc8e5db36fd909ef26771a725a551e8e6
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/478383
Community-CI: SPDK CI Jenkins <sys_sgci@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Compare and write fused operation num_blocks should
not exceed value of 'atomic compare and write unit'.
In case of NVMe native support we should read this
value from 'namespace atomic compare and write unit'
if set in namespace identify data, otherwise from
'atomic and write unit' field in controller identify
data. If bdev does not support this natively we should
set this value to 1.
Signed-off-by: Maciej Szwed <maciej.szwed@intel.com>
Change-Id: Ib1ea02dbf9d1eed476d9dd0114ea96b1376e0c45
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/477911
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Added new field in bdev_io structure for tracking
number of IO retries. It will be used in future patches.
Signed-off-by: Maciej Szwed <maciej.szwed@intel.com>
Change-Id: I8e002e93f54c9ce39c7af0dd3a1960e6aea93580
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/479828
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Community-CI: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
We will only support a vectored variant of
compare-and-write for now.
This does no locking for now. Ii will be added
in a separate patch.
Signed-off-by: Maciej Szwed <maciej.szwed@intel.com>
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I5bd075c912de60090e19cf8fced19c4879fcc900
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/475941
Community-CI: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
We can't allow overlapped locked ranges - otherwise
two different channels could be deadlocked.
So add a pending_locked_ranges to the bdev. When we
start a lock operation, check if the new range overlaps
one that's already locked. If so, put it on the pending
list. When an unlock operation completes, we will
check if any pending ranges can now be locked.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I2e3113216a195887b954533495ff200df14fadc1
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/478537
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Keep a mutex protected list of the active locked ranges
in the bdev itself. This is only accessed when a new
channel is created, so that it can be populated with
the currently locked ranges.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Id68311b46ad4983b6bc9b0e1a8664d121a7e9f8e
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/477871
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Media management event was introduced. It's sent out to notify that
some portion of the data needs to be rewritten (e.g. due to data
refresh, wear leveling, high error rate, etc.). This type of
notification is only utilized by devices exposing raw access to the
physical medium (e.g. Open Channel SSDs).
Change-Id: Ia30faa5866d71fd597009b441f69c609de974161
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/471460
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Add a link to the bdev channel for linking all the bdev
IOs that were submitted to this channel so that we can
monitor each IO's consuming-time.
Change-Id: I1e425b2059f20fd7b158eb3d6b023ce8629e7a30
Signed-off-by: JinYu <jin.yu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469227
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This change allows setting of the NVMe completion queue
CDW0 in spdk_bdev_io_complete_nvme_status.
Before that change, handling of vendor specific NVMe IO
commands was limited since there wasn't a way to return
command specific info back to the initiator.
Change-Id: I250d5df3bd1e62ddb89a011503d42bd4c8390f9b
Signed-off-by: Michael Haeuptle <michael.haeuptle@hpe.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470678
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
spdk_bdev_zone_management() allows to perform
management action on a zone. Zone is specified
by start logical block address. Available zone
actions: open, close, reset and finish.
Change-Id: Ie7eaed3e2cc7b9b49dd51ee2d6c28b4ef2f23eb9
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/460647
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
spdk_bdev_get_zone_info() is used for retriving
information about zones inside zoned namespace.
Change-Id: I8f931505245e984c0b1ee35ed6592c978ee47544
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/460643
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Added new public header for zoned bdev. Zoned bdev is an
extension of the bdev interface. Generic concept comes
from ATA/SCSI and is also being worked as an NVMe TP.
Zoned device logical blocks space is divided into fixed-sized
zones. Each zone is described by its start logical block address
and capacity. Writes to a single zone need to be sequential.
After zone is fully written it need to be reset to write to it
again. Such writing schema could be very beneficial in terms of
write amplification factor for NAND based devices.
SPDK Flash Translation Layer library will be consuming this
interface in the future.
Extending SPDK bdev interface will allow to use existing bdev
infrastructure for this new type of devices.
Zoned device have several properties defined in spdk_bdev
structure:
- zone_size: default size of each zone
- max_open_zone: maximum number of open zones
- optimal_open_zones: optimal number of open zones to get
best performance on writes
Single zone properties are defined in spdk_bdev_zone_info
structure:
- start_lba: first logical block of z zone
- write_pointer: logical block address in the zone at
which next write shall occur.
- capacity: maximum number of logical blocks that may
be written in the zone when zone is empty.
- state: zone state
Several zone states are defined: Empty, Open, Full, Closed,
Read Only and Offline.
To change zone state zone actions are defined: Close, Finish,
Open and Reset.
Change-Id: I5fcc22d548c15743329344cae96f5ff73e268504
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/460642
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Maciej Szwed <maciej.szwed@intel.com>
This patch is entry point for extending bdev
interface to support devices with zoned namespace
semantics.
spdk_bdev_is_zoned() will allow user to check if
bdev is zoned bdev.
Change-Id: Id9ea9898d406d1d942bf3081b00ebcb574ac2b5e
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/460641
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Maciej Szwed <maciej.szwed@intel.com>
Added write_unit_size field to bdev structure.
It describes required number of logical blocks
for write operation. For legacy bdevs this value
will be equal to logical block size.
For bdevs working on top of Open Channel/Zoned
Namespace SSDs or RAID 5 volumes write size unit
could be greater than logical block size and
upper layer should perform write operations
with size of multiple of write unit size.
Change-Id: I55eb6687912a7d0d1157874b2778e7d6c2d44e37
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/463802
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Maciej Szwed <maciej.szwed@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
For devices that don't have a UUID, the UUID is generated at
registration time. That means that some devices will not have the same
UUID from run to run, but this seems no worse than having no UUID at
all.
Change-Id: Icf6b8517ffcffabafa2b73176dc03d896d0017fe
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/459604
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This patch adds *_with_md family of functions allowing for IO with
metadata being transferred in a separate buffer.
Change-Id: I842d5a00a532cf5d0b0f0738535ea46903674140
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/451465
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Add a ZCOPY operation to obtain buffers that represent
data regions on the backing block device.
Change-Id: Ie941c16ee051d0009e3888b52b8f41773bba47b3
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/386166
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We used to wait only for those descriptors which
specified the hotremove notification callback. The
bdev could've been removed before the descriptor
was closed and the subsequent spdk_bdev_close would
simply segfault.
This patch modifies spdk_bdev_unregister to always
wait for all descriptors to be closed before actually
unregistering the bdev. This consolidates the bdev
unregister behavior for descriptors with and without
the hotremove callback.
Change-Id: I9b358209c6abd301b6fe8660e27bc6fa4ef485d6
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/450175
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This API had good intentions, but as more complicated
use cases came up where base bdevs could come and go,
we've realized that the bdev layer will need another
mechanism to query bdev modules on these types of
relationships between a virtual bdev and its base
bdevs. We removed all code related to tracking
the array of base bdevs a long time ago.
Change all existing callers to use spdk_bdev_register.
Document spdk_vbdev_register as deprecated for now,
and change its implementation to just call
spdk_bdev_register for simplicity sake.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I3b40ed96480c0fa7184db42953a9f4e4c167fed1
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/450076
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Currently if module claims a bdev in examine_config,
it cannot start asynchronously. This patch changes this
behaviour by calling examine_disk on modules which previously
claimed bdev trough examine_config.
Signed-off-by: Piotr Pelplinski <piotr.pelplinski@intel.com>
Change-Id: I85b603590c6dab50e59ef7e68f292cb9abc47d98
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/448132
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Vitaliy Mysak <vitaliy.mysak@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
When the specified buffer size to spdk_bdev_io_get_buf() is greater
than the permitted maximum, spdk_bdev_io_get_buf() asserts simply and
doesn't call the specified callback function.
SPDK SCSI library doesn't allocate read buffer and specifies
expected read buffer size, and expects that it is allocated by
spdk_bdev_io_get_buf().
Bdev perf tool also doesn't allocate read buffer and specifies
expected read buffer size, and expects that it is allocated by
spdk_bdev_io_get_buf().
When we support DIF insert and strip in iSCSI target, the read
buffer size iSCSI initiator requests and the read buffer size iSCSI target
requests will become different.
Even after that, iSCSI initiator and iSCSI target will negotiate correctly
not to cause buffer overflow in spdk_bdev_io_get_buf(), but if iSCSI
initiator ignores the result of negotiation, iSCSI initiator can request
read buffer size larger than the permitted maximum, and can cause
failure in iSCSI target. This is very flagile and should be avoided.
This patch do the following
- Add the completion status of spdk_bdev_io_get_buf() to
spdk_bdev_io_get_buf_cb(),
- spdk_bdev_io_get_buf() calls spdk_bdev_io_get_buf_cb() by setting
success to false, and return.
- spdk_bdev_io_get_buf_cb() in each bdev module calls assert if success
is false.
Subsequent patches will process the case that success is false
in spdk_bdev_io_get_buf_cb().
Change-Id: I76429a86e18a69aa085a353ac94743296d270b82
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/446045
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Currently struct bdev_io holds io_channel that the I/O was submitted
on through bdev_io::bdev_channel, but bdev_io::bdev_channel is private
in bdev.c and cannot be referenced in other files.
Hence add an new API spdk_bdev_io_get_io_channel API to get io_channel
coniveniently.
Change-Id: Ic2e2fde845d324f7a1637e3c75080727a62de5ec
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/443843
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Currently, the SPDK_BDEV_REGISTER_MODULE() macro uses __LINE__
to generate functions like spdk_bdev_module_register_187().
Typically, this is not a problem as these functions are not called directly
rather, they are only used as constructor functions to load the bdevs during
system startup.
There are languages however, (e.g rust) that require these functions to be
referenced explicitly to prevent them from being removed during the linking phase.
In order to reference them, having the names predictable (and potentially
changed per commit) makes things easier.
Change-Id: I15947ed9136912cfe2368db7e5bba833f1d94b15
Signed-off-by: gila <jeffry.molanus@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/443536
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This patch is for DIF check types.
Add enum spdk_dif_check_type to DIF library.
Add a field dif_check_flags to struct spdk_bdev and add
spdk_bdev_is_dif_check_enabled to bdev APIs.
Added enum is intended to improve usability. If no enum, the
caller will have to get raw data of flags and mask each bit.
Change-Id: Ia46a37a9684dc968dcc51963674f0a9963e0cd4d
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/443339
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This patch is for DIF settings.
Add fields dif_type and dif_is_head_of_md to struct spdk_bdev and
add APIs spdk_bdev_get_dif_type and spdk_bdev_is_dif_head_of_md to
bdev APIs.
The fields dif_type and dif_is_head_of_md are added to the JSON
information dump.
Change-Id: I15db10cb170a76e77fc44a36a68224917d633160
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/443184
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
To support DIF, bdev will need to expose the following information:
- Metadata format
- Block size
- Metadata size
- Metadata setting (interleave or separate)
- DIF settings
- DIF type 1, 2, or 3
- DIF location
- DIF check types
- Guard check
- Reference tag check
- Application tag check
This patch is for the metadata format. Subsequent patches will do for the DIF
setting and DIF check types.
Add fields, md_len and md_interleave, to struct spdk_bdev and add APIs,
spdk_bdev_get_md_size and spdk_bdev_is_md_interleaved, to bdev APIs.
The fields, md_len and md_interleave, are added to the bdev JSON infomation dump.
DIF will be used only in the NVMe bdev module and the upcoming virtual
DIF bdev module first. But additional required storage by md_len and md_interleave
will be very small and they are simple. Hence add them to struct spdk_bdev simply.
Change-Id: I4109f6a63e6f0576efe424feb0305a9a17b9b2e8
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/443183
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Considering it's the part base object that's now accessible
in its remove callback, we can simplify the part API by making
it accept the part base object directly.
Change-Id: I87c3278929a063c115828d02e0def7fa536e6682
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/434835
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Now, that _spdk_bdev_io_get_buf offers allocating aligned buffers,
add possibility to store original buffer and replace it by aligned
one for the time of IO.
Signed-off-by: Piotr Pelplinski <piotr.pelplinski@intel.com>
Change-Id: If0ed306175631613c0f9310dccaae6615364fb49
Reviewed-on: https://review.gerrithub.io/429754
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This patch changes the name of the field. Following patches
will introduce logic that will guarantee that buffers
provided to bdev module will be aligned to value
specified in this field
Signed-off-by: Piotr Pelplinski <piotr.pelplinski@intel.com>
Change-Id: I5329b9fe26ef2417bc7beae86518cc643b263f97
Reviewed-on: https://review.gerrithub.io/430782
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Large read I/O will be typical in some use cases such as
web stream services. On the other hand, large write I/O
may not be typical but will be sufficiently probable.
Currently when large I/O is submitted to the RAID bdev,
the I/O will be divided by the strip size of it and then
divided I/Os are submitted sequentially.
This patch tries to improve the performance of the RAID bdev
in large I/Os. Besides, when the RAID bdev supports higher
levels of RAID (such as RAID5), it should issue multiple
I/Os to multiple base bdevs by batch fasion in the parity
update. Having experience in batched I/O will be helpful
in the future case too.
In this patch, submit split I/Os by batch until all child IOVs
are consumed or all data are submitted. If all child IOVs are
consumed before all data are submitted, wait until all batched
split I/Os complete and then submit again.
In this patch, test code is added too.
Change-Id: If6cd81cc0c306e3875a93c39dbe4288723b78937
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/424770
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
according to commit:
bdev: add spdk_bdev_queue_io_wait
This patch will make io_wait to support bdev_split
Change-Id: Ie5c57db3fbd9e777f69daf65b080f2659d85032a
Signed-off-by: Ni Xun <nixun@baidu.com>
Signed-off-by: Li Lin <lilin24@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Reviewed-on: https://review.gerrithub.io/426649
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ie2b0e24000665fd95a7c21298c86f4889a03650a
Reviewed-on: https://review.gerrithub.io/424583
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Splitting a 1TB unmap into individual 64KB unmap commands
(for a RAID volume with 64KB strip size) would be awful -
the RAID module can be much smarter about this.
So back out the changes for splitting I/O without payload.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I24fe6d911f4e3c9db4b2cb5d66c7236a5596e0d9
Reviewed-on: https://review.gerrithub.io/424103
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
A number of modules (RAID, logical volumes) have logical
"stripes" that require splitting an I/O into several
child I/O. For example, on a RAID-0 with 128KB strip size,
an I/O that spans a 128KB boundary will require sending
one I/O for the portion that comes before the boundary to
one member disk, and another I/O for the portion that comes
after the boundary to another member disk. Logical volumes
are similar - data is allocated in clusters, so an I/O that
spans a cluster boundary may need to be split since the
clusters may not be contiguous on disk.
Putting the splitting logic in the common bdev layer ensures
bdev module authors don't have to always do this themselves.
This is especially helpful for cases like splitting an I/O
described by many iovs - we can simplify this a lot by
handling it in the common bdev layer.
Note that currently we will only submit one child I/O
at a time. This could be improved later to submit multiple
child I/O in parallel, but the complexity in the iov splitting
code also increases a lot.
Note: Some Intel NVMe SSDs have a similar characteristic.
We will not use this bdev stripe feature for NVMe though -
we want to primarily use the splitting functionality inside
of the NVMe driver itself to ensure it remains fully
functional. Many SPDK users use the NVMe driver without
the bdev layer.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ife804ecc56f6b2b55345a0d0ae9fda9e68632b3b
Reviewed-on: https://review.gerrithub.io/423024
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This will be needed for using this same descriptor when
splitting an I/O.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Idec759df7ab27f8de567d3c8a4214e25dbe173f5
Reviewed-on: https://review.gerrithub.io/423022
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
So we don't need to allocate memory (maybe failed) just for free other
memory.
Change-Id: I2c83f6acc2aa6ed79455bff90f952a2e70b44d59
Signed-off-by: wuzhouhui <wuzhouhui@kingsoft.com>
Reviewed-on: https://review.gerrithub.io/422203
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
The intents of these arrays was to keep track in the
bdev layer of all base<->virtual bdev relationships -
i.e. which member disk bdevs make up a RAID bdev,
which logical volume bdevs are associated with a
bdev that contains an lvolstore, etc.
Currently none of this is used however. And trying
to keep track in the bdev layer instead of asking
the bdev modules for the relationships has a number
of complications. Early one, we tried to do this
with TAILQs - but that doesn't work since this can't
be done with a single TAILQ_ENTRY in the bdev
structures. So we moved to arrays - that works a bit
better, but then the pointer arrays have to be
realloc'd which isn't ideal.
The biggest problem though with these arrays is that
they held bdev pointers - not bdev descriptor pointers.
It's not really valid to access bdevs without a
descriptor - the descriptors are what make sure active
references are accounted for when a bdev is hotplugged.
Of course the bdev layer knows when a bdev is getting
removed and could go and do the updates to these
arrays separately - but that just seems very convoluted.
So for now just remove these arrays completely. If
there is a future need for the bdev layer to
understand relationships between bdevs, we can add
module APIs so that the generic layer can ask
the modules about the relationships.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I99ef1068240bff1262f64f234260cf2fb44df51d
Reviewed-on: https://review.gerrithub.io/420932
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: GangCao <gang.cao@intel.com>
When an SPDK application shuts down, the bdev layer will
automatically unregister all of the bdevs to ensure they
are properly quiesced and cleaned up.
Some modules may want to perform different operations when
a bdev is destructed during normal runtime vs. shutdown.
For example, for lvol, when the last lvol is cleaned up,
it should unload the lvolstore, release and close the bdev
that contains the lvolstore. You never want to do this
during normal runtime though - it is perfectly valid to
have an lvolstore that contains no lvols. RAID and future
bdev modules such as multipath have similar use cases.
So add a new bdev module callback named "fini_start".
If a module specifies a function pointer for this callback,
the bdev layer will call it before it starts the bdev
unregistrations.
This enables some future patches to the bdev layer such
that it will always unregister block devices that are not
claimed (i.e. logical volumes) before block devices that
are claimed (i.e. the bdev containing an lvolstore).
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I6e87f5c2b27f16731ea5def858f26e882a29495a
Reviewed-on: https://review.gerrithub.io/421175
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Currently, the bdev layer iterates over all of the
existing channels of a bdev to collect I/O
statistics. But this ignores statistics for
channels that are deleted.
Fix that by keeping an io_stat structure in the
bdev which accumulates statistics for deleted
channels. Use the bdev mutex to protect these
accumulations.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I3103c0b8b55973c827d977765f47e5b9e7f58e5f
Reviewed-on: https://review.gerrithub.io/421029
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Two extra fields are added to the iostat rpc.
1. io_time. The amount of time since queue depth tracking was
enabled that has been spent on I/O processing.
2. weighted_io_time. Incremented each time this bdev's queue depth is
polled by the amount of time spent processing I/O since the last polling
event times the measured queue depth.
Change-Id: Ie70489ec24dee83f3eeac8f4f813ec7074ff458f
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/419031
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Daisuke Aoyama originally contributed to istgt, the
iSCSI target in FreeBSD. The SPDK iSCSI code is
originally derived from that. Due to copy and paste,
some incorrect copyright attributions have been added
to other files that do not derive from istgt, so
this patch removes those.
It is doubtful, at this point, that there is any code
whatsoever that remains from the original istgt, but
we can revisit that at a later time.
Change-Id: I207e1e6b99d271e2f12690be90a96f7d0c113af7
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/420679
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Also make the iov_len always set to the used length, not
the total length of the buffer.
Change-Id: I7ebb5b63c6ca7570369f4af0131a23c520c1f7b0
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/419025
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Maciej Szwed <maciej.szwed@intel.com>
Reviewed-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This change includes a function to enable this feature on a per-bdev
basis. The new information stored in the bdev includes the following:
measured_queue_depth: The aggregate of the outstanding oparations from
each channel associated with this bdev.
period: The period at which this bdev's measured_queue_depth is being
updated.
With this information, one could calculate the average queue depth and
the disk utilization of the device
Change-Id: Ie0623ee4796e33b125504fb0965d5ef348cbff7d
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/418102
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This fills two holes and allows the structure to fit neatly into 4
cachelines.
Change-Id: Ie175fec2656ff0b60a8c6fcbc26b1a66a8bf5162
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/417604
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
A few open-coded sequences equivalent to SPDK_CONTAINEROF() were
scattered around; replace them with the macro from spdk/util.h.
Change-Id: I95c6e6838902f411420573399ced7c58c2e4ef84
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-on: https://review.gerrithub.io/418126
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
During spdk_bdev_init, examine_config is called.
This call can claim bdev synchronously, based on
configuration. On spdk_bdev_start if none module
claimed bdev, examine_disk is called and can
perform I/O before claiming bdev.
Signed-off-by: Piotr Pelplinski <piotr.pelplinski@intel.com>
Change-Id: I1448dd368cf3a24a5daccab387d7af7c3d231127
Reviewed-on: https://review.gerrithub.io/413913
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Maciej Szwed <maciej.szwed@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Also make it correctly account for alignment and automatically
use the internal iov element if necessary.
Change-Id: I0b33ef9444f0693c2d6b0cdaf221c4a5b0ad2cc3
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/416870
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This is potentially useful for more types of commands.
Change-Id: Ifbde7ae35294f581b8360891579836fd6f9573a6
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/416869
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This variable is completely unused in the spdk repository. I believe
that it is a deprecated member of this struct.
Change-Id: Icf1ff1eaa67dc46d17f8ab6a913ff199558876ed
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/416442
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
I missed this one in the initial series.
Change-Id: Id4dc7574a04cd964455852f1a00084b65ab989b3
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/416253
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This should be set from bdev.c and does not need to be accessed further
from bdev modules.
Change-Id: I2174ed2378d986cec291e7f29e64fe13a5f7df6d
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/416060
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
These submission related variables are not accessed from any of our
current bdev modules.
Change-Id: I69e21eea736273183dfeb48922890a4dc9a244cc
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/416058
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Move the status field into the internal structure.
Change-Id: Icf96436925dd829ee89d2491ef55e337823be6fb
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/416057
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
The members of spdk_bdev_io which are associated with the data buffer
should only be modified by calling functions in bdev.c
Change-Id: Icacb7f7387d626cf6834480b572e2f31b48666e1
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/416054
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ic8aa4d5b8c1edff6875bba38fa5936a6fb9950cb
Reviewed-on: https://review.gerrithub.io/415871
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Change-Id: I6babd4cf990bf19b510db88bdfb0ca81e29d9252
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/414700
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Madhu Pai <mpai@netapp.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
This begins the process of hiding data members in
spdk_bdev_io that don't need to be accessed from
within bdev modules.
One strategy would be to implement accessors for
every data member in the structure. However, that
may have negative performance effects. Instead,
create a new internal structure within the old
structure. This new structure will still be visible
for now, but at least makes clear which members
are accessible and which are not.
This patch shifts one data member to the new structure
as an example.
Change-Id: I68525db995325732fe9f5fc3f45b06920948309b
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/412298
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The bdev module writer's guide at the top of bdev_module.h is outdated;
replace it with a link to the up-to-date documentation in
doc/bdev_module.md.
Change-Id: I36f745f70596b9e4c9a7495a019a03268460d2dd
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-on: https://review.gerrithub.io/412539
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This will become the public interface for implementing
bdev modules. Right now the file exposes too much of
the guts of the bdev layer to modules, so it needs
to be stripped down.
Change-Id: Ie8b8c3271d51fdb8d0c24a80244b3f3e510c8790
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/412297
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>