Set the bdev->optimal_io_boundary to the strip size, and
set split_on_optimal_io_boundary = true. This will ensure
that all I/O submitted to the raid module do not cross
a strip boundary, meaning it does not need to be split
across multiple member disks.
This is a step towards removing the iovcnt == 1
limitation. Further improvements and simplifications
will be made in future patches before removing this
restriction.
Unit tests need to be adjusted here to not span
boundaries either.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I08943805def673288f552a1b7662a4fbe16f25eb
Reviewed-on: https://review.gerrithub.io/423323
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
When a raid bdev is constructed by JSON-RPC construct_raid_bdev,
information about config and slot can be passed to
raid_bdev_add_base_device() and raid_bdev_can_claim() doesn't have
to be called.
Hence extract raid_bdev_can_claim_bdev() from raid_bdev_add_base_device()
and put it to raid_bdev_examine().
Change-Id: I92e02bf3661cb97b691246f32198ba946810d96c
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/423618
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Kunal Sablok <kunal.sablok@intel.com>
raid_base_bdev_config is suffciently descriptive as name and so
"name" can be used as the name of the base bdev.
Other structs have used "name" as the name of bdev.
Change-Id: I090ef541a84fd14d9ec3bfebbdc96ae649709551
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/423614
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Kunal Sablok <kunal.sablok@intel.com>
According to the purpose of the lists, state_link and global_link
will be enough to understand the purpose well.
Besides, in raid_bdev_remove_base_bdev(), if any entry is not removed
during iteration, TAILQ_FOREACH_SAFE() is not necessary. Hence
change TAILQ_FOREACH_SAFE to TAILQ_FOREACH for this case.
Change-Id: I3022c58faf96721df9241e07dbb5a06d7de89e70
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/423056
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Kunal Sablok <kunal.sablok@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
struct raid_bdev has a pointer raid_bdev_config to its configuration,
but raid_bdev is duplicated between raid_bdev and raid_bdev_config
and just config may be enough as a name.
Change-Id: I77f6a69b913cd540c22f05803c486a4caf103f24
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/422923
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Kunal Sablok <kunal.sablok@intel.com>
base_bdevs_io_channel is good but base_channel may be enough and
fit other bdev modules.
Change-Id: I67a1d224f1ef4ca1fc048b4325333f2552a37150
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/422921
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Kunal Sablok <kunal.sablok@intel.com>
raid_base_bdev_info is sufficiently descriptive as name and so
names of member variables in raid_base_bdev_info can be simplified.
Change-Id: I759a758ec0f3ac21de767ad379a902f1b04cc66d
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/422919
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Kunal Sablok <kunal.sablok@intel.com>
During IO submission, raid_bdev can be get by bdev_io->bdev->ctxt.
Hence holding raid_bdev in raid_bdev_io_channel is duplicated.
Change-Id: I722432718aca8c5846541816b6ecca56821d77f6
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/421182
Reviewed-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Kunal Sablok <kunal.sablok@intel.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Locating spdk_bdev structure to the beginning of raid_bdev structure
will simplify the hierarchy and match other bdev modules.
Change-Id: I1bfbf773bc96a4f144e6bff772ade05bb42762e9
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/420818
Reviewed-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Kunal Sablok <kunal.sablok@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Use strdup for all strings allocated by JSON decoder and free them on
all paths.
Change-Id: Ia30d32c2fd6bd79f62588ca19bd3bb093d38a502
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/420325
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Kunal Sablok <kunal.sablok@intel.com>
For front-ends like iSCSI (and NVMe-oF in the future)
which want the backend to specify the data buffer, the
RAID module doesn't copy the pointer to the allocated
buffer from the child IO back to the parent IO. It
really can't copy the pointer - the child IO owns it
and will free it.
So the RAID module needs to allocate the buffer first
and then pass it down.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I3b1eceac9b1cdd26130e59e1d400c9869a19f881
Reviewed-on: https://review.gerrithub.io/420677
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Raid module:
============
- SPDK raid bdev module is a new bdev module which is
responsible for striping various NVMe devices and expose the raid bdev
to bdev layer which would enhance the performance and capacity.
- It can support theoretically 256 base devices (currently it is being
tested max upto 8 base devices)
- Multiple strip sizes like 32KB, 64KB, 128KB, 256KB, 512KB etc is
supported. Most of the current testing is focused on 64KB strip size.
- New RPC commands like "create raid bdev", "destroy raid bdev" and "get raid bdevs"
are introduced to configure raid bdev dynamically in a running
SPDK system.
- Currently raid bdev configuration parameters are persisted in the
current SPDK configuration file for across reboot support. DDF will be
introduced later.
High level testing done:
=======================
- Raid bdev is created with 8 base NVMe devices via configuration
file and is exposed to initiator via existing methods. Initiator is
able to see a single NVMe namespace with capacity equal to sum of the
minimum capacities of 8 devices. Initiator was able to run raw
read/write workload, file system workload etc (tested with XFS file
system workload).
- Multiple raid bdevs are also created and exposed to initiator and
tested with file system and other workloads for read/write IO.
- LVS / LVOL are created over raid bdev and exposed to initiator.
Testing was done for raw read/write workloads and XFS file system
workloads.
- RPC testing is done where on the running SPDK system raid bdevs
are created out of NVMe base devices. These raid bdevs (and LVOLs
over raid bdevs) are then exposed to initiator and IO workload was
tested for raw read/write and XFS file system workload.
- RPC testing is done for delete raid bdevs where all raid bdevs
are deleted in running SPDK system.
- RPC testing is done for get raid bdevs where existing list of
raid bdev names is printed (it can be all raid bdevs or only
online or only configuring or only offline).
- RPC testing is done where raid bdevs and underlying NVMe devices
relationship was returned in JSON RPC commands
Change-Id: I10ae1266f8f2cca3c106e4df8c1c0993ddf435d8
Signed-off-by: Kunal Sablok <kunal.sablok@intel.com>
Reviewed-on: https://review.gerrithub.io/410484
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>