Compare commits

..

34 Commits

Author SHA1 Message Date
David Marchand
a26adb0d6c env_dpdk/pci: fix check on 20.11 EAL API change
RTE_DEV_ALLOWED is an enum and has no associated define, hence checking
for its presence will always be false.
We could test for RTE_DEV_WHITELISTED define, but this macro added for
deprecation warning will be dropped in the future.
Switch to a check on DPDK version.

Fixes: 10ed0eb755 ("env_dpdk/pci: adapt to 20.11 EAL changes")
Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5165 (master)

(cherry picked from commit 2e9cd9d7b0)
Change-Id: I75270977b580065b36c753266cbaa5fb73f99eb1
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5175
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-11-24 08:03:42 +00:00
David Marchand
a73286870e env_dpdk/pci: adapt to 20.11 EAL changes
DPDK 20.11 renamed device and bus control enums [1].
This is a simple renaming, no change in semantics.

1: https://git.dpdk.org/dpdk/commit/?id=a65a34a85ebf

Signed-off-by: David Marchand <david.marchand@redhat.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5116 (master)

(cherry picked from commit 10ed0eb755)
Change-Id: Ia40bae750ad74f405eb700b47514fca021ffd052
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5135
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-11-18 08:16:05 +00:00
Tomasz Zawadzki
ad82ba685e version: 20.10.1 pre
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I105e102e9dff8719644819254c8a792027473aea
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4937
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2020-11-02 08:36:16 +00:00
Tomasz Zawadzki
e5d26ecc2a SPDK 20.10
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Ib56970df336aaccdca04b7b132294b732296903a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4936
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: John Kariuki <John.K.Kariuki@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2020-10-30 19:33:45 +00:00
Karol Latecki
658d7df6a7 nvme/hotplug.sh: Copy external DPDK libs into test VM
Need to select proper path for passing libs into the VM
in case we're building with custom DPDK.

Signed-off-by: Karol Latecki <karol.latecki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4975 (master)

(cherry picked from commit 097e6e16b9)
Change-Id: I97d301c70adee31b727c6b6673eadac3cbde9817
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4984
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-10-30 19:33:35 +00:00
Alexey Marchuk
ba9c5abe86 sock/posix: Disable zcopy send by default
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4978 (master)

(cherry picked from commit 9b19abae3c)
Change-Id: I4825c681d742946dfcf5bdc209356194766a15cd
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4982
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-10-30 16:15:30 +00:00
Alexey Marchuk
8a0f9cf3a7 nvme_tcp: Fix icreq/icresp handing with zcopy enabled.
There is a problem with TCP zcopy enabled:
1. TCP initiator sends icreq and start polling a qpair. Polling of qpair
actively calls nvme_tcp_read_pdu function
2. nvme_tcp_read_pdu: qpair is in NVME_TCP_PDU_RECV_STATE_AWAIT_PDU_CH state,
it reads 8 bytes of common PDU header. It determines the type of the PDU
and finds the size of PDU_PSH header.
3. nvme_tcp_read_pdu: qpair is in NVME_TCP_PDU_RECV_STATE_AWAIT_PDU_PSH state.
It should read 120 bytes of icresp PDU. The number of bytes which needs to be
read is pdu->psh_len - pdu->psh_valid_bytes. qpair receives 120 bytes
(the full PDU) and calls nvme_tcp_pdu_psh_handle -> nvme_tcp_icresp_handle.
Here we check that we haven't yet received buffer reclaim notification and
simply return from this function. At the same time we continue to poll the qpair.
4. nvme_tcp_read_pdu: qpair is in NVME_TCP_PDU_RECV_STATE_AWAIT_PDU_PSH state
and tries to read data from a socket again. The number of bytes is
pdu->psh_len - pdu->psh_valid_bytes. But now pdu->psh_len == pdu->psh_valid_bytes,
so we call nvme_tcp_read_data with zero length.
readv with zero length is commonly used to check errors on the socket,
but in our case there is no errors and readv returns 0.
5. nvme_tcp_read_data treats zero as error and return NVME_TCP_CONNECTION_FATAL.

Fix is to handle icresp, but leave qpair in INITIALIZING state until
we receive acknowledgement for icreqsend_ack. We also move qpair to
NVME_TCP_PDU_RECV_STATE_AWAIT_PDU_READY recv_state so recv_pdu
will be zerofied and qpair will try to read a common PDU header.
But since it is not initialized yet, it won't receive anything
from the target.

Fixes issue #1633

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4969 (master)

(cherry picked from commit d296fcd8d9)
Change-Id: I22cedefe530a8ac3b51495988ed6265d8fad15bb
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4976
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2020-10-30 14:51:12 +00:00
Simon A. F. Lund
2209decef9 examples/nvme_fio_plugin: update ZNS section of README
Signed-off-by: Simon A. F. Lund <simon.lund@samsung.com>
Change-Id: I99c82490811314201b53dde59e586538835e0840
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4950
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
(cherry picked from commit f875da4019)
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4897
2020-10-30 12:52:55 +00:00
Simon A. F. Lund
81e12ff27e examples/nvme_fio_plugin: add plugin-option 'initial_zone_reset'
Added plugin-option 'initial_zone_reset', providing the option to reset
all zones on all namespaces with the Zoned Command Set enabled upon
initialization.
The default is not to reset. The option is exposed even when the ZBD
plumbing is not available. However, it will then inform the user that
ZBD/ZNS is not supported instead of resetting.

The plugin-option provides a short-term solution to an observed issue
with consecutive invocations of fio exhausting maximum-active-resources.
A longer-term solution would be to add a 'max_active_zones' limit in fio
and ensure that fio does not exceed that limit.

Signed-off-by: Simon A. F. Lund <simon.lund@samsung.com>
Change-Id: I65341c028a97657370b315fb298bf97651b9bffd
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4949
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
(cherry picked from commit aae84b1f0e)
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4896
2020-10-30 12:52:55 +00:00
Simon A. F. Lund
3c42afe400 examples/nvme_fio_plugin: move completion-helpers
Preparation patch for the addition of the 'initial_zone_reset' plugin-option.

Signed-off-by: Simon A. F. Lund <simon.lund@samsung.com>
Change-Id: I768fc207b74cfa2a516009e10fc2a4646d06ba72
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4948
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
(cherry picked from commit 528ad3b3cf)
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4895
2020-10-30 12:52:55 +00:00
Michael Haeuptle
034d7cc9d7 nvmf: Fixes double triggering of association timer
Fixes issue #1635.

Under rare circumstances, the CC.en and CC.shn are both set
which then results in setting the association timer twice.
This scenario was observed during hot plug testing when the
initiator tries to reset the subsystem that contains the
removed device.
The end result is that when the ctrlr is destructed, then
one of the timers can still fire and access freed memory.

Signed-off-by: Michael Haeuptle <michael.haeuptle@hpe.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4935 (master)

(cherry picked from commit 4409007906)
Change-Id: Ie5880ab325a28f19361f73712bdeb5b58894ee68
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4957
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2020-10-29 15:44:07 +00:00
Michael Haeuptle
807019734e nvme: break completion loop when ctrlr is invalid
This fixes #1423 where the completion loop never
breaks when the NVMe ctrlr is no longer present.
This condition can happen during a hot remove.

Signed-off-by: Michael Haeuptle <michael.haeuptle@hpe.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4831 (master)

(cherry picked from commit 7fc48a5ffc)
Change-Id: Ia238c8aeae720832068de28ce4d34a9d233344fb
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4959
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2020-10-29 15:44:01 +00:00
Maciej Szwed
ede2227a7c changelog: Add information regarding scheduler implementation
Signed-off-by: Maciej Szwed <maciej.szwed@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4857 (master)

(cherry picked from commit 67109ff88b)
Change-Id: Id389ffb2c6091add92fb2849fac21a0472c8a404
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4960
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2020-10-29 15:43:52 +00:00
Jim Harris
d3b788e9ab nvme: continue probing ctrlrs even if one fails
It is possible that a single probe_ctx could be used
to probe multiple newly attached nvme controllers.  If
one of those controllers is removed during this process,
the rest of the controllers do not get probed and can
even get stuck in a zombie state.

It is better to just continue with probing the rest of
the controllers.

Fixes issue #1611.

Signed-off-by: Jim Harris <james.r.harris@intel.com>

Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4945 (master)

(cherry picked from commit ddf86600bb)
Change-Id: I4156ee8b50e8d52cfeee7224f210a58bb773e939
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4958
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2020-10-29 15:43:38 +00:00
Jim Harris
c64c931c52 env_dpdk: add rte_rcu library dependency
rte_hash depends on rte_rcu starting in upcoming
DPDK 20.11 release.  rte_rcu was only added in
DPDK 19.05 release, so we need to check if it
exists before linking it.

Fixes issue #1661.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4947 (master)

(cherry picked from commit f896aa6f10)
Change-Id: I7e343c6f964b03cc62484b57803a3bad00f80288
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4965
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2020-10-29 15:43:25 +00:00
Jim Harris
52d31fe78d dpdk: move submodule to build rte_rcu library
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4953 (master)

(cherry picked from commit f2196fcafb)
Change-Id: Ie38573774612ac7ec7cf23367d3233124acd1a66
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4964
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2020-10-29 15:43:25 +00:00
Jim Harris
3421560144 test/external_code: use variable to hold DPDK lib list
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4946 (master)

(cherry picked from commit d792356a5a)
Change-Id: Ie4a489926695cc56125f0e18796071f7730190f1
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4963
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2020-10-29 15:43:25 +00:00
Simon A. F. Lund
bfd7d22df5 examples/nvme_fio_plugin: fix log-msgs. in report_zones() and reset_wp()
In spdk_fio_report_zones(), log_err did not prefix messages with
"spdk/nvme", making it hard to determine who dumped the error-message.
In spdk_fio_reset_wp() log_err described the wrong function.

This change fixes the above.

Signed-off-by: Simon A. F. Lund <simon.lund@samsung.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4916 (master)

(cherry picked from commit 414500c9eb)
Change-Id: I41df6d451e88942806c8b5a3cf9a0902d98cb186
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4941
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Simon A. F. Lund <simon.lund@samsung.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-10-29 08:10:17 +00:00
Simon A. F. Lund
9c93d81be3 examples/nvme_fio_plugin: fix reset_wp()
When _reset_wp() received a range to reset, then the loop kept resetting
the first zone in the range.
Also, the processing of command-completion were re-using the same
'completion' state, thus a previous completion would short-circuit
command-completion such that it would never be processed.

This change fixes that.

Also, the reset-loop assumes that the given offset is a valid zone-start
LBA, a check is added to verify that and return -EINVAL if it is not.

Signed-off-by: Simon A. F. Lund <simon.lund@samsung.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4915 (master)

(cherry picked from commit 12a44d4745)
Change-Id: I1a1e4be2e1f67c2d8fecb5fc36a211b2dbb5a921
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4940
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Simon A. F. Lund <simon.lund@samsung.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-10-29 08:10:17 +00:00
Simon A. F. Lund
3a3cfb3292 examples/nvme_fio_plugin: help the user with max_open_zones constraints
When a device has resource-limitations such as the
maximum-open-resources (mor) and this threshold is exceeded, then IO
will fail upon completion. Such behavior is not the most user-friendly
way to tell the user that they should provide a value for the
fio-parameter 'max_open_zones'.

This change provides an arguably more user-friendly approach by checking
whether the device is limited and in case it is:

* Provide a default value for 'max_open_zones', inform the user, and
continue
* Verify 'max_open_zones' and in case of error inform the user and
return error

Signed-off-by: Simon A. F. Lund <simon.lund@samsung.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4914 (master)

(cherry picked from commit 906c2adb86)
Change-Id: I76cb045d560b9ec5701d97b82a62947af11960b6
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4939
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Simon A. F. Lund <simon.lund@samsung.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-10-29 08:10:17 +00:00
Niklas Cassel
8c6c009fce examples/nvme/identify: print ZNS data structures if supported
Print the Zoned Namespace Command Set Specific data structures,
if the namespace/controller supports them.

spdk_nvme_zns_ctrlr_get_data() returns NULL for a controller
that does not support the ZNS specific controller data struct.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4839 (master)

(cherry picked from commit 00d197dba2)
Change-Id: I0acd2695976fc598b61591989f612db35ac821db
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4942
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-10-29 08:10:03 +00:00
Niklas Cassel
4e9a9c4a22 examples/nvme_fio_plugin: fix zone reporting
All zone management receive helper functions (including
spdk_nvme_zns_report_zones()) are implemented to match the parameters of
the zone management receive function in the ZNS specification.

The documentation for spdk_nvme_zns_report_zones() states:
"param partial_report If true, nr_zones field in the zone report indicates
the number of zone descriptors that were successfully written to the zone
report. If false, nr_zones field in the zone report indicates the number
of zone descriptors that match the report_opts criteria."
This matches the description of the "Partial Report" bit in the ZNS spec.

Since the FIO function parse_zone_info() calls the io_ops->report_zones()
function multiple times, until all zones have been reported, it expects
the return from this function to represent the number of zones that were
successfully reported.

By setting the partial_report bit to false, the controller will return
the total number of zones, and since spdk_fio_report_zones() loops until
idx < report->nr_zones, and writes to zbdz[idx], the current code will
overwrite heap memory, since idx will take on index values that are out
of bounds for the memory allocated by the FIO function parse_zone_info().

Therefore, spdk_fio_report_zones() has to set the partial_report bit to
true when calling the NVMe level function spdk_nvme_zns_report_zones().

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4871 (master)

(cherry picked from commit 1e83b640aa)
Change-Id: I8846711bfed4faadac0315b450158293cefa36f4
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4943
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-10-29 08:09:53 +00:00
Maciej Wawryk
fc9ae4f30f Modifications for using universal qcow2 image in tests
We want to replace few qcow2 images with one universal.
This commit contains:
 - change password in autotest.sh
 - change image path
 - change image name
 - use snapshot mode in hotplug.sh instead of copying base image

Signed-off-by: Maciej Wawryk <maciejx.wawryk@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4551 (master)

(cherry picked from commit c4c37f1cf1)
Change-Id: I75c457fe75f005b0ab43ca909be7886529ed115b
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4944
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-10-29 08:09:37 +00:00
Changpeng Liu
ea364da638 env/memory: use stack variable when unmapping the dma region
When enable Werror compile option with new kernel(v5.8), there is
following error reported due to the <linux/vfio.h> data structure
change(added a uint8_t data[] variable in new kernel), we can just
put the 'unmap' at the end of the data structure to fix the issue,
I think it's better to just use a stack variable instead.

CC lib/env_dpdk/memory.o
memory.c:63:36: error: field 'unmap' with variable sized type 'struct vfio_iommu_type1_dma_unmap' not
at the end of a struct or class is a GNU extension [-Werror,-Wgnu-variable-sized-type-not-at-end]
struct vfio_iommu_type1_dma_unmap unmap;
^
1 error generated.

Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4925 (master)

(cherry picked from commit 2c9b5b5af5)
Change-Id: Icf73a3c48a301e74b92b9ae2e2d8715262b2d056
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4938
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-10-29 08:09:24 +00:00
yidong0635
0e70cb48ab rocksdb/env_spdk: Fix unused warning.
Fuction takes one parameter to print,
others are unused.

spdk/lib/rocksdb/env_spdk.cc: In function
 ‘void rocksdb::base_bdev_event_cb(spdk_bdev_event_type, spdk_bdev*, void*)’:
/spdk/lib/rocksdb/env_spdk.cc:666:70:
error: unused parameter ‘bdev’ [-Werror=unused-parameter]
666 | base_bdev_event_cb(enum spdk_bdev_event_type type, struct spdk_bdev *bdev,
      |                                                    ~~~~~~~~~~~~~~~~~~^~~~
/home/yidong/spdk/lib/rocksdb/env_spdk.cc:667:12:
error: unused parameter ‘event_ctx’ [-Werror=unused-parameter]
  667 |      void *event_ctx)
      |      ~~~~~~^~~~~~~~~
Signed-off-by: yidong0635 <dongx.yi@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4929 (master)

(cherry picked from commit a474889bc6)
Change-Id: Ic1cf45443ab1dcdf38d1b9c6bdea2905e94df19c
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4933
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2020-10-29 08:09:09 +00:00
Liu Xiaodong
494848e3a9 reactor: check calloc failure in gather_metrics
A round of _reactors_scheduler_gather_metrics should be stopped
when there is calloc failure.

Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4882 (master)

(cherry picked from commit e2f773aafc)
Change-Id: Ic2220c561abb07a849ea37d3c88af3f6d5d1ffa1
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4923
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-10-28 15:03:17 +00:00
Alexey Marchuk
623d5cc456 nvme: Don't log an error when we can't resubmit all requests
In TCP NVME initiator with zero copy enabled requests might be
completed asynchronously - out of qpair_process_completions
context. At the same time we calculate requests completed
asynchronously so that generic NVME layer can resubmit
queued requests after calling qpair_process_requests (or
poll_group_process_requests).
But there is a time gap between async request complete and
qpair_process_completions and the user can submit new IO
thereby decrease the number of free TCP requests. That means
that there might be less free requests than we excpected when
we try to resubmit queued requests.
The solution is change ERRLOG to DEBUG log since it is not a
fatal case.

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4859 (master)

(cherry picked from commit e385cafa72)
Change-Id: If045ecd331cc6693e8ef450d8e15432dfa5d8812
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4872
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2020-10-28 08:09:03 +00:00
Liu Xiaodong
0bcafaea56 lib/rocksdb: remove redundant linked blobfs_bdev
blobfs_bdev lib is already added into BLOCKDEV_MODULES_LIST
so it shouldn't be included by application who already
uses BLOCKDEV_MODULES_LIST or ALL_MODULES_LIST.

Fixes issue: #1654

Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4876 (master)

(cherry picked from commit b45788036f)
Change-Id: I46a272e4593e19cf14c3ed8b2965797443c37a0d
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4922
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-10-28 08:08:49 +00:00
Allen Zhu
b5005497db rpc/nvmf.py: pass zero values to SPDK when allowed
in_capsule_data_size/buf_cache_size/sock_priority/max_namespaces can be 0,
which should be passed in nvmf_create_transport/nvmf_create_subsystem commands.

Signed-off-by: Allen Zhu <allenz@mellanox.com>
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3815 (master)

(cherry picked from commit 96ab62802c)
Change-Id: Ib557cf9f20f7ec2c0b3c31156cd79dbd670ce7e7
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4921
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Allen Zhu <allenz@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-10-28 08:08:37 +00:00
Alexey Marchuk
839af8867e nvmf/tcp: Align recv_buf_size to MIN_SOC_PIPE_SIZE
If the user decided to disable ICD then we have several side effects:
1. SPDK prints several warnings/errors
2. SPDK doesn't create recv pipe and doesn't set SO_RCVBUF socket option.

I think that we should not rely on ICD only when we create recv pipe or
set SO_RCVBUF since data may be transferred in sgls via R2T/H2C and
we still need recv_pipe and SO_RCVBUF for better performance.
Alternative option is to set recv_buf_size as a maximum between
ICD and io_unit_size

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4755 (master)

(cherry picked from commit c1fbbfbe56)
Change-Id: Ida71ecc099f9a9355e4617f13315a341872d1cb3
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4920
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2020-10-28 08:08:37 +00:00
Alexey Marchuk
bf602f12ca nvmf/tcp: Support ICD for fabric/admin commands
According to the SPEC we should support up to 8192 bytes
of ICD for admin and fabric commands. Transport configuration
parameter in_capsule_data_size is applied to all qpair types -
admin and IO. Also we allocate resources when we get a connection
request, so we don't know qpair type at this moment.
Create a list of buffer in TCP poll group to support ICD up
to 8192 bytes when configuration ICD is less than this value.
The number of elements in this pool is hardcoded, it is planned
to add a new configuration parameter later.

Fixes issue #1569
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4754 (master)

(cherry picked from commit 85fa43241b)
Change-Id: I8589e3e2ea95d515f5503c6de7c1ee40aaf7b6da
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4886
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
2020-10-28 08:08:37 +00:00
Liu Xiaodong
ad97082bd0 bdev_aio: fix interrupt mode notify error
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4865 (master)

(cherry picked from commit 47962dc7a3)
Change-Id: Ie4492aa33028e8090da96e2b592b20293d694120
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4873
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-by: John Kariuki <John.K.Kariuki@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2020-10-27 15:44:16 +00:00
Jim Harris
05216cb7bf event: deprecate opts.config_file member
Just always put the config file name in json_config_file,
since we now only support JSON.

If user specifies both -c and --json, it will just take
the latter of the two.  This is similar to if the user
specified --json twice.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Idc21d73acf0e190eda57a7b0c5d9bcfa14e87030
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4858
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
(cherry picked from commit c31ad66893)
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4894
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Xiaodong Liu <xiaodong.liu@intel.com>
2020-10-27 15:44:03 +00:00
Liu Xiaodong
69b16a000f thread: fix warning caused by intr
Fixes issue: #1650

Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4869 (master)

(cherry picked from commit a3c3c0b538)
Change-Id: I8935d439fb7d1d1c896ef297baa53db0d2cd538f
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4874
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
2020-10-27 15:43:46 +00:00
1997 changed files with 96029 additions and 258394 deletions

View File

@ -1,7 +1,4 @@
#!/bin/sh
# SPDX-License-Identifier: BSD-3-Clause
# All rights reserved.
#
# Verify what is about to be committed.
# Called by "git commit" with no arguments. The hook should

View File

@ -1,7 +1,4 @@
#!/bin/sh
# SPDX-License-Identifier: BSD-3-Clause
# All rights reserved.
# Verify what is about to be pushed. Called by "git
# push" after it has checked the remote status, but before anything has been
# pushed. If this script exits with a non-zero status nothing will be pushed.

View File

@ -1,5 +1,5 @@
---
name: Sighting report
name: Bug report
about: Create a report to help us improve. Please use the issue tracker only for reporting suspected issues.
title: ''
labels: 'Sighting'
@ -7,8 +7,6 @@ assignees: ''
---
# Sighting report
<!--- Provide a general summary of the issue in the Title above -->
## Expected Behavior
@ -21,12 +19,12 @@ assignees: ''
## Possible Solution
<!--- Not obligatory, but suggest a fix/reason for the potential issue, -->
<!--- Not obligatory, but suggest a fix/reason for the bug, -->
## Steps to Reproduce
<!--- Provide a link to a live example, or an unambiguous set of steps to -->
<!--- reproduce this sighting. Include code to reproduce, if relevant -->
<!--- reproduce this bug. Include code to reproduce, if relevant -->
1.
2.
3.

View File

@ -7,8 +7,6 @@ assignees: ''
---
# CI Intermittent Failure
<!--- Provide a [test_name] where the issue occurred and brief description in the Title above. -->
<!--- Name of the test can be found by last occurrence of: -->
<!--- ************************************ -->

View File

@ -1,11 +0,0 @@
# To get started with Dependabot version updates, you'll need to specify which
# package ecosystems to update and where the package manifests are located.
# Please see the documentation for all configuration options:
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
version: 2
updates:
- package-ecosystem: "" # See documentation for possible values
directory: "/" # Location of package manifests
schedule:
interval: "weekly"

View File

@ -1,10 +0,0 @@
filters:
- true
commentBody: |
Thanks for your contribution! Unfortunately, we don't use GitHub pull
requests to manage code contributions to this repository. Instead, please
see https://spdk.io/development which provides instructions on how to
submit patches to the SPDK Gerrit instance.
addLabel: false

6
.gitignore vendored
View File

@ -2,17 +2,12 @@
*.a
*.cmd
*.d
*.dll
*.exe
*.gcda
*.gcno
*.kdev4
*.ko
*.lib
*.log
*.o
*.obj
*.pdb
*.pyc
*.so
*.so.*
@ -39,4 +34,3 @@ PYTHON_COMMAND
test_completions.txt
timing.txt
test/common/build_config.sh
.coredump_path

9
.gitmodules vendored
View File

@ -10,12 +10,3 @@
[submodule "ocf"]
path = ocf
url = https://github.com/Open-CAS/ocf.git
[submodule "libvfio-user"]
path = libvfio-user
url = https://github.com/nutanix/libvfio-user.git
[submodule "xnvme"]
path = xnvme
url = https://github.com/OpenMPDK/xNVMe.git
[submodule "isa-l-crypto"]
path = isa-l-crypto
url = https://github.com/intel/isa-l_crypto

File diff suppressed because it is too large Load Diff

View File

@ -1,130 +0,0 @@
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, caste, color, religion, or sexual
identity and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the overall
community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or advances of
any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email address,
without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
SPDK core [maintainers](https://spdk.io/development/) are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
SPDK core maintainers have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported privately to any of the SPDK core maintainers. All complaints will be
reviewed and investigated promptly and fairly.
All SPDK core maintainers are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
SPDK core maintainers will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from SPDK core maintainers, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series of
actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or permanent
ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within the
community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.1, available at
[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
Community Impact Guidelines were inspired by
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
For answers to common questions about this code of conduct, see the FAQ at
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
[https://www.contributor-covenant.org/translations][translations].
[homepage]: https://www.contributor-covenant.org
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq

139
CONFIG
View File

@ -1,11 +1,35 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation.
# All rights reserved.
# Copyright (c) 2021, 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2022 Dell Inc, or its subsidiaries.
#
# configure options: __CONFIGURE_OPTIONS__
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
# Installation prefix
CONFIG_PREFIX="/usr/local"
@ -13,9 +37,6 @@ CONFIG_PREFIX="/usr/local"
# Target architecture
CONFIG_ARCH=native
# Destination directory for the libraries
CONFIG_LIBDIR=
# Prefix for cross compilation
CONFIG_CROSS_PREFIX=
@ -43,10 +64,6 @@ CONFIG_ASAN=n
# Build with Undefined Behavior Sanitizer enabled
CONFIG_UBSAN=n
# Build with LLVM fuzzing enabled
CONFIG_FUZZER=n
CONFIG_FUZZER_LIB=
# Build with Thread Sanitizer enabled
CONFIG_TSAN=n
@ -59,12 +76,6 @@ CONFIG_UNIT_TESTS=y
# Build examples
CONFIG_EXAMPLES=y
# Build apps
CONFIG_APPS=y
# Build with Control-flow Enforcement Technology (CET)
CONFIG_CET=n
# Directory that contains the desired SPDK environment library.
# By default, this is implemented using DPDK.
CONFIG_ENV=
@ -72,13 +83,6 @@ CONFIG_ENV=
# This directory should contain 'include' and 'lib' directories for your DPDK
# installation.
CONFIG_DPDK_DIR=
# Automatically set via pkg-config when bare --with-dpdk is set
CONFIG_DPDK_LIB_DIR=
CONFIG_DPDK_INC_DIR=
CONFIG_DPDK_PKG_CONFIG=n
# This directory should contain 'include' and 'lib' directories for WPDK.
CONFIG_WPDK_DIR=
# Build SPDK FIO plugin. Requires CONFIG_FIO_SOURCE_DIR set to a valid
# fio source code directory.
@ -93,7 +97,6 @@ CONFIG_FIO_SOURCE_DIR=/usr/src/fio
CONFIG_RDMA=n
CONFIG_RDMA_SEND_WITH_INVAL=n
CONFIG_RDMA_SET_ACK_TIMEOUT=n
CONFIG_RDMA_SET_TOS=n
CONFIG_RDMA_PROV=verbs
# Enable NVMe Character Devices.
@ -108,38 +111,18 @@ CONFIG_FC_PATH=
# Requires librbd development libraries
CONFIG_RBD=n
# Build DAOS support in bdev modules
# Requires daos development libraries
CONFIG_DAOS=n
CONFIG_DAOS_DIR=
# Build UBLK support
CONFIG_UBLK=n
# Build vhost library.
CONFIG_VHOST=y
# Build vhost initiator (Virtio) driver.
CONFIG_VIRTIO=y
# Build custom vfio-user transport for NVMf target and NVMe initiator.
CONFIG_VFIO_USER=n
CONFIG_VFIO_USER_DIR=
# Build with PMDK backends
CONFIG_PMDK=n
CONFIG_PMDK_DIR=
# Build with xNVMe
CONFIG_XNVME=n
# Enable the dependencies for building the DPDK accel compress module
CONFIG_DPDK_COMPRESSDEV=n
# Enable the dependencies for building the compress vbdev, includes the reduce library
CONFIG_VBDEV_COMPRESS=n
# Enable mlx5_pci dpdk compress PMD, enabled automatically if CONFIG_VBDEV_COMPRESS=y and libmlx5 exists
CONFIG_VBDEV_COMPRESS_MLX5=n
# Enable mlx5_pci dpdk crypto PMD, enabled automatically if CONFIG_CRYPTO=y and libmlx5 exists
CONFIG_CRYPTO_MLX5=n
# Enable the dependencies for building the compress vbdev
CONFIG_REDUCE=n
# Requires libiscsi development libraries.
CONFIG_ISCSI_INITIATOR=n
@ -150,10 +133,13 @@ CONFIG_CRYPTO=n
# Build spdk shared libraries in addition to the static ones.
CONFIG_SHARED=n
# Build with VTune support.
# Build with VTune suport.
CONFIG_VTUNE=n
CONFIG_VTUNE_DIR=
# Build the dpdk igb_uio driver
CONFIG_IGB_UIO_DRIVER=n
# Build Intel IPSEC_MB library
CONFIG_IPSEC_MB=n
@ -165,58 +151,17 @@ CONFIG_CUSTOMOCF=n
# Build ISA-L library
CONFIG_ISAL=y
# Build ISA-L-crypto library
CONFIG_ISAL_CRYPTO=y
# Build with IO_URING support
CONFIG_URING=n
# Build IO_URING bdev with ZNS support
CONFIG_URING_ZNS=n
# Path to custom built IO_URING library
CONFIG_URING_PATH=
# Path to custom built OPENSSL library
CONFIG_OPENSSL_PATH=
# Build with FUSE support
CONFIG_FUSE=n
# Build with RAID5f support
CONFIG_RAID5F=n
# Build with RAID5 support
CONFIG_RAID5=n
# Build with IDXD support
# In this mode, SPDK fully controls the DSA device.
CONFIG_IDXD=n
# Build with USDT support
CONFIG_USDT=n
# Build with IDXD kernel support.
# In this mode, SPDK shares the DSA device with the kernel.
CONFIG_IDXD_KERNEL=n
# arc4random is available in stdlib.h
CONFIG_HAVE_ARC4RANDOM=n
# uuid_generate_sha1 is available in uuid/uuid.h
CONFIG_HAVE_UUID_GENERATE_SHA1=n
# Is DPDK using libbsd?
CONFIG_HAVE_LIBBSD=n
# Is DPDK using libarchive?
CONFIG_HAVE_LIBARCHIVE=n
# Path to IPSEC_MB used by DPDK
CONFIG_IPSEC_MB_DIR=
# Generate Storage Management Agent's protobuf interface
CONFIG_SMA=n
# Build with Avahi support
CONFIG_AVAHI=n
# Setup DPDK's RTE_MAX_LCORES
CONFIG_MAX_LCORES=

52
LICENSE
View File

@ -1,30 +1,30 @@
The SPDK repo contains multiple git submodules each with its own
license info.
BSD LICENSE
Submodule license info:
dpdk: see dpdk/license
intel-ipsec-mb: see intel-ipsec-mb/LICENSE
isa-l: see isa-l/LICENSE
libvfio-user: see libvfio-user/LICENSE
ocf: see ocf/LICENSE
Copyright (c) Intel Corporation.
All rights reserved.
The rest of the SPDK repository uses the Open Source BSD-3-Clause
license. SPDK also uses SPDX Unique License Identifiers to eliminate
the need to copy the license text into each individual file.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
Any new file contributions to SPDK shall adhere to the BSD-3-Clause
license and use SPDX identifiers. Exceptions are subject to usual
review and must be listed in this file.
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
Exceptions:
* include/linux/* header files are BSD-3-Clause but do not use SPDX
identifier to keep them identical to the same header files in the
Linux kernel source tree.
* include/spdk/tree.h and include/spdk/queue_extras are BSD-2-Clause,
since there were primarily imported from FreeBSD. tree.h uses an SPDX
identifier but also the license text to reduce differences from the
FreeBSD source tree.
* lib/util/base64_neon.c is BSD-2-Clause.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@ -1,9 +1,36 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation.
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# Copyright (c) 2020, Mellanox Corporation.
# Copyright (c) 2022 NVIDIA CORPORATION & AFFILIATES
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
S :=
@ -13,16 +40,11 @@ include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
DIRS-y += lib
DIRS-y += module
DIRS-$(CONFIG_SHARED) += shared_lib
DIRS-y += include
DIRS-y += app include
DIRS-$(CONFIG_EXAMPLES) += examples
DIRS-$(CONFIG_APPS) += app
DIRS-y += test
DIRS-$(CONFIG_IPSEC_MB) += ipsecbuild
DIRS-$(CONFIG_ISAL) += isalbuild
DIRS-$(CONFIG_ISAL_CRYPTO) += isalcryptobuild
DIRS-$(CONFIG_VFIO_USER) += vfiouserbuild
DIRS-$(CONFIG_SMA) += proto
DIRS-$(CONFIG_XNVME) += xnvmebuild
.PHONY: all clean $(DIRS-y) include/spdk/config.h mk/config.mk \
cc_version cxx_version .libs_only_other .ldflags ldflags install \
@ -34,20 +56,11 @@ export MAKE_PID := $(shell echo $$PPID)
ifeq ($(SPDK_ROOT_DIR)/lib/env_dpdk,$(CONFIG_ENV))
ifeq ($(CURDIR)/dpdk/build,$(CONFIG_DPDK_DIR))
ifneq ($(SKIP_DPDK_BUILD),1)
ifneq ($(CONFIG_DPDK_PKG_CONFIG),y)
DPDKBUILD = dpdkbuild
DIRS-y += dpdkbuild
endif
endif
endif
endif
ifeq ($(OS),Windows)
ifeq ($(CURDIR)/wpdk/build,$(CONFIG_WPDK_DIR))
WPDK = wpdk
DIRS-y += wpdk
endif
endif
ifeq ($(CONFIG_SHARED),y)
LIB = shared_lib
@ -61,29 +74,18 @@ DPDK_DEPS += ipsecbuild
endif
ifeq ($(CONFIG_ISAL),y)
ISALBUILD = isalbuild
LIB += isalbuild
DPDK_DEPS += isalbuild
ifeq ($(CONFIG_ISAL_CRYPTO),y)
ISALCRYPTOBUILD = isalcryptobuild
LIB += isalcryptobuild
endif
endif
ifeq ($(CONFIG_VFIO_USER),y)
VFIOUSERBUILD = vfiouserbuild
LIB += vfiouserbuild
endif
ifeq ($(CONFIG_XNVME),y)
XNVMEBUILD = xnvmebuild
LIB += xnvmebuild
endif
all: mk/cc.mk $(DIRS-y)
clean: $(DIRS-y)
$(Q)rm -f include/spdk/config.h
$(Q)rm -rf build
$(Q)rm -rf build/bin
$(Q)rm -rf build/fio
$(Q)rm -rf build/examples
$(Q)rm -rf build/include
$(Q)find build/lib ! -name .gitignore -type f -delete
install: all
$(Q)echo "Installed to $(DESTDIR)$(CONFIG_PREFIX)"
@ -92,11 +94,10 @@ uninstall: $(DIRS-y)
$(Q)echo "Uninstalled spdk"
ifneq ($(SKIP_DPDK_BUILD),1)
dpdkdeps $(DPDK_DEPS): $(WPDK)
dpdkbuild: $(WPDK) $(DPDK_DEPS)
dpdkbuild: $(DPDK_DEPS)
endif
lib: $(WPDK) $(DPDKBUILD) $(VFIOUSERBUILD) $(XNVMEBUILD) $(ISALBUILD) $(ISALCRYPTOBUILD)
lib: $(DPDKBUILD)
module: lib
shared_lib: module
app: $(LIB)
@ -112,7 +113,7 @@ mk/cc.mk:
false
build_dir: mk/cc.mk
$(Q)mkdir -p build/lib/pkgconfig/tmp
$(Q)mkdir -p build/lib
$(Q)mkdir -p build/bin
$(Q)mkdir -p build/fio
$(Q)mkdir -p build/examples

View File

@ -2,11 +2,6 @@
[![Build Status](https://travis-ci.org/spdk/spdk.svg?branch=master)](https://travis-ci.org/spdk/spdk)
NOTE: The SPDK mailing list has moved to a new location. Please visit
[this URL](https://lists.linuxfoundation.org/mailman/listinfo/spdk) to subscribe
at the new location. Subscribers from the old location will not be automatically
migrated to the new location.
The Storage Performance Development Kit ([SPDK](http://www.spdk.io)) provides a set of tools
and libraries for writing high performance, scalable, user-mode storage
applications. It achieves high performance by moving all of the necessary
@ -23,7 +18,7 @@ The development kit currently includes:
* [vhost target](http://www.spdk.io/doc/vhost.html)
* [Virtio-SCSI driver](http://www.spdk.io/doc/virtio.html)
## In this readme
# In this readme
* [Documentation](#documentation)
* [Prerequisites](#prerequisites)
@ -134,9 +129,7 @@ Boolean (on/off) options are configured with a 'y' (yes) or 'n' (no). For
example, this line of `CONFIG` controls whether the optional RDMA (libibverbs)
support is enabled:
~~~{.sh}
CONFIG_RDMA?=n
~~~
CONFIG_RDMA?=n
To enable RDMA, this line may be added to `mk/config.mk` with a 'y' instead of
'n'. For the majority of options this can be done using the `configure` script.
@ -228,13 +221,6 @@ configuring 8192MB memory.
sudo HUGEMEM=8192 scripts/setup.sh
~~~
There are a lot of other environment variables that can be set to configure
setup.sh for advanced users. To see the full list, run:
~~~{.sh}
scripts/setup.sh --help
~~~
<a id="examples"></a>
## Example Code

View File

@ -1,4 +0,0 @@
# Security Policy
The SPDK community has a documented CVE process [here](https://spdk.io/cve_threat/) that describes
both how to report a potential security issue as well as who to contact for more information.

View File

@ -1,7 +1,35 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation.
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
@ -9,13 +37,11 @@ include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
DIRS-y += trace
DIRS-y += trace_record
DIRS-y += nvmf_tgt
DIRS-y += iscsi_top
DIRS-y += iscsi_tgt
DIRS-y += spdk_tgt
DIRS-y += spdk_lspci
ifneq ($(OS),Windows)
# TODO - currently disabled on Windows due to lack of support for curses
DIRS-y += spdk_top
endif
ifeq ($(OS),Linux)
DIRS-$(CONFIG_VHOST) += vhost
DIRS-y += spdk_dd

View File

@ -1,7 +1,35 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2016 Intel Corporation.
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
@ -15,20 +43,17 @@ CFLAGS += -I$(SPDK_ROOT_DIR)/lib
C_SRCS := iscsi_tgt.c
SPDK_LIB_LIST = $(ALL_MODULES_LIST) event event_iscsi
SPDK_LIB_LIST = $(ALL_MODULES_LIST)
SPDK_LIB_LIST += $(EVENT_BDEV_SUBSYSTEM) event_iscsi event_net event_scsi event
SPDK_LIB_LIST += jsonrpc json rpc bdev iscsi scsi accel trace conf
SPDK_LIB_LIST += thread util log net sock notify
ifeq ($(SPDK_ROOT_DIR)/lib/env_dpdk,$(CONFIG_ENV))
SPDK_LIB_LIST += env_dpdk_rpc
endif
ifeq ($(OS),Linux)
SPDK_LIB_LIST += event_nbd
SPDK_LIB_LIST += event_nbd nbd
endif
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
install: $(APP)
$(INSTALL_APP)
uninstall:
$(UNINSTALL_APP)

View File

@ -1,6 +1,34 @@
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright (C) 2016 Intel Corporation.
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
@ -9,6 +37,7 @@
#include "spdk/event.h"
#include "iscsi/iscsi.h"
#include "spdk/log.h"
#include "spdk/net.h"
static int g_daemon_mode = 0;
@ -46,7 +75,7 @@ main(int argc, char **argv)
int rc;
struct spdk_app_opts opts = {};
spdk_app_opts_init(&opts, sizeof(opts));
spdk_app_opts_init(&opts);
opts.name = "iscsi";
if ((rc = spdk_app_parse_args(argc, argv, &opts, "b", NULL,
iscsi_parse_arg, iscsi_usage)) !=

1
app/iscsi_top/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
iscsi_top

46
app/iscsi_top/Makefile Normal file
View File

@ -0,0 +1,46 @@
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk
APP = iscsi_top
SPDK_LIB_LIST = jsonrpc json rpc log util
CFLAGS += -I$(SPDK_ROOT_DIR)/lib
C_SRCS := iscsi_top.c
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk

277
app/iscsi_top/iscsi_top.c Normal file
View File

@ -0,0 +1,277 @@
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
#include "spdk/event.h"
#include "spdk/jsonrpc.h"
#include "spdk/rpc.h"
#include "spdk/string.h"
#include "spdk/trace.h"
#include "spdk/util.h"
#include "iscsi/conn.h"
static char *exe_name;
static int g_shm_id = 0;
struct spdk_jsonrpc_client *g_rpc_client;
static void usage(void)
{
fprintf(stderr, "usage:\n");
fprintf(stderr, " %s <option>\n", exe_name);
fprintf(stderr, " option = '-i' to specify the shared memory ID,"
" (required)\n");
fprintf(stderr, " -r <path> RPC listen address (default: /var/tmp/spdk.sock\n");
}
struct rpc_conn_info {
uint32_t id;
uint32_t cid;
uint32_t tsih;
uint32_t lcore_id;
char *initiator_addr;
char *target_addr;
char *target_node_name;
};
static struct rpc_conn_info g_conn_info[1024];
static const struct spdk_json_object_decoder rpc_conn_info_decoders[] = {
{"id", offsetof(struct rpc_conn_info, id), spdk_json_decode_uint32},
{"cid", offsetof(struct rpc_conn_info, cid), spdk_json_decode_uint32},
{"tsih", offsetof(struct rpc_conn_info, tsih), spdk_json_decode_uint32},
{"lcore_id", offsetof(struct rpc_conn_info, lcore_id), spdk_json_decode_uint32},
{"initiator_addr", offsetof(struct rpc_conn_info, initiator_addr), spdk_json_decode_string},
{"target_addr", offsetof(struct rpc_conn_info, target_addr), spdk_json_decode_string},
{"target_node_name", offsetof(struct rpc_conn_info, target_node_name), spdk_json_decode_string},
};
static int
rpc_decode_conn_object(const struct spdk_json_val *val, void *out)
{
struct rpc_conn_info *info = (struct rpc_conn_info *)out;
return spdk_json_decode_object(val, rpc_conn_info_decoders,
SPDK_COUNTOF(rpc_conn_info_decoders), info);
}
static void
print_connections(void)
{
struct spdk_jsonrpc_client_response *json_resp = NULL;
struct spdk_json_write_ctx *w;
struct spdk_jsonrpc_client_request *request;
int rc;
size_t conn_count, i;
struct rpc_conn_info *conn;
request = spdk_jsonrpc_client_create_request();
if (request == NULL) {
return;
}
w = spdk_jsonrpc_begin_request(request, 1, "iscsi_get_connections");
spdk_jsonrpc_end_request(request, w);
spdk_jsonrpc_client_send_request(g_rpc_client, request);
do {
rc = spdk_jsonrpc_client_poll(g_rpc_client, 1);
} while (rc == 0 || rc == -ENOTCONN);
if (rc <= 0) {
goto end;
}
json_resp = spdk_jsonrpc_client_get_response(g_rpc_client);
if (json_resp == NULL) {
goto end;
}
if (spdk_json_decode_array(json_resp->result, rpc_decode_conn_object, g_conn_info,
SPDK_COUNTOF(g_conn_info), &conn_count, sizeof(struct rpc_conn_info))) {
goto end;
}
for (i = 0; i < conn_count; i++) {
conn = &g_conn_info[i];
printf("Connection: %u CID: %u TSIH: %u Initiator Address: %s Target Address: %s Target Node Name: %s\n",
conn->id, conn->cid, conn->tsih, conn->initiator_addr, conn->target_addr, conn->target_node_name);
}
end:
spdk_jsonrpc_client_free_request(request);
}
int main(int argc, char **argv)
{
void *history_ptr;
struct spdk_trace_histories *histories;
struct spdk_trace_history *history;
const char *rpc_socket_path = SPDK_DEFAULT_RPC_ADDR;
uint64_t tasks_done, last_tasks_done[SPDK_TRACE_MAX_LCORE];
int delay, old_delay, history_fd, i, quit, rc;
int tasks_done_delta, tasks_done_per_sec;
int total_tasks_done_per_sec;
struct timeval timeout;
fd_set fds;
char ch;
struct termios oldt, newt;
char spdk_trace_shm_name[64];
int op;
exe_name = argv[0];
while ((op = getopt(argc, argv, "i:r:")) != -1) {
switch (op) {
case 'i':
g_shm_id = spdk_strtol(optarg, 10);
break;
case 'r':
rpc_socket_path = optarg;
break;
default:
usage();
exit(1);
}
}
g_rpc_client = spdk_jsonrpc_client_connect(rpc_socket_path, AF_UNIX);
if (!g_rpc_client) {
fprintf(stderr, "spdk_jsonrpc_client_connect() failed: %d\n", errno);
return 1;
}
snprintf(spdk_trace_shm_name, sizeof(spdk_trace_shm_name), "/iscsi_trace.%d", g_shm_id);
history_fd = shm_open(spdk_trace_shm_name, O_RDONLY, 0600);
if (history_fd < 0) {
fprintf(stderr, "Unable to open history shm %s\n", spdk_trace_shm_name);
usage();
exit(1);
}
history_ptr = mmap(NULL, sizeof(*histories), PROT_READ, MAP_SHARED, history_fd, 0);
if (history_ptr == MAP_FAILED) {
fprintf(stderr, "Unable to mmap history shm (%d).\n", errno);
exit(1);
}
histories = (struct spdk_trace_histories *)history_ptr;
memset(last_tasks_done, 0, sizeof(last_tasks_done));
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
history = spdk_get_per_lcore_history(histories, i);
last_tasks_done[i] = history->tpoint_count[TRACE_ISCSI_TASK_DONE];
}
delay = 1;
quit = 0;
tcgetattr(0, &oldt);
newt = oldt;
newt.c_lflag &= ~(ICANON);
tcsetattr(0, TCSANOW, &newt);
while (1) {
FD_ZERO(&fds);
FD_SET(0, &fds);
timeout.tv_sec = delay;
timeout.tv_usec = 0;
rc = select(2, &fds, NULL, NULL, &timeout);
if (rc > 0) {
if (read(0, &ch, 1) != 1) {
fprintf(stderr, "Read error on stdin\n");
goto cleanup;
}
printf("\b");
switch (ch) {
case 'd':
printf("Enter num seconds to delay (1-10): ");
old_delay = delay;
rc = scanf("%d", &delay);
if (rc != 1) {
fprintf(stderr, "Illegal delay value\n");
delay = old_delay;
} else if (delay < 1 || delay > 10) {
delay = 1;
}
break;
case 'q':
quit = 1;
break;
default:
fprintf(stderr, "'%c' not recognized\n", ch);
break;
}
if (quit == 1) {
break;
}
}
printf("\e[1;1H\e[2J");
print_connections();
printf("lcore tasks\n");
printf("=============\n");
total_tasks_done_per_sec = 0;
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
history = spdk_get_per_lcore_history(histories, i);
tasks_done = history->tpoint_count[TRACE_ISCSI_TASK_DONE];
tasks_done_delta = tasks_done - last_tasks_done[i];
if (tasks_done_delta == 0) {
continue;
}
last_tasks_done[i] = tasks_done;
tasks_done_per_sec = tasks_done_delta / delay;
printf("%5d %7d\n", history->lcore, tasks_done_per_sec);
total_tasks_done_per_sec += tasks_done_per_sec;
}
printf("Total %7d\n", total_tasks_done_per_sec);
}
cleanup:
tcsetattr(0, TCSANOW, &oldt);
munmap(history_ptr, sizeof(*histories));
close(history_fd);
spdk_jsonrpc_client_close(g_rpc_client);
return (0);
}

View File

@ -1,7 +1,35 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2016 Intel Corporation.
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
@ -11,20 +39,24 @@ APP = nvmf_tgt
C_SRCS := nvmf_main.c
SPDK_LIB_LIST = $(ALL_MODULES_LIST) event event_nvmf
SPDK_LIB_LIST = $(ALL_MODULES_LIST)
SPDK_LIB_LIST += $(EVENT_BDEV_SUBSYSTEM) event_nvmf event_net
SPDK_LIB_LIST += nvmf event log trace conf thread util bdev accel rpc jsonrpc json net sock
SPDK_LIB_LIST += notify
ifeq ($(SPDK_ROOT_DIR)/lib/env_dpdk,$(CONFIG_ENV))
SPDK_LIB_LIST += env_dpdk_rpc
endif
ifeq ($(OS),Linux)
SPDK_LIB_LIST += event_nbd
SPDK_LIB_LIST += event_nbd nbd
endif
ifeq ($(CONFIG_FC),y)
ifneq ($(strip $(CONFIG_FC_PATH)),)
SYS_LIBS += -L$(CONFIG_FC_PATH)
endif
SYS_LIBS += -lufc
endif
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
install: $(APP)
$(INSTALL_APP)
uninstall:
$(UNINSTALL_APP)

View File

@ -1,6 +1,34 @@
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright (C) 2017 Intel Corporation.
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
@ -35,7 +63,7 @@ main(int argc, char **argv)
struct spdk_app_opts opts = {};
/* default value in opts */
spdk_app_opts_init(&opts, sizeof(opts));
spdk_app_opts_init(&opts);
opts.name = "nvmf";
if ((rc = spdk_app_parse_args(argc, argv, &opts, "", NULL,
nvmf_parse_arg, nvmf_usage)) !=

View File

@ -1,7 +1,35 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2017 Intel Corporation.
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
@ -11,12 +39,9 @@ APP = spdk_dd
C_SRCS := spdk_dd.c
SPDK_LIB_LIST = $(ALL_MODULES_LIST) event event_bdev
SPDK_LIB_LIST = $(ALL_MODULES_LIST)
SPDK_LIB_LIST += event_sock event_bdev event_accel event_vmd
SPDK_LIB_LIST += bdev accel event thread util conf trace \
log jsonrpc json rpc sock notify
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
install: $(APP)
$(INSTALL_APP)
uninstall:
$(UNINSTALL_APP)

File diff suppressed because it is too large Load Diff

View File

@ -1,22 +1,51 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation.
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk
APP = spdk_lspci
C_SRCS := spdk_lspci.c
SPDK_LIB_LIST = $(SOCK_MODULES_LIST) nvme vmd
SPDK_LIB_LIST = $(SOCK_MODULES_LIST)
SPDK_LIB_LIST += nvme thread util log sock vmd jsonrpc json rpc
ifeq ($(CONFIG_RDMA),y)
SPDK_LIB_LIST += rdma
ifeq ($(CONFIG_RDMA_PROV),mlx5_dv)
SYS_LIBS += -lmlx5
endif
endif
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
install: $(APP)
$(INSTALL_APP)
uninstall:
$(UNINSTALL_APP)

View File

@ -1,6 +1,34 @@
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright (C) 2019 Intel Corporation.
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
@ -21,7 +49,7 @@ pci_enum_cb(void *ctx, struct spdk_pci_device *dev)
}
static void
print_pci_dev(void *ctx, struct spdk_pci_device *dev)
print_pci_dev(struct spdk_pci_device *dev)
{
struct spdk_pci_addr pci_addr = spdk_pci_device_get_addr(dev);
char addr[32] = { 0 };
@ -46,8 +74,9 @@ print_pci_dev(void *ctx, struct spdk_pci_device *dev)
int
main(int argc, char **argv)
{
int op, rc = 0;
int op;
struct spdk_env_opts opts;
struct spdk_pci_device *dev;
while ((op = getopt(argc, argv, "h")) != -1) {
switch (op) {
@ -74,16 +103,21 @@ main(int argc, char **argv)
if (spdk_pci_enumerate(spdk_pci_nvme_get_driver(), pci_enum_cb, NULL)) {
printf("Unable to enumerate PCI nvme driver\n");
rc = 1;
goto exit;
return 1;
}
dev = spdk_pci_get_first_device();
if (!dev) {
printf("\nLack of PCI devices available for SPDK!\n");
}
printf("\nList of available PCI devices:\n");
spdk_pci_for_each_device(NULL, print_pci_dev);
while (dev) {
print_pci_dev(dev);
dev = spdk_pci_get_next_device(dev);
}
exit:
spdk_vmd_fini();
spdk_env_fini();
return rc;
return 0;
}

View File

@ -1,7 +1,35 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2018 Intel Corporation.
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
@ -13,23 +41,29 @@ C_SRCS := spdk_tgt.c
SPDK_LIB_LIST = $(ALL_MODULES_LIST)
SPDK_LIB_LIST += event event_iscsi event_nvmf
ifeq ($(OS),Linux)
ifeq ($(CONFIG_VHOST),y)
SPDK_LIB_LIST += vhost event_vhost
endif
endif
SPDK_LIB_LIST += $(EVENT_BDEV_SUBSYSTEM) event_iscsi event_net event_scsi event_nvmf event
SPDK_LIB_LIST += nvmf trace log conf thread util bdev iscsi scsi accel rpc jsonrpc json
SPDK_LIB_LIST += net sock notify
ifeq ($(SPDK_ROOT_DIR)/lib/env_dpdk,$(CONFIG_ENV))
SPDK_LIB_LIST += env_dpdk_rpc
endif
ifeq ($(OS),Linux)
SPDK_LIB_LIST += event_nbd
ifeq ($(CONFIG_UBLK),y)
SPDK_LIB_LIST += event_ublk
SPDK_LIB_LIST += event_nbd nbd
endif
ifeq ($(CONFIG_VHOST),y)
SPDK_LIB_LIST += event_vhost_blk event_vhost_scsi
endif
ifeq ($(CONFIG_VFIO_USER),y)
SPDK_LIB_LIST += event_vfu_tgt
ifeq ($(CONFIG_FC),y)
ifneq ($(strip $(CONFIG_FC_PATH)),)
SYS_LIBS += -L$(CONFIG_FC_PATH)
endif
SYS_LIBS += -lufc
endif
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk

View File

@ -1,6 +1,34 @@
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright (C) 2018 Intel Corporation.
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
@ -81,7 +109,7 @@ main(int argc, char **argv)
struct spdk_app_opts opts = {};
int rc;
spdk_app_opts_init(&opts, sizeof(opts));
spdk_app_opts_init(&opts);
opts.name = "spdk_tgt";
if ((rc = spdk_app_parse_args(argc, argv, &opts, g_spdk_tgt_get_opts_string,
NULL, spdk_tgt_parse_arg, spdk_tgt_usage)) !=

View File

@ -1,7 +1,35 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation.
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
@ -10,13 +38,7 @@ APP = spdk_top
C_SRCS := spdk_top.c
SPDK_LIB_LIST = rpc
LIBS=-lpanel -lmenu -lncurses
SPDK_LIB_LIST = jsonrpc json rpc log util
LIBS=-lncurses -lpanel -lmenu
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
install: $(APP)
$(INSTALL_APP)
uninstall:
$(UNINSTALL_APP)

File diff suppressed because it is too large Load Diff

View File

@ -1,7 +1,35 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation.
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
@ -10,14 +38,6 @@ include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk
APP = spdk_trace
SPDK_NO_LINK_ENV = 1
SPDK_LIB_LIST += json trace_parser
CXX_SRCS := trace.cpp
include $(SPDK_ROOT_DIR)/mk/spdk.app_cxx.mk
install: $(APP)
$(INSTALL_APP)
uninstall:
$(UNINSTALL_APP)

View File

@ -1,56 +1,91 @@
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright (C) 2016 Intel Corporation.
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
#include "spdk/env.h"
#include "spdk/json.h"
#include "spdk/likely.h"
#include "spdk/string.h"
#include "spdk/util.h"
#include <map>
extern "C" {
#include "spdk/trace_parser.h"
#include "spdk/trace.h"
#include "spdk/util.h"
}
static struct spdk_trace_parser *g_parser;
static const struct spdk_trace_flags *g_flags;
static struct spdk_json_write_ctx *g_json;
static struct spdk_trace_histories *g_histories;
static bool g_print_tsc = false;
/* This is a bit ugly, but we don't want to include env_dpdk in the app, while spdk_util, which we
* do need, uses some of the functions implemented there. We're not actually using the functions
* that depend on those, so just define them as no-ops to allow the app to link.
*/
extern "C" {
void *
spdk_realloc(void *buf, size_t size, size_t align)
{
assert(false);
return NULL;
}
void
spdk_free(void *buf)
{
assert(false);
}
uint64_t
spdk_get_ticks(void)
{
return 0;
}
} /* extern "C" */
static void usage(void);
struct entry_key {
entry_key(uint16_t _lcore, uint64_t _tsc) : lcore(_lcore), tsc(_tsc) {}
uint16_t lcore;
uint64_t tsc;
};
class compare_entry_key
{
public:
bool operator()(const entry_key &first, const entry_key &second) const
{
if (first.tsc == second.tsc) {
return first.lcore < second.lcore;
} else {
return first.tsc < second.tsc;
}
}
};
typedef std::map<entry_key, spdk_trace_entry *, compare_entry_key> entry_map;
entry_map g_entry_map;
struct object_stats {
std::map<uint64_t, uint64_t> start;
std::map<uint64_t, uint64_t> index;
std::map<uint64_t, uint64_t> size;
std::map<uint64_t, uint64_t> tpoint_id;
uint64_t counter;
object_stats() : start(), index(), size(), tpoint_id(), counter(0) {}
};
struct object_stats g_stats[SPDK_TRACE_MAX_OBJECT];
static char *g_exe_name;
static int g_verbose = 1;
static uint64_t g_tsc_rate;
static uint64_t g_first_tsc = 0x0;
static float
get_us_from_tsc(uint64_t tsc, uint64_t tsc_rate)
@ -58,19 +93,10 @@ get_us_from_tsc(uint64_t tsc, uint64_t tsc_rate)
return ((float)tsc) * 1000 * 1000 / tsc_rate;
}
static const char *
format_argname(const char *name)
{
static char namebuf[16];
snprintf(namebuf, sizeof(namebuf), "%s: ", name);
return namebuf;
}
static void
print_ptr(const char *arg_string, uint64_t arg)
{
printf("%-7.7s0x%-14jx ", format_argname(arg_string), arg);
printf("%-7.7s0x%-14jx ", arg_string, arg);
}
static void
@ -81,13 +107,14 @@ print_uint64(const char *arg_string, uint64_t arg)
* for FLUSH WRITEBUF when writev() returns -1 due to full
* socket buffer.
*/
printf("%-7.7s%-16jd ", format_argname(arg_string), arg);
printf("%-7.7s%-16jd ", arg_string, arg);
}
static void
print_string(const char *arg_string, const char *arg)
print_string(const char *arg_string, uint64_t arg)
{
printf("%-7.7s%-16.16s ", format_argname(arg_string), arg);
char *str = (char *)&arg;
printf("%-7.7s%.8s ", arg_string, str);
}
static void
@ -101,46 +128,64 @@ print_size(uint32_t size)
}
static void
print_object_id(const struct spdk_trace_tpoint *d, struct spdk_trace_parser_entry *entry)
print_object_id(uint8_t type, uint64_t id)
{
/* Set size to 128 and 256 bytes to make sure we can fit all the characters we need */
char related_id[128] = {'\0'};
char ids[256] = {'\0'};
if (entry->related_type != OBJECT_NONE) {
snprintf(related_id, sizeof(related_id), " (%c%jd)",
g_flags->object[entry->related_type].id_prefix,
entry->related_index);
}
snprintf(ids, sizeof(ids), "%c%jd%s", g_flags->object[d->object_type].id_prefix,
entry->object_index, related_id);
printf("id: %-17s", ids);
printf("id: %c%-15jd ", g_histories->flags.object[type].id_prefix, id);
}
static void
print_float(const char *arg_string, float arg)
{
printf("%-7s%-16.3f ", format_argname(arg_string), arg);
printf("%-7s%-16.3f ", arg_string, arg);
}
static void
print_event(struct spdk_trace_parser_entry *entry, uint64_t tsc_rate, uint64_t tsc_offset)
print_arg(uint8_t arg_type, const char *arg_string, uint64_t arg)
{
struct spdk_trace_entry *e = entry->entry;
const struct spdk_trace_tpoint *d;
float us;
size_t i;
if (arg_string[0] == 0) {
printf("%24s", "");
return;
}
switch (arg_type) {
case SPDK_TRACE_ARG_TYPE_PTR:
print_ptr(arg_string, arg);
break;
case SPDK_TRACE_ARG_TYPE_INT:
print_uint64(arg_string, arg);
break;
case SPDK_TRACE_ARG_TYPE_STR:
print_string(arg_string, arg);
break;
}
}
static void
print_event(struct spdk_trace_entry *e, uint64_t tsc_rate,
uint64_t tsc_offset, uint16_t lcore)
{
struct spdk_trace_tpoint *d;
struct object_stats *stats;
float us;
d = &g_histories->flags.tpoint[e->tpoint_id];
stats = &g_stats[d->object_type];
if (d->new_object) {
stats->index[e->object_id] = stats->counter++;
stats->tpoint_id[e->object_id] = e->tpoint_id;
stats->start[e->object_id] = e->tsc;
stats->size[e->object_id] = e->size;
}
d = &g_flags->tpoint[e->tpoint_id];
us = get_us_from_tsc(e->tsc - tsc_offset, tsc_rate);
printf("%2d: %10.3f ", entry->lcore, us);
printf("%2d: %10.3f ", lcore, us);
if (g_print_tsc) {
printf("(%9ju) ", e->tsc - tsc_offset);
}
if (g_flags->owner[d->owner_type].id_prefix) {
printf("%c%02d ", g_flags->owner[d->owner_type].id_prefix, e->poller_id);
if (g_histories->flags.owner[d->owner_type].id_prefix) {
printf("%c%02d ", g_histories->flags.owner[d->owner_type].id_prefix, e->poller_id);
} else {
printf("%4s", " ");
}
@ -148,181 +193,94 @@ print_event(struct spdk_trace_parser_entry *entry, uint64_t tsc_rate, uint64_t t
printf("%-*s ", (int)sizeof(d->name), d->name);
print_size(e->size);
print_arg(d->arg1_type, d->arg1_name, e->arg1);
if (d->new_object) {
print_object_id(d, entry);
print_object_id(d->object_type, stats->index[e->object_id]);
} else if (d->object_type != OBJECT_NONE) {
if (entry->object_index != UINT64_MAX) {
us = get_us_from_tsc(e->tsc - entry->object_start, tsc_rate);
print_object_id(d, entry);
print_float("time", us);
if (stats->start.find(e->object_id) != stats->start.end()) {
us = get_us_from_tsc(e->tsc - stats->start[e->object_id],
tsc_rate);
print_object_id(d->object_type, stats->index[e->object_id]);
print_float("time:", us);
} else {
printf("id: N/A");
}
} else if (e->object_id != 0) {
print_ptr("object", e->object_id);
}
for (i = 0; i < d->num_args; ++i) {
switch (d->args[i].type) {
case SPDK_TRACE_ARG_TYPE_PTR:
print_ptr(d->args[i].name, (uint64_t)entry->args[i].pointer);
break;
case SPDK_TRACE_ARG_TYPE_INT:
print_uint64(d->args[i].name, entry->args[i].integer);
break;
case SPDK_TRACE_ARG_TYPE_STR:
print_string(d->args[i].name, entry->args[i].string);
break;
}
print_arg(SPDK_TRACE_ARG_TYPE_PTR, "object: ", e->object_id);
}
printf("\n");
}
static void
print_event_json(struct spdk_trace_parser_entry *entry, uint64_t tsc_rate, uint64_t tsc_offset)
process_event(struct spdk_trace_entry *e, uint64_t tsc_rate,
uint64_t tsc_offset, uint16_t lcore)
{
struct spdk_trace_entry *e = entry->entry;
const struct spdk_trace_tpoint *d;
size_t i;
d = &g_flags->tpoint[e->tpoint_id];
spdk_json_write_object_begin(g_json);
spdk_json_write_named_uint64(g_json, "lcore", entry->lcore);
spdk_json_write_named_uint64(g_json, "tpoint", e->tpoint_id);
spdk_json_write_named_uint64(g_json, "tsc", e->tsc);
if (g_flags->owner[d->owner_type].id_prefix) {
spdk_json_write_named_string_fmt(g_json, "poller", "%c%02d",
g_flags->owner[d->owner_type].id_prefix,
e->poller_id);
if (g_verbose) {
print_event(e, tsc_rate, tsc_offset, lcore);
}
if (e->size != 0) {
spdk_json_write_named_uint32(g_json, "size", e->size);
}
if (d->new_object || d->object_type != OBJECT_NONE || e->object_id != 0) {
char object_type;
spdk_json_write_named_object_begin(g_json, "object");
if (d->new_object) {
object_type = g_flags->object[d->object_type].id_prefix;
spdk_json_write_named_string_fmt(g_json, "id", "%c%" PRIu64, object_type,
entry->object_index);
} else if (d->object_type != OBJECT_NONE) {
object_type = g_flags->object[d->object_type].id_prefix;
if (entry->object_index != UINT64_MAX) {
spdk_json_write_named_string_fmt(g_json, "id", "%c%" PRIu64,
object_type,
entry->object_index);
spdk_json_write_named_uint64(g_json, "time",
e->tsc - entry->object_start);
}
}
spdk_json_write_named_uint64(g_json, "value", e->object_id);
spdk_json_write_object_end(g_json);
}
/* Print related objects array */
if (entry->related_index != UINT64_MAX) {
spdk_json_write_named_string_fmt(g_json, "related", "%c%" PRIu64,
g_flags->object[entry->related_type].id_prefix,
entry->related_index);
}
if (d->num_args > 0) {
spdk_json_write_named_array_begin(g_json, "args");
for (i = 0; i < d->num_args; ++i) {
switch (d->args[i].type) {
case SPDK_TRACE_ARG_TYPE_PTR:
spdk_json_write_uint64(g_json, (uint64_t)entry->args[i].pointer);
break;
case SPDK_TRACE_ARG_TYPE_INT:
spdk_json_write_uint64(g_json, entry->args[i].integer);
break;
case SPDK_TRACE_ARG_TYPE_STR:
spdk_json_write_string(g_json, entry->args[i].string);
break;
}
}
spdk_json_write_array_end(g_json);
}
spdk_json_write_object_end(g_json);
}
static void
process_event(struct spdk_trace_parser_entry *e, uint64_t tsc_rate, uint64_t tsc_offset)
{
if (g_json == NULL) {
print_event(e, tsc_rate, tsc_offset);
} else {
print_event_json(e, tsc_rate, tsc_offset);
}
}
static void
print_tpoint_definitions(void)
{
const struct spdk_trace_tpoint *tpoint;
size_t i, j;
/* We only care about these when printing JSON */
if (!g_json) {
return;
}
spdk_json_write_named_uint64(g_json, "tsc_rate", g_flags->tsc_rate);
spdk_json_write_named_array_begin(g_json, "tpoints");
for (i = 0; i < SPDK_COUNTOF(g_flags->tpoint); ++i) {
tpoint = &g_flags->tpoint[i];
if (tpoint->tpoint_id == 0) {
continue;
}
spdk_json_write_object_begin(g_json);
spdk_json_write_named_string(g_json, "name", tpoint->name);
spdk_json_write_named_uint32(g_json, "id", tpoint->tpoint_id);
spdk_json_write_named_bool(g_json, "new_object", tpoint->new_object);
spdk_json_write_named_array_begin(g_json, "args");
for (j = 0; j < tpoint->num_args; ++j) {
spdk_json_write_object_begin(g_json);
spdk_json_write_named_string(g_json, "name", tpoint->args[j].name);
spdk_json_write_named_uint32(g_json, "type", tpoint->args[j].type);
spdk_json_write_named_uint32(g_json, "size", tpoint->args[j].size);
spdk_json_write_object_end(g_json);
}
spdk_json_write_array_end(g_json);
spdk_json_write_object_end(g_json);
}
spdk_json_write_array_end(g_json);
}
static int
print_json(void *cb_ctx, const void *data, size_t size)
populate_events(struct spdk_trace_history *history, int num_entries)
{
ssize_t rc;
int i, num_entries_filled;
struct spdk_trace_entry *e;
int first, last, lcore;
while (size > 0) {
rc = write(STDOUT_FILENO, data, size);
if (rc < 0) {
fprintf(stderr, "%s: %s\n", g_exe_name, spdk_strerror(errno));
abort();
lcore = history->lcore;
e = history->entries;
num_entries_filled = num_entries;
while (e[num_entries_filled - 1].tsc == 0) {
num_entries_filled--;
}
size -= rc;
if (num_entries == num_entries_filled) {
first = last = 0;
for (i = 1; i < num_entries; i++) {
if (e[i].tsc < e[first].tsc) {
first = i;
}
if (e[i].tsc > e[last].tsc) {
last = i;
}
}
} else {
first = 0;
last = num_entries_filled - 1;
}
return 0;
/*
* We keep track of the highest first TSC out of all reactors.
* We will ignore any events that occured before this TSC on any
* other reactors. This will ensure we only print data for the
* subset of time where we have data across all reactors.
*/
if (e[first].tsc > g_first_tsc) {
g_first_tsc = e[first].tsc;
}
i = first;
while (1) {
g_entry_map[entry_key(lcore, e[i].tsc)] = &e[i];
if (i == last) {
break;
}
i++;
if (i == num_entries_filled) {
i = 0;
}
}
return (0);
}
static void
usage(void)
static void usage(void)
{
fprintf(stderr, "usage:\n");
fprintf(stderr, " %s <option> <lcore#>\n", g_exe_name);
fprintf(stderr, " option = '-q' to disable verbose mode\n");
fprintf(stderr, " '-c' to display single lcore history\n");
fprintf(stderr, " '-t' to display TSC offset for each event\n");
fprintf(stderr, " '-s' to specify spdk_trace shm name for a\n");
@ -333,25 +291,25 @@ usage(void)
fprintf(stderr, " -i or -p must be specified)\n");
fprintf(stderr, " '-f' to specify a tracepoint file name\n");
fprintf(stderr, " (-s and -f are mutually exclusive)\n");
fprintf(stderr, " '-j' to use JSON to format the output\n");
}
int
main(int argc, char **argv)
int main(int argc, char **argv)
{
struct spdk_trace_parser_opts opts;
struct spdk_trace_parser_entry entry;
void *history_ptr;
struct spdk_trace_history *history;
int fd, i, rc;
int lcore = SPDK_TRACE_MAX_LCORE;
uint64_t tsc_offset, entry_count;
uint64_t tsc_offset;
const char *app_name = NULL;
const char *file_name = NULL;
int op, i;
int op;
char shm_name[64];
int shm_id = -1, shm_pid = -1;
bool json = false;
uint64_t trace_histories_size;
struct stat _stat;
g_exe_name = argv[0];
while ((op = getopt(argc, argv, "c:f:i:jp:s:t")) != -1) {
while ((op = getopt(argc, argv, "c:f:i:p:qs:t")) != -1) {
switch (op) {
case 'c':
lcore = atoi(optarg);
@ -368,6 +326,9 @@ main(int argc, char **argv)
case 'p':
shm_pid = atoi(optarg);
break;
case 'q':
g_verbose = 0;
break;
case 's':
app_name = optarg;
break;
@ -377,9 +338,6 @@ main(int argc, char **argv)
case 't':
g_print_tsc = true;
break;
case 'j':
json = true;
break;
default:
usage();
exit(1);
@ -398,65 +356,107 @@ main(int argc, char **argv)
exit(1);
}
if (json) {
g_json = spdk_json_write_begin(print_json, NULL, 0);
if (g_json == NULL) {
fprintf(stderr, "Failed to allocate JSON write context\n");
exit(1);
}
}
if (!file_name) {
if (file_name) {
fd = open(file_name, O_RDONLY);
} else {
if (shm_id >= 0) {
snprintf(shm_name, sizeof(shm_name), "/%s_trace.%d", app_name, shm_id);
} else {
snprintf(shm_name, sizeof(shm_name), "/%s_trace.pid%d", app_name, shm_pid);
}
fd = shm_open(shm_name, O_RDONLY, 0600);
file_name = shm_name;
}
opts.filename = file_name;
opts.lcore = lcore;
opts.mode = app_name == NULL ? SPDK_TRACE_PARSER_MODE_FILE : SPDK_TRACE_PARSER_MODE_SHM;
g_parser = spdk_trace_parser_init(&opts);
if (g_parser == NULL) {
fprintf(stderr, "Failed to initialize trace parser\n");
exit(1);
if (fd < 0) {
fprintf(stderr, "Could not open %s.\n", file_name);
usage();
exit(-1);
}
g_flags = spdk_trace_parser_get_flags(g_parser);
if (!g_json) {
printf("TSC Rate: %ju\n", g_flags->tsc_rate);
} else {
spdk_json_write_object_begin(g_json);
print_tpoint_definitions();
spdk_json_write_named_array_begin(g_json, "entries");
rc = fstat(fd, &_stat);
if (rc < 0) {
fprintf(stderr, "Could not get size of %s.\n", file_name);
usage();
exit(-1);
}
if ((size_t)_stat.st_size < sizeof(*g_histories)) {
fprintf(stderr, "%s is not a valid trace file\n", file_name);
usage();
exit(-1);
}
for (i = 0; i < SPDK_TRACE_MAX_LCORE; ++i) {
if (lcore == SPDK_TRACE_MAX_LCORE || i == lcore) {
entry_count = spdk_trace_parser_get_entry_count(g_parser, i);
if (entry_count > 0) {
printf("Trace Size of lcore (%d): %ju\n", i, entry_count);
}
}
/* Map the header of trace file */
history_ptr = mmap(NULL, sizeof(*g_histories), PROT_READ, MAP_SHARED, fd, 0);
if (history_ptr == MAP_FAILED) {
fprintf(stderr, "Could not mmap %s.\n", file_name);
usage();
exit(-1);
}
tsc_offset = spdk_trace_parser_get_tsc_offset(g_parser);
while (spdk_trace_parser_next_entry(g_parser, &entry)) {
if (entry.entry->tsc < tsc_offset) {
g_histories = (struct spdk_trace_histories *)history_ptr;
g_tsc_rate = g_histories->flags.tsc_rate;
if (g_tsc_rate == 0) {
fprintf(stderr, "Invalid tsc_rate %ju\n", g_tsc_rate);
usage();
exit(-1);
}
if (g_verbose) {
printf("TSC Rate: %ju\n", g_tsc_rate);
}
/* Remap the entire trace file */
trace_histories_size = spdk_get_trace_histories_size(g_histories);
munmap(history_ptr, sizeof(*g_histories));
if ((size_t)_stat.st_size < trace_histories_size) {
fprintf(stderr, "%s is not a valid trace file\n", file_name);
usage();
exit(-1);
}
history_ptr = mmap(NULL, trace_histories_size, PROT_READ, MAP_SHARED, fd, 0);
if (history_ptr == MAP_FAILED) {
fprintf(stderr, "Could not mmap %s.\n", file_name);
usage();
exit(-1);
}
g_histories = (struct spdk_trace_histories *)history_ptr;
if (lcore == SPDK_TRACE_MAX_LCORE) {
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
history = spdk_get_per_lcore_history(g_histories, i);
if (history->num_entries == 0 || history->entries[0].tsc == 0) {
continue;
}
process_event(&entry, g_flags->tsc_rate, tsc_offset);
if (g_verbose && history->num_entries) {
printf("Trace Size of lcore (%d): %ju\n", i, history->num_entries);
}
if (g_json != NULL) {
spdk_json_write_array_end(g_json);
spdk_json_write_object_end(g_json);
spdk_json_write_end(g_json);
populate_events(history, history->num_entries);
}
} else {
history = spdk_get_per_lcore_history(g_histories, lcore);
if (history->num_entries > 0 && history->entries[0].tsc != 0) {
if (g_verbose && history->num_entries) {
printf("Trace Size of lcore (%d): %ju\n", lcore, history->num_entries);
}
spdk_trace_parser_cleanup(g_parser);
populate_events(history, history->num_entries);
}
}
tsc_offset = g_first_tsc;
for (entry_map::iterator it = g_entry_map.begin(); it != g_entry_map.end(); it++) {
if (it->first.tsc < g_first_tsc) {
continue;
}
process_event(it->second, g_tsc_rate, tsc_offset, it->first.lcore);
}
munmap(history_ptr, trace_histories_size);
close(fd);
return (0);
}

View File

@ -1,7 +1,35 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation.
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
@ -13,9 +41,3 @@ APP = spdk_trace_record
C_SRCS := trace_record.c
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
install: $(APP)
$(INSTALL_APP)
uninstall:
$(UNINSTALL_APP)

View File

@ -1,6 +1,34 @@
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright (C) 2018 Intel Corporation.
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
@ -24,7 +52,6 @@ static uint64_t g_histories_size;
struct lcore_trace_record_ctx {
char lcore_file[TRACE_PATH_MAX];
int fd;
bool valid;
struct spdk_trace_history *in_history;
struct spdk_trace_history *out_history;
@ -95,15 +122,11 @@ input_trace_file_mmap(struct aggr_trace_record_ctx *ctx, const char *shm_name)
ctx->trace_histories = (struct spdk_trace_histories *)history_ptr;
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
struct spdk_trace_history *history;
ctx->lcore_ports[i].in_history = spdk_get_per_lcore_history(ctx->trace_histories, i);
history = spdk_get_per_lcore_history(ctx->trace_histories, i);
ctx->lcore_ports[i].in_history = history;
ctx->lcore_ports[i].valid = (history != NULL);
if (g_verbose && history) {
if (g_verbose) {
printf("Number of trace entries for lcore (%d): %ju\n", i,
history->num_entries);
ctx->lcore_ports[i].in_history->num_entries);
}
}
@ -154,10 +177,6 @@ output_trace_files_prepare(struct aggr_trace_record_ctx *ctx, const char *aggr_p
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
port_ctx = &ctx->lcore_ports[i];
if (!port_ctx->valid) {
continue;
}
port_ctx->fd = open(port_ctx->lcore_file, flags, 0600);
if (port_ctx->fd < 0) {
fprintf(stderr, "Could not open lcore file %s.\n", port_ctx->lcore_file);
@ -441,7 +460,6 @@ trace_files_aggregate(struct aggr_trace_record_ctx *ctx)
uint64_t lcore_offsets[SPDK_TRACE_MAX_LCORE + 1];
int rc, i;
ssize_t len = 0;
uint64_t current_offset;
uint64_t len_sum;
ctx->out_fd = open(ctx->out_file, flags, 0600);
@ -463,17 +481,11 @@ trace_files_aggregate(struct aggr_trace_record_ctx *ctx)
}
/* Update and append lcore offsets converged trace file */
current_offset = sizeof(struct spdk_trace_flags);
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
lcore_port = &ctx->lcore_ports[i];
if (lcore_port->valid) {
lcore_offsets[i] = current_offset;
current_offset += spdk_get_trace_history_size(lcore_port->num_entries);
} else {
lcore_offsets[i] = 0;
lcore_offsets[0] = sizeof(struct spdk_trace_flags);
for (i = 1; i < (int)SPDK_COUNTOF(lcore_offsets); i++) {
lcore_offsets[i] = spdk_get_trace_history_size(ctx->lcore_ports[i - 1].num_entries) +
lcore_offsets[i - 1];
}
}
lcore_offsets[SPDK_TRACE_MAX_LCORE] = current_offset;
rc = cont_write(ctx->out_fd, lcore_offsets, sizeof(lcore_offsets));
if (rc < 0) {
@ -485,10 +497,6 @@ trace_files_aggregate(struct aggr_trace_record_ctx *ctx)
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
lcore_port = &ctx->lcore_ports[i];
if (!lcore_port->valid) {
continue;
}
lcore_port->out_history->num_entries = lcore_port->num_entries;
rc = cont_write(ctx->out_fd, lcore_port->out_history, sizeof(struct spdk_trace_history));
if (rc < 0) {
@ -513,9 +521,6 @@ trace_files_aggregate(struct aggr_trace_record_ctx *ctx)
}
}
/* Clear rc so that the last cont_write() doesn't get interpreted as a failure. */
rc = 0;
if (len_sum != lcore_port->num_entries * sizeof(struct spdk_trace_entry)) {
fprintf(stderr, "Len of lcore trace file doesn't match number of entries for lcore\n");
}
@ -561,8 +566,7 @@ setup_exit_signal_handler(void)
return rc;
}
static void
usage(void)
static void usage(void)
{
printf("\n%s is used to record all SPDK generated trace entries\n", g_exe_name);
printf("from SPDK trace shared-memory to specified file.\n\n");
@ -578,8 +582,7 @@ usage(void)
printf(" '-h' to print usage information\n");
}
int
main(int argc, char **argv)
int main(int argc, char **argv)
{
const char *app_name = NULL;
const char *file_name = NULL;
@ -610,8 +613,6 @@ main(int argc, char **argv)
file_name = optarg;
break;
case 'h':
usage();
exit(EXIT_SUCCESS);
default:
usage();
exit(1);
@ -662,9 +663,6 @@ main(int argc, char **argv)
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
lcore_port = &ctx.lcore_ports[i];
if (!lcore_port->valid) {
continue;
}
rc = lcore_trace_record(lcore_port);
if (rc) {
break;

View File

@ -1,7 +1,35 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2017 Intel Corporation.
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
@ -11,16 +39,16 @@ APP = vhost
C_SRCS := vhost.c
SPDK_LIB_LIST = $(ALL_MODULES_LIST) event event_vhost_blk event_vhost_scsi event_nbd
SPDK_LIB_LIST = $(ALL_MODULES_LIST)
SPDK_LIB_LIST += vhost event_vhost
SPDK_LIB_LIST += $(EVENT_BDEV_SUBSYSTEM) event_net event_scsi event
SPDK_LIB_LIST += jsonrpc json rpc bdev scsi accel trace conf
SPDK_LIB_LIST += thread util log
SPDK_LIB_LIST += event_nbd nbd net sock notify
ifeq ($(SPDK_ROOT_DIR)/lib/env_dpdk,$(CONFIG_ENV))
SPDK_LIB_LIST += env_dpdk_rpc
endif
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
install: $(APP)
$(INSTALL_APP)
uninstall:
$(UNINSTALL_APP)

View File

@ -1,6 +1,34 @@
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright (C) 2017 Intel Corporation.
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
@ -60,7 +88,7 @@ main(int argc, char *argv[])
struct spdk_app_opts opts = {};
int rc;
spdk_app_opts_init(&opts, sizeof(opts));
spdk_app_opts_init(&opts);
opts.name = "vhost";
if ((rc = spdk_app_parse_args(argc, argv, &opts, "f:S:", NULL,

View File

@ -1,14 +1,32 @@
#!/usr/bin/env bash
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation
# All rights reserved.
#
set -e
# If the configuration of tests is not provided, no tests will be carried out.
if [[ ! -f $1 ]]; then
echo "ERROR: SPDK test configuration not specified"
exit 1
fi
rootdir=$(readlink -f $(dirname $0))
source "$rootdir/test/common/autobuild_common.sh"
source "$1"
source "$rootdir/test/common/autotest_common.sh"
out=$output_dir
if [ -n "$SPDK_TEST_NATIVE_DPDK" ]; then
scanbuild_exclude=" --exclude $(dirname $SPDK_RUN_EXTERNAL_DPDK)"
else
scanbuild_exclude="--exclude $rootdir/dpdk/"
fi
scanbuild="scan-build -o $output_dir/scan-build-tmp $scanbuild_exclude --status-bugs"
config_params=$(get_config_params)
trap '[[ -d $SPDK_WORKSPACE ]] && rm -rf "$SPDK_WORKSPACE"' 0
SPDK_WORKSPACE=$(mktemp -dt "spdk_$(date +%s).XXXXXX")
export SPDK_WORKSPACE
SPDK_TEST_AUTOBUILD=${SPDK_TEST_AUTOBUILD:-}
umask 022
cd $rootdir
@ -16,6 +34,232 @@ cd $rootdir
date -u
git describe --tags
function ocf_precompile() {
# We compile OCF sources ourselves
# They don't need to be checked with scanbuild and code coverage is not applicable
# So we precompile OCF now for further use as standalone static library
./configure $(echo $config_params | sed 's/--enable-coverage//g')
$MAKE $MAKEFLAGS include/spdk/config.h
CC=gcc CCAR=ar $MAKE $MAKEFLAGS -C lib/env_ocf exportlib O=$rootdir/build/ocf.a
# Set config to use precompiled library
config_params="$config_params --with-ocf=/$rootdir/build/ocf.a"
# need to reconfigure to avoid clearing ocf related files on future make clean.
./configure $config_params
}
function build_native_dpdk() {
local external_dpdk_dir
local external_dpdk_base_dir
external_dpdk_dir="$SPDK_RUN_EXTERNAL_DPDK"
external_dpdk_base_dir="$(dirname $external_dpdk_dir)"
if [[ ! -d "$external_dpdk_base_dir" ]]; then
sudo mkdir -p "$external_dpdk_base_dir"
sudo chown -R $(whoami) "$external_dpdk_base_dir"/..
fi
orgdir=$PWD
rm -rf "$external_dpdk_base_dir"
git clone --branch $SPDK_TEST_NATIVE_DPDK --depth 1 http://dpdk.org/git/dpdk "$external_dpdk_base_dir"
dpdk_cflags="-fPIC -g -Werror -fcommon"
dpdk_ldflags=""
# the drivers we use
DPDK_DRIVERS=("bus" "bus/pci" "bus/vdev" "mempool/ring")
# all possible DPDK drivers
DPDK_ALL_DRIVERS=($(find "$external_dpdk_base_dir/drivers" -mindepth 1 -type d | sed -n "s#^$external_dpdk_base_dir/drivers/##p"))
if [[ "$SPDK_TEST_CRYPTO" -eq 1 ]]; then
git clone --branch v0.54 --depth 1 https://github.com/intel/intel-ipsec-mb.git "$external_dpdk_base_dir/intel-ipsec-mb"
cd "$external_dpdk_base_dir/intel-ipsec-mb"
$MAKE $MAKEFLAGS all SHARED=y EXTRA_CFLAGS=-fPIC
DPDK_DRIVERS+=("crypto")
DPDK_DRIVERS+=("crypto/aesni_mb")
DPDK_DRIVERS+=("crypto/qat")
DPDK_DRIVERS+=("compress/qat")
DPDK_DRIVERS+=("common/qat")
dpdk_cflags+=" -I$external_dpdk_base_dir/intel-ipsec-mb"
dpdk_ldflags+=" -L$external_dpdk_base_dir/intel-ipsec-mb"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$external_dpdk_base_dir/intel-ipsec-mb
fi
if [[ "$SPDK_TEST_REDUCE" -eq 1 ]]; then
isal_dir="$external_dpdk_base_dir/isa-l"
git clone --branch v2.29.0 --depth 1 https://github.com/intel/isa-l.git "$isal_dir"
cd $isal_dir
./autogen.sh
./configure CFLAGS="-fPIC -g -O2" --enable-shared=yes --prefix="$isal_dir/build"
ln -s $PWD/include $PWD/isa-l
$MAKE $MAKEFLAGS all
$MAKE install
DPDK_DRIVERS+=("compress")
DPDK_DRIVERS+=("compress/isal")
DPDK_DRIVERS+=("compress/qat")
DPDK_DRIVERS+=("common/qat")
export PKG_CONFIG_PATH="$PKG_CONFIG_PATH:$isal_dir/build/lib/pkgconfig"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$isal_dir/build/lib"
fi
# Use difference between DPDK_ALL_DRIVERS and DPDK_DRIVERS as a set of DPDK drivers we don't want or
# don't need to build.
DPDK_DISABLED_DRIVERS=($(sort <(printf "%s\n" "${DPDK_DRIVERS[@]}") <(printf "%s\n" "${DPDK_ALL_DRIVERS[@]}") | uniq -u))
cd $external_dpdk_base_dir
if [ "$(uname -s)" = "Linux" ]; then
dpdk_cflags+=" -Wno-stringop-overflow"
# Fix for freeing device if not kernel driver configured.
# TODO: Remove once this is merged in upstream DPDK
if grep "20.08.0" $external_dpdk_base_dir/VERSION; then
wget https://github.com/spdk/dpdk/commit/64f1ced13f974e8b3d46b87c361a09eca68126f9.patch -O dpdk-pci.patch
wget https://github.com/spdk/dpdk/commit/c2c273d5c8fbf673623b427f8f4ab5af5ddf0e08.patch -O dpdk-qat.patch
else
wget https://github.com/karlatec/dpdk/commit/3219c0cfc38803aec10c809dde16e013b370bda9.patch -O dpdk-pci.patch
wget https://github.com/karlatec/dpdk/commit/adf8f7638de29bc4bf9ba3faf12bbdae73acda0c.patch -O dpdk-qat.patch
fi
git config --local user.name "spdk"
git config --local user.email "nomail@all.com"
git am dpdk-pci.patch
git am dpdk-qat.patch
fi
meson build-tmp --prefix="$external_dpdk_dir" --libdir lib \
-Denable_docs=false -Denable_kmods=false -Dtests=false \
-Dc_link_args="$dpdk_ldflags" -Dc_args="$dpdk_cflags" \
-Dmachine=native -Ddisable_drivers=$(printf "%s," "${DPDK_DISABLED_DRIVERS[@]}")
ninja -C "$external_dpdk_base_dir/build-tmp" $MAKEFLAGS
ninja -C "$external_dpdk_base_dir/build-tmp" $MAKEFLAGS install
# Save this path. In tests are run using autorun.sh then autotest.sh
# script will be unaware of LD_LIBRARY_PATH and will fail tests.
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" > /tmp/spdk-ld-path
cd "$orgdir"
}
function make_fail_cleanup() {
if [ -d $out/scan-build-tmp ]; then
scanoutput=$(ls -1 $out/scan-build-tmp/)
mv $out/scan-build-tmp/$scanoutput $out/scan-build
rm -rf $out/scan-build-tmp
chmod -R a+rX $out/scan-build
fi
false
}
function scanbuild_make() {
pass=true
$scanbuild $MAKE $MAKEFLAGS > $out/build_output.txt && rm -rf $out/scan-build-tmp || make_fail_cleanup
xtrace_disable
rm -f $out/*files.txt
for ent in $(find app examples lib module test -type f | grep -vF ".h"); do
if [[ $ent == lib/env_ocf* ]]; then continue; fi
if file -bi $ent | grep -q 'text/x-c'; then
echo $ent | sed 's/\.cp\{0,2\}$//g' >> $out/all_c_files.txt
fi
done
xtrace_restore
grep -E "CC|CXX" $out/build_output.txt | sed 's/\s\s\(CC\|CXX\)\s//g' | sed 's/\.o//g' > $out/built_c_files.txt
cat $rootdir/test/common/skipped_build_files.txt >> $out/built_c_files.txt
sort -o $out/all_c_files.txt $out/all_c_files.txt
sort -o $out/built_c_files.txt $out/built_c_files.txt
# from comm manual:
# -2 suppress column 2 (lines unique to FILE2)
# -3 suppress column 3 (lines that appear in both files)
# comm may exit 1 if no lines were printed (undocumented, unreliable)
comm -2 -3 $out/all_c_files.txt $out/built_c_files.txt > $out/unbuilt_c_files.txt || true
if [ $(wc -l < $out/unbuilt_c_files.txt) -ge 1 ]; then
echo "missing files"
cat $out/unbuilt_c_files.txt
pass=false
fi
$pass
}
function porcelain_check() {
if [ $(git status --porcelain --ignore-submodules | wc -l) -ne 0 ]; then
echo "Generated files missing from .gitignore:"
git status --porcelain --ignore-submodules
exit 1
fi
}
# Check that header file dependencies are working correctly by
# capturing a binary's stat data before and after touching a
# header file and re-making.
function header_dependency_check() {
STAT1=$(stat $SPDK_BIN_DIR/spdk_tgt)
sleep 1
touch lib/nvme/nvme_internal.h
$MAKE $MAKEFLAGS
STAT2=$(stat $SPDK_BIN_DIR/spdk_tgt)
if [ "$STAT1" == "$STAT2" ]; then
echo "Header dependency check failed"
false
fi
}
function test_make_uninstall() {
# Create empty file to check if it is not deleted by target uninstall
touch "$SPDK_WORKSPACE/usr/lib/sample_xyz.a"
$MAKE $MAKEFLAGS uninstall DESTDIR="$SPDK_WORKSPACE" prefix=/usr
if [[ $(find "$SPDK_WORKSPACE/usr" -maxdepth 1 -mindepth 1 | wc -l) -ne 2 ]] || [[ $(find "$SPDK_WORKSPACE/usr/lib/" -maxdepth 1 -mindepth 1 | wc -l) -ne 1 ]]; then
ls -lR "$SPDK_WORKSPACE"
echo "Make uninstall failed"
exit 1
fi
}
function build_doc() {
$MAKE -C "$rootdir"/doc --no-print-directory $MAKEFLAGS &> "$out"/doxygen.log
if [ -s "$out"/doxygen.log ]; then
cat "$out"/doxygen.log
echo "Doxygen errors found!"
exit 1
fi
if hash pdflatex 2> /dev/null; then
$MAKE -C "$rootdir"/doc/output/latex --no-print-directory $MAKEFLAGS &>> "$out"/doxygen.log
fi
mkdir -p "$out"/doc
mv "$rootdir"/doc/output/html "$out"/doc
if [ -f "$rootdir"/doc/output/latex/refman.pdf ]; then
mv "$rootdir"/doc/output/latex/refman.pdf "$out"/doc/spdk.pdf
fi
$MAKE -C "$rootdir"/doc --no-print-directory $MAKEFLAGS clean &>> "$out"/doxygen.log
if [ -s "$out"/doxygen.log ]; then
rm "$out"/doxygen.log
fi
rm -rf "$rootdir"/doc/output
}
function autobuild_test_suite() {
run_test "autobuild_check_format" ./scripts/check_format.sh
run_test "autobuild_external_code" sudo -E --preserve-env=PATH LD_LIBRARY_PATH=$LD_LIBRARY_PATH $rootdir/test/external_code/test_make.sh $rootdir
if [ "$SPDK_TEST_OCF" -eq 1 ]; then
run_test "autobuild_ocf_precompile" ocf_precompile
fi
run_test "autobuild_check_so_deps" $rootdir/test/make/check_so_deps.sh $1
./configure $config_params --without-shared
run_test "scanbuild_make" scanbuild_make
run_test "autobuild_generated_files_check" porcelain_check
run_test "autobuild_header_dependency_check" header_dependency_check
run_test "autobuild_make_install" $MAKE $MAKEFLAGS install DESTDIR="$SPDK_WORKSPACE" prefix=/usr
run_test "autobuild_make_uninstall" test_make_uninstall
run_test "autobuild_build_doc" build_doc
}
if [ $SPDK_RUN_VALGRIND -eq 1 ]; then
run_test "valgrind" echo "using valgrind"
fi
if [ $SPDK_RUN_ASAN -eq 1 ]; then
run_test "asan" echo "using asan"
fi
@ -25,46 +269,23 @@ if [ $SPDK_RUN_UBSAN -eq 1 ]; then
fi
if [ -n "$SPDK_TEST_NATIVE_DPDK" ]; then
build_native_dpdk
run_test "build_native_dpdk" build_native_dpdk
fi
case "$SPDK_TEST_AUTOBUILD" in
full)
$rootdir/configure $config_params
echo "** START ** Info for Hostname: $HOSTNAME"
uname -a
$MAKE cc_version
$MAKE cxx_version
echo "** END ** Info for Hostname: $HOSTNAME"
;;
ext | tiny | "") ;;
*)
echo "ERROR: supported values for SPDK_TEST_AUTOBUILD are 'full', 'tiny' and 'ext'"
exit 1
;;
esac
./configure $config_params
echo "** START ** Info for Hostname: $HOSTNAME"
uname -a
$MAKE cc_version
$MAKE cxx_version
echo "** END ** Info for Hostname: $HOSTNAME"
if [[ $SPDK_TEST_OCF -eq 1 ]]; then
ocf_precompile
fi
if [[ $SPDK_TEST_FUZZER -eq 1 ]]; then
llvm_precompile
fi
if [[ -n $SPDK_TEST_AUTOBUILD ]]; then
autobuild_test_suite
elif [[ $SPDK_TEST_UNITTEST -eq 1 ]]; then
unittest_build
elif [[ $SPDK_TEST_SCANBUILD -eq 1 ]]; then
scanbuild_make
if [ "$SPDK_TEST_AUTOBUILD" -eq 1 ]; then
run_test "autobuild" autobuild_test_suite $1
else
if [[ $SPDK_TEST_FUZZER -eq 1 ]]; then
# if we are testing nvmf fuzz with llvm lib, --with-shared will cause lib link fail
$rootdir/configure $config_params
else
# if we aren't testing the unittests, build with shared objects.
$rootdir/configure $config_params --with-shared
if [ "$SPDK_TEST_OCF" -eq 1 ]; then
run_test "autobuild_ocf_precompile" ocf_precompile
fi
# if we aren't testing the unittests, build with shared objects.
./configure $config_params --with-shared
run_test "make" $MAKE $MAKEFLAGS
fi

View File

@ -1,19 +1,25 @@
#!/usr/bin/env bash
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation
# All rights reserved.
#
set -xe
# If the configuration of tests is not provided, no tests will be carried out.
if [[ ! -f $1 ]]; then
echo "ERROR: SPDK test configuration not specified"
exit 1
fi
source "$1"
rootdir=$(readlink -f $(dirname $0))
source "$rootdir/test/common/autobuild_common.sh"
source "$rootdir/test/common/autotest_common.sh"
out=$PWD
MAKEFLAGS=${MAKEFLAGS:--j16}
cd $rootdir
timing_enter porcelain_check
if [[ -e $rootdir/mk/config.mk ]]; then
$MAKE clean
fi
$MAKE clean
if [ $(git status --porcelain --ignore-submodules | wc -l) -ne 0 ]; then
echo make clean left the following files:
@ -22,12 +28,7 @@ if [ $(git status --porcelain --ignore-submodules | wc -l) -ne 0 ]; then
fi
timing_exit porcelain_check
if [[ $SPDK_TEST_RELEASE_BUILD -eq 1 ]]; then
build_packaging
$MAKE clean
fi
if [[ $RUN_NIGHTLY -eq 0 || $SPDK_TEST_UNITTEST -eq 0 ]]; then
if [[ $RUN_NIGHTLY -eq 0 && $SPDK_TEST_RELEASE_BUILD -eq 0 ]]; then
timing_finish
exit 0
fi
@ -36,15 +37,10 @@ timing_enter build_release
config_params="$(get_config_params | sed 's/--enable-debug//g')"
if [ $(uname -s) = Linux ]; then
# LTO needs a special compiler to work under clang. See detect_cc.sh for details.
if [[ $CC == *clang* ]]; then
LD=$(type -P ld.gold)
export LD
fi
$rootdir/configure $config_params --enable-lto
./configure $config_params --enable-lto
else
# LTO needs a special compiler to work on BSD.
$rootdir/configure $config_params
./configure $config_params
fi
$MAKE ${MAKEFLAGS}
$MAKE ${MAKEFLAGS} clean

View File

@ -1,32 +1,21 @@
#!/usr/bin/env bash
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2016 Intel Corporation
# All rights reserved.
#
set -e
rootdir=$(readlink -f $(dirname $0))
default_conf=~/autorun-spdk.conf
conf=${1:-${default_conf}}
conf=~/autorun-spdk.conf
# If the configuration of tests is not provided, no tests will be carried out.
if [[ ! -f $conf ]]; then
echo "ERROR: $conf doesn't exist"
exit 1
fi
source "$conf"
echo "Test configuration:"
cat "$conf"
# Runs agent scripts
$rootdir/autobuild.sh "$conf"
if ((SPDK_TEST_UNITTEST == 1 || SPDK_RUN_FUNCTIONAL_TEST == 1)); then
sudo -E $rootdir/autotest.sh "$conf"
fi
if [[ $SPDK_TEST_AUTOBUILD != 'tiny' ]]; then
$rootdir/autopackage.sh "$conf"
fi
sudo -E $rootdir/autotest.sh "$conf"
$rootdir/autopackage.sh "$conf"

View File

@ -1,41 +1,21 @@
#!/usr/bin/python3
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2017 Intel Corporation.
# All rights reserved.
import shutil
import subprocess
import argparse
import itertools
import os
import sys
import glob
import re
import pandas as pd
def generateTestCompletionTableByTest(output_dir, data_table):
columns_to_group = ['Domain', 'Test', 'Agent']
total_tests_number = len(data_table.groupby('Test'))
has_agent = data_table['Agent'] != 'None'
data_table_with_agent = data_table[has_agent]
executed_tests = len(data_table_with_agent.groupby('Test'))
tests_executions = len(data_table_with_agent.groupby(columns_to_group))
pivot_by_test = pd.pivot_table(data_table, index=columns_to_group)
output_file = os.path.join(output_dir, 'post_process', 'completions_table_by_test.html')
with open(output_file, 'w') as f:
table_row = '<tr><td>{}</td><td>{}</td>\n'
f.write('<table>\n')
f.write(table_row.format('Total number of tests', total_tests_number))
f.write(table_row.format('Tests executed', executed_tests))
f.write(table_row.format('Number of test executions', tests_executions))
f.write('</table>\n')
f.write(pivot_by_test.to_html(None))
def highest_value(inp):
ret_value = False
for x in inp:
if x:
return True
else:
return False
def generateTestCompletionTables(output_dir, completion_table):
@ -45,12 +25,11 @@ def generateTestCompletionTables(output_dir, completion_table):
pivot_by_agent = pd.pivot_table(data_table, index=["Agent", "Domain", "Test"])
pivot_by_agent.to_html(os.path.join(output_dir, "post_process", 'completions_table_by_agent.html'))
generateTestCompletionTableByTest(output_dir, data_table)
pivot_by_asan = pd.pivot_table(data_table, index=["Domain", "Test"], values=["With Asan"], aggfunc=any)
pivot_by_test = pd.pivot_table(data_table, index=["Domain", "Test", "Agent"])
pivot_by_test.to_html(os.path.join(output_dir, "post_process", 'completions_table_by_test.html'))
pivot_by_asan = pd.pivot_table(data_table, index=["Domain", "Test"], values=["With Asan"], aggfunc=highest_value)
pivot_by_asan.to_html(os.path.join(output_dir, "post_process", 'completions_table_by_asan.html'))
pivot_by_ubsan = pd.pivot_table(data_table, index=["Domain", "Test"], values=["With UBsan"], aggfunc=any)
pivot_by_ubsan = pd.pivot_table(data_table, index=["Domain", "Test"], values=["With UBsan"], aggfunc=highest_value)
pivot_by_ubsan.to_html(os.path.join(output_dir, "post_process", 'completions_table_by_ubsan.html'))
@ -62,39 +41,34 @@ def generateCoverageReport(output_dir, repo_dir):
if len(covfiles) == 0:
return
lcov_opts = [
'--rc', 'lcov_branch_coverage=1',
'--rc', 'lcov_function_coverage=1',
'--rc', 'genhtml_branch_coverage=1',
'--rc', 'genhtml_function_coverage=1',
'--rc', 'genhtml_legend=1',
'--rc', 'geninfo_all_blocks=1',
'--rc lcov_branch_coverage=1',
'--rc lcov_function_coverage=1',
'--rc genhtml_branch_coverage=1',
'--rc genhtml_function_coverage=1',
'--rc genhtml_legend=1',
'--rc geninfo_all_blocks=1',
]
# HACK: This is a workaround for some odd CI assumptions
details = '--show-details'
cov_total = os.path.abspath(os.path.join(output_dir, 'cov_total.info'))
coverage = os.path.join(output_dir, 'coverage')
lcov = ['lcov', *lcov_opts, '-q', *itertools.chain(*[('-a', f) for f in covfiles]), '-o', cov_total]
genhtml = ['genhtml', *lcov_opts, '-q', cov_total, '--legend', '-t', 'Combined', *details.split(), '-o', coverage]
lcov = 'lcov' + ' ' + ' '.join(lcov_opts) + ' -q -a ' + ' -a '.join(covfiles) + ' -o ' + cov_total
genhtml = 'genhtml' + ' ' + ' '.join(lcov_opts) + ' -q ' + cov_total + ' --legend' + ' -t "Combined" --show-details -o ' + coverage
try:
subprocess.check_call(lcov)
subprocess.check_call([lcov], shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
except subprocess.CalledProcessError as e:
print("lcov failed")
print(e)
return
with open(cov_total, 'r') as cov_total_file:
file_contents = cov_total_file.readlines()
cov_total_file = open(cov_total, 'r')
replacement = "SF:" + repo_dir
file_contents = cov_total_file.readlines()
cov_total_file.close()
os.remove(cov_total)
with open(cov_total, 'w+') as file:
for Line in file_contents:
Line = re.sub("^SF:.*/repo", replacement, Line)
file.write(Line + '\n')
try:
subprocess.check_call(genhtml)
subprocess.check_call([genhtml], shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
except subprocess.CalledProcessError as e:
print("genhtml failed")
print(e)
@ -155,7 +129,7 @@ def getSkippedTests(repo_dir):
skipped_test_file = os.path.join(repo_dir, "test", "common", "skipped_tests.txt")
if not os.path.exists(skipped_test_file):
return []
else:
with open(skipped_test_file, "r") as skipped_test_data:
return [x.strip() for x in skipped_test_data.readlines() if "#" not in x and x.strip() != '']
@ -166,7 +140,7 @@ def confirmPerPatchTests(test_list, skiplist):
if len(missing_tests) > 0:
print("Not all tests were run. Failing the build.")
print(missing_tests)
sys.exit(1)
exit(1)
def aggregateCompletedTests(output_dir, repo_dir, skip_confirm=False):
@ -200,8 +174,6 @@ def aggregateCompletedTests(output_dir, repo_dir, skip_confirm=False):
if not skip_confirm:
confirmPerPatchTests(test_list, skipped_tests)
return 0
def main(output_dir, repo_dir, skip_confirm=False):
print("-----Begin Post Process Script------")

View File

@ -1,8 +1,4 @@
#!/usr/bin/env bash
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation
# All rights reserved.
#
rootdir=$(readlink -f $(dirname $0))
@ -13,6 +9,9 @@ if [[ ! -f $1 ]]; then
exit 1
fi
# always test with SPDK shared objects.
export SPDK_LIB_DIR="$rootdir/build/lib"
# Autotest.sh, as part of autorun.sh, runs in a different
# shell process than autobuild.sh. Use helper file to pass
# over env variable containing libraries paths.
@ -31,13 +30,13 @@ fi
if [ $(uname -s) = Linux ]; then
old_core_pattern=$(< /proc/sys/kernel/core_pattern)
mkdir -p "$output_dir/coredumps"
# Set core_pattern to a known value to avoid ABRT, systemd-coredump, etc.
# Dump the $output_dir path to a file so collector can pick it up while executing.
# We don't set in in the core_pattern command line because of the string length limitation
# of 128 bytes. See 'man core 5' for details.
echo "|$rootdir/scripts/core-collector.sh %P %s %t" > /proc/sys/kernel/core_pattern
echo "$output_dir/coredumps" > "$rootdir/.coredump_path"
# set core_pattern to a known value to avoid ABRT, systemd-coredump, etc.
echo "core" > /proc/sys/kernel/core_pattern
# Make sure that the hugepage state for our VM is fresh so we don't fail
# hugepage allocation. Allow time for this action to complete.
echo 1 > /proc/sys/vm/drop_caches
sleep 3
# make sure nbd (network block device) driver is loaded if it is available
# this ensures that when tests need to use nbd, it will be fully initialized
@ -49,7 +48,7 @@ if [ $(uname -s) = Linux ]; then
fi
fi
trap "autotest_cleanup || :; exit 1" SIGINT SIGTERM EXIT
trap "process_core; autotest_cleanup; exit 1" SIGINT SIGTERM EXIT
timing_enter autotest
@ -59,14 +58,13 @@ src=$(readlink -f $(dirname $0))
out=$output_dir
cd $src
freebsd_update_contigmem_mod
freebsd_set_maxsock_buf
./scripts/setup.sh status
# lcov takes considerable time to process clang coverage.
# Disabling lcov allow us to do this.
# More information: https://github.com/spdk/spdk/issues/1693
CC_TYPE=$(grep CC_TYPE mk/cc.mk)
if hash lcov && ! [[ "$CC_TYPE" == *"clang"* ]]; then
freebsd_update_contigmem_mod
if hash lcov; then
# setup output dir for unittest.sh
export UT_COVERAGE=$out/ut_coverage
export LCOV_OPTS="
--rc lcov_branch_coverage=1
--rc lcov_function_coverage=1
@ -83,59 +81,81 @@ if hash lcov && ! [[ "$CC_TYPE" == *"clang"* ]]; then
fi
# Make sure the disks are clean (no leftover partition tables)
timing_enter pre_cleanup
timing_enter cleanup
# Remove old domain socket pathname just in case
rm -f /var/tmp/spdk*.sock
# Load the kernel driver
$rootdir/scripts/setup.sh reset
get_zoned_devs
if ((${#zoned_devs[@]} > 0)); then
# FIXME: For now make sure zoned devices are tested on-demand by
# a designated tests instead of falling into any other. The main
# concern here are fio workloads where specific configuration
# must be in place for it to work with the zoned device.
export PCI_BLOCKED="${zoned_devs[*]}"
export PCI_ZONED="${zoned_devs[*]}"
fi
# Delete all leftover lvols and gpt partitions
# Matches both /dev/nvmeXnY on Linux and /dev/nvmeXnsY on BSD
# Filter out nvme with partitions - the "p*" suffix
for dev in $(ls /dev/nvme*n* | grep -v p || true); do
# Skip zoned devices as non-sequential IO will always fail
[[ -z ${zoned_devs["${dev##*/}"]} ]] || continue
if ! block_in_use "$dev"; then
dd if=/dev/zero of="$dev" bs=1M count=1
fi
done
sync
if ! xtrace_disable_per_cmd reap_spdk_processes; then
echo "WARNING: Lingering SPDK processes were detected. Testing environment may be unstable" >&2
fi
./scripts/setup.sh reset
if [ $(uname -s) = Linux ]; then
run_test "setup.sh" "$rootdir/test/setup/test-setup.sh"
fi
# OCSSD devices drivers don't support IO issues by kernel so
# detect OCSSD devices and blacklist them (unbind from any driver).
# If test scripts want to use this device it needs to do this explicitly.
#
# If some OCSSD device is bound to other driver than nvme we won't be able to
# discover if it is OCSSD or not so load the kernel driver first.
$rootdir/scripts/setup.sh status
while IFS= read -r -d '' dev; do
# Send Open Channel 2.0 Geometry opcode "0xe2" - not supported by NVMe device.
if nvme admin-passthru $dev --namespace-id=1 --data-len=4096 --opcode=0xe2 --read > /dev/null; then
bdf="$(basename $(readlink -e /sys/class/nvme/${dev#/dev/}/device))"
echo "INFO: blacklisting OCSSD device: $dev ($bdf)"
PCI_BLACKLIST+=" $bdf"
OCSSD_PCI_DEVICES+=" $bdf"
fi
done < <(find /dev -maxdepth 1 -regex '/dev/nvme[0-9]+' -print0)
export OCSSD_PCI_DEVICES
# Now, bind blacklisted devices to pci-stub module. This will prevent
# automatic grabbing these devices when we add device/vendor ID to
# proper driver.
if [[ -n "$PCI_BLACKLIST" ]]; then
# shellcheck disable=SC2097,SC2098
PCI_WHITELIST="$PCI_BLACKLIST" \
PCI_BLACKLIST="" \
DRIVER_OVERRIDE="pci-stub" \
./scripts/setup.sh
# Export our blacklist so it will take effect during next setup.sh
export PCI_BLACKLIST
fi
fi
if [[ $(uname -s) == Linux ]]; then
# Revert NVMe namespaces to default state
nvme_namespace_revert
fi
timing_exit pre_cleanup
# Delete all leftover lvols and gpt partitions
# Matches both /dev/nvmeXnY on Linux and /dev/nvmeXnsY on BSD
# Filter out nvme with partitions - the "p*" suffix
for dev in $(ls /dev/nvme*n* | grep -v p || true); do
dd if=/dev/zero of="$dev" bs=1M count=1
done
sync
timing_exit cleanup
# set up huge pages
timing_enter afterboot
$rootdir/scripts/setup.sh
./scripts/setup.sh
timing_exit afterboot
timing_enter nvmf_setup
rdma_device_init
timing_exit nvmf_setup
if [[ $SPDK_TEST_CRYPTO -eq 1 || $SPDK_TEST_REDUCE -eq 1 ]]; then
if grep -q '#define SPDK_CONFIG_IGB_UIO_DRIVER 1' $rootdir/include/spdk/config.h; then
./scripts/qat_setup.sh igb_uio
else
./scripts/qat_setup.sh
fi
fi
# Revert existing OPAL to factory settings that may have been left from earlier failed tests.
# This ensures we won't hit any unexpected failures due to NVMe SSDs being locked.
opal_revert_cleanup
@ -145,142 +165,88 @@ opal_revert_cleanup
#####################
if [ $SPDK_TEST_UNITTEST -eq 1 ]; then
run_test "unittest" $rootdir/test/unit/unittest.sh
run_test "unittest" ./test/unit/unittest.sh
run_test "env" test/env/env.sh
fi
if [ $SPDK_RUN_FUNCTIONAL_TEST -eq 1 ]; then
if [[ $SPDK_TEST_CRYPTO -eq 1 || $SPDK_TEST_VBDEV_COMPRESS -eq 1 ]]; then
if [[ $SPDK_TEST_USE_IGB_UIO -eq 1 ]]; then
$rootdir/scripts/qat_setup.sh igb_uio
else
$rootdir/scripts/qat_setup.sh
fi
fi
timing_enter lib
run_test "env" $rootdir/test/env/env.sh
run_test "rpc" $rootdir/test/rpc/rpc.sh
run_test "rpc_client" $rootdir/test/rpc_client/rpc_client.sh
run_test "json_config" $rootdir/test/json_config/json_config.sh
run_test "json_config_extra_key" $rootdir/test/json_config/json_config_extra_key.sh
run_test "alias_rpc" $rootdir/test/json_config/alias_rpc/alias_rpc.sh
run_test "spdkcli_tcp" $rootdir/test/spdkcli/tcp.sh
run_test "dpdk_mem_utility" $rootdir/test/dpdk_memory_utility/test_dpdk_mem_info.sh
run_test "event" $rootdir/test/event/event.sh
run_test "thread" $rootdir/test/thread/thread.sh
run_test "accel" $rootdir/test/accel/accel.sh
run_test "app_cmdline" $rootdir/test/app/cmdline.sh
run_test "rpc" test/rpc/rpc.sh
run_test "rpc_client" test/rpc_client/rpc_client.sh
run_test "json_config" ./test/json_config/json_config.sh
run_test "alias_rpc" test/json_config/alias_rpc/alias_rpc.sh
run_test "spdkcli_tcp" test/spdkcli/tcp.sh
run_test "dpdk_mem_utility" test/dpdk_memory_utility/test_dpdk_mem_info.sh
run_test "event" test/event/event.sh
if [ $SPDK_TEST_BLOCKDEV -eq 1 ]; then
run_test "blockdev_general" $rootdir/test/bdev/blockdev.sh
run_test "bdev_raid" $rootdir/test/bdev/bdev_raid.sh
run_test "bdevperf_config" $rootdir/test/bdev/bdevperf/test_config.sh
run_test "blockdev_general" test/bdev/blockdev.sh
run_test "bdev_raid" test/bdev/bdev_raid.sh
run_test "bdevperf_config" test/bdev/bdevperf/test_config.sh
if [[ $(uname -s) == Linux ]]; then
run_test "reactor_set_interrupt" $rootdir/test/interrupt/reactor_set_interrupt.sh
run_test "reap_unregistered_poller" $rootdir/test/interrupt/reap_unregistered_poller.sh
run_test "spdk_dd" test/dd/dd.sh
fi
fi
if [[ $(uname -s) == Linux ]]; then
if [[ $SPDK_TEST_BLOCKDEV -eq 1 || $SPDK_TEST_URING -eq 1 ]]; then
# The crypto job also includes the SPDK_TEST_BLOCKDEV in its configuration hence the
# dd tests are executed there as well. However, these tests can take a significant
# amount of time to complete (up to 4min) on a physical system leading to a potential
# job timeout. Avoid that by skipping these tests - this should not affect the coverage
# since dd tests are still run as part of the vg jobs.
if [[ $SPDK_TEST_CRYPTO -eq 0 ]]; then
run_test "spdk_dd" $rootdir/test/dd/dd.sh
fi
fi
if [ $SPDK_TEST_JSON -eq 1 ]; then
run_test "test_converter" test/config_converter/test_converter.sh
fi
if [ $SPDK_TEST_NVME -eq 1 ]; then
run_test "blockdev_nvme" $rootdir/test/bdev/blockdev.sh "nvme"
if [[ $(uname -s) == Linux ]]; then
run_test "blockdev_nvme_gpt" $rootdir/test/bdev/blockdev.sh "gpt"
fi
run_test "nvme" $rootdir/test/nvme/nvme.sh
if [[ $SPDK_TEST_NVME_PMR -eq 1 ]]; then
run_test "nvme_pmr" $rootdir/test/nvme/nvme_pmr.sh
fi
if [[ $SPDK_TEST_NVME_SCC -eq 1 ]]; then
run_test "nvme_scc" $rootdir/test/nvme/nvme_scc.sh
fi
if [[ $SPDK_TEST_NVME_BP -eq 1 ]]; then
run_test "nvme_bp" $rootdir/test/nvme/nvme_bp.sh
run_test "blockdev_nvme" test/bdev/blockdev.sh "nvme"
run_test "blockdev_nvme_gpt" test/bdev/blockdev.sh "gpt"
run_test "nvme" test/nvme/nvme.sh
if [[ $SPDK_TEST_NVME_CLI -eq 1 ]]; then
run_test "nvme_cli" test/nvme/spdk_nvme_cli.sh
fi
if [[ $SPDK_TEST_NVME_CUSE -eq 1 ]]; then
run_test "nvme_cuse" $rootdir/test/nvme/cuse/nvme_cuse.sh
run_test "nvme_cuse" test/nvme/cuse/nvme_cuse.sh
fi
if [[ $SPDK_TEST_NVME_CMB -eq 1 ]]; then
run_test "nvme_cmb" $rootdir/test/nvme/cmb/cmb.sh
fi
if [[ $SPDK_TEST_NVME_FDP -eq 1 ]]; then
run_test "nvme_fdp" test/nvme/nvme_fdp.sh
fi
if [[ $SPDK_TEST_NVME_ZNS -eq 1 ]]; then
run_test "nvme_zns" $rootdir/test/nvme/zns/zns.sh
fi
run_test "nvme_rpc" $rootdir/test/nvme/nvme_rpc.sh
run_test "nvme_rpc_timeouts" $rootdir/test/nvme/nvme_rpc_timeouts.sh
run_test "nvme_rpc" test/nvme/nvme_rpc.sh
# Only test hotplug without ASAN enabled. Since if it is
# enabled, it catches SEGV earlier than our handler which
# breaks the hotplug logic.
if [ $SPDK_RUN_ASAN -eq 0 ] && [ $(uname -s) = Linux ]; then
run_test "sw_hotplug" $rootdir/test/nvme/sw_hotplug.sh
fi
if [[ $SPDK_TEST_XNVME -eq 1 ]]; then
run_test "nvme_xnvme" $rootdir/test/nvme/xnvme/xnvme.sh
run_test "blockdev_xnvme" $rootdir/test/bdev/blockdev.sh "xnvme"
# Run ublk with xnvme since they have similar kernel dependencies
run_test "ublk" $rootdir/test/ublk/ublk.sh
if [ $SPDK_RUN_ASAN -eq 0 ]; then
run_test "nvme_hotplug" test/nvme/hotplug.sh root
fi
fi
if [ $SPDK_TEST_IOAT -eq 1 ]; then
run_test "ioat" $rootdir/test/ioat/ioat.sh
run_test "ioat" test/ioat/ioat.sh
fi
timing_exit lib
if [ $SPDK_TEST_ISCSI -eq 1 ]; then
run_test "iscsi_tgt" $rootdir/test/iscsi_tgt/iscsi_tgt.sh
run_test "spdkcli_iscsi" $rootdir/test/spdkcli/iscsi.sh
run_test "iscsi_tgt" ./test/iscsi_tgt/iscsi_tgt.sh
run_test "spdkcli_iscsi" ./test/spdkcli/iscsi.sh
# Run raid spdkcli test under iSCSI since blockdev tests run on systems that can't run spdkcli yet
run_test "spdkcli_raid" $rootdir/test/spdkcli/raid.sh
run_test "spdkcli_raid" test/spdkcli/raid.sh
fi
if [ $SPDK_TEST_BLOBFS -eq 1 ]; then
run_test "rocksdb" $rootdir/test/blobfs/rocksdb/rocksdb.sh
run_test "blobstore" $rootdir/test/blobstore/blobstore.sh
run_test "blobstore_grow" $rootdir/test/blobstore/blobstore_grow/blobstore_grow.sh
run_test "blobfs" $rootdir/test/blobfs/blobfs.sh
run_test "rocksdb" ./test/blobfs/rocksdb/rocksdb.sh
run_test "blobstore" ./test/blobstore/blobstore.sh
run_test "blobfs" ./test/blobfs/blobfs.sh
run_test "hello_blob" $SPDK_EXAMPLE_DIR/hello_blob \
examples/blob/hello_world/hello_blob.json
fi
if [ $SPDK_TEST_NVMF -eq 1 ]; then
export NET_TYPE
# The NVMe-oF run test cases are split out like this so that the parser that compiles the
# list of all tests can properly differentiate them. Please do not merge them into one line.
if [ "$SPDK_TEST_NVMF_TRANSPORT" = "rdma" ]; then
run_test "nvmf_rdma" $rootdir/test/nvmf/nvmf.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
run_test "spdkcli_nvmf_rdma" $rootdir/test/spdkcli/nvmf.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
run_test "nvmf_rdma" ./test/nvmf/nvmf.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
run_test "spdkcli_nvmf_rdma" ./test/spdkcli/nvmf.sh
elif [ "$SPDK_TEST_NVMF_TRANSPORT" = "tcp" ]; then
run_test "nvmf_tcp" $rootdir/test/nvmf/nvmf.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
if [[ $SPDK_TEST_URING -eq 0 ]]; then
run_test "spdkcli_nvmf_tcp" $rootdir/test/spdkcli/nvmf.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
run_test "nvmf_identify_passthru" $rootdir/test/nvmf/target/identify_passthru.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
fi
run_test "nvmf_dif" $rootdir/test/nvmf/target/dif.sh
run_test "nvmf_abort_qd_sizes" $rootdir/test/nvmf/target/abort_qd_sizes.sh
run_test "nvmf_tcp" ./test/nvmf/nvmf.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
run_test "spdkcli_nvmf_tcp" ./test/spdkcli/nvmf.sh
run_test "nvmf_identify_passthru" test/nvmf/target/identify_passthru.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
elif [ "$SPDK_TEST_NVMF_TRANSPORT" = "fc" ]; then
run_test "nvmf_fc" $rootdir/test/nvmf/nvmf.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
run_test "spdkcli_nvmf_fc" $rootdir/test/spdkcli/nvmf.sh
run_test "nvmf_fc" ./test/nvmf/nvmf.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
run_test "spdkcli_nvmf_fc" ./test/spdkcli/nvmf.sh
else
echo "unknown NVMe transport, please specify rdma, tcp, or fc."
exit 1
@ -288,98 +254,82 @@ if [ $SPDK_RUN_FUNCTIONAL_TEST -eq 1 ]; then
fi
if [ $SPDK_TEST_VHOST -eq 1 ]; then
run_test "vhost" $rootdir/test/vhost/vhost.sh
fi
if [ $SPDK_TEST_VFIOUSER_QEMU -eq 1 ]; then
run_test "vfio_user_qemu" $rootdir/test/vfio_user/vfio_user.sh
run_test "vhost" ./test/vhost/vhost.sh
fi
if [ $SPDK_TEST_LVOL -eq 1 ]; then
run_test "lvol" $rootdir/test/lvol/lvol.sh
run_test "blob_io_wait" $rootdir/test/blobstore/blob_io_wait/blob_io_wait.sh
run_test "lvol" ./test/lvol/lvol.sh
run_test "blob_io_wait" ./test/blobstore/blob_io_wait/blob_io_wait.sh
fi
if [ $SPDK_TEST_VHOST_INIT -eq 1 ]; then
timing_enter vhost_initiator
run_test "vhost_blockdev" $rootdir/test/vhost/initiator/blockdev.sh
run_test "spdkcli_virtio" $rootdir/test/spdkcli/virtio.sh
run_test "vhost_shared" $rootdir/test/vhost/shared/shared.sh
run_test "vhost_fuzz" $rootdir/test/vhost/fuzz/fuzz.sh
run_test "vhost_blockdev" ./test/vhost/initiator/blockdev.sh
run_test "spdkcli_virtio" ./test/spdkcli/virtio.sh
run_test "vhost_shared" ./test/vhost/shared/shared.sh
run_test "vhost_fuzz" ./test/vhost/fuzz/fuzz.sh
timing_exit vhost_initiator
fi
if [ $SPDK_TEST_PMDK -eq 1 ]; then
run_test "blockdev_pmem" ./test/bdev/blockdev.sh "pmem"
run_test "pmem" ./test/pmem/pmem.sh -x
run_test "spdkcli_pmem" ./test/spdkcli/pmem.sh
fi
if [ $SPDK_TEST_RBD -eq 1 ]; then
run_test "blockdev_rbd" $rootdir/test/bdev/blockdev.sh "rbd"
run_test "spdkcli_rbd" $rootdir/test/spdkcli/rbd.sh
run_test "blockdev_rbd" ./test/bdev/blockdev.sh "rbd"
run_test "spdkcli_rbd" ./test/spdkcli/rbd.sh
fi
if [ $SPDK_TEST_OCF -eq 1 ]; then
run_test "ocf" $rootdir/test/ocf/ocf.sh
run_test "ocf" ./test/ocf/ocf.sh
fi
if [ $SPDK_TEST_FTL -eq 1 ]; then
run_test "ftl" $rootdir/test/ftl/ftl.sh
run_test "ftl" ./test/ftl/ftl.sh
fi
if [ $SPDK_TEST_VMD -eq 1 ]; then
run_test "vmd" $rootdir/test/vmd/vmd.sh
run_test "vmd" ./test/vmd/vmd.sh
fi
if [ $SPDK_TEST_VBDEV_COMPRESS -eq 1 ]; then
run_test "compress_compdev" $rootdir/test/compress/compress.sh "compdev"
run_test "compress_isal" $rootdir/test/compress/compress.sh "isal"
if [ $SPDK_TEST_REDUCE -eq 1 ]; then
run_test "compress_qat" ./test/compress/compress.sh "qat"
run_test "compress_isal" ./test/compress/compress.sh "isal"
fi
if [ $SPDK_TEST_OPAL -eq 1 ]; then
run_test "nvme_opal" $rootdir/test/nvme/nvme_opal.sh
run_test "nvme_opal" ./test/nvme/nvme_opal.sh
fi
if [ $SPDK_TEST_CRYPTO -eq 1 ]; then
run_test "blockdev_crypto_aesni" $rootdir/test/bdev/blockdev.sh "crypto_aesni"
run_test "blockdev_crypto_sw" $rootdir/test/bdev/blockdev.sh "crypto_sw"
run_test "blockdev_crypto_qat" $rootdir/test/bdev/blockdev.sh "crypto_qat"
run_test "chaining" $rootdir/test/bdev/chaining.sh
run_test "blockdev_crypto_aesni" ./test/bdev/blockdev.sh "crypto_aesni"
# Proceed with the test only if QAT devices are in place
if [[ $(lspci -d:37c8) ]]; then
run_test "blockdev_crypto_qat" ./test/bdev/blockdev.sh "crypto_qat"
fi
if [[ $SPDK_TEST_SCHEDULER -eq 1 ]]; then
run_test "scheduler" $rootdir/test/scheduler/scheduler.sh
fi
if [[ $SPDK_TEST_SMA -eq 1 ]]; then
run_test "sma" $rootdir/test/sma/sma.sh
fi
if [[ $SPDK_TEST_FUZZER -eq 1 ]]; then
run_test "llvm_fuzz" $rootdir/test/fuzz/llvm.sh
fi
if [[ $SPDK_TEST_RAID5 -eq 1 ]]; then
run_test "blockdev_raid5f" $rootdir/test/bdev/blockdev.sh "raid5f"
fi
fi
trap - SIGINT SIGTERM EXIT
timing_enter post_cleanup
timing_enter cleanup
autotest_cleanup
timing_exit post_cleanup
timing_exit cleanup
timing_exit autotest
chmod a+r $output_dir/timing.txt
[[ -f "$output_dir/udev.log" ]] && rm -f "$output_dir/udev.log"
trap - SIGINT SIGTERM EXIT
if hash lcov && ! [[ "$CC_TYPE" == *"clang"* ]]; then
# catch any stray core files
process_core
if hash lcov; then
# generate coverage data and combine with baseline
$LCOV -q -c -d $src -t "$(hostname)" -o $out/cov_test.info
$LCOV -q -a $out/cov_base.info -a $out/cov_test.info -o $out/cov_total.info
$LCOV -q -r $out/cov_total.info '*/dpdk/*' -o $out/cov_total.info
$LCOV -q -r $out/cov_total.info '/usr/*' -o $out/cov_total.info
$LCOV -q -r $out/cov_total.info '*/examples/vmd/*' -o $out/cov_total.info
$LCOV -q -r $out/cov_total.info '*/app/spdk_lspci/*' -o $out/cov_total.info
$LCOV -q -r $out/cov_total.info '*/app/spdk_top/*' -o $out/cov_total.info
owner=$(stat -c "%U" .)
sudo -u $owner git clean -f "*.gcda"
git clean -f "*.gcda"
rm -f cov_base.info cov_test.info OLD_STDOUT OLD_STDERR
fi

1
build/lib/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
# Placeholder

973
configure vendored

File diff suppressed because it is too large Load Diff

View File

@ -1,62 +0,0 @@
# Deprecation
## ABI and API Deprecation
This document details the policy for maintaining stability of SPDK ABI and API.
Major ABI version can change at most once for each quarterly SPDK release.
ABI versions are managed separately for each library and follow [Semantic Versioning](https://semver.org/).
API and ABI deprecation notices shall be posted in the next section.
Each entry must describe what will be removed and can suggest the future use or alternative.
Specific future SPDK release for the removal must be provided.
ABI cannot be removed without providing deprecation notice for at least single SPDK release.
Deprecated code paths must be registered with `SPDK_DEPRECATION_REGISTER()` and logged with
`SPDK_LOG_DEPRECATED()`. The tag used with these macros will appear in the SPDK
log at the warn level when `SPDK_LOG_DEPRECATED()` is called, subject to rate limits.
The tags can be matched with the level 4 headers below.
## Deprecation Notices
### PMDK
PMDK is no longer supported and integrations with it in SPDK are now deprecated, and will be removed in SPDK 23.05.
Please see: [UPDATE ON PMDK AND OUR LONG TERM SUPPORT STRATEGY](https://pmem.io/blog/2022/11/update-on-pmdk-and-our-long-term-support-strategy/).
### VTune
#### `vtune_support`
VTune integration is in now deprecated and will be removed in SPDK 23.05.
### nvmf
#### `spdk_nvmf_qpair_disconnect`
Parameters `cb_fn` and `ctx` of `spdk_nvmf_qpair_disconnect` API are deprecated. These parameters
will be removed in 23.09 release.
### gpt
#### `old_gpt_guid`
Deprecated the SPDK partition type GUID `7c5222bd-8f5d-4087-9c00-bf9843c7b58c`. Partitions of this
type have bdevs created that are one block less than the actual size of the partition. Existing
partitions using the deprecated GUID can continue to use that GUID; support for the deprecated GUID
will remain in SPDK indefinitely, and will continue to exhibit the off-by-one bug so that on-disk
metadata layouts based on the incorrect size are not affected.
See GitHub issue [2801](https://github.com/spdk/spdk/issues/2801) for additional details on the bug.
New SPDK partition types should use GUID `6527994e-2c5a-4eec-9613-8f5944074e8b` which will create
a bdev of the correct size.
### lvol
#### `vbdev_lvol_rpc_req_size`
Param `size` in rpc commands `rpc_bdev_lvol_create` and `rpc_bdev_lvol_resize` is deprecated and
replace by `size_in_mib`.
See GitHub issue [2346](https://github.com/spdk/spdk/issues/2346) for additional details.

3
doc/.gitignore vendored
View File

@ -1,4 +1,3 @@
# changelog.md and deprecation.md is generated by Makefile
# changelog.md is generated by Makefile
changelog.md
deprecation.md
output/

View File

@ -234,7 +234,7 @@ ALIASES =
# A mapping has the form "name=value". For example adding "class=itcl::class"
# will allow you to use the command class in the itcl::class meaning.
# TCL_SUBST =
TCL_SUBST =
# Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C sources
# only. Doxygen will then generate output that is more tailored for C. For
@ -746,7 +746,7 @@ WARN_IF_DOC_ERROR = YES
# parameter documentation, but not about the absence of documentation.
# The default value is: NO.
WARN_NO_PARAMDOC = YES
WARN_NO_PARAMDOC = NO
# If the WARN_AS_ERROR tag is set to YES then doxygen will immediately stop when
# a warning is encountered.
@ -795,7 +795,6 @@ INPUT += \
misc.md \
driver_modules.md \
tools.md \
ci_tools.md \
performance_reports.md \
# All remaining pages are listed here in alphabetical order by filename.
@ -813,8 +812,6 @@ INPUT += \
compression.md \
concurrency.md \
containers.md \
deprecation.md \
distributions.md \
event.md \
ftl.md \
gdb_macros.md \
@ -829,26 +826,18 @@ INPUT += \
memory.md \
notify.md \
nvme.md \
nvme_multipath.md \
nvme-cli.md \
nvme_spec.md \
nvmf.md \
nvmf_tgt_pg.md \
nvmf_tracing.md \
nvmf_multipath_howto.md \
overview.md \
peer_2_peer.md \
pkgconfig.md \
porting.md \
rpm.md \
scheduler.md \
shfmt.md \
sma.md \
spdkcli.md \
spdk_top.md \
ssd_internals.md \
system_configuration.md \
ublk.md \
usdt.md \
userspace.md \
vagrant.md \
vhost.md \
@ -1111,7 +1100,7 @@ ALPHABETICAL_INDEX = YES
# Minimum value: 1, maximum value: 20, default value: 5.
# This tag requires that the tag ALPHABETICAL_INDEX is set to YES.
# COLS_IN_ALPHA_INDEX = 5
COLS_IN_ALPHA_INDEX = 5
# In case all classes in a project start with a common prefix, all classes will
# be put under the same header in the alphabetical index. The IGNORE_PREFIX tag
@ -1247,7 +1236,7 @@ HTML_COLORSTYLE_GAMMA = 80
# The default value is: NO.
# This tag requires that the tag GENERATE_HTML is set to YES.
HTML_TIMESTAMP = NO
HTML_TIMESTAMP = YES
# If the HTML_DYNAMIC_SECTIONS tag is set to YES then the generated HTML
# documentation will contain sections that can be hidden and shown after the
@ -1519,6 +1508,17 @@ EXT_LINKS_IN_WINDOW = NO
FORMULA_FONTSIZE = 10
# Use the FORMULA_TRANPARENT tag to determine whether or not the images
# generated for formulas are transparent PNGs. Transparent PNGs are not
# supported properly for IE 6.0, but are supported on all modern browsers.
#
# Note that when changing this option you need to delete any form_*.png files in
# the HTML output directory before the changes have effect.
# The default value is: YES.
# This tag requires that the tag GENERATE_HTML is set to YES.
FORMULA_TRANSPARENT = YES
# Enable the USE_MATHJAX option to render LaTeX formulas using MathJax (see
# http://www.mathjax.org) which uses client side Javascript for the rendering
# instead of using pre-rendered bitmaps. Use this if you do not have LaTeX
@ -1661,7 +1661,7 @@ EXTRA_SEARCH_MAPPINGS =
# If the GENERATE_LATEX tag is set to YES, doxygen will generate LaTeX output.
# The default value is: YES.
GENERATE_LATEX = NO
GENERATE_LATEX = YES
# The LATEX_OUTPUT tag is used to specify where the LaTeX docs will be put. If a
# relative path is entered the value of OUTPUT_DIRECTORY will be put in front of
@ -1797,6 +1797,16 @@ LATEX_BATCHMODE = YES
LATEX_HIDE_INDICES = NO
# If the LATEX_SOURCE_CODE tag is set to YES then doxygen will include source
# code with syntax highlighting in the LaTeX output.
#
# Note that which sources are shown also depends on other settings such as
# SOURCE_BROWSER.
# The default value is: NO.
# This tag requires that the tag GENERATE_LATEX is set to YES.
LATEX_SOURCE_CODE = NO
# The LATEX_BIB_STYLE tag can be used to specify the style to use for the
# bibliography, e.g. plainnat, or ieeetr. See
# http://en.wikipedia.org/wiki/BibTeX and \cite for more info.
@ -1869,6 +1879,16 @@ RTF_STYLESHEET_FILE =
RTF_EXTENSIONS_FILE =
# If the RTF_SOURCE_CODE tag is set to YES then doxygen will include source code
# with syntax highlighting in the RTF output.
#
# Note that which sources are shown also depends on other settings such as
# SOURCE_BROWSER.
# The default value is: NO.
# This tag requires that the tag GENERATE_RTF is set to YES.
RTF_SOURCE_CODE = NO
#---------------------------------------------------------------------------
# Configuration options related to the man page output
#---------------------------------------------------------------------------
@ -1958,6 +1978,15 @@ GENERATE_DOCBOOK = NO
DOCBOOK_OUTPUT = docbook
# If the DOCBOOK_PROGRAMLISTING tag is set to YES, doxygen will include the
# program listings (including syntax highlighting and cross-referencing
# information) to the DOCBOOK output. Note that enabling this will significantly
# increase the size of the DOCBOOK output.
# The default value is: NO.
# This tag requires that the tag GENERATE_DOCBOOK is set to YES.
DOCBOOK_PROGRAMLISTING = NO
#---------------------------------------------------------------------------
# Configuration options for the AutoGen Definitions output
#---------------------------------------------------------------------------
@ -2136,12 +2165,21 @@ EXTERNAL_PAGES = YES
# interpreter (i.e. the result of 'which perl').
# The default file (with absolute path) is: /usr/bin/perl.
# PERL_PATH = /usr/bin/perl
PERL_PATH = /usr/bin/perl
#---------------------------------------------------------------------------
# Configuration options related to the dot tool
#---------------------------------------------------------------------------
# If the CLASS_DIAGRAMS tag is set to YES, doxygen will generate a class diagram
# (in HTML and LaTeX) for classes with base or super classes. Setting the tag to
# NO turns the diagrams off. Note that this option also works with HAVE_DOT
# disabled, but it is recommended to install and use dot, since it yields more
# powerful graphs.
# The default value is: YES.
CLASS_DIAGRAMS = YES
# You can define message sequence charts within doxygen comments using the \msc
# command. Doxygen will then run the mscgen tool (see:
# http://www.mcternan.me.uk/mscgen/)) to produce the chart and insert it in the
@ -2149,7 +2187,7 @@ EXTERNAL_PAGES = YES
# the mscgen tool resides. If left empty the tool is assumed to be found in the
# default search path.
# MSCGEN_PATH =
MSCGEN_PATH =
# You can include diagrams made with dia in doxygen documentation. Doxygen will
# then run dia to produce the diagram and insert it in the documentation. The
@ -2183,6 +2221,23 @@ HAVE_DOT = YES
DOT_NUM_THREADS = 0
# When you want a differently looking font in the dot files that doxygen
# generates you can specify the font name using DOT_FONTNAME. You need to make
# sure dot is able to find the font, which can be done by putting it in a
# standard location or by setting the DOTFONTPATH environment variable or by
# setting DOT_FONTPATH to the directory containing the font.
# The default value is: Helvetica.
# This tag requires that the tag HAVE_DOT is set to YES.
DOT_FONTNAME = Helvetica
# The DOT_FONTSIZE tag can be used to set the size (in points) of the font of
# dot graphs.
# Minimum value: 4, maximum value: 24, default value: 10.
# This tag requires that the tag HAVE_DOT is set to YES.
DOT_FONTSIZE = 10
# By default doxygen will tell dot to use the default font as specified with
# DOT_FONTNAME. If you specify a different font using DOT_FONTNAME you can set
# the path where dot can find it using this tag.
@ -2395,6 +2450,18 @@ DOT_GRAPH_MAX_NODES = 50
MAX_DOT_GRAPH_DEPTH = 2
# Set the DOT_TRANSPARENT tag to YES to generate images with a transparent
# background. This is disabled by default, because dot on Windows does not seem
# to support this out of the box.
#
# Warning: Depending on the platform used, enabling this option may lead to
# badly anti-aliased labels on the edges of a graph (i.e. they become hard to
# read).
# The default value is: NO.
# This tag requires that the tag HAVE_DOT is set to YES.
DOT_TRANSPARENT = NO
# Set the DOT_MULTI_TARGETS tag to YES to allow dot to generate multiple output
# files in one run (i.e. multiple -o and -T options on the command line). This
# makes dot run faster, but since only newer versions of dot (>1.8.10) support

View File

@ -1,8 +1,3 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation
# All rights reserved.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
@ -13,10 +8,6 @@ all: doc
doc: output
deprecation.md: ../deprecation.md
$(Q)sed -e 's/^# Deprecation/# Deprecation {#deprecation}/' \
< $< > $@
changelog.md: ../CHANGELOG.md
$(Q)sed -e 's/^# Changelog/# Changelog {#changelog}/' \
-e 's/^##/#/' \
@ -24,9 +15,9 @@ changelog.md: ../CHANGELOG.md
-e '/# v..\...:/s/\./-/2' \
< $< > $@
output: Doxyfile changelog.md deprecation.md $(wildcard *.md) $(wildcard ../include/spdk/*.h)
output: Doxyfile changelog.md $(wildcard *.md) $(wildcard ../include/spdk/*.h)
$(Q)rm -rf $@
$(Q)doxygen Doxyfile
clean:
$(Q)rm -rf output changelog.md deprecation.md
$(Q)rm -rf output changelog.md

View File

@ -1,9 +1,11 @@
# SPDK Documentation
SPDK Documentation
==================
The current version of the SPDK documentation can be found online at
http://www.spdk.io/doc/
## Building the Documentation
Building the Documentation
==========================
To convert the documentation into HTML run `make` in the `doc`
directory. The output will be located in `doc/output/html`. Before

View File

@ -9,29 +9,30 @@ exists to enable use of the framework in environments without hardware
acceleration capabilities. ISA/L is used for optimized CRC32C calculation within
the software module.
## Acceleration Framework Functions {#accel_functions}
The framework includes an API for getting the current capabilities of the
selected module. See [`spdk_accel_get_capabilities`](https://spdk.io/doc/accel__engine_8h.html) for more details. For the software module, all capabilities will be reported as supported. For the hardware modules, only functions accelerated by hardware will be reported however any function can still be called, it will just be backed by software if it is not reported as a supported capability.
# Acceleration Framework Functions {#accel_functions}
Functions implemented via the framework can be found in the DoxyGen documentation of the
framework public header file here [accel.h](https://spdk.io/doc/accel_8h.html)
framework public header file here [accel_engine.h](https://spdk.io/doc/accel__engine_8h.html)
## Acceleration Framework Design Considerations {#accel_dc}
# Acceleration Framework Design Considerations {#accel_dc}
The general interface is defined by `/include/spdk/accel.h` and implemented
The general interface is defined by `/include/accel_engine.h` and implemented
in `/lib/accel`. These functions may be called by an SPDK application and in
most cases, except where otherwise documented, are asynchronous and follow the
standard SPDK model for callbacks with a callback argument.
If the acceleration framework is started without initializing a hardware module,
optimized software implementations of the operations will back the public API. All
operations supported by the framework have a backing software implementation in
the event that no hardware accelerators have been enabled for that operation.
optimized software implementations of the functions will back the public API.
Additionally, if any hardware module does not support a specific function and that
hardware module is initialized, the specific function will fallback to a software
optimized implementation. For example, IOAT does not support the dualcast function
in hardware but if the IOAT module has been initialized and the public dualcast API
is called, it will actually be done via software behind the scenes.
When multiple hardware modules are enabled the framework will assign each operation to
a module based on the order in which it was initialized. So, for example if two modules are
enabled, IOAT and software, the software module will be used for every operation except those
supported by IOAT.
## Acceleration Low Level Libraries {#accel_libs}
# Acceleration Low Level Libraries {#accel_libs}
Low level libraries provide only the most basic functions that are specific to
the hardware. Low level libraries are located in the '/lib' directory with the
@ -45,21 +46,9 @@ functions exposed by the individual low level libraries. Thus, code written this
way needs to be certain that the underlying hardware exists everywhere that it runs.
The low level library for IOAT is located in `/lib/ioat`. The low level library
for DSA and IAA is in `/lib/idxd` (IDXD stands for Intel(R) Data Acceleration Driver and
supports both DSA and IAA hardware accelerators). In `/lib/idxd` folder, SPDK supports the ability
to use either user space and kernel space drivers. The following describes each usage scenario:
for DSA is in `/liv/idxd` (IDXD stands for Intel(R) Data Acceleration Driver).
Leveraging user space idxd driver: The DSA devices are managed by the SPDK user space
driver in a dedicated SPDK process, then the device cannot be shared by another
process. The benefit of this usage is no kernel dependency.
Leveraging kernel space driver: The DSA devices are managed by kernel
space drivers. And the Work queues inside the DSA device can be shared among
different processes. Naturally, it can be used in cloud native scenario. The drawback of
this usage is the kernel dependency, i.e., idxd kernel driver must be supported and loaded
in the kernel.
## Acceleration Plug-In Modules {#accel_modules}
# Acceleration Plug-In Modules {#accel_modules}
Plug-in modules depend on low level libraries to interact with the hardware and
add additional functionality such as queueing during busy conditions or flow
@ -68,123 +57,51 @@ the complete implementation of the acceleration component. A module must be
selected via startup RPC when the application is started. Otherwise, if no startup
RPC is provided, the framework is available and will use the software plug-in module.
### IOAT Module {#accel_ioat}
## IOAT Module {#accel_ioat}
To use the IOAT module, use the RPC [`ioat_scan_accel_module`](https://spdk.io/doc/jsonrpc.html) before starting the application.
To use the IOAT engine, use the RPC [`ioat_scan_accel_engine`](https://spdk.io/doc/jsonrpc.html) before starting the application.
### DSA Module {#accel_dsa}
## IDXD Module {#accel_idxd}
The DSA module supports the DSA hardware and relies on the low level IDXD library.
To use the DSA engine, use the RPC [`idxd_scan_accel_engine`](https://spdk.io/doc/jsonrpc.html) with an optional parameter of `-c` and provide a configuration number of either 0 or 1. These pre-defined configurations determine how the DSA engine will be setup in terms
of work queues and engines. The DSA engine is very flexible allowing for various configurations of these elements to either account for different quality of service requirements or to isolate hardware paths where the back end media is of varying latency (i.e. persistent memory vs DRAM). The pre-defined configurations are as follows:
To use the DSA module, use the RPC
[`dsa_scan_accel_module`](https://spdk.io/doc/jsonrpc.html). By default, this
will attempt to load the SPDK user-space idxd driver. To use the built-in
kernel driver on Linux, add the `-k` parameter. See the next section for
details on using the kernel driver.
0: Four separate work queues each backed with one DSA engine. This is a generic
configuration that provides 4 portals to submit operations to each with a
single engine behind it providing some level of isolation as operations are
submitted round-robin.
The DSA hardware supports a limited queue depth and channels. This means that
only a limited number of `spdk_thread`s will be able to acquire a channel.
Design software to deal with the inability to get a channel.
1: Two separate work queues each backed with two DSA engines. This is another
generic configuration that provides 2 portals to submit operations to and
lets the DSA hardware decide which engine to select based on loading.
#### How to use kernel idxd driver {#accel_idxd_kernel}
There are several other configurations that are possible that include quality
of service parameters on the work queues that are not currently utilized by
the module. Specialized use of DSA may require different configurations that
can be added to the module as needed.
There are several dependencies to leverage the Linux idxd driver for driving DSA devices.
## Software Module {#accel_sw}
1 Linux kernel support: You need to have a Linux kernel with the `idxd` driver
loaded. Further, add the following command line options to the kernel boot
commands:
```bash
intel_iommu=on,sm_on
```
2 User library dependency: Users need to install the developer version of the
`accel-config` library. This is often packaged, but the source is available on
[GitHub](https://github.com/intel/idxd-config). After the library is installed,
users can use the `accel-config` command to configure the work queues(WQs) of
the idxd devices managed by the kernel with the following steps:
Note: this library must be installed before you run `configure`
```bash
accel-config disable-wq dsa0/wq0.1
accel-config disable-device dsa0
accel-config config-wq --group-id=0 --mode=dedicated --wq-size=128 --type=user --name="MyApp1"
--priority=10 --block-on-fault=1 dsa0/wq0.1
accel-config config-engine dsa0/engine0.0 --group-id=0
accel-config config-engine dsa0/engine0.1 --group-id=0
accel-config config-engine dsa0/engine0.2 --group-id=0
accel-config config-engine dsa0/engine0.3 --group-id=0
accel-config enable-device dsa0
accel-config enable-wq dsa0/wq0.1
```
DSA can be configured in many ways, but the above configuration is needed for use with SPDK.
Before you can run using the kernel driver you need to make sure that the hardware is bound
to the kernel driver and not VFIO. By default when you run `setup.sh` DSA devices will be
bound to VFIO. To exclude DSA devices, pass a whitespace separated list of DSA devices BDF
using the PCI_BLOCKED parameter as shown below.
```bash
sudo PCI_BLOCKED="0000:04:00.0 0000:05:00.0" ./setup.sh
```
Note: you might need to run `sudo ./setup.sh reset` to unbind all drivers before performing
the step above.
### Software Module {#accel_sw}
The software module is enabled by default. If no hardware module is explicitly
The software module is enabled by default. If no hardware engine is explicitly
enabled via startup RPC as discussed earlier, the software module will use ISA-L
if available for functions such as CRC32C. Otherwise, standard glibc calls are
used to back the framework API.
### dpdk_cryptodev {#accel_dpdk_cryptodev}
## Batching {#batching}
The dpdk_cryptodev module uses DPDK CryptoDev API to implement crypto operations.
The following ciphers and PMDs are supported:
Batching is exposed by the acceleration framework and provides an interface to
batch sets of commands up and then submit them with a single command. The public
API is consistent with the implementation however each plug-in module behaves
differently depending on its capabilities.
- AESN-NI Multi Buffer Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC
- Intel(R) QuickAssist (QAT) Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC,
RTE_CRYPTO_CIPHER_AES128_XTS
(Note: QAT is functional however is marked as experimental until the hardware has
been fully integrated with the SPDK CI system.)
- MLX5 Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES256_XTS, RTE_CRYPTO_CIPHER_AES512_XTS
The DSA engine has complete support for batching all supported commands together
into one submission. This is advantageous as it reduces the overhead incurred in
the submission process to the hardware.
To enable this module, use [`dpdk_cryptodev_scan_accel_module`](https://spdk.io/doc/jsonrpc.html),
this RPC is available in STARTUP state and the SPDK application needs to be run with `--wait-for-rpc`
CLI parameter. To select a specific PMD, use [`dpdk_cryptodev_set_driver`](https://spdk.io/doc/jsonrpc.html)
The software engine supports batching only to be consistent with the framework API.
In software there is no savings by batching sets of commands versus submitting them
individually.
### Module to Operation Code Assignment {#accel_assignments}
When multiple modules are initialized, the accel framework will assign op codes to
modules by first assigning all op codes to the Software Module and then overriding
op code assignments to Hardware Modules in the order in which they were initialized.
The RPC `accel_get_opc_assignments` can be used at any time to see the current
assignment map including the names of valid operations. The RPC `accel_assign_opc`
can be used after initializing the desired Hardware Modules but before starting the
framework in the event that a specific override is desired. Note that to start an
application and send startup RPC's use the `--wait-for-rpc` parameter and then use the
`framework_start_init` RPC to continue. For example, assume the DSA Module is initialized
but for some reason the desire is to have the Software Module handle copies instead.
The following RPCs would accomplish the copy override:
```bash
./scripts/rpc.py dsa_scan_accel_module
./scripts/rpc.py accel_assign_opc -o copy -m software
./scripts/rpc.py framework_start_init
./scripts/rpc.py accel_get_opc_assignments
{
"copy": "software",
"fill": "dsa",
"dualcast": "dsa",
"compare": "dsa",
"crc32c": "dsa",
"copy_crc32c": "dsa",
"compress": "software",
"decompress": "software"
}
```
To determine the name of available modules and their supported operations use the
RPC `accel_get_module_info`.
The IOAT engine supports batching but it is only beneficial for `memmove` and `memfill`
as these are supported by the hardware. All other commands can be batched and the
framework will manage all other commands via software.

View File

@ -29,20 +29,20 @@ Param | Long Param | Type | Default | Descript
-------- | ---------------------- | -------- | ---------------------- | -----------
-c | --config | string | | @ref cmd_arg_config_file
-d | --limit-coredump | flag | false | @ref cmd_arg_limit_coredump
-e | --tpoint-group | integer | | @ref cmd_arg_limit_tpoint_group_mask
-e | --tpoint-group-mask | integer | 0x0 | @ref cmd_arg_limit_tpoint_group_mask
-g | --single-file-segments | flag | | @ref cmd_arg_single_file_segments
-h | --help | flag | | show all available parameters and exit
-i | --shm-id | integer | | @ref cmd_arg_multi_process
-m | --cpumask | CPU mask | 0x1 | application @ref cpu_mask
-n | --mem-channels | integer | all channels | number of memory channels used for DPDK
-p | --main-core | integer | first core in CPU mask | main (primary) core for DPDK
-p | --master-core | integer | first core in CPU mask | master (primary) core for DPDK
-r | --rpc-socket | string | /var/tmp/spdk.sock | RPC listen address
-s | --mem-size | integer | all hugepage memory | @ref cmd_arg_memory_size
| | --silence-noticelog | flag | | disable notice level logging to `stderr`
-u | --no-pci | flag | | @ref cmd_arg_disable_pci_access.
| | --wait-for-rpc | flag | | @ref cmd_arg_deferred_initialization
-B | --pci-blocked | B:D:F | | @ref cmd_arg_pci_blocked_allowed.
-A | --pci-allowed | B:D:F | | @ref cmd_arg_pci_blocked_allowed.
-B | --pci-blacklist | B:D:F | | @ref cmd_arg_pci_blacklist_whitelist.
-W | --pci-whitelist | B:D:F | | @ref cmd_arg_pci_blacklist_whitelist.
-R | --huge-unlink | flag | | @ref cmd_arg_huge_unlink
| | --huge-dir | string | the first discovered | allocate hugepages from a specific mount
-L | --logflag | string | | @ref cmd_arg_log_flags
@ -61,7 +61,7 @@ to RLIM_INFINITY. Specifying `--limit-coredump` will not set the resource limit
SPDK has an experimental low overhead tracing framework. Tracepoints in this
framework are organized into tracepoint groups. By default, all tracepoint
groups are disabled. `--tpoint-group` can be used to enable a specific
groups are disabled. `--tpoint-group-mask` can be used to enable a specific
subset of tracepoint groups in the application.
Note: Additional documentation on the tracepoint framework is in progress.
@ -121,12 +121,12 @@ If SPDK is run with PCI access disabled it won't detect any PCI devices. This
includes primarily NVMe and IOAT devices. Also, the VFIO and UIO kernel modules
are not required in this mode.
### PCI address blocked and allowed lists {#cmd_arg_pci_blocked_allowed}
### PCI address blacklist and whitelist {#cmd_arg_pci_blacklist_whitelist}
If blocked list is used, then all devices with the provided PCI address will be
ignored. If an allowed list is used, only allowed devices will be probed.
`-B` or `-A` can be used more than once, but cannot be mixed together. That is,
`-B` and `-A` cannot be used at the same time.
If blacklist is used, then all devices with the provided PCI address will be
ignored. If a whitelist is used, only whitelisted devices will be probed.
`-B` or `-W` can be used more than once, but cannot be mixed together. That is,
`-B` and `-W` cannot be used at the same time.
### Unlink hugepage files after initialization {#cmd_arg_huge_unlink}
@ -151,7 +151,7 @@ Whenever the `CPU mask` is mentioned it is a string in one of the following form
The following CPU masks are equal and correspond to CPUs 0, 1, 2, 8, 9, 10, 11 and 12:
~~~bash
~~~
0x1f07
0x1F07
1f07

View File

@ -1,11 +1,10 @@
# Block Device User Guide {#bdev}
## Target Audience {#bdev_ug_targetaudience}
# Target Audience {#bdev_ug_targetaudience}
This user guide is intended for software developers who have knowledge of block storage, storage drivers, issuing JSON-RPC
commands and storage services such as RAID, compression, crypto, and others.
This user guide is intended for software developers who have knowledge of block storage, storage drivers, issuing JSON-RPC commands and storage services such as RAID, compression, crypto, and others.
## Introduction {#bdev_ug_introduction}
# Introduction {#bdev_ug_introduction}
The SPDK block device layer, often simply called *bdev*, is a C library
intended to be equivalent to the operating system block storage layer that
@ -27,7 +26,7 @@ device underneath (please refer to @ref bdev_module for details). SPDK
provides also vbdev modules which creates block devices on existing bdev. For
example @ref bdev_ug_logical_volumes or @ref bdev_ug_gpt
## Prerequisites {#bdev_ug_prerequisites}
# Prerequisites {#bdev_ug_prerequisites}
This guide assumes that you can already build the standard SPDK distribution
on your platform. The block device layer is a C library with a single public
@ -40,50 +39,25 @@ directly from SPDK application by running `scripts/rpc.py rpc_get_methods`.
Detailed help for each command can be displayed by adding `-h` flag as a
command parameter.
## Configuring Block Device Modules {#bdev_ug_general_rpcs}
# Configuring Block Device Modules {#bdev_ug_general_rpcs}
Block devices can be configured using JSON RPCs. A complete list of available RPC commands
with detailed information can be found on the @ref jsonrpc_components_bdev page.
## Common Block Device Configuration Examples
# Common Block Device Configuration Examples
## Ceph RBD {#bdev_config_rbd}
# Ceph RBD {#bdev_config_rbd}
The SPDK RBD bdev driver provides SPDK block layer access to Ceph RADOS block
devices (RBD). Ceph RBD devices are accessed via librbd and librados libraries
to access the RADOS block device exported by Ceph. To create Ceph bdev RPC
command `bdev_rbd_register_cluster` and `bdev_rbd_create` should be used.
SPDK provides two ways of creating a RBD bdev. One is to create a new Rados cluster object
for each RBD bdev. Another is to share the same Rados cluster object for multiple RBD bdevs.
Each Rados cluster object creates a small number of io_context_pool and messenger threads.
Ceph commands `ceph config help librados_thread_count` and `ceph config help ms_async_op_threads`
could help to check these threads information. Besides, you can specify the number of threads by
updating ceph.conf file or using Ceph config commands. For more information, please refer to
[Ceph configuration](https://docs.ceph.com/en/latest/rados/configuration/ceph-conf/)
One set of threads may not be enough to maximize performance with a large number of RBD bdevs,
but one set of threads per RBD bdev may add too much context switching. Therefore, performance
tuning on the number of RBD bdevs per cluster object and thread may be required.
command `bdev_rbd_create` should be used.
Example command
`rpc.py bdev_rbd_register_cluster rbd_cluster`
This command will register a cluster named rbd_cluster. Optional `--config-file` and
`--key-file` params are specified for the cluster.
To remove a registered cluster use the bdev_rbd_unregister_cluster command.
`rpc.py bdev_rbd_unregister_cluster rbd_cluster`
To create RBD bdev with a registered cluster.
`rpc.py bdev_rbd_create rbd foo 512 -c rbd_cluster`
`rpc.py bdev_rbd_create rbd foo 512`
This command will create a bdev that represents the 'foo' image from a pool called 'rbd'.
When specifying -c for `bdev_rbd_create`, RBD bdevs will share the same rados cluster with
one connection of Ceph in librbd module. Instead it will create a new rados cluster with one
cluster connection for every bdev without specifying -c.
To remove a block device representation use the bdev_rbd_delete command.
@ -95,7 +69,7 @@ To resize a bdev use the bdev_rbd_resize command.
This command will resize the Rbd0 bdev to 4096 MiB.
## Compression Virtual Bdev Module {#bdev_config_compress}
# Compression Virtual Bdev Module {#bdev_config_compress}
The compression bdev module can be configured to provide compression/decompression
services for an underlying thinly provisioned logical volume. Although the underlying
@ -104,10 +78,10 @@ unless the data stored on disk is placed appropriately. The compression vbdev mo
relies on an internal SPDK library called `reduce` to accomplish this, see @ref reduce
for detailed information.
The compression bdev module leverages the [Acceleration Framework](https://spdk.io/doc/accel_fw.html) to
carry out the actual compression and decompression. The acceleration framework can be configured to use
ISA-L software optimized compression or the DPDK Compressdev module for hardware acceleration. To configure
the Compressdev module please see the `compressdev_scan_accel_module` documentation [here](https://spdk.io/doc/jsonrpc.html)
The vbdev module relies on the DPDK CompressDev Framework to provide all compression
functionality. The framework provides support for many different software only
compression modules as well as hardware assisted support for Intel QAT. At this
time the vbdev module supports the DPDK drivers for ISAL and QAT.
Persistent memory is used to store metadata associated with the layout of the data on the
backing device. SPDK relies on [PMDK](http://pmem.io/pmdk/) to interface persistent memory so any hardware
@ -130,6 +104,18 @@ created it cannot be separated from the persistent memory file that will be crea
the specified directory. If the persistent memory file is not available, the compression
vbdev will also not be available.
By default the vbdev module will choose the QAT driver if the hardware and drivers are
available and loaded. If not, it will revert to the software-only ISAL driver. By using
the following command, the driver may be specified however this is not persistent so it
must be done either upon creation or before the underlying logical volume is loaded to
be honored. In the example below, `0` is telling the vbdev module to use QAT if available
otherwise use ISAL, this is the default and if sufficient the command is not required. Passing
a value of 1 tells the driver to use QAT and if not available then the creation or loading
the vbdev should fail to create or load. A value of '2' as shown below tells the module
to use ISAL and if for some reason it is not available, the vbdev should fail to create or load.
`rpc.py compress_set_pmd -p 2`
To remove a compression vbdev, use the following command which will also delete the PMEM
file. If the logical volume is deleted the PMEM file will not be removed and the
compression vbdev will not be available.
@ -142,100 +128,44 @@ all volumes, if used it will return the name or an error that the device does no
`rpc.py bdev_compress_get_orphans --name COMP_Nvme0n1`
## Crypto Virtual Bdev Module {#bdev_config_crypto}
# Crypto Virtual Bdev Module {#bdev_config_crypto}
The crypto virtual bdev module can be configured to provide at rest data encryption
for any underlying bdev. The module relies on the SPDK Accel Framework to provide
all cryptographic functionality.
One of the accel modules, dpdk_cryptodev is implemented with the DPDK CryptoDev API,
it provides support for many different software only cryptographic modules as well hardware
assisted support for the Intel QAT board and NVIDIA crypto enabled NICs.
For reads, the buffer provided to the crypto block device will be used as the destination buffer
for unencrypted data. For writes, however, a temporary scratch buffer is used as the
destination buffer for encryption which is then passed on to the underlying bdev as the
write buffer. This is done to avoid encrypting the data in the original source buffer which
may cause problems in some use cases.
Below is information about accel modules which support crypto operations:
### dpdk_cryptodev accel module
Supports the following ciphers:
for any underlying bdev. The module relies on the DPDK CryptoDev Framework to provide
all cryptographic functionality. The framework provides support for many different software
only cryptographic modules as well hardware assisted support for the Intel QAT board. The
framework also provides support for cipher, hash, authentication and AEAD functions. At this
time the SPDK virtual bdev module supports cipher only as follows:
- AESN-NI Multi Buffer Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC
- Intel(R) QuickAssist (QAT) Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC,
RTE_CRYPTO_CIPHER_AES128_XTS
- Intel(R) QuickAssist (QAT) Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC
(Note: QAT is functional however is marked as experimental until the hardware has
been fully integrated with the SPDK CI system.)
- MLX5 Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES256_XTS, RTE_CRYPTO_CIPHER_AES512_XTS
In order to support using the bdev block offset (LBA) as the initialization vector (IV),
the crypto module break up all I/O into crypto operations of a size equal to the block
size of the underlying bdev. For example, a 4K I/O to a bdev with a 512B block size,
would result in 8 cryptographic operations.
### SW accel module
For reads, the buffer provided to the crypto module will be used as the destination buffer
for unencrypted data. For writes, however, a temporary scratch buffer is used as the
destination buffer for encryption which is then passed on to the underlying bdev as the
write buffer. This is done to avoid encrypting the data in the original source buffer which
may cause problems in some use cases.
Supports the following ciphers:
Example command
- AES_XTS cipher with 128 or 256 bit keys implemented with ISA-L_crypto
`rpc.py bdev_crypto_create NVMe1n1 CryNvmeA crypto_aesni_mb 0123456789123456`
### General workflow
- Set desired accel module to perform crypto operations, that can be done with `accel_assign_opc` RPC command
- Create a named crypto key using `accel_crypto_key_create` RPC command. The key will use the assigned accel
module. Set of parameters and supported ciphers may be different in each accel module.
- Create virtual crypto block device providing the base block device name and the crypto key name
using `bdev_crypto_create` RPC command
#### Example
Example command which uses dpdk_cryptodev accel module
```
# start SPDK application with `--wait-for-rpc` parameter
rpc.py dpdk_cryptodev_scan_accel_module
rpc.py dpdk_cryptodev_set_driver crypto_aesni_mb
rpc.py accel_assign_opc -o encrypt -m dpdk_cryptodev
rpc.py accel_assign_opc -o decrypt -m dpdk_cryptodev
rpc.py framework_start_init
rpc.py accel_crypto_key_create -c AES_CBC -k 01234567891234560123456789123456 -n key_aesni_cbc_1
rpc.py bdev_crypto_create NVMe1n1 CryNvmeA -n key_aesni_cbc_1
```
These commands will create a crypto vbdev called 'CryNvmeA' on top of the NVMe bdev
'NVMe1n1' and will use a key named `key_aesni_cbc_1`. The key will work with the accel module which
has been assigned for encrypt operations, in this example it will be the dpdk_cryptodev.
### Crypto key format
Please make sure the keys are provided in hexlified format. This means string passed to
rpc.py must be twice as long than the key length in binary form.
#### Example command
`rpc.py accel_crypto_key_create -c AES_XTS -k2 7859243a027411e581e0c40a35c8228f -k d16a2f3a9e9f5b32daefacd7f5984f4578add84425be4a0baa489b9de8884b09 -n sample_key`
This command will create a key called `sample_key`, the AES key
'd16a2f3a9e9f5b32daefacd7f5984f4578add84425be4a0baa489b9de8884b09' and the XTS key
'7859243a027411e581e0c40a35c8228f'. In other words, the compound AES_XTS key to be used is
'd16a2f3a9e9f5b32daefacd7f5984f4578add84425be4a0baa489b9de8884b097859243a027411e581e0c40a35c8228f'
### Delete the virtual crypto block device
This command will create a crypto vbdev called 'CryNvmeA' on top of the NVMe bdev
'NVMe1n1' and will use the DPDK software driver 'crypto_aesni_mb' and the key
'0123456789123456'.
To remove the vbdev use the bdev_crypto_delete command.
`rpc.py bdev_crypto_delete CryNvmeA`
### dpdk_cryptodev mlx5_pci driver configuration
The mlx5_pci driver works with crypto enabled Nvidia NICs and requires special configuration of
DPDK environment to enable crypto function. It can be done via SPDK event library by configuring
`env_context` member of `spdk_app_opts` structure or by passing corresponding CLI arguments in
the following form: `--allow=BDF,class=crypto,wcs_file=/full/path/to/wrapped/credentials`, e.g.
`--allow=0000:01:00.0,class=crypto,wcs_file=/path/credentials.txt`.
## Delay Bdev Module {#bdev_config_delay}
# Delay Bdev Module {#bdev_config_delay}
The delay vbdev module is intended to apply a predetermined additional latency on top of a lower
level bdev. This enables the simulation of the latency characteristics of a device during the functional
@ -266,15 +196,15 @@ Example command:
`rpc.py bdev_delay_delete delay0`
## GPT (GUID Partition Table) {#bdev_config_gpt}
# GPT (GUID Partition Table) {#bdev_config_gpt}
The GPT virtual bdev driver is enabled by default and does not require any configuration.
It will automatically detect @ref bdev_ug_gpt on any attached bdev and will create
possibly multiple virtual bdevs.
### SPDK GPT partition table {#bdev_ug_gpt}
## SPDK GPT partition table {#bdev_ug_gpt}
The SPDK partition type GUID is `6527994e-2c5a-4eec-9613-8f5944074e8b`. Existing SPDK bdevs
The SPDK partition type GUID is `7c5222bd-8f5d-4087-9c00-bf9843c7b58c`. Existing SPDK bdevs
can be exposed as Linux block devices via NBD and then can be partitioned with
standard partitioning tools. After partitioning, the bdevs will need to be deleted and
attached again for the GPT bdev module to see any changes. NBD kernel module must be
@ -298,9 +228,9 @@ Example command
`rpc.py nbd_stop_disk -n /dev/nbd0`
### Creating a GPT partition table using NBD {#bdev_ug_gpt_create_part}
## Creating a GPT partition table using NBD {#bdev_ug_gpt_create_part}
~~~bash
~~~
# Expose bdev Nvme0n1 as kernel block device /dev/nbd0 by JSON-RPC
rpc.py nbd_start_disk Nvme0n1 /dev/nbd0
@ -312,7 +242,7 @@ parted -s /dev/nbd0 mkpart MyPartition '0%' '50%'
# Change the partition type to the SPDK GUID.
# sgdisk is part of the gdisk package.
sgdisk -t 1:6527994e-2c5a-4eec-9613-8f5944074e8b /dev/nbd0
sgdisk -t 1:7c5222bd-8f5d-4087-9c00-bf9843c7b58c /dev/nbd0
# Stop the NBD device (stop exporting /dev/nbd0).
rpc.py nbd_stop_disk /dev/nbd0
@ -322,7 +252,7 @@ rpc.py nbd_stop_disk /dev/nbd0
# Nvme0n1p1 in SPDK applications.
~~~
## iSCSI bdev {#bdev_config_iscsi}
# iSCSI bdev {#bdev_config_iscsi}
The SPDK iSCSI bdev driver depends on libiscsi and hence is not enabled by default.
In order to use it, build SPDK with an extra `--with-iscsi-initiator` configure option.
@ -335,7 +265,7 @@ with `iqn.2016-06.io.spdk:init` as the reported initiator IQN.
The URL is in the following format:
`iscsi://[<username>[%<password>]@]<host>[:<port>]/<target-iqn>/<lun>`
## Linux AIO bdev {#bdev_config_aio}
# Linux AIO bdev {#bdev_config_aio}
The SPDK AIO bdev driver provides SPDK block layer access to Linux kernel block
devices or a file on a Linux filesystem via Linux AIO. Note that O_DIRECT is
@ -358,7 +288,7 @@ To delete an aio bdev use the bdev_aio_delete command.
`rpc.py bdev_aio_delete aio0`
## OCF Virtual bdev {#bdev_config_cas}
# OCF Virtual bdev {#bdev_config_cas}
OCF virtual bdev module is based on [Open CAS Framework](https://github.com/Open-CAS/ocf) - a
high performance block storage caching meta-library.
@ -382,10 +312,12 @@ To remove `Cache1`:
During removal OCF-cache will be stopped and all cached data will be written to the core device.
Note that OCF has a per-device RAM requirement. More details can be found in the
[OCF documentation](https://open-cas.github.io/guide_system_requirements.html).
Note that OCF has a per-device RAM requirement
of about 56000 + _cache device size_ * 58 / _cache line size_ (in bytes).
To get more information on OCF
please visit [OCF documentation](https://open-cas.github.io/).
## Malloc bdev {#bdev_config_malloc}
# Malloc bdev {#bdev_config_malloc}
Malloc bdevs are ramdisks. Because of its nature they are volatile. They are created from hugepage memory given to SPDK
application.
@ -398,7 +330,7 @@ Example command for removing malloc bdev:
`rpc.py bdev_malloc_delete Malloc0`
## Null {#bdev_config_null}
# Null {#bdev_config_null}
The SPDK null bdev driver is a dummy block I/O target that discards all writes and returns undefined
data for reads. It is useful for benchmarking the rest of the bdev I/O stack with minimal block
@ -415,7 +347,7 @@ To delete a null bdev use the bdev_null_delete command.
`rpc.py bdev_null_delete Null0`
## NVMe bdev {#bdev_config_nvme}
# NVMe bdev {#bdev_config_nvme}
There are two ways to create block device based on NVMe device in SPDK. First
way is to connect local PCIe drive and second one is to connect NVMe-oF device.
@ -437,41 +369,31 @@ To remove an NVMe controller use the bdev_nvme_detach_controller command.
This command will remove NVMe bdev named Nvme0.
The SPDK NVMe bdev driver provides the multipath feature. Please refer to
@ref nvme_multipath for details.
## NVMe bdev character device {#bdev_config_nvme_cuse}
### NVMe bdev character device {#bdev_config_nvme_cuse}
This feature is considered as experimental. You must configure with --with-nvme-cuse
option to enable this RPC.
This feature is considered as experimental.
Example commands
`rpc.py bdev_nvme_cuse_register -n Nvme3`
`rpc.py bdev_nvme_cuse_register -n Nvme0 -p spdk/nvme0`
This command will register a character device under /dev/spdk associated with Nvme3
controller. If there are namespaces created on Nvme3 controller, a namespace
character device is also created for each namespace.
For example, the first controller registered will have a character device path of
/dev/spdk/nvmeX, where X is replaced with a unique integer to differentiate it from
other controllers. Note that this 'nvmeX' name here has no correlation to the name
associated with the controller in SPDK. Namespace character devices will have a path
of /dev/spdk/nvmeXnY, where Y is the namespace ID.
This command will register /dev/spdk/nvme0 character device associated with Nvme0
controller. If there are namespaces created on Nvme0 controller, for each namespace
device /dev/spdk/nvme0nX is created.
Cuse devices are removed from system, when NVMe controller is detached or unregistered
with command:
`rpc.py bdev_nvme_cuse_unregister -n Nvme0`
## Logical volumes {#bdev_ug_logical_volumes}
# Logical volumes {#bdev_ug_logical_volumes}
The Logical Volumes library is a flexible storage space management system. It allows
creating and managing virtual block devices with variable size on top of other bdevs.
The SPDK Logical Volume library is built on top of @ref blob. For detailed description
please refer to @ref lvol.
### Logical volume store {#bdev_ug_lvol_store}
## Logical volume store {#bdev_ug_lvol_store}
Before creating any logical volumes (lvols), an lvol store has to be created first on
selected block device. Lvol store is lvols vessel responsible for managing underlying
@ -510,7 +432,7 @@ Example commands
`rpc.py bdev_lvol_delete_lvstore -l lvs`
### Lvols {#bdev_ug_lvols}
## Lvols {#bdev_ug_lvols}
To create lvols on existing lvol store user should use `bdev_lvol_create` RPC command.
Each created lvol will be represented by new bdev.
@ -521,7 +443,7 @@ Example commands
`rpc.py bdev_lvol_create lvol2 25 -u 330a6ab2-f468-11e7-983e-001e67edf35d`
## Passthru {#bdev_config_passthru}
# Passthru {#bdev_config_passthru}
The SPDK Passthru virtual block device module serves as an example of how to write a
virtual block device module. It implements the required functionality of a vbdev module
@ -533,14 +455,50 @@ Example commands
`rpc.py bdev_passthru_delete pt`
## RAID {#bdev_ug_raid}
# Pmem {#bdev_config_pmem}
The SPDK pmem bdev driver uses pmemblk pool as the target for block I/O operations. For
details on Pmem memory please refer to PMDK documentation on http://pmem.io website.
First, user needs to configure SPDK to include PMDK support:
`configure --with-pmdk`
To create pmemblk pool for use with SPDK user should use `bdev_pmem_create_pool` RPC command.
Example command
`rpc.py bdev_pmem_create_pool /path/to/pmem_pool 25 4096`
To get information on created pmem pool file user can use `bdev_pmem_get_pool_info` RPC command.
Example command
`rpc.py bdev_pmem_get_pool_info /path/to/pmem_pool`
To remove pmem pool file user can use `bdev_pmem_delete_pool` RPC command.
Example command
`rpc.py bdev_pmem_delete_pool /path/to/pmem_pool`
To create bdev based on pmemblk pool file user should use `bdev_pmem_create ` RPC
command.
Example command
`rpc.py bdev_pmem_create /path/to/pmem_pool -n pmem`
To remove a block device representation use the bdev_pmem_delete command.
`rpc.py bdev_pmem_delete pmem`
# RAID {#bdev_ug_raid}
RAID virtual bdev module provides functionality to combine any SPDK bdevs into
one RAID bdev. Currently SPDK supports only RAID 0. RAID metadata may be stored
on member disks if enabled when creating the RAID bdev, so user does not have to
recreate the RAID volume when restarting application. It is not enabled by
default for backward compatibility. User may specify member disks to create
RAID volume even if they do not exist yet - as the member disks are registered at
one RAID bdev. Currently SPDK supports only RAID 0. RAID functionality does not
store on-disk metadata on the member disks, so user must recreate the RAID
volume when restarting application. User may specify member disks to create RAID
volume event if they do not exists yet - as the member disks are registered at
a later time, the RAID module will claim them and will surface the RAID volume
after all of the member disks are available. It is allowed to use disks of
different sizes - the smallest disk size will be the amount of space used on
@ -554,7 +512,7 @@ Example commands
`rpc.py bdev_raid_delete Raid0`
## Split {#bdev_ug_split}
# Split {#bdev_ug_split}
The split block device module takes an underlying block device and splits it into
several smaller equal-sized virtual block devices. This serves as an example to create
@ -576,7 +534,7 @@ To remove the split bdevs, use the `bdev_split_delete` command with th
`rpc.py bdev_split_delete bdev_b0`
## Uring {#bdev_ug_uring}
# Uring {#bdev_ug_uring}
The uring bdev module issues I/O to kernel block devices using the io_uring Linux kernel API. This module requires liburing.
For more information on io_uring refer to kernel [IO_uring] (https://kernel.dk/io_uring.pdf)
@ -585,10 +543,6 @@ The user needs to configure SPDK to include io_uring support:
`configure --with-uring`
Support for zoned devices is enabled by default in uring bdev. It can be explicitly disabled as follows:
`configure --with-uring --without-uring-zns`
To create a uring bdev with given filename, bdev name and block size use the `bdev_uring_create` RPC.
`rpc.py bdev_uring_create /path/to/device bdev_u0 512`
@ -597,27 +551,7 @@ To remove a uring bdev use the `bdev_uring_delete` RPC.
`rpc.py bdev_uring_delete bdev_u0`
## xnvme {#bdev_ug_xnvme}
The xnvme bdev module issues I/O to the underlying NVMe devices through various I/O mechanisms
such as libaio, io_uring, Asynchronous IOCTL using io_uring passthrough, POSIX aio, emulated aio etc.
This module requires xNVMe library.
For more information on xNVMe refer to [xNVMe] (https://xnvme.io/docs/latest)
The user needs to configure SPDK to include xNVMe support:
`configure --with-xnvme`
To create a xnvme bdev with given filename, bdev name and I/O mechanism use the `bdev_xnvme_create` RPC.
`rpc.py bdev_xnvme_create /dev/ng0n1 bdev_ng0n1 io_uring_cmd`
To remove a xnvme bdev use the `bdev_xnvme_delete` RPC.
`rpc.py bdev_xnvme_delete bdev_ng0n1`
## Virtio Block {#bdev_config_virtio_blk}
# Virtio Block {#bdev_config_virtio_blk}
The Virtio-Block driver allows creating SPDK bdevs from Virtio-Block devices.
@ -638,7 +572,7 @@ Virtio-Block devices can be removed with the following command
`rpc.py bdev_virtio_detach_controller VirtioBlk0`
## Virtio SCSI {#bdev_config_virtio_scsi}
# Virtio SCSI {#bdev_config_virtio_scsi}
The Virtio-SCSI driver allows creating SPDK block devices from Virtio-SCSI LUNs.
@ -656,30 +590,3 @@ Virtio-SCSI devices can be removed with the following command
`rpc.py bdev_virtio_detach_controller VirtioScsi0`
Removing a Virtio-SCSI device will destroy all its bdevs.
## DAOS bdev {#bdev_config_daos}
DAOS bdev creates SPDK block device on top of DAOS DFS, the name of the bdev defines the file name in DFS namespace.
Note that DAOS container has to be POSIX type, e.g.: ` daos cont create --pool=test-pool --label=test-cont --type=POSIX`
To build SPDK with daos support, daos-devel package has to be installed, please see the setup [guide](https://docs.daos.io/v2.0/).
To enable the module, configure SPDK using `--with-daos` flag.
Running `daos_agent` service on the target machine is required for the SPDK DAOS bdev communication with a DAOS cluster.
The implementation uses the independent pool and container connections per device's channel for the best IO throughput, therefore,
running a target application with multiple cores (`-m [0-7], for example) is highly advisable.
Example command for creating daos bdev:
`rpc.py bdev_daos_create daosdev0 test-pool test-cont 64 4096`
Example command for removing daos bdev:
`rpc.py bdev_daos_delete daosdev0`
To resize a bdev use the bdev_daos_resize command.
`rpc.py bdev_daos_resize daosdev0 8192`
This command will resize the daosdev0 bdev to 8192 MiB.

View File

@ -151,46 +151,13 @@ bdev module. Refer to test/external_code/README.md and @ref so_linking for furth
Block devices are considered virtual if they handle I/O requests by routing
the I/O to other block devices. The canonical example would be a bdev module
that implements RAID. Virtual bdevs are created in the same way as regular
bdevs, but take the one additional step of claiming the bdev.
The module can open the underlying bdevs it wishes to route I/O to using
spdk_bdev_open_ext(), where the string name is provided by the user via an RPC.
To ensure that other consumers do not modify the underlying bdev in an unexpected
way, the virtual bdev should take a claim on the underlying bdev before
reading from or writing to the underlying bdev.
There are two slightly different APIs for taking and releasing claims. The
preferred interface uses `spdk_bdev_module_claim_bdev_desc()`. This method allows
claims that ensure there is a single writer with
`SPDK_BDEV_CLAIM_READ_MANY_WRITE_ONE`, cooperating shared writers with
`SPDK_BDEV_CLAIM_READ_MANY_WRITE_SHARED`, and shared readers that prevent any
writers with `SPDK_BDEV_CLAIM_READ_MANY_WRITE_NONE`. In all cases,
`spdk_bdev_open_ext()` may be used to open the underlying bdev read-only. If a
read-only bdev descriptor successfully claims a bdev with
`SPDK_BDEV_CLAIM_READ_MANY_WRITE_ONE` or `SPDK_BDEV_CLAIM_READ_MANY_WRITE_SHARED`
the bdev descriptor is promoted to read-write.
Any claim that is obtained with `spdk_bdev_module_claim_bdev_desc()` is
automatically released upon closing the bdev descriptor used to obtain the
claim. Shared claims continue to block new incompatible claims and new writers
until the last claim is released.
The non-preferred interface for obtaining a claim allows the caller to obtain
an exclusive writer claim with `spdk_bdev_module_claim_bdev()`. It may be
be released with `spdk_bdev_module_release_bdev()`. If a read-only bdev
descriptor is passed, it is promoted to read-write. NULL may be passed instead
of a bdev descriptor to avoid promotion and to block new writers. New code
should use `spdk_bdev_module_claim_bdev_desc()` with the claim type that is
tailored to the virtual bdev's needs.
The descriptor obtained from the successful spdk_bdev_open_ext() may be used
with spdk_bdev_get_io_channel() to obtain I/O channels for the bdev. This is
likely done in response to the virtual bdev's `get_io_channel` callback.
Channels may be obtained before and/or after claiming the underlying bdev, but
beware there may be other unknown writers until the underlying bdev has been
claimed.
When a virtual bdev module claims an underlying bdev from its `examine_config`
callback, it causes the `examine_disk` callback to only be called for this
module and any others that establish a shared claim. If no claims are taken by
`examine_config` callbacks, all virtual bdevs' `examine_disk` callbacks are
called.
bdevs, but take one additional step. The module can look up the underlying
bdevs it wishes to route I/O to using spdk_bdev_get_by_name(), where the string
name is provided by the user via an RPC. The module
then may proceed is normal by opening the bdev to obtain a descriptor, and
creating I/O channels for the bdev (probably in response to the
`get_io_channel` callback). The final step is to have the module use its open
descriptor to call spdk_bdev_module_claim_bdev(), indicating that it is
consuming the underlying bdev. This prevents other users from opening
descriptors with write permissions. This effectively 'promotes' the descriptor
to write-exclusive and is an operation only available to bdev modules.

View File

@ -63,7 +63,7 @@ to tear down the bdev library, call spdk_bdev_finish().
All block devices have a simple string name. At any time, a pointer to the
device object can be obtained by calling spdk_bdev_get_by_name(), or the entire
set of bdevs may be iterated using spdk_bdev_first() and spdk_bdev_next() and
their variants or spdk_for_each_bdev() and its variant.
their variants.
Some block devices may also be given aliases, which are also string names.
Aliases behave like symlinks - they can be used interchangeably with the real

View File

@ -1,23 +1,23 @@
# bdevperf {#bdevperf}
# Using bdevperf application {#bdevperf}
## Introduction
bdevperf is an SPDK application used for performance testing
bdevperf is an SPDK application that is used for performance testing
of block devices (bdevs) exposed by the SPDK bdev layer. It is an
alternative to the SPDK bdev fio plugin for benchmarking SPDK bdevs.
In some cases, bdevperf can provide lower overhead than the fio
plugin, resulting in better performance and efficiency for tests
using a limited number of CPU cores.
In some cases, bdevperf can provide much lower overhead than the fio
plugin, resulting in much better performance for tests using a limited
number of CPU cores.
bdevperf exposes command line interface that allows to specify
SPDK framework options as well as testing options.
bdevperf also supports a configuration file format similar
Since SPDK 20.07, bdevperf supports configuration file that is similar
to FIO. It allows user to create jobs parameterized by
filename, cpumask, blocksize, queuesize, etc.
## Config file
bdevperf's config file format is similar to FIO.
Bdevperf's config file is similar to FIO's config file format.
Below is an example config file that uses all available parameters:
@ -73,7 +73,6 @@ length | 100% of bdev size | End I/O at `offset`+`length` on the bdev
rw | | Type of I/O pattern
Available rw types:
- read
- randread
- write

View File

@ -1,6 +1,6 @@
# Blobstore Programmer's Guide {#blob}
## In this document {#blob_pg_toc}
# In this document {#blob_pg_toc}
* @ref blob_pg_audience
* @ref blob_pg_intro
@ -57,17 +57,56 @@ The Blobstore defines a hierarchy of storage abstractions as follows.
Blobstore owns the entire underlying device which is made up of a private Blobstore metadata region and the collection of
blobs as managed by the application.
```text
+-----------------------------------------------------------------+
| Blob |
| +-----------------------------+ +-----------------------------+ |
| | Cluster | | Cluster | |
| | +----+ +----+ +----+ +----+ | | +----+ +----+ +----+ +----+ | |
| | |Page| |Page| |Page| |Page| | | |Page| |Page| |Page| |Page| | |
| | +----+ +----+ +----+ +----+ | | +----+ +----+ +----+ +----+ | |
| +-----------------------------+ +-----------------------------+ |
+-----------------------------------------------------------------+
```
@htmlonly
<div id="blob_hierarchy"></div>
<script>
let elem = document.getElementById('blob_hierarchy');
let canvasWidth = 800;
let canvasHeight = 200;
var two = new Two({ width: 800, height: 200 }).appendTo(elem);
var blobRect = two.makeRectangle(canvasWidth / 2, canvasHeight / 2, canvasWidth, canvasWidth);
blobRect.fill = '#7ED3F7';
var blobText = two.makeText('Blob', canvasWidth / 2, 10, { alignment: 'center'});
for (var i = 0; i < 2; i++) {
let clusterWidth = 400;
let clusterHeight = canvasHeight;
var clusterRect = two.makeRectangle((clusterWidth / 2) + (i * clusterWidth),
clusterHeight / 2,
clusterWidth - 10,
clusterHeight - 50);
clusterRect.fill = '#00AEEF';
var clusterText = two.makeText('Cluster',
(clusterWidth / 2) + (i * clusterWidth),
35,
{ alignment: 'center', fill: 'white' });
for (var j = 0; j < 4; j++) {
let pageWidth = 100;
let pageHeight = canvasHeight;
var pageRect = two.makeRectangle((pageWidth / 2) + (j * pageWidth) + (i * clusterWidth),
pageHeight / 2,
pageWidth - 20,
pageHeight - 100);
pageRect.fill = '#003C71';
var pageText = two.makeText('Page',
(pageWidth / 2) + (j * pageWidth) + (i * clusterWidth),
pageHeight / 2,
{ alignment: 'center', fill: 'white' });
}
}
two.update();
</script>
@endhtmlonly
### Atomicity
@ -129,11 +168,6 @@ Channels are an SPDK-wide abstraction and with Blobstore the best way to think a
required in order to do IO. The application will perform IO to the channel and channels are best thought of as being
associated 1:1 with a thread.
With external snapshots (see @ref blob_pg_esnap_and_esnap_clone), a read from a blob may lead to
reading from the device containing the blobstore or an external snapshot device. To support this,
each blobstore IO channel maintains a tree of channels to be used when reading from external
snapshot devices.
### Blob Identifiers
When an application creates a blob, it does not provide a name as is the case with many other similar
@ -164,9 +198,6 @@ options and their defaults are:
Blobstore found here is appropriate to claim or not. The default is NULL and unless the application is being deployed in
an environment where multiple applications using the same disks are at risk of inadvertently using the wrong Blobstore, there
is no need to set this value. It can, however, be set to any valid set of characters.
* **External Snapshot Device Creation Callback**: If the blobstore supports external snapshots this function will be called
as a blob that clones an external snapshot (an "esnap clone") is opened so that the blobstore consumer can load the external
snapshot and register a blobstore device that will satisfy read requests. See @ref blob_pg_esnap_and_esnap_clone.
### Sub-page Sized Operations
@ -194,7 +225,7 @@ with SPDK API.
### Error Handling
Asynchronous Blobstore callbacks all include an error number that should be checked; non-zero values
indicate an error. Synchronous calls will typically return an error value if applicable.
indicate and error. Synchronous calls will typically return an error value if applicable.
### Asynchronous API
@ -264,23 +295,19 @@ contribute to the Blobstore effort itself.
The Blobstore owns the entire storage device. The device is divided into clusters starting from the beginning, such
that cluster 0 begins at the first logical block.
```text
LBA 0 LBA N
+-----------+-----------+-----+-----------+
| Cluster 0 | Cluster 1 | ... | Cluster N |
+-----------+-----------+-----+-----------+
```
LBA 0 LBA N
+-----------+-----------+-----+-----------+
| Cluster 0 | Cluster 1 | ... | Cluster N |
+-----------+-----------+-----+-----------+
Cluster 0 is special and has the following format, where page 0 is the first page of the cluster:
```text
+--------+-------------------+
| Page 0 | Page 1 ... Page N |
+--------+-------------------+
| Super | Metadata Region |
| Block | |
+--------+-------------------+
```
+--------+-------------------+
| Page 0 | Page 1 ... Page N |
+--------+-------------------+
| Super | Metadata Region |
| Block | |
+--------+-------------------+
The super block is a single page located at the beginning of the partition. It contains basic information about
the Blobstore. The metadata region is the remainder of cluster 0 and may extend to additional clusters. Refer
@ -310,152 +337,6 @@ when creating a blob.
Extents pointing to contiguous LBA are run-length encoded, including unallocated extents represented by 0.
Every new cluster allocation incurs serializing whole linked list of pages for the blob.
### Thin Blobs, Snapshots, and Clones
Each in-use cluster is allocated to blobstore metadata or to a particular blob. Once a cluster is
allocated to a blob it is considered owned by that blob and that particular blob's metadata
maintains a reference to the cluster as a record of ownership. Cluster ownership is transferred
during snapshot operations described later in @ref blob_pg_snapshots.
Through the use of thin provisioning, snapshots, and/or clones, a blob may be backed by clusters it
owns, clusters owned by another blob, or by a zeroes device. The behavior of reads and writes depend
on whether the operation targets blocks that are backed by a cluster owned by the blob or not.
* **read from blocks on an owned cluster**: The read is serviced by reading directly from the
appropriate cluster.
* **read from other blocks**: The read is passed on to the blob's *back device* and the back
device services the read. The back device may be another blob or it may be a zeroes device.
* **write to blocks on an owned cluster**: The write is serviced by writing directly to the
appropriate cluster.
* **write to thin provisioned cluster**: If the back device is the zeroes device and no cluster
is allocated to the blob the process described in @ref blob_pg_thin_provisioning is followed.
* **write to other blocks**: A copy-on-write operation is triggered. See @ref blob_pg_copy_on_write
for details.
External snapshots allow some external data source to act as a snapshot. This allows clones to be
created of data that resides outside of the blobstore containing the clone.
#### Thin Provisioning {#blob_pg_thin_provisioning}
As mentioned in @ref blob_pg_cluster_layout, a blob may be thin provisioned. A thin provisioned blob
starts out with no allocated clusters. Clusters are allocated as writes occur. A thin provisioned
blob's back device is a *zeroes device*. A read from a zeroes device fills the read buffer with
zeroes.
When a thin provisioned volume writes to a block that does not have an allocated cluster, the
following steps are performed:
1. Allocate a cluster.
2. Update blob metadata.
3. Perform the write.
#### Snapshots and Clones {#blob_pg_snapshots}
A snapshot is a read-only blob that may have clones. A snapshot may itself be a clone of one other
blob. While the interface gives the illusion of being able to create many snapshots of a blob, under
the covers this results in a chain of snapshots that are clones of the previous snapshot.
When blob1 is snapshotted, a new read-only blob is created and blob1 becomes a clone of this new
blob. That is:
| Step | Action | State |
| ---- | ------------------------------ | ------------------------------------------------- |
| 1 | Create blob1 | `blob1 (rw)` |
| 2 | Create snapshot blob2 of blob1 | `blob1 (rw) --> blob2 (ro)` |
| 2a | Write to blob1 | `blob1 (rw) --> blob2 (ro)` |
| 3 | Create snapshot blob3 of blob1 | `blob1 (rw) --> blob3 (ro) ---> blob2 (ro)` |
Supposing blob1 was not thin provisioned, step 1 would have allocated clusters needed to perform a
full write of blob1. As blob2 is created in step 2, the ownership of all of blob1's clusters is
transferred to blob2 and blob2 becomes blob1's back device. During step2a, the writes to blob1 cause
one or more clusters to be allocated to blob1. When blob3 is created in step 3, the clusters
allocated in step 2a are given to blob3, blob3's back device becomes blob2, and blob1's back device
becomes blob3.
It is important to understand the chain above when considering strategies to use a golden image from
which many clones are made. The IO path is more efficient if one snapshot is cloned many times than
it is to create a new snapshot for every clone. The following illustrates the difference.
Using a single snapshot means the data originally referenced by the golden image is always one hop
away.
```text
create golden golden --> golden-snap
snapshot golden as golden-snap ^ ^ ^
clone golden-snap as clone1 clone1 ---+ | |
clone golden-snap as clone2 clone2 -----+ |
clone golden-snap as clone3 clone3 -------+
```
Using a snapshot per clone means that the chain of back devices grows with every new snapshot and
clone pair. Reading a block from clone3 may result in a read from clone3's back device (snap3), from
clone2's back device (snap2), then finally clone1's back device (snap1, the current owner of the
blocks originally allocated to golden).
```text
create golden
snapshot golden as snap1 golden --> snap3 -----> snap2 ----> snap1
clone snap1 as clone1 clone3----/ clone2 --/ clone1 --/
snapshot golden as snap2
clone snap2 as clone2
snapshot golden as snap3
clone snap3 as clone3
```
A snapshot with no more than one clone can be deleted. When a snapshot with one clone is deleted,
the clone becomes a regular blob. The clusters owned by the snapshot are transferred to the clone or
freed, depending on whether the clone already owns a cluster for a particular block range.
Removal of the last clone leaves the snapshot in place. This snapshot continues to be read-only and
can serve as the snapshot for future clones.
#### Inflating and Decoupling Clones
A clone can remove its dependence on a snapshot with the following operations:
1. Inflate the clone. Clusters backed by any snapshot or a zeroes device are copied into newly
allocated clusters. The blob becomes a thick provisioned blob.
2. Decouple the clone. Clusters backed by the first back device snapshot are copied into newly
allocated clusters. If the clone's back device snapshot was itself a clone of another
snapshot, the clone remains a clone but is now a clone of a different snapshot.
3. Remove the snapshot. This is only possible if the snapshot has one clone. The end result is
usually the same as decoupling but ownership of clusters is transferred from the snapshot rather
than being copied. If the snapshot that was deleted was itself a clone of another snapshot, the
clone remains a clone, but is now a clone of a different snapshot.
#### External Snapshots and Esnap Clones {#blob_pg_esnap_and_esnap_clone}
A blobstore that is loaded with the `esnap_bs_dev_create` callback defined will support external
snapshots (esnaps). An external snapshot is not useful on its own: it needs to be cloned by a blob.
A clone of an external snapshot is referred to as an *esnap clone*. An esnap clone supports IO and
other operations just like any other clone.
An esnap clone can be recognized in various ways:
* **On disk**: the blob metadata has the `SPDK_BLOB_EXTERNAL_SNAPSHOT` (0x8) bit is set in
`invalid_flags` and an internal XATTR with name `BLOB_EXTERNAL_SNAPSHOT_ID` ("EXTSNAP") exists.
* **In memory**: The `spdk_blob` structure contains the metadata read from disk, `blob->parent_id`
is set to `SPDK_BLOBID_EXTERNAL_SNAPSHOT`, and `blob->back_bs_dev` references a blobstore device
which is not a blob in the same blobstore nor a zeroes device.
#### Copy-on-write {#blob_pg_copy_on_write}
A copy-on-write operation is somewhat expensive, with the cost being proportional to the cluster
size. Typical copy-on-write involves the following steps:
1. Allocate a cluster.
2. Allocate a cluster-sized buffer into which data can be read.
3. Trigger a full-cluster read from the back device into the cluster-sized buffer.
4. Write from the cluster-sized buffer into the newly allocated cluster.
5. Update the blob's on-disk metadata to record ownership of the newly allocated cluster. This
involves at least one page-sized write.
6. Write the new data to the just allocated and copied cluster.
If the source cluster is backed by a zeroes device, steps 2 through 4 are skipped. Alternatively, if
the blobstore resides on a device that can perform the copy on its own, steps 2 through 4 are
offloaded to the device. Neither of these optimizations are available when the back device is an
external snapshot.
### Sequences and Batches
Internally Blobstore uses the concepts of sequences and batches to submit IO to the underlying device in either
@ -470,13 +351,6 @@ of IO. They are an internal construct only and are pre-allocated on a per channe
earlier). They are removed from a channel associated linked list when the set (sequence or batch) is started and
then returned to the list when completed.
Each request set maintains a reference to a `channel` and a `back_channel`. The `channel` is used
for performing IO on the blobstore device. The `back_channel` is used for performing IO on the
blob's back device, `blob->back_bs_dev`. For blobs that are not esnap clones, `channel` and
`back_channel` reference an IO channel used with the device that contains the blobstore. For blobs
that are esnap clones, `channel` is the same as with any other blob and `back_channel` is an IO
channel for the external snapshot device.
### Key Internal Structures
`blobstore.h` contains many of the key structures for the internal workings of Blobstore. Only a few notable ones

View File

@ -1,8 +1,8 @@
# BlobFS (Blobstore Filesystem) {#blobfs}
## BlobFS Getting Started Guide {#blobfs_getting_started}
# BlobFS Getting Started Guide {#blobfs_getting_started}
## RocksDB Integration {#blobfs_rocksdb}
# RocksDB Integration {#blobfs_rocksdb}
Clone and build the SPDK repository as per https://github.com/spdk/spdk
@ -14,30 +14,30 @@ make
~~~
Clone the RocksDB repository from the SPDK GitHub fork into a separate directory.
Make sure you check out the `6.15.fb` branch.
Make sure you check out the `spdk-v5.14.3` branch.
~~~{.sh}
cd ..
git clone -b 6.15.fb https://github.com/spdk/rocksdb.git
git clone -b spdk-v5.14.3 https://github.com/spdk/rocksdb.git
~~~
Build RocksDB. Only the `db_bench` benchmarking tool is integrated with BlobFS.
~~~{.sh}
cd rocksdb
make db_bench SPDK_DIR=relative_path/to/spdk
make db_bench SPDK_DIR=path/to/spdk
~~~
Or you can also add `DEBUG_LEVEL=0` for a release build (need to turn on `USE_RTTI`).
~~~{.sh}
export USE_RTTI=1 && make db_bench DEBUG_LEVEL=0 SPDK_DIR=relative_path/to/spdk
export USE_RTTI=1 && make db_bench DEBUG_LEVEL=0 SPDK_DIR=path/to/spdk
~~~
Create an NVMe section in the configuration file using SPDK's `gen_nvme.sh` script.
~~~{.sh}
scripts/gen_nvme.sh --json-with-subsystems > /usr/local/etc/spdk/rocksdb.json
scripts/gen_nvme.sh > /usr/local/etc/spdk/rocksdb.json
~~~
Verify the configuration file has specified the correct NVMe SSD.
@ -68,7 +68,7 @@ At this point, RocksDB is ready for testing with SPDK. Three `db_bench` paramet
SPDK has a set of scripts which will run `db_bench` against a variety of workloads and capture performance and profiling
data. The primary script is `test/blobfs/rocksdb/rocksdb.sh`.
## FUSE
# FUSE
BlobFS provides a FUSE plug-in to mount an SPDK BlobFS as a kernel filesystem for inspection or debug purposes.
The FUSE plug-in requires fuse3 and will be built automatically when fuse3 is detected on the system.
@ -79,7 +79,7 @@ test/blobfs/fuse/fuse /usr/local/etc/spdk/rocksdb.json Nvme0n1 /mnt/fuse
Note that the FUSE plug-in has some limitations - see the list below.
## Limitations
# Limitations
* BlobFS has primarily been tested with RocksDB so far, so any use cases different from how RocksDB uses a filesystem
may run into issues. BlobFS will be tested in a broader range of use cases after this initial release.

View File

@ -1,7 +0,0 @@
# CI Tools {#ci_tools}
Section describing tools used by CI to verify integrity of the submitted
patches ([status](https://ci.spdk.io)).
- @subpage shfmt
- @subpage distributions

View File

@ -88,7 +88,7 @@ In these examples, the value "X" will represent the special value (2^64-1) descr
### Initial Creation
```text
```
+--------------------+
Backing Device | |
+--------------------+
@ -123,7 +123,7 @@ In these examples, the value "X" will represent the special value (2^64-1) descr
store the 16KB of data.
* Write the chunk map index to entry 2 in the logical map.
```text
```
+--------------------+
Backing Device |01 |
+--------------------+
@ -157,7 +157,7 @@ In these examples, the value "X" will represent the special value (2^64-1) descr
* Write (2, X, X, X) to the chunk map.
* Write the chunk map index to entry 0 in the logical map.
```text
```
+--------------------+
Backing Device |012 |
+--------------------+
@ -205,7 +205,7 @@ In these examples, the value "X" will represent the special value (2^64-1) descr
* Free chunk map 1 back to the free chunk map list.
* Free backing IO unit 2 back to the free backing IO unit list.
```text
```
+--------------------+
Backing Device |01 34 |
+--------------------+

View File

@ -1,6 +1,6 @@
# Message Passing and Concurrency {#concurrency}
## Theory
# Theory
One of the primary aims of SPDK is to scale linearly with the addition of
hardware. This can mean many things in practice. For instance, moving from one
@ -56,7 +56,7 @@ data isn't mutated very often, but is read very frequently, and is often
employed in the I/O path. This of course trades memory size for computational
efficiency, so it is used in only the most critical code paths.
## Message Passing Infrastructure
# Message Passing Infrastructure
SPDK provides several layers of message passing infrastructure. The most
fundamental libraries in SPDK, for instance, don't do any message passing on
@ -110,19 +110,7 @@ repeatedly call `spdk_thread_poll()` on each `spdk_thread()` that exists. This
makes SPDK very portable to a wide variety of asynchronous, event-based
frameworks such as [Seastar](https://www.seastar.io) or [libuv](https://libuv.org/).
## SPDK Spinlocks
There are some cases where locks are used. These should be limited in favor of
the message passing interface described above. When locks are needed,
SPDK spinlocks should be used instead of POSIX locks.
POSIX locks like `pthread_mutex_t` and `pthread_spinlock_t` do not properly
handle locking between SPDK's lightweight threads. SPDK's `spdk_spinlock`
is safe to use in SPDK libraries and applications. This safety comes from
imposing restrictions on when locks can be held. See
[spdk_spinlock](structspdk__spinlock.html) for details.
## The event Framework
# The event Framework
The SPDK project didn't want to officially pick an asynchronous, event-based
framework for all of the example applications it shipped with, in the interest
@ -134,7 +122,7 @@ signal handlers to cleanly shutdown, and basic command line option parsing.
Only established applications should consider directly integrating the lower
level libraries.
## Limitations of the C Language
# Limitations of the C Language
Message passing is efficient, but it results in asynchronous code.
Unfortunately, asynchronous code is a challenge in C. It's often implemented by
@ -152,7 +140,6 @@ function `foo` performs some asynchronous operation and when that completes
function `bar` is called, then function `bar` performs some operation that
calls function `baz` on completion, a good way to write it is as such:
```c
void baz(void *ctx) {
...
}
@ -164,7 +151,6 @@ calls function `baz` on completion, a good way to write it is as such:
void foo(void *ctx) {
async_op(bar, ctx);
}
```
Don't split these functions up - keep them as a nice unit that can be read from bottom to top.
@ -176,7 +162,6 @@ them in C we can still write them out by hand. As an example, here's a
callback chain that performs `foo` 5 times and then calls `bar` - effectively
an asynchronous for loop.
```c
enum states {
FOO_START = 0,
FOO_END,
@ -259,7 +244,6 @@ an asynchronous for loop.
run_state_machine(sm);
}
```
This is complex, of course, but the `run_state_machine` function can be read
from top to bottom to get a clear overview of what's happening in the code

View File

@ -4,13 +4,26 @@ This is a living document as there are many ways to use containers with
SPDK. As new usages are identified and tested, they will be documented
here.
## In this document {#containers_toc}
# In this document {#containers_toc}
* @ref spdk_in_docker
* @ref spdk_docker_suite
* @ref kata_containers_with_spdk_vhost
* @ref spdk_in_docker
## Containerizing an SPDK Application for Docker {#spdk_in_docker}
# Using SPDK vhost target to provide volume service to Kata Containers and Docker {#kata_containers_with_spdk_vhost}
[Kata Containers](https://katacontainers.io) can build a secure container
runtime with lightweight virtual machines that feel and perform like
containers, but provide stronger workload isolation using hardware
virtualization technology as a second layer of defense.
From Kata Containers [1.11.0](https://github.com/kata-containers/runtime/releases/tag/1.11.0),
vhost-user-blk support is enabled in `kata-containers/runtime`. That is to say
SPDK vhost target can be used to provide volume service to Kata Containers directly.
In addition, a container manager like Docker, can be configured easily to launch
a Kata container with an SPDK vhost-user block device. For operating details, visit
Kata containers use-case [Setup to run SPDK vhost-user devices with Kata Containers and Docker](https://github.com/kata-containers/documentation/blob/master/use-cases/using-SPDK-vhostuser-and-kata.md#host-setup-for-vhost-user-devices)
# Containerizing an SPDK Application for Docker {#spdk_in_docker}
There are no SPDK specific changes needed to run an SPDK based application in
a docker container, however this quick start guide should help you as you
@ -65,8 +78,7 @@ Your output should look something like this:
~~~{.sh}
$ sudo docker run --privileged -v //dev//hugepages://dev//hugepages hello:1.0
Starting SPDK v20.01-pre git sha1 80da95481 // DPDK 19.11.0 initialization...
[ DPDK EAL parameters: hello_world -c 0x1 --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --iova-mode=pa
--base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk0 --proc-type=auto ]
[ DPDK EAL parameters: hello_world -c 0x1 --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --iova-mode=pa --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk0 --proc-type=auto ]
EAL: No available hugepages reported in hugepages-1048576kB
Initializing NVMe Controllers
Attaching to 0000:06:00.0
@ -77,25 +89,3 @@ Initialization complete.
INFO: using host memory buffer for IO
Hello world!
~~~
## SPDK Docker suite {#spdk_docker_suite}
When considering how to generate SPDK docker container images formally,
deploy SPDK containers correctly, interact with SPDK container instances,
and orchestrate SPDK container instances, you can get practiced and inspired from
this SPDK docker-compose example:
[SPDK Docker suite](https://github.com/spdk/spdk/blob/master/docker/README.md).
## Using SPDK vhost target to provide volume service to Kata Containers and Docker {#kata_containers_with_spdk_vhost}
[Kata Containers](https://katacontainers.io) can build a secure container
runtime with lightweight virtual machines that feel and perform like
containers, but provide stronger workload isolation using hardware
virtualization technology as a second layer of defense.
From Kata Containers [1.11.0](https://github.com/kata-containers/runtime/releases/tag/1.11.0),
vhost-user-blk support is enabled in `kata-containers/runtime`. That is to say
SPDK vhost target can be used to provide volume service to Kata Containers directly.
In addition, a container manager like Docker, can be configured easily to launch
a Kata container with an SPDK vhost-user block device. For operating details, visit
Kata containers use-case [Setup to run SPDK vhost-user devices with Kata Containers and Docker](https://github.com/kata-containers/documentation/blob/master/use-cases/using-SPDK-vhostuser-and-kata.md#host-setup-for-vhost-user-devices)

View File

@ -1,69 +0,0 @@
# distributions {#distributions}
## In this document {#distros_toc}
* @ref distros_overview
* @ref linux_list
* @ref freebsd_list
## Overview {#distros_overview}
CI pool uses different flavors of `Linux` and `FreeBSD` distributions which are
used as a base for all the tests run against submitted patches. Below is the
listing which covers all currently supported versions and the related CI
jobs (see [status](https://ci.spdk.io) as a reference).
## Linux distributions {#linux_list}
* Fedora: Trying to follow new release as per the release cycle whenever possible.
```list
- autobuild-vg-autotest
- clang-vg-autotest
- iscsi*-vg-autotest
- nvme-vg-autotest
- nvmf*-vg-autotest
- scanbuild-vg-autotest
- unittest-vg-autotest
- vhost-initiator-vg-autotest
```
Jobs listed below are run on bare-metal systems where version of
Fedora may vary. In the future these will be aligned with the
`vg` jobs.
```list
- BlobFS-autotest
- crypto-autotest
- nvme-phy-autotest
- nvmf*-phy-autotest
- vhost-autotest
```
* Ubuntu: Last two LTS releases. Currently `18.04` and `20.04`.
```list
- ubuntu18-vg-autotest
- ubuntu20-vg-autotest
```
* CentOS: Maintained releases. Currently `7.9`. Centos 8.3 is only used for testing on 22.01.x branch.
```list
- centos7-vg-autotest
- centos8-vg-autotest
```
* Rocky Linux: Last release. Currently `8.6`. CentOS 8 replacement.
```list
- rocky8-vg-autotest
```
## FreeBSD distributions {#freebsd_list}
* FreeBSD: Production release. Currently `12.2`.
```list
- freebsd-vg-autotest
```

View File

@ -14,7 +14,7 @@ concurrency.
The event framework public interface is defined in event.h.
## Event Framework Design Considerations {#event_design}
# Event Framework Design Considerations {#event_design}
Simple server applications can be written in a single-threaded fashion. This
allows for straightforward code that can maintain state without any locking or
@ -27,9 +27,9 @@ synchronization. Unfortunately, in many real-world cases, the connections are
not entirely independent and cross-thread shared state is necessary. SPDK
provides an event framework to help solve this problem.
## SPDK Event Framework Components {#event_components}
# SPDK Event Framework Components {#event_components}
### Events {#event_component_events}
## Events {#event_component_events}
To accomplish cross-thread communication while minimizing synchronization
overhead, the framework provides message passing in the form of events. The
@ -45,7 +45,7 @@ asynchronous operations to achieve concurrency. Asynchronous I/O may be issued
with a non-blocking function call, and completion is typically signaled using
a callback function.
### Reactors {#event_component_reactors}
## Reactors {#event_component_reactors}
Each reactor has a lock-free queue for incoming events to that core, and
threads from any core may insert events into the queue of any other core. The
@ -54,7 +54,7 @@ in first-in, first-out order as they are received. Event functions should
never block and should preferably execute very quickly, since they are called
directly from the event loop on the destination core.
### Pollers {#event_component_pollers}
## Pollers {#event_component_pollers}
The framework also defines another type of function called a poller. Pollers
may be registered with the spdk_poller_register() function. Pollers, like
@ -66,18 +66,10 @@ intended to poll hardware as a replacement for interrupts. Normally, pollers
are executed on every iteration of the main event loop. Pollers may also be
scheduled to execute periodically on a timer if low latency is not required.
### Application Framework {#event_component_app}
## Application Framework {#event_component_app}
The framework itself is bundled into a higher level abstraction called an "app". Once
spdk_app_start() is called, it will block the current thread until the application
terminates by calling spdk_app_stop() or an error condition occurs during the
initialization code within spdk_app_start(), itself, before invoking the caller's
supplied function.
### Custom shutdown callback {#event_component_shutdown}
When creating SPDK based application user may add custom shutdown callback which
will be called before the application framework starts the shutdown process.
To do that set shutdown_cb function callback in spdk_app_opts structure passed
to spdk_app_start(). Custom shutdown callback should call spdk_app_stop() before
returning to continue application shutdown process.

View File

@ -1,28 +1,23 @@
# Flash Translation Layer {#ftl}
The Flash Translation Layer library provides efficient 4K block device access on top of devices
with >4K write unit size (eg. raid5f bdev) or devices with large indirection units (some
capacity-focused NAND drives), which don't handle 4K writes well. It handles the logical to
physical address mapping and manages the garbage collection process.
The Flash Translation Layer library provides block device access on top of devices
implementing bdev_zone interface.
It handles the logical to physical address mapping, responds to the asynchronous
media management events, and manages the defragmentation process.
## Terminology {#ftl_terminology}
# Terminology {#ftl_terminology}
### Logical to physical address map {#ftl_l2p}
## Logical to physical address map
- Shorthand: `L2P`
* Shorthand: L2P
Contains the mapping of the logical addresses (LBA) to their on-disk physical location. The LBAs
are contiguous and in range from 0 to the number of surfaced blocks (the number of spare blocks
are calculated during device formation and are subtracted from the available address space). The
spare blocks account for zones going offline throughout the lifespan of the device as well as
provide necessary buffer for data [garbage collection](#ftl_reloc).
provide necessary buffer for data [defragmentation](#ftl_reloc).
Since the L2P would occupy a significant amount of DRAM (4B/LBA for drives smaller than 16TiB,
8B/LBA for bigger drives), FTL will, by default, store only the 2GiB of most recently used L2P
addresses in memory (the amount is configurable), and page them in and out of the cache device
as necessary.
### Band {#ftl_band}
## Band {#ftl_band}
A band describes a collection of zones, each belonging to a different parallel unit. All writes to
a band follow the same pattern - a batch of logical blocks is written to one zone, another batch
@ -32,7 +27,6 @@ well as their validity, as some of the data will be invalidated by subsequent wr
logical address. The L2P mapping can be restored from the SSD by reading this information in order
from the oldest band to the youngest.
```text
+--------------+ +--------------+ +--------------+
band 1 | zone 1 +--------+ zone 1 +---- --- --- --- --- ---+ zone 1 |
+--------------+ +--------------+ +--------------+
@ -48,157 +42,246 @@ from the oldest band to the youngest.
+--------------+ +--------------+ +--------------+
parallel unit 1 pu 2 pu n
```
The address map (`P2L`) is saved as a part of the band's metadata, at the end of each band:
The address map and valid map are, along with a several other things (e.g. UUID of the device it's
part of, number of surfaced LBAs, band's sequence number, etc.), parts of the band's metadata. The
metadata is split in two parts:
```text
band's data tail metadata
head metadata band's data tail metadata
+-------------------+-------------------------------+------------------------+
|zone 1 |...|zone n |...|...|zone 1 |...| | ... |zone m-1 |zone m|
|block 1| |block 1| | |block x| | | |block y |block y|
+-------------------+-------------+-----------------+------------------------+
```
* the head part, containing information already known when opening the band (device's UUID, band's
sequence number, etc.), located at the beginning blocks of the band,
* the tail part, containing the address map and the valid map, located at the end of the band.
Bands are written sequentially (in a way that was described earlier). Before a band can be written
to, all of its zones need to be erased. During that time, the band is considered to be in a `PREP`
state. Then the band moves to the `OPEN` state and actual user data can be written to the
state. After that is done, the band transitions to the `OPENING` state, in which head metadata
is being written. Then the band moves to the `OPEN` state and actual user data can be written to the
band. Once the whole available space is filled, tail metadata is written and the band transitions to
`CLOSING` state. When that finishes the band becomes `CLOSED`.
### Non volatile cache {#ftl_nvcache}
## Ring write buffer {#ftl_rwb}
- Shorthand: `nvcache`
* Shorthand: RWB
Nvcache is a bdev that is used for buffering user writes and storing various metadata.
Nvcache data space is divided into chunks. Chunks are written in sequential manner.
When number of free chunks is below assigned threshold data from fully written chunks
is moved to base_bdev. This process is called chunk compaction.
```text
nvcache
+-----------------------------------------+
|chunk 1 |
| +--------------------------------- + |
| |blk 1 + md| blk 2 + md| blk n + md| |
| +----------------------------------| |
+-----------------------------------------+
Because the smallest write size the SSD may support can be a multiple of block size, in order to
support writes to a single block, the data needs to be buffered. The write buffer is the solution to
this problem. It consists of a number of pre-allocated buffers called batches, each of size allowing
for a single transfer to the SSD. A single batch is divided into block-sized buffer entries.
write buffer
+-----------------------------------+
|batch 1 |
| +-----------------------------+ |
| |rwb |rwb | ... |rwb | |
| |entry 1|entry 2| |entry n| |
| +-----------------------------+ |
+-----------------------------------+
| ... |
+-----------------------------------------+
+-----------------------------------------+
|chunk N |
| +--------------------------------- + |
| |blk 1 + md| blk 2 + md| blk n + md| |
| +----------------------------------| |
+-----------------------------------------+
```
+-----------------------------------+
|batch m |
| +-----------------------------+ |
| |rwb |rwb | ... |rwb | |
| |entry 1|entry 2| |entry n| |
| +-----------------------------+ |
+-----------------------------------+
### Garbage collection and relocation {#ftl_reloc}
When a write is scheduled, it needs to acquire an entry for each of its blocks and copy the data
onto this buffer. Once all blocks are copied, the write can be signalled as completed to the user.
In the meantime, the `rwb` is polled for filled batches and, if one is found, it's sent to the SSD.
After that operation is completed the whole batch can be freed. For the whole time the data is in
the `rwb`, the L2P points at the buffer entry instead of a location on the SSD. This allows for
servicing read requests from the buffer.
- Shorthand: gc, reloc
## Defragmentation and relocation {#ftl_reloc}
* Shorthand: defrag, reloc
Since a write to the same LBA invalidates its previous physical location, some of the blocks on a
band might contain old data that basically wastes space. As there is no way to overwrite an already
written block for a ZNS drive, this data will stay there until the whole zone is reset. This might create a
written block, this data will stay there until the whole zone is reset. This might create a
situation in which all of the bands contain some valid data and no band can be erased, so no writes
can be executed anymore. Therefore a mechanism is needed to move valid data and invalidate whole
bands, so that they can be reused.
```text
band band
+-----------------------------------+ +-----------------------------------+
| ** * * *** * *** * * | | |
|** * * * * * * *| +----> | |
|* *** * * * | | |
+-----------------------------------+ +-----------------------------------+
```
Valid blocks are marked with an asterisk '\*'.
Module responsible for data relocation is called `reloc`. When a band is chosen for garbage collection,
the appropriate blocks are marked as required to be moved. The `reloc` module takes a band that has
some of such blocks marked, checks their validity and, if they're still valid, copies them.
Another reason for data relocation might be an event from the SSD telling us that the data might
become corrupt if it's not relocated. This might happen due to its old age (if it was written a
long time ago) or due to read disturb (media characteristic, that causes corruption of neighbouring
blocks during a read operation).
Choosing a band for garbage collection depends its validity ratio (proportion of valid blocks to all
user blocks). The lower the ratio, the higher the chance the band will be chosen for gc.
Module responsible for data relocation is called `reloc`. When a band is chosen for defragmentation
or a media management event is received, the appropriate blocks are marked as
required to be moved. The `reloc` module takes a band that has some of such blocks marked, checks
their validity and, if they're still valid, copies them.
## Metadata {#ftl_metadata}
Choosing a band for defragmentation depends on several factors: its valid ratio (1) (proportion of
valid blocks to all user blocks), its age (2) (when was it written) and its write count / wear level
index of its zones (3) (how many times the band was written to). The lower the ratio (1), the
higher its age (2) and the lower its write count (3), the higher the chance the band will be chosen
for defrag.
In addition to the [L2P](#ftl_l2p), FTL will store additional metadata both on the cache, as
well as on the base devices. The following types of metadata are persisted:
# Usage {#ftl_usage}
- Superblock - stores the global state of FTL; stored on cache, mirrored to the base device
## Prerequisites {#ftl_prereq}
- L2P - see the [L2P](#ftl_l2p) section for details
In order to use the FTL module, a device capable of zoned interface is required e.g. `zone_block`
bdev or OCSSD `nvme` bdev.
- Band - stores the state of bands - write pointers, their OPEN/FREE/CLOSE state; stored on cache, mirrored to a different section of the cache device
- Valid map - bitmask of all the valid physical addresses, used for improving [relocation](#ftl_reloc)
- Chunk - stores the state of chunks - write pointers, their OPEN/FREE/CLOSE state; stored on cache, mirrored to a different section of the cache device
- P2L - stores the address mapping (P2L, see [band](#ftl_band)) of currently open bands. This allows for the recovery of open
bands after dirty shutdown without needing VSS DIX metadata on the base device; stored on the cache device
- Trim - stores information about unmapped (trimmed) LBAs; stored on cache, mirrored to a different section of the cache device
## Dirty shutdown recovery {#ftl_dirty_shutdown}
After power failure, FTL needs to rebuild the whole L2P using the address maps (`P2L`) stored within each band/chunk.
This needs to done, because while individual L2P pages may have been paged out and persisted to the cache device,
there's no way to tell which, if any, pages were dirty before the power failure occurred. The P2L consists of not only
the mapping itself, but also a sequence id (`seq_id`), which describes the relative age of a given logical block
(multiple writes to the same logical block would produce the same amount of P2L entries, only the last one having the current data).
FTL will therefore rebuild the whole L2P by reading the P2L of all closed bands and chunks. For open bands, the P2L is stored on
the cache device, in a separate metadata region (see [the P2L section](#ftl_metadata)). Open chunks can be restored thanks to storing
the mapping in the VSS DIX metadata, which the cache device must be formatted with.
### Shared memory recovery {#ftl_shm_recovery}
In order to shorten the recovery after crash of the target application, FTL also stores its metadata in shared memory (`shm`) - this
allows it to keep track of the dirty-ness state of individual pages and shortens the recovery time dramatically, as FTL will only
need to mark any potential L2P pages which were paging out at the time of the crash as dirty and reissue the writes. There's no need
to read the whole P2L in this case.
### Trim {#ftl_trim}
Due to metadata size constraints and the difficulty of maintaining consistent data returned before and after dirty shutdown, FTL
currently only allows for trims (unmaps) aligned to 4MiB (alignment concerns both the offset and length of the trim command).
## Usage {#ftl_usage}
### Prerequisites {#ftl_prereq}
In order to use the FTL module, a cache device formatted with VSS DIX metadata is required.
### FTL bdev creation {#ftl_create}
## FTL bdev creation {#ftl_create}
Similar to other bdevs, the FTL bdevs can be created either based on JSON config files or via RPC.
Both interfaces require the same arguments which are described by the `--help` option of the
`bdev_ftl_create` RPC call, which are:
- bdev's name
- base bdev's name
- cache bdev's name (cache bdev must support VSS DIX mode - could be emulated by providing SPDK_FTL_VSS_EMU=1 flag to make;
emulating VSS should be done for testing purposes only, it is not power-fail safe)
- UUID of the FTL device (if the FTL is to be restored from the SSD)
- bdev's name
- base bdev's name (base bdev must implement bdev_zone API)
- UUID of the FTL device (if the FTL is to be restored from the SSD)
## FTL bdev stack {#ftl_bdev_stack}
## FTL usage with OCSSD nvme bdev {#ftl_ocssd}
This option requires an Open Channel SSD, which can be emulated using QEMU.
The QEMU with the patches providing Open Channel support can be found on the SPDK's QEMU fork
on [spdk-3.0.0](https://github.com/spdk/qemu/tree/spdk-3.0.0) branch.
## Configuring QEMU {#ftl_qemu_config}
To emulate an Open Channel device, QEMU expects parameters describing the characteristics and
geometry of the SSD:
- `serial` - serial number,
- `lver` - version of the OCSSD standard (0 - disabled, 1 - "1.2", 2 - "2.0"), libftl only supports
2.0,
- `lba_index` - default LBA format. Possible values can be found in the table below (libftl only supports lba_index >= 3):
- `lnum_ch` - number of groups,
- `lnum_lun` - number of parallel units
- `lnum_pln` - number of planes (logical blocks from all planes constitute a chunk)
- `lpgs_per_blk` - number of pages (smallest programmable unit) per chunk
- `lsecs_per_pg` - number of sectors in a page
- `lblks_per_pln` - number of chunks in a parallel unit
- `laer_thread_sleep` - timeout in ms between asynchronous events requesting the host to relocate
the data based on media feedback
- `lmetadata` - metadata file
|lba_index| data| metadata|
|---------|-----|---------|
| 0 | 512B| 0B |
| 1 | 512B| 8B |
| 2 | 512B| 16B |
| 3 |4096B| 0B |
| 4 |4096B| 64B |
| 5 |4096B| 128B |
| 6 |4096B| 16B |
For more detailed description of the available options, consult the `hw/block/nvme.c` file in
the QEMU repository.
Example:
```
$ /path/to/qemu [OTHER PARAMETERS] -drive format=raw,file=/path/to/data/file,if=none,id=myocssd0
-device nvme,drive=myocssd0,serial=deadbeef,lver=2,lba_index=3,lnum_ch=1,lnum_lun=8,lnum_pln=4,
lpgs_per_blk=1536,lsecs_per_pg=4,lblks_per_pln=512,lmetadata=/path/to/md/file
```
In the above example, a device is created with 1 channel, 8 parallel units, 512 chunks per parallel
unit, 24576 (`lnum_pln` * `lpgs_per_blk` * `lsecs_per_pg`) logical blocks in each chunk with logical
block being 4096B. Therefore the data file needs to be at least 384G (8 * 512 * 24576 * 4096B) of
size and can be created with the following command:
```
fallocate -l 384G /path/to/data/file
```
## Configuring SPDK {#ftl_spdk_config}
To verify that the drive is emulated correctly, one can check the output of the NVMe identify app
(assuming that `scripts/setup.sh` was called before and the driver has been changed for that
device):
```
$ build/examples/identify
=====================================================
NVMe Controller at 0000:00:0a.0 [1d1d:1f1f]
=====================================================
Controller Capabilities/Features
================================
Vendor ID: 1d1d
Subsystem Vendor ID: 1af4
Serial Number: deadbeef
Model Number: QEMU NVMe Ctrl
... other info ...
Namespace OCSSD Geometry
=======================
OC version: maj:2 min:0
... other info ...
Groups (channels): 1
PUs (LUNs) per group: 8
Chunks per LUN: 512
Logical blks per chunk: 24576
... other info ...
```
In order to create FTL on top Open Channel SSD, the following steps are required:
1) Attach OCSSD NVMe controller
2) Create OCSSD bdev on the controller attached in step 1 (user could specify parallel unit range
and create multiple OCSSD bdevs on single OCSSD NVMe controller)
3) Create FTL bdev on top of bdev created in step 2
Example:
```
$ scripts/rpc.py bdev_nvme_attach_controller -b nvme0 -a 00:0a.0 -t pcie
$ scripts/rpc.py bdev_ocssd_create -c nvme0 -b nvme0n1
nvme0n1
$ scripts/rpc.py bdev_ftl_create -b ftl0 -d nvme0n1
{
"name": "ftl0",
"uuid": "3b469565-1fa5-4bfb-8341-747ec9fca9b9"
}
```
## FTL usage with zone block bdev {#ftl_zone_block}
Zone block bdev is a bdev adapter between regular `bdev` and `bdev_zone`. It emulates a zoned
interface on top of a regular block device.
In order to create FTL on top of a regular bdev:
1) Create regular bdev e.g. `bdev_nvme`, `bdev_null`, `bdev_malloc`
2) Create second regular bdev for nvcache
3) Create FTL bdev on top of bdev created in step 1 and step 2
2) Create zone block bdev on top of a regular bdev created in step 1 (user could specify zone capacity
and optimal number of open zones)
3) Create FTL bdev on top of bdev created in step 2
Example:
```
$ scripts/rpc.py bdev_nvme_attach_controller -b nvme0 -a 00:05.0 -t pcie
nvme0n1
$ scripts/rpc.py bdev_nvme_attach_controller -b nvme1 -a 00:06.0 -t pcie
nvme1n1
$ scripts/rpc.py bdev_zone_block_create -b zone1 -n nvme0n1 -z 4096 -o 32
zone1
$ scripts/rpc.py bdev_ftl_create -b ftl0 -d nvme0n1 -c nvme1n1
$ scripts/rpc.py bdev_ftl_create -b ftl0 -d zone1
{
"name": "ftl0",
"uuid": "3b469565-1fa5-4bfb-8341-747ec9f3a9b9"

View File

@ -1,6 +1,6 @@
# GDB Macros User Guide {#gdb_macros}
## Introduction
# Introduction
When debugging an spdk application using gdb we may need to view data structures
in lists, e.g. information about bdevs or threads.
@ -125,55 +125,7 @@ nqn "nqn.2016-06.io.spdk.umgmt:cnode1", '\000' <repeats 191 times>
ID 1
~~~
Printing SPDK spinlocks:
In this example, the spinlock has been initialized and locked but has never been unlocked.
After it is unlocked the first time the last unlocked stack will be present and the
`Locked by spdk_thread` line will say `not locked`.
~~~{.sh}
Breakpoint 2, spdk_spin_unlock (sspin=0x655110 <g_bdev_mgr+80>) at thread.c:2915
2915 struct spdk_thread *thread = spdk_get_thread();
(gdb) print *sspin
$2 = struct spdk_spinlock:
Locked by spdk_thread: 0x658080
Initialized at:
0x43e677 <spdk_spin_init+213> thread.c:2878
0x404feb <_bdev_init+16> /build/spdk/spdk-review-public/lib/bdev/bdev.c:116
0x44483d <__libc_csu_init+77>
0x7ffff62c9d18 <__libc_start_main+120>
0x40268e <_start+46>
Last locked at:
0x43e936 <spdk_spin_lock+436> thread.c:2909
0x40ca9c <bdev_name_add+129> /build/spdk/spdk-review-public/lib/bdev/bdev.c:3855
0x411a3c <bdev_register+641> /build/spdk/spdk-review-public/lib/bdev/bdev.c:6660
0x412e1e <spdk_bdev_register+24> /build/spdk/spdk-review-public/lib/bdev/bdev.c:7171
0x417895 <num_blocks_test+119> bdev_ut.c:878
0x7ffff7bc38cb <run_single_test.constprop+379>
0x7ffff7bc3b61 <run_single_suite.constprop+433>
0x7ffff7bc3f76 <CU_run_all_tests+118>
0x43351f <main+1439> bdev_ut.c:6295
0x7ffff62c9d85 <__libc_start_main+229>
0x40268e <_start+46>
Last unlocked at:
~~~
Print a single spinlock stack:
~~~{.sh}
(gdb) print sspin->internal.lock_stack
$1 = struct sspin_stack:
0x40c6a1 <spdk_spin_lock+436> /build/spdk/spdk-review-public/lib/thread/thread.c:2909
0x413f48 <spdk_spin+552> thread_ut.c:1831
0x7ffff7bc38cb <run_single_test.constprop+379>
0x7ffff7bc3b61 <run_single_suite.constprop+433>
0x7ffff7bc3f76 <CU_run_all_tests+118>
0x4148fa <main+547> thread_ut.c:1948
0x7ffff62c9d85 <__libc_start_main+229>
0x40248e <_start+46>
~~~
## Loading The gdb Macros
# Loading The gdb Macros
Copy the gdb macros to the host where you are about to debug.
It is best to copy the file either to somewhere within the PYTHONPATH, or to add
@ -194,7 +146,7 @@ the PYTHONPATH, so I had to manually add the directory to the path.
(gdb) spdk_load_macros
~~~
## Using the gdb Data Directory
# Using the gdb Data Directory
On most systems, the data directory is /usr/share/gdb. The python script should
be copied into the python/gdb/function (or python/gdb/command) directory under
@ -203,12 +155,12 @@ the data directory, e.g. /usr/share/gdb/python/gdb/function.
If the python script is in there, then the only thing you need to do when
starting gdb is type "spdk_load_macros".
## Using .gdbinit To Load The Macros
# Using .gdbinit To Load The Macros
.gdbinit can also be used in order to run automatically run the manual steps
above prior to starting gdb.
Example .gdbinit:
Exmaple .gdbinit:
~~~{.sh}
source /opt/km/install/tools/gdb_macros/gdb_macros.py
@ -216,14 +168,14 @@ source /opt/km/install/tools/gdb_macros/gdb_macros.py
When starting gdb you still have to call spdk_load_macros.
## Why Do We Need to Explicitly Call spdk_load_macros
# Why Do We Need to Explicitly Call spdk_load_macros
The reason is that the macros need to use globals provided by spdk in order to
iterate the spdk lists and build iterable representations of the list objects.
This will result in errors if these are not available which is very possible if
gdb is used for reasons other than debugging spdk core dumps.
In the example below, I attempted to load the macros when the globals are not
In the example bellow, I attempted to load the macros when the globals are not
available causing gdb to fail loading the gdb_macros:
~~~{.sh}
@ -244,7 +196,7 @@ Error occurred in Python command: No symbol table is loaded. Use the "file"
command.
~~~
## Macros available
# Macros available
- spdk_load_macros: load the macros (use --reload in order to reload them)
- spdk_print_bdevs: information about bdevs
@ -253,7 +205,7 @@ command.
- spdk_print_nvmf_subsystems: information about nvmf subsystems
- spdk_print_threads: information about threads
## Adding New Macros
# Adding New Macros
The list iteration macros are usually built from 3 layers:

View File

@ -1,6 +1,5 @@
# General Information {#general}
- @subpage event
- @subpage scheduler
- @subpage logical_volumes
- @subpage accel_fw

View File

@ -1,12 +1,14 @@
# Getting Started {#getting_started}
## Getting the Source Code {#getting_started_source}
# Getting the Source Code {#getting_started_source}
~~~{.sh}
git clone https://github.com/spdk/spdk --recursive
git clone https://github.com/spdk/spdk
cd spdk
git submodule update --init
~~~
## Installing Prerequisites {#getting_started_prerequisites}
# Installing Prerequisites {#getting_started_prerequisites}
The `scripts/pkgdep.sh` script will automatically install the bare minimum
dependencies required to build SPDK.
@ -22,7 +24,7 @@ Option --all will install all dependencies needed by SPDK features.
sudo scripts/pkgdep.sh --all
~~~
## Building {#getting_started_building}
# Building {#getting_started_building}
Linux:
@ -55,7 +57,7 @@ can enable it by doing the following:
make
~~~
## Running the Unit Tests {#getting_started_unittests}
# Running the Unit Tests {#getting_started_unittests}
It's always a good idea to confirm your build worked by running the
unit tests.
@ -68,7 +70,7 @@ You will see several error messages when running the unit tests, but they are
part of the test suite. The final message at the end of the script indicates
success or failure.
## Running the Example Applications {#getting_started_examples}
# Running the Example Applications {#getting_started_examples}
Before running an SPDK application, some hugepages must be allocated and
any NVMe and I/OAT devices must be unbound from the native kernel drivers.

View File

@ -1,23 +1,28 @@
# IDXD Driver {#idxd}
## Public Interface {#idxd_interface}
# Public Interface {#idxd_interface}
- spdk/idxd.h
## Key Functions {#idxd_key_functions}
# Key Functions {#idxd_key_functions}
Function | Description
--------------------------------------- | -----------
spdk_idxd_probe() | @copybrief spdk_idxd_probe()
spdk_idxd_batch_get_max() | @copybrief spdk_idxd_batch_get_max()
spdk_idxd_batch_create() | @copybrief spdk_idxd_batch_create()
spdk_idxd_batch_prep_copy() | @copybrief spdk_idxd_batch_prep_copy()
spdk_idxd_batch_submit() | @copybrief spdk_idxd_batch_submit()
spdk_idxd_submit_copy() | @copybrief spdk_idxd_submit_copy()
spdk_idxd_submit_compare() | @copybrief spdk_idxd_submit_compare()
spdk_idxd_submit_crc32c() | @copybrief spdk_idxd_submit_crc32c()
spdk_idxd_submit_dualcast | @copybrief spdk_idxd_submit_dualcast()
spdk_idxd_submit_fill() | @copybrief spdk_idxd_submit_fill()
## Kernel vs User {#idxd_configs}
# Pre-defined configurations {#idxd_configs}
The low level library can be initialized either directly via `spdk_idxd_set_config`,
passing in a value of `true` indicates that the IDXD kernel driver is loaded and
that SPDK will use work queue(s) surfaced by the driver. Passing in `false` means
that the SPDK user space driver will be used to initialize the hardware.
The RPC `idxd_scan_accel_engine` is used to both enable IDXD and set it's
configuration to one of two pre-defined configs:
Config #0: 4 groups, 1 work queue per group, 1 engine per group.
Config #1: 2 groups, 2 work queues per group, 2 engines per group.

View File

@ -1,673 +0,0 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
width="181.24mm"
height="79.375mm"
version="1.1"
viewBox="0 0 181.24 79.375"
id="svg172"
sodipodi:docname="lvol_esnap_clone.svg"
inkscape:version="1.2.2 (b0a8486541, 2022-12-01)"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<sodipodi:namedview
id="namedview174"
pagecolor="#ffffff"
bordercolor="#000000"
borderopacity="0.25"
inkscape:showpageshadow="2"
inkscape:pageopacity="0.0"
inkscape:pagecheckerboard="0"
inkscape:deskcolor="#d1d1d1"
inkscape:document-units="mm"
showgrid="false"
inkscape:zoom="1.7926966"
inkscape:cx="338.59607"
inkscape:cy="148.93764"
inkscape:window-width="1351"
inkscape:window-height="930"
inkscape:window-x="762"
inkscape:window-y="134"
inkscape:window-maximized="0"
inkscape:current-layer="g170" />
<title
id="title2">Thin Provisioning</title>
<defs
id="defs28">
<marker
id="marker2036"
overflow="visible"
orient="auto">
<path
transform="matrix(-.4 0 0 -.4 -4 0)"
d="m0 0 5-5-17.5 5 17.5 5z"
fill-rule="evenodd"
stroke="#000"
stroke-width="1pt"
id="path4" />
</marker>
<marker
id="marker1960"
overflow="visible"
orient="auto">
<path
transform="matrix(-.4 0 0 -.4 -4 0)"
d="m0 0 5-5-17.5 5 17.5 5z"
fill-rule="evenodd"
stroke="#000"
stroke-width="1pt"
id="path7" />
</marker>
<marker
id="marker1890"
overflow="visible"
orient="auto">
<path
transform="matrix(-.4 0 0 -.4 -4 0)"
d="m0 0 5-5-17.5 5 17.5 5z"
fill-rule="evenodd"
stroke="#000"
stroke-width="1pt"
id="path10" />
</marker>
<marker
id="marker1826"
overflow="visible"
orient="auto">
<path
transform="matrix(-.4 0 0 -.4 -4 0)"
d="m0 0 5-5-17.5 5 17.5 5z"
fill-rule="evenodd"
stroke="#000"
stroke-width="1pt"
id="path13" />
</marker>
<marker
id="marker1816"
overflow="visible"
orient="auto">
<path
transform="matrix(-.4 0 0 -.4 -4 0)"
d="m0 0 5-5-17.5 5 17.5 5z"
fill-rule="evenodd"
stroke="#000"
stroke-width="1pt"
id="path16" />
</marker>
<marker
id="Arrow1Mend"
overflow="visible"
orient="auto">
<path
transform="matrix(-.4 0 0 -.4 -4 0)"
d="m0 0 5-5-17.5 5 17.5 5z"
fill-rule="evenodd"
stroke="#000"
stroke-width="1pt"
id="path19" />
</marker>
<marker
id="marker11771-4-9"
overflow="visible"
orient="auto">
<path
transform="matrix(-.4 0 0 -.4 -4 0)"
d="m0 0 5-5-17.5 5 17.5 5z"
fill="#f00"
fill-rule="evenodd"
stroke="#ff2a2a"
stroke-width="1pt"
id="path22" />
</marker>
<marker
id="marker1826-2-4-7-1-7"
overflow="visible"
orient="auto">
<path
transform="matrix(-.4 0 0 -.4 -4 0)"
d="m0 0 5-5-17.5 5 17.5 5z"
fill="#00f"
fill-rule="evenodd"
stroke="#00f"
stroke-width="1pt"
id="path25" />
</marker>
</defs>
<metadata
id="metadata30">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
<dc:title>Thin Provisioning</dc:title>
</cc:Work>
</rdf:RDF>
</metadata>
<g
transform="translate(2.6458 2.3956)"
id="g34">
<rect
x="-2.6458"
y="-2.3956"
width="181.24"
height="79.375"
fill="#fffffe"
stroke-width=".26458"
id="rect32" />
</g>
<g
transform="translate(-3.9688 -4.6356)"
id="g170">
<g
stroke="#000"
id="g52">
<g
stroke-width=".26458"
id="g48">
<rect
x="44.979"
y="32.417"
width="22.49"
height="6.6146"
fill="none"
stroke-dasharray="0.52916663, 0.52916663"
id="rect36" />
<rect
x="67.469"
y="32.417"
width="22.49"
height="6.6146"
fill="#d7d7f4"
id="rect38" />
<rect
x="89.958"
y="32.417"
width="22.49"
height="6.6146"
fill="#d7d7f4"
id="rect40" />
<rect
x="112.45"
y="32.417"
width="22.49"
height="6.6146"
fill="none"
stroke-dasharray="0.52916663, 0.52916663"
id="rect42" />
<rect
x="134.94"
y="32.417"
width="22.49"
height="6.6146"
fill="none"
stroke-dasharray="0.52916663, 0.52916663"
id="rect44" />
<rect
x="157.43"
y="32.417"
width="22.49"
height="6.6146"
fill="#d7d7f4"
id="rect46" />
</g>
<rect
x="44.979"
y="46.969"
width="22.49"
height="6.6146"
fill="#f4d7d7"
stroke-dasharray="0.52999997, 0.26499999"
stroke-width=".265"
id="rect50" />
</g>
<text
x="56.412949"
y="51.598957"
fill="#000000"
font-family="sans-serif"
font-size="10.583px"
letter-spacing="0px"
stroke-width="0.26458"
word-spacing="0px"
style="line-height:1.25"
xml:space="preserve"
id="text56"><tspan
x="56.412949"
y="51.598957"
font-family="sans-serif"
font-size="3.5278px"
stroke-width="0.26458"
text-align="center"
text-anchor="middle"
style="font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
id="tspan54">26f9a7...</tspan></text>
<rect
x="67.469"
y="46.969"
width="22.49"
height="6.6146"
fill="#f4d7d7"
stroke="#000"
stroke-dasharray="0.52999997, 0.26499999"
stroke-width=".265"
id="rect58" />
<text
x="78.902527"
y="51.598961"
fill="#000000"
font-family="sans-serif"
font-size="10.583px"
letter-spacing="0px"
stroke-width="0.26458"
word-spacing="0px"
style="line-height:1.25"
xml:space="preserve"
id="text62"><tspan
x="78.902527"
y="51.598961"
font-family="sans-serif"
font-size="3.5278px"
stroke-width="0.26458"
text-align="center"
text-anchor="middle"
style="font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
id="tspan60">b44ab3...</tspan></text>
<rect
x="89.958"
y="46.969"
width="22.49"
height="6.6146"
fill="#f4d7d7"
stroke="#000"
stroke-dasharray="0.52999997, 0.26499999"
stroke-width=".265"
id="rect64" />
<text
x="101.39211"
y="51.598961"
fill="#000000"
font-family="sans-serif"
font-size="10.583px"
letter-spacing="0px"
stroke-width="0.26458"
word-spacing="0px"
style="line-height:1.25"
xml:space="preserve"
id="text68"><tspan
x="101.39211"
y="51.598961"
font-family="sans-serif"
font-size="3.5278px"
stroke-width="0.26458"
text-align="center"
text-anchor="middle"
style="font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
id="tspan66">ee5593...</tspan></text>
<rect
x="112.45"
y="46.969"
width="22.49"
height="6.6146"
fill="#f4d7d7"
stroke="#000"
stroke-dasharray="0.52999997, 0.26499999"
stroke-width=".265"
id="rect70" />
<text
x="123.88169"
y="51.598961"
fill="#000000"
font-family="sans-serif"
font-size="10.583px"
letter-spacing="0px"
stroke-width="0.26458"
word-spacing="0px"
style="line-height:1.25"
xml:space="preserve"
id="text74"><tspan
x="123.88169"
y="51.598961"
font-family="sans-serif"
font-size="3.5278px"
stroke-width="0.26458"
text-align="center"
text-anchor="middle"
style="font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
id="tspan72">7a3bfe...</tspan></text>
<rect
x="134.94"
y="46.969"
width="22.49"
height="6.6146"
fill="#f4d7d7"
stroke="#000"
stroke-dasharray="0.52999997, 0.26499999"
stroke-width=".265"
id="rect76" />
<text
x="146.37128"
y="51.598957"
fill="#000000"
font-family="sans-serif"
font-size="10.583px"
letter-spacing="0px"
stroke-width="0.26458"
word-spacing="0px"
style="line-height:1.25"
xml:space="preserve"
id="text80"><tspan
x="146.37128"
y="51.598957"
font-family="sans-serif"
font-size="3.5278px"
stroke-width="0.26458"
text-align="center"
text-anchor="middle"
style="font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
id="tspan78">8f4e15...</tspan></text>
<rect
x="157.43"
y="46.969"
width="22.49"
height="6.6146"
fill="#f4d7d7"
stroke="#000"
stroke-dasharray="0.52999997, 0.26499999"
stroke-width=".265"
id="rect82" />
<g
font-family="sans-serif"
letter-spacing="0px"
stroke-width=".26458"
word-spacing="0px"
id="g98">
<text
x="168.86086"
y="51.598961"
font-size="10.583px"
style="line-height:1.25"
xml:space="preserve"
id="text86"><tspan
x="168.86086"
y="51.598961"
font-family="sans-serif"
font-size="3.5278px"
stroke-width="0.26458"
text-align="center"
text-anchor="middle"
style="font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
id="tspan84">40c285...</tspan></text>
<text
x="6.6430736"
y="51.680019"
font-size="3.5278px"
style="line-height:1.25;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
xml:space="preserve"
id="text90"><tspan
x="6.6430736"
y="51.680019"
stroke-width="0.26458"
id="tspan88">read-only bdev</tspan></text>
<text
x="6.6296382"
y="12.539818"
font-size="3.5278px"
style="line-height:1.25;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
xml:space="preserve"
id="text96"><tspan
sodipodi:role="line"
id="tspan436"
x="6.6296382"
y="12.539818">esnap clone</tspan><tspan
sodipodi:role="line"
x="6.6296382"
y="16.949568"
id="tspan440">Volume</tspan><tspan
sodipodi:role="line"
id="tspan438"
x="6.6296382"
y="21.359318" /></text>
</g>
<g
stroke="#000"
id="g118">
<path
d="m6.6146 24.479 173.3 1e-6"
fill="none"
stroke-dasharray="1.59, 1.59"
stroke-width=".265"
id="path100" />
<g
fill="#f4d7d7"
stroke-dasharray="0.52916663, 0.26458332"
stroke-width=".26458"
id="g108">
<rect
x="44.979"
y="9.9271"
width="22.49"
height="6.6146"
id="rect102" />
<rect
x="112.45"
y="9.9271"
width="22.49"
height="6.6146"
id="rect104" />
<rect
x="134.94"
y="9.9271"
width="22.49"
height="6.6146"
id="rect106" />
</g>
<g
fill="#d7d7f4"
stroke-width=".26458"
id="g116">
<rect
x="67.469"
y="9.9271"
width="22.49"
height="6.6146"
id="rect110" />
<rect
x="89.958"
y="9.9271"
width="22.49"
height="6.6146"
id="rect112" />
<rect
x="157.43"
y="9.9271"
width="22.49"
height="6.6146"
id="rect114" />
</g>
</g>
<text
x="6.614583"
y="37.708332"
fill="#000000"
font-family="sans-serif"
font-size="3.5278px"
letter-spacing="0px"
stroke-width=".26458"
word-spacing="0px"
style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25"
xml:space="preserve"
id="text122"><tspan
x="6.614583"
y="37.708332"
stroke-width=".26458"
id="tspan120">active clusters</tspan></text>
<rect
x="37.042"
y="7.2812"
width="145.52"
height="11.906"
ry="1.3229"
fill="none"
stroke="#999"
stroke-width=".5"
id="rect124" />
<rect
x="37.042"
y="29.771"
width="145.52"
height="26.458"
ry="1.3229"
fill="none"
stroke="#999"
stroke-width=".5"
id="rect126" />
<g
fill="#00f"
stroke="#00f"
id="g144">
<g
stroke-width=".26458"
id="g140">
<path
d="m78.052 16.542v15.875"
marker-end="url(#marker1960)"
id="path128" />
<path
d="m55.562 16.542v30.427"
marker-end="url(#marker2036)"
id="path130" />
<path
d="m100.54 16.542v15.875"
marker-end="url(#marker1890)"
id="path132" />
<path
d="m169.33 16.542v15.875"
marker-end="url(#Arrow1Mend)"
id="path134" />
<path
d="m124.35 16.542v30.427"
marker-end="url(#marker1826)"
id="path136" />
<path
d="m146.84 16.542v30.427"
marker-end="url(#marker1816)"
id="path138" />
</g>
<path
d="m132.29 61.521 10.583 1e-5"
marker-end="url(#marker1826-2-4-7-1-7)"
stroke-width=".265"
id="path142" />
</g>
<path
d="m132.29 66.813h10.583"
fill="#f00"
marker-end="url(#marker11771-4-9)"
stroke="#ff2a2a"
stroke-width=".265"
id="path146" />
<g
stroke-width=".26458"
id="g162">
<text
x="145.52083"
y="62.843975"
fill="#000000"
font-family="sans-serif"
font-size="3.5278px"
letter-spacing="0px"
word-spacing="0px"
style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25"
xml:space="preserve"
id="text150"><tspan
x="145.52083"
y="62.843975"
font-family="sans-serif"
font-size="2.8222px"
stroke-width=".26458"
style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal"
id="tspan148">read</tspan></text>
<text
x="145.52083"
y="68.135651"
fill="#000000"
font-family="sans-serif"
font-size="3.5278px"
letter-spacing="0px"
word-spacing="0px"
style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25"
xml:space="preserve"
id="text154"><tspan
x="145.52083"
y="68.135651"
font-family="sans-serif"
font-size="2.8222px"
stroke-width=".26458"
style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal"
id="tspan152">allocate and copy cluster</tspan></text>
<rect
x="132.29"
y="70.781"
width="10.583"
height="2.6458"
fill="none"
stroke="#000"
stroke-dasharray="0.52916664, 0.52916664"
id="rect156" />
<text
x="145.52083"
y="73.427307"
fill="#000000"
font-family="sans-serif"
font-size="3.5278px"
letter-spacing="0px"
word-spacing="0px"
style="line-height:1.25;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
xml:space="preserve"
id="text160"><tspan
x="145.52083"
y="73.427307"
font-family="sans-serif"
font-size="2.8222px"
stroke-width="0.26458"
style="font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
id="tspan158">external snapshot cluster</tspan></text>
</g>
<rect
x="132.29"
y="76.073"
width="10.583"
height="2.6458"
fill="none"
stroke="#000"
stroke-width=".265"
id="rect164" />
<text
x="145.52083"
y="78.718971"
fill="#000000"
font-family="sans-serif"
font-size="3.5278px"
letter-spacing="0px"
stroke-width=".26458"
word-spacing="0px"
style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25"
xml:space="preserve"
id="text168"><tspan
x="145.52083"
y="78.718971"
font-family="sans-serif"
font-size="2.8222px"
stroke-width=".26458"
style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal"
id="tspan166">allocated cluster</tspan></text>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

View File

@ -1,41 +0,0 @@
<?xml version="1.0"?>
<svg width="680" height="420" xmlns="http://www.w3.org/2000/svg" xmlns:svg="http://www.w3.org/2000/svg">
<!-- Created with SVG-edit - https://github.com/SVG-Edit/svgedit-->
<g class="layer">
<title>Layer 1</title>
<rect fill="#ffffff" height="369" id="svg_1" stroke="#000000" width="635.87" x="22.74" y="26.61"/>
<rect fill="#aaffff" height="0" id="svg_2" stroke="#000000" width="0" x="191.24" y="101.36">Application A</rect>
<rect fill="#aaffff" height="88.96" id="svg_3" stroke="#000000" width="171" x="400.9" y="67.61">ublk Server</rect>
<line fill="none" id="svg_4" stroke="#000000" stroke-dasharray="5,5" stroke-width="2" x1="23.11" x2="660.11" y1="199.03" y2="198.03">ublk Server</line>
<text fill="#000000" font-family="Serif" font-size="21" font-weight="bold" id="svg_5" stroke="#000000" stroke-width="0" text-anchor="middle" transform="matrix(1 0 0 1 0 0)" x="488.28" xml:space="preserve" y="122.24">ublk Server</text>
<rect fill="#aaffff" height="62" id="svg_6" stroke="#000000" transform="matrix(1 0 0 1 0 0)" width="161" x="384.38" y="311.2"/>
<text fill="#000000" font-family="Serif" font-size="21" font-weight="bold" id="svg_7" stroke="#000000" stroke-width="0" text-anchor="middle" transform="matrix(1 0 0 1 0 0)" x="468.93" xml:space="preserve" y="349.7">ublk Driver</text>
<rect fill="#ffff00" height="32" id="svg_8" stroke="#000000" width="98" x="144.36" y="212.94"/>
<text fill="#000000" font-family="Serif" font-size="18" id="svg_9" stroke="#000000" stroke-width="0" text-anchor="middle" x="194.36" xml:space="preserve" y="235.94">/dev/ublkb3</text>
<rect fill="#ffffff" height="0" id="svg_10" stroke="#000000" width="0" x="175.36" y="246.94"/>
<rect fill="#ffff00" height="33" id="svg_11" stroke="#000000" width="97" x="200.03" y="239.6"/>
<text fill="#000000" font-family="Serif" font-size="18" id="svg_12" stroke="#000000" stroke-width="0" text-anchor="middle" x="249.36" xml:space="preserve" y="263.27">/dev/ublkb2</text>
<rect fill="#ffffff" height="0" id="svg_13" stroke="#000000" width="0" x="174.36" y="264.94"/>
<rect fill="#ffff00" height="33" id="svg_14" stroke="#000000" width="97" x="33.99" y="244.06"/>
<text fill="#000000" font-family="Serif" font-size="18" id="svg_15" stroke="#000000" stroke-width="0" text-anchor="middle" x="82.99" xml:space="preserve" y="267.06">/dev/ublkb1</text>
<rect fill="#00ff00" height="32" id="svg_16" stroke="#000000" width="93" x="35.99" y="206.31">le/dev/ublkb1</rect>
<text fill="#000000" font-family="Serif" font-size="18" id="svg_17" stroke="#000000" stroke-width="0" text-anchor="middle" x="80.99" xml:space="preserve" y="226.31">Filesystem</text>
<path d="m383.94,359.38l-298.65,-1.66c0,0 -1.68,-79.96 -1.68,-79.96" fill="none" id="svg_22" stroke="#000000" stroke-linejoin="bevel" stroke-width="4"/>
<path d="m384.83,334.28l-148.14,-0.2c0,0 3.33,-62.12 3.33,-62.12" fill="none" id="svg_26" stroke="#000000" stroke-linejoin="bevel" stroke-width="4" transform="matrix(1 0 0 1 0 0)"/>
<path d="m384.69,347.33l-201.99,-0.22l0,-102.04" fill="none" id="svg_27" stroke="#000000" stroke-linejoin="bevel" stroke-width="4" transform="matrix(1 0 0 1 0 0)"/>
<path d="m454.33,155.75c0,0 0.48,154.94 0.32,154.69c-0.16,-0.25 -0.32,-154.69 -0.32,-154.69z" fill="none" id="svg_28" stroke="#000000" stroke-linejoin="bevel" stroke-width="3"/>
<path d="m468.6,156.42l0.18,155.99l-0.18,-155.99z" fill="none" id="svg_29" stroke="#000000" stroke-linejoin="bevel" stroke-width="3"/>
<path d="m482.69,157.08l-0.32,154.03l0.32,-154.03z" fill="none" id="svg_30" stroke="#000000" stroke-linecap="square" stroke-linejoin="bevel" stroke-width="3">ublk Server</path>
<rect fill="#aaffff" height="35.63" id="svg_40" stroke="#000000" width="109.37" x="65.74" y="91.86">Application A</rect>
<text fill="#000000" font-family="Serif" font-size="18" id="svg_41" stroke="#000000" stroke-width="0" style="cursor: move;" text-anchor="middle" x="119.36" xml:space="preserve" y="112.19">Application D</text>
<rect fill="#aaffff" height="30.63" id="svg_42" stroke="#000000" width="109.37" x="89.49" y="115.61">Application A</rect>
<text fill="#000000" font-family="Serif" font-size="18" id="svg_43" stroke="#000000" stroke-width="0" text-anchor="middle" x="143.11" xml:space="preserve" y="136.56">Application C</text>
<rect fill="#aaffff" height="31.25" id="svg_44" stroke="#000000" width="109.37" x="114.49" y="139.99">Application A</rect>
<text fill="#000000" font-family="Serif" font-size="18" id="svg_45" stroke="#000000" stroke-width="0" style="cursor: move;" text-anchor="middle" x="169.36" xml:space="preserve" y="160.31">Application B</text>
<rect fill="#aaffff" height="30.63" id="svg_46" stroke="#000000" width="109.37" x="145.74" y="164.99">Application A</rect>
<text fill="#000000" font-family="Serif" font-size="18" id="svg_47" stroke="#000000" stroke-width="0" text-anchor="middle" x="201.24" xml:space="preserve" y="186.56">Application A</text>
<text fill="#000000" font-family="Serif" font-size="21" font-weight="bold" id="svg_50" stroke="#000000" stroke-width="0" text-anchor="middle" transform="matrix(1 0 0 1 0 0)" x="161.4" xml:space="preserve" y="82.24">ublk Workload</text>
<text fill="#000000" font-family="Serif" font-size="19" font-style="italic" font-weight="normal" id="svg_51" stroke="#000000" stroke-width="0" text-anchor="middle" x="602.65" xml:space="preserve" y="222.24">Kernel Space</text>
<text fill="#000000" font-family="Serif" font-size="19" font-style="italic" font-weight="normal" id="svg_52" stroke="#000000" stroke-width="0" text-anchor="middle" transform="matrix(1 0 0 1 0 0)" x="602.03" xml:space="preserve" y="188.49">Userspace</text>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 5.6 KiB

View File

@ -1,41 +1,37 @@
# Storage Performance Development Kit {#mainpage}
# Storage Performance Development Kit {#index}
## Introduction
# Introduction
@copydoc intro
## Concepts
# Concepts
@copydoc concepts
## User Guides
# User Guides
@copydoc user_guides
## Programmer Guides
# Programmer Guides
@copydoc prog_guides
## General Information
# General Information
@copydoc general
## Miscellaneous
# Miscellaneous
@copydoc misc
## Driver Modules
# Driver Modules
@copydoc driver_modules
## Tools
# Tools
@copydoc tools
## CI Tools
@copydoc ci_tools
## Performance Reports
# Performance Reports
@copydoc performance_reports

View File

@ -4,5 +4,4 @@
- @subpage getting_started
- @subpage vagrant
- @subpage changelog
- @subpage deprecation
- [Source Code (GitHub)](https://github.com/spdk/spdk)

View File

@ -1,10 +1,10 @@
# I/OAT Driver {#ioat}
## Public Interface {#ioat_interface}
# Public Interface {#ioat_interface}
- spdk/ioat.h
## Key Functions {#ioat_key_functions}
# Key Functions {#ioat_key_functions}
Function | Description
--------------------------------------- | -----------

View File

@ -1,6 +1,6 @@
# iSCSI Target {#iscsi}
## iSCSI Target Getting Started Guide {#iscsi_getting_started}
# iSCSI Target Getting Started Guide {#iscsi_getting_started}
The Storage Performance Development Kit iSCSI target application is named `iscsi_tgt`.
This following section describes how to run iscsi from your cloned package.
@ -32,7 +32,7 @@ To ensure the SPDK iSCSI target has the best performance, place the NICs and the
same NUMA node and configure the target to run on CPU cores associated with that node. The following
command line option is used to configure the SPDK iSCSI target:
~~~bash
~~~
-m 0xF000000
~~~
@ -45,35 +45,35 @@ The iSCSI target is configured via JSON-RPC calls. See @ref jsonrpc for details.
### Portal groups
- iscsi_create_portal_group -- Add a portal group.
- iscsi_delete_portal_group -- Delete an existing portal group.
- iscsi_target_node_add_pg_ig_maps -- Add initiator group to portal group mappings to an existing iSCSI target node.
- iscsi_target_node_remove_pg_ig_maps -- Delete initiator group to portal group mappings from an existing iSCSI target node.
- iscsi_get_portal_groups -- Show information about all available portal groups.
- iscsi_create_portal_group -- Add a portal group.
- iscsi_delete_portal_group -- Delete an existing portal group.
- iscsi_target_node_add_pg_ig_maps -- Add initiator group to portal group mappings to an existing iSCSI target node.
- iscsi_target_node_remove_pg_ig_maps -- Delete initiator group to portal group mappings from an existing iSCSI target node.
- iscsi_get_portal_groups -- Show information about all available portal groups.
~~~bash
~~~
/path/to/spdk/scripts/rpc.py iscsi_create_portal_group 1 10.0.0.1:3260
~~~
### Initiator groups
- iscsi_create_initiator_group -- Add an initiator group.
- iscsi_delete_initiator_group -- Delete an existing initiator group.
- iscsi_initiator_group_add_initiators -- Add initiators to an existing initiator group.
- iscsi_get_initiator_groups -- Show information about all available initiator groups.
- iscsi_create_initiator_group -- Add an initiator group.
- iscsi_delete_initiator_group -- Delete an existing initiator group.
- iscsi_initiator_group_add_initiators -- Add initiators to an existing initiator group.
- iscsi_get_initiator_groups -- Show information about all available initiator groups.
~~~bash
~~~
/path/to/spdk/scripts/rpc.py iscsi_create_initiator_group 2 ANY 10.0.0.2/32
~~~
### Target nodes
- iscsi_create_target_node -- Add an iSCSI target node.
- iscsi_delete_target_node -- Delete an iSCSI target node.
- iscsi_target_node_add_lun -- Add a LUN to an existing iSCSI target node.
- iscsi_get_target_nodes -- Show information about all available iSCSI target nodes.
- iscsi_create_target_node -- Add an iSCSI target node.
- iscsi_delete_target_node -- Delete an iSCSI target node.
- iscsi_target_node_add_lun -- Add a LUN to an existing iSCSI target node.
- iscsi_get_target_nodes -- Show information about all available iSCSI target nodes.
~~~bash
~~~
/path/to/spdk/scripts/rpc.py iscsi_create_target_node Target3 Target3_alias MyBdev:0 1:2 64 -d
~~~
@ -83,30 +83,30 @@ The Linux initiator is open-iscsi.
Installing open-iscsi package
Fedora:
~~~bash
~~~
yum install -y iscsi-initiator-utils
~~~
Ubuntu:
~~~bash
~~~
apt-get install -y open-iscsi
~~~
### Setup
Edit /etc/iscsi/iscsid.conf
~~~bash
~~~
node.session.cmds_max = 4096
node.session.queue_depth = 128
~~~
iscsid must be restarted or receive SIGHUP for changes to take effect. To send SIGHUP, run:
~~~bash
~~~
killall -HUP iscsid
~~~
Recommended changes to /etc/sysctl.conf
~~~bash
~~~
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 0
@ -124,14 +124,13 @@ net.core.netdev_max_backlog = 300000
### Discovery
Assume target is at 10.0.0.1
~~~bash
~~~
iscsiadm -m discovery -t sendtargets -p 10.0.0.1
~~~
### Connect to target
~~~bash
~~~
iscsiadm -m node --login
~~~
@ -140,13 +139,13 @@ they came up as.
### Disconnect from target
~~~bash
~~~
iscsiadm -m node --logout
~~~
### Deleting target node cache
~~~bash
~~~
iscsiadm -m node -o delete
~~~
@ -154,7 +153,7 @@ This will cause the initiator to forget all previously discovered iSCSI target n
### Finding /dev/sdX nodes for iSCSI LUNs
~~~bash
~~~
iscsiadm -m session -P 3 | grep "Attached scsi disk" | awk '{print $4}'
~~~
@ -166,19 +165,19 @@ After the targets are connected, they can be tuned. For example if /dev/sdc is
an iSCSI disk then the following can be done:
Set noop to scheduler
~~~bash
~~~
echo noop > /sys/block/sdc/queue/scheduler
~~~
Disable merging/coalescing (can be useful for precise workload measurements)
~~~bash
~~~
echo "2" > /sys/block/sdc/queue/nomerges
~~~
Increase requests for block queue
~~~bash
~~~
echo "1024" > /sys/block/sdc/queue/nr_requests
~~~
@ -192,34 +191,33 @@ Assuming we have one iSCSI Target server with portal at 10.0.0.1:3200, two LUNs
#### Configure iSCSI Target
Start iscsi_tgt application:
```bash
```
./build/bin/iscsi_tgt
```
Construct two 64MB Malloc block devices with 512B sector size "Malloc0" and "Malloc1":
```bash
```
./scripts/rpc.py bdev_malloc_create -b Malloc0 64 512
./scripts/rpc.py bdev_malloc_create -b Malloc1 64 512
```
Create new portal group with id 1, and address 10.0.0.1:3260:
```bash
```
./scripts/rpc.py iscsi_create_portal_group 1 10.0.0.1:3260
```
Create one initiator group with id 2 to accept any connection from 10.0.0.2/32:
```bash
```
./scripts/rpc.py iscsi_create_initiator_group 2 ANY 10.0.0.2/32
```
Finally construct one target using previously created bdevs as LUN0 (Malloc0) and LUN1 (Malloc1)
with a name "disk1" and alias "Data Disk1" using portal group 1 and initiator group 2.
```bash
```
./scripts/rpc.py iscsi_create_target_node disk1 "Data Disk1" "Malloc0:0 Malloc1:1" 1:2 64 -d
```
@ -227,14 +225,14 @@ with a name "disk1" and alias "Data Disk1" using portal group 1 and initiator gr
Discover target
~~~bash
~~~
$ iscsiadm -m discovery -t sendtargets -p 10.0.0.1
10.0.0.1:3260,1 iqn.2016-06.io.spdk:disk1
~~~
Connect to the target
~~~bash
~~~
iscsiadm -m node --login
~~~
@ -242,7 +240,7 @@ At this point the iSCSI target should show up as SCSI disks.
Check dmesg to see what they came up as. In this example it can look like below:
~~~bash
~~~
...
[630111.860078] scsi host68: iSCSI Initiator over TCP/IP
[630112.124743] scsi 68:0:0:0: Direct-Access INTEL Malloc disk 0001 PQ: 0 ANSI: 5
@ -265,37 +263,35 @@ Check dmesg to see what they came up as. In this example it can look like below:
You may also use simple bash command to find /dev/sdX nodes for each iSCSI LUN
in all logged iSCSI sessions:
~~~bash
~~~
$ iscsiadm -m session -P 3 | grep "Attached scsi disk" | awk '{print $4}'
sdd
sde
~~~
## iSCSI Hotplug {#iscsi_hotplug}
# iSCSI Hotplug {#iscsi_hotplug}
At the iSCSI level, we provide the following support for Hotplug:
1. bdev/nvme:
At the bdev/nvme level, we start one hotplug monitor which will call
spdk_nvme_probe() periodically to get the hotplug events. We provide the
private attach_cb and remove_cb for spdk_nvme_probe(). For the attach_cb,
we will create the block device base on the NVMe device attached, and for the
remove_cb, we will unregister the block device, which will also notify the
upper level stack (for iSCSI target, the upper level stack is scsi/lun) to
handle the hot-remove event.
At the bdev/nvme level, we start one hotplug monitor which will call
spdk_nvme_probe() periodically to get the hotplug events. We provide the
private attach_cb and remove_cb for spdk_nvme_probe(). For the attach_cb,
we will create the block device base on the NVMe device attached, and for the
remove_cb, we will unregister the block device, which will also notify the
upper level stack (for iSCSI target, the upper level stack is scsi/lun) to
handle the hot-remove event.
2. scsi/lun:
When the LUN receive the hot-remove notification from block device layer,
the LUN will be marked as removed, and all the IOs after this point will
return with check condition status. Then the LUN starts one poller which will
wait for all the commands which have already been submitted to block device to
return back; after all the commands return back, the LUN will be deleted.
When the LUN receive the hot-remove notification from block device layer,
the LUN will be marked as removed, and all the IOs after this point will
return with check condition status. Then the LUN starts one poller which will
wait for all the commands which have already been submitted to block device to
return back; after all the commands return back, the LUN will be deleted.
@sa spdk_nvme_probe
## iSCSI Login Redirection {#iscsi_login_redirection}
# iSCSI Login Redirection {#iscsi_login_redirection}
The SPDK iSCSI target application supports iSCSI login redirection feature.
@ -316,7 +312,7 @@ portal group may optionally have a redirect portal for non-discovery logins for
each associated target. This redirect portal must be from a private portal group.
Initiators configure portals in public portal groups as target portals. When an
initiator logs in to a target through a portal in an associated public portal group,
initator logs in to a target through a portal in an associated public portal group,
the target sends a temporary redirection response with a redirect portal. Then the
initiator logs in to the target again through the redirect portal.

File diff suppressed because it is too large Load Diff

View File

@ -1,8 +1,6 @@
# JSON-RPC Remote access {#jsonrpc_proxy}
SPDK provides a sample python script `rpc_http_proxy.py`, that provides http server which listens for JSON
objects from users. It uses HTTP POST method to receive JSON objects including methods and parameters
described in this chapter.
SPDK provides a sample python script `rpc_http_proxy.py`, that provides http server which listens for JSON objects from users. It uses HTTP POST method to receive JSON objects including methods and parameters described in this chapter.
## Parameters
@ -28,10 +26,9 @@ Status 200 with resultant JSON object included on success.
## Client side
Below is a sample python script acting as a client side. It sends `bdev_get_bdevs` method with optional `name`
parameter and prints JSON object returned from remote_rpc script.
Below is a sample python script acting as a client side. It sends `bdev_get_bdevs` method with optional `name` parameter and prints JSON object returned from remote_rpc script.
~~~python
~~~
import json
import requests
@ -48,10 +45,7 @@ if __name__ == '__main__':
Output:
~~~python
python client.py
[{u'num_blocks': 2621440, u'name': u'Malloc0', u'uuid': u'fb57e59c-599d-42f1-8b89-3e46dbe12641', u'claimed': True,
u'driver_specific': {}, u'supported_io_types': {u'reset': True, u'nvme_admin': False, u'unmap': True, u'read': True,
u'nvme_io': False, u'write': True, u'flush': True, u'write_zeroes': True}, u'qos_ios_per_sec': 0, u'block_size': 4096,
u'product_name': u'Malloc disk', u'aliases': []}]
~~~
python client.py
[{u'num_blocks': 2621440, u'name': u'Malloc0', u'uuid': u'fb57e59c-599d-42f1-8b89-3e46dbe12641', u'claimed': True, u'driver_specific': {}, u'supported_io_types': {u'reset': True, u'nvme_admin': False, u'unmap': True, u'read': True, u'nvme_io': False, u'write': True, u'flush': True, u'write_zeroes': True}, u'qos_ios_per_sec': 0, u'block_size': 4096, u'product_name': u'Malloc disk', u'aliases': []}]
~~~

View File

@ -9,7 +9,7 @@ mixing of SPDK event framework dependent code and lower level libraries. This do
is aimed at explaining the structure, naming conventions, versioning scheme, and use cases
of the libraries contained in these two directories.
## Directory Structure {#structure}
# Directory Structure {#structure}
The SPDK libraries are divided into two directories. The `lib` directory contains the base libraries that
compose SPDK. Some of these base libraries define plug-in systems. Instances of those plug-ins are called
@ -17,24 +17,24 @@ modules and are located in the `module` directory. For example, the `spdk_sock`
`lib` directory while the implementations of socket abstractions, `sock_posix` and `sock_uring`
are contained in the `module` directory.
### lib {#lib}
## lib {#lib}
The libraries in the `lib` directory can be readily divided into four categories:
- Utility Libraries: These libraries contain basic, commonly used functions that make more complex
libraries easier to implement. For example, `spdk_log` contains macro definitions that provide a
consistent logging paradigm and `spdk_json` is a general purpose JSON parsing library.
libraries easier to implement. For example, `spdk_log` contains macro definitions that provide a
consistent logging paradigm and `spdk_json` is a general purpose JSON parsing library.
- Protocol Libraries: These libraries contain the building blocks for a specific service. For example,
`spdk_nvmf` and `spdk_vhost` each define the storage protocols after which they are named.
`spdk_nvmf` and `spdk_vhost` each define the storage protocols after which they are named.
- Storage Service Libraries: These libraries provide a specific abstraction that can be mapped to somewhere
between the physical drive and the filesystem level of your typical storage stack. For example `spdk_bdev`
provides a general block device abstraction layer, `spdk_lvol` provides a logical volume abstraction,
`spdk_blobfs` provides a filesystem abstraction, and `spdk_ftl` provides a flash translation layer
abstraction.
between the physical drive and the filesystem level of your typical storage stack. For example `spdk_bdev`
provides a general block device abstraction layer, `spdk_lvol` provides a logical volume abstraction,
`spdk_blobfs` provides a filesystem abstraction, and `spdk_ftl` provides a flash translation layer
abstraction.
- System Libraries: These libraries provide system level services such as a JSON based RPC service
(see `spdk_jsonrpc`) and thread abstractions (see `spdk_thread`). The most notable library in this category
is the `spdk_env_dpdk` library which provides a shim for the underlying Data Plane Development Kit (DPDK)
environment and provides services like memory management.
(see `spdk_jsonrpc`) and thread abstractions (see `spdk_thread`). The most notable library in this category
is the `spdk_env_dpdk` library which provides a shim for the underlying Data Plane Development Kit (DPDK)
environment and provides services like memory management.
The one library in the `lib` directory that doesn't fit into the above classification is the `spdk_event` library.
This library defines a framework used by the applications contained in the `app` and `example` directories. Much
@ -48,7 +48,7 @@ Much like the `spdk_event` library, the `spdk_env_dpdk` library has been archite
can be readily replaced by an alternate environment shim. More information on replacing the `spdk_env_dpdk`
module and the underlying `dpdk` environment can be found in the [environment](#env_replacement) section.
### module {#module}
## module {#module}
The component libraries in the `module` directory represent specific implementations of the base libraries in
the `lib` directory. As with the `lib` directory, much care has been taken to avoid dependencies on the
@ -58,10 +58,10 @@ There are seven sub-directories in the `module` directory which each hold a diff
sub-directories can be divided into two types.
- plug-in libraries: These libraries are explicitly tied to one of the libraries in the `lib` directory and
are registered with that library at runtime by way of a specific constructor function. The parent library in
the `lib` directory then manages the module directly. These types of libraries each implement a function table
defined by their parent library. The following table shows these directories and their corresponding parent
libraries:
are registered with that library at runtime by way of a specific constructor function. The parent library in
the `lib` directory then manages the module directly. These types of libraries each implement a function table
defined by their parent library. The following table shows these directories and their corresponding parent
libraries:
<center>
| module directory | parent library | dependent on event library |
@ -73,15 +73,15 @@ sub-directories can be divided into two types.
</center>
- Free libraries: These libraries are highly dependent upon a library in the `lib` directory but are not
explicitly registered to that library via a constructor. The libraries in the `blob`, `blobfs`, and `env_dpdk`
directories fall into this category. None of the libraries in this category depend explicitly on the
`spdk_event` library.
explicitly registered to that library via a constructor. The libraries in the `blob`, `blobfs`, and `env_dpdk`
directories fall into this category. None of the libraries in this category depend explicitly on the
`spdk_event` library.
## Library Conventions {#conventions}
# Library Conventions {#conventions}
The SPDK libraries follow strict conventions for naming functions, logging, versioning, and header files.
### Headers {#headers}
## Headers {#headers}
All public SPDK header files exist in the `include` directory of the SPDK repository. These headers
are divided into two sub-directories.
@ -105,7 +105,7 @@ Other header files contained directly in the `lib` and `module` directories are
by source files of their corresponding library. Any symbols intended to be used across libraries need to be
included in a header in the `include/spdk_internal` directory.
### Naming Conventions {#naming}
## Naming Conventions {#naming}
All public types and functions in SPDK libraries begin with the prefix `spdk_`. They are also typically
further namespaced using the spdk library name. The rest of the function or type name describes its purpose.
@ -114,15 +114,15 @@ There are no internal library functions that begin with the `spdk_` prefix. This
enforced by the SPDK continuous Integration testing. Functions not intended for use outside of their home
library should be namespaced with the name of the library only.
### Map Files {#map}
## Map Files {#map}
SPDK libraries can be built as both static and shared object files. To facilitate building libraries as shared
objects, each one has a corresponding map file (e.g. `spdk_nvmf` relies on `spdk_nvmf.map`). SPDK libraries
not exporting any symbols rely on a blank map file located at `mk/spdk_blank.map`.
## SPDK Shared Objects {#shared_objects}
# SPDK Shared Objects {#shared_objects}
### Shared Object Versioning {#versioning}
## Shared Object Versioning {#versioning}
SPDK shared objects follow a semantic versioning pattern with a major and minor version. Any changes which
break backwards compatibility (symbol removal or change) will cause a shared object major increment and
@ -141,7 +141,7 @@ Shared objects are versioned independently of one another. This means that `libs
with the same suffix are not necessarily compatible with each other. It is important to source all of your
SPDK libraries from the same repository and version to ensure inter-library compatibility.
### Linking to Shared Objects {#so_linking}
## Linking to Shared Objects {#so_linking}
Shared objects in SPDK are created on a per-library basis. There is a top level `libspdk.so` object
which is a linker script. It simply contains references to all of the other spdk shared objects.
@ -149,15 +149,15 @@ which is a linker script. It simply contains references to all of the other spdk
There are essentially two ways of linking to SPDK libraries.
1. An application can link to the top level shared object library as follows:
~~~{.sh}
~~~{.sh}
gcc -o my_app ./my_app.c -lspdk -lspdk_env_dpdk -ldpdk
~~~
~~~
2. An application can link to only a subset of libraries by linking directly to the ones it relies on:
~~~{.sh}
~~~{.sh}
gcc -o my_app ./my_app.c -lpassthru_external -lspdk_event_bdev -lspdk_bdev -lspdk_bdev_malloc
-lspdk_log -lspdk_thread -lspdk_util -lspdk_event -lspdk_env_dpdk -ldpdk
~~~
~~~
In the second instance, please note that applications need only link to the libraries upon which they
directly depend. All SPDK libraries have their dependencies specified at object compile time. This means
@ -172,7 +172,7 @@ itself need to be supplied to the linker. In the examples above, these are `spdk
respectively. This was intentional and allows one to easily swap out both the environment and the
environment shim.
### Replacing the env abstraction {#env_replacement}
## Replacing the env abstraction {#env_replacement}
SPDK depends on an environment abstraction that provides crucial pinned memory management and PCIe
bus management operations. The interface for this environment abstraction is defined in the
@ -184,7 +184,6 @@ modifications to the spdk source directly.
Any environment can replace the `spdk_env_dpdk` environment by implementing the `include/env.h` header
file. The environment can either be implemented wholesale in a single library or as a two-part
shim/implementation library system.
~~~{.sh}
# single library
gcc -o my_app ./my_app.c -lspdk -lcustom_env_implementation
@ -192,23 +191,3 @@ shim/implementation library system.
# two libraries
gcc -o my_app ./my_app.c -lspdk -lcustom_env_shim -lcustom_env_implementation
~~~
## SPDK Static Objects {#static_objects}
SPDK static objects are compiled by default even when no parameters are supplied to the build system.
Unlike SPDK shared objects, the filename does not contain any versioning semantics. Linking against
static objects is similar to shared objects but will always require the use of `-Wl,--whole-archive`
as argument. This is due to the use of constructor functions in SPDK such as those to register
NVMe transports.
Due to the lack of versioning semantics, it is not recommended to install static libraries system wide.
Instead the path to these static libraries should be added as argument at compile time using
`-L/path/to/static/libs`. The use of static objects instead of shared objects can also be forced
through `-Wl,-Bstatic`, otherwise some compilers might prefer to use the shared objects if both
are available.
~~~{.sh}
gcc -o my_app ./my_app.c -L/path/to/static/libs -Wl,--whole-archive -Wl,-Bstatic -lpassthru_external
-lspdk_event_bdev -lspdk_bdev -lspdk_bdev_malloc -lspdk_log -lspdk_thread -lspdk_util -lspdk_event
-lspdk_env_dpdk -Wl,--no-whole-archive -Wl,-Bdynamic -pthread -ldpdk
~~~

View File

@ -1,48 +1,38 @@
# Logical Volumes {#logical_volumes}
The Logical Volumes library is a flexible storage space management system. It provides creating and managing virtual
block devices with variable size. The SPDK Logical Volume library is built on top of @ref blob.
The Logical Volumes library is a flexible storage space management system. It provides creating and managing virtual block devices with variable size. The SPDK Logical Volume library is built on top of @ref blob.
## Terminology {#lvol_terminology}
# Terminology {#lvol_terminology}
### Logical volume store {#lvs}
## Logical volume store {#lvs}
* Shorthand: lvolstore, lvs
* Type name: struct spdk_lvol_store
A logical volume store uses the super blob feature of blobstore to hold uuid (and in future other metadata).
Blobstore types are implemented in blobstore itself, and saved on disk. An lvolstore will generate a UUID on
creation, so that it can be uniquely identified from other lvolstores.
By default when creating lvol store data region is unmapped. Optional --clear-method parameter can be passed
on creation to change that behavior to writing zeroes or performing no operation.
A logical volume store uses the super blob feature of blobstore to hold uuid (and in future other metadata). Blobstore types are implemented in blobstore itself, and saved on disk. An lvolstore will generate a UUID on creation, so that it can be uniquely identified from other lvolstores.
By default when creating lvol store data region is unmapped. Optional --clear-method parameter can be passed on creation to change that behavior to writing zeroes or performing no operation.
### Logical volume {#lvol}
## Logical volume {#lvol}
* Shorthand: lvol
* Type name: struct spdk_lvol
A logical volume is implemented as an SPDK blob created from an lvolstore. An lvol is uniquely identified by
its UUID. Lvol additional can have alias name.
A logical volume is implemented as an SPDK blob created from an lvolstore. An lvol is uniquely identified by its UUID. Lvol additional can have alias name.
### Logical volume block device {#lvol_bdev}
## Logical volume block device {#lvol_bdev}
* Shorthand: lvol_bdev
* Type name: struct spdk_lvol_bdev
Representation of an SPDK block device (spdk_bdev) with an lvol implementation.
A logical volume block device translates generic SPDK block device I/O (spdk_bdev_io) operations into the
equivalent SPDK blob operations. Combination of lvol name and lvolstore name gives lvol_bdev alias name in
a form "lvs_name/lvol_name". block_size of the created bdev is always 4096, due to blobstore page size.
Cluster_size is configurable by parameter. Size of the new bdev will be rounded up to nearest multiple of
cluster_size. By default lvol bdevs claim part of lvol store equal to their set size. When thin provision
option is enabled, no space is taken from lvol store until data is written to lvol bdev.
By default when deleting lvol bdev or resizing down, allocated clusters are unmapped. Optional --clear-method
parameter can be passed on creation to change that behavior to writing zeroes or performing no operation.
A logical volume block device translates generic SPDK block device I/O (spdk_bdev_io) operations into the equivalent SPDK blob operations. Combination of lvol name and lvolstore name gives lvol_bdev alias name in a form "lvs_name/lvol_name". block_size of the created bdev is always 4096, due to blobstore page size. Cluster_size is configurable by parameter.
Size of the new bdev will be rounded up to nearest multiple of cluster_size.
By default lvol bdevs claim part of lvol store equal to their set size. When thin provision option is enabled, no space is taken from lvol store until data is written to lvol bdev.
By default when deleting lvol bdev or resizing down, allocated clusters are unmapped. Optional --clear-method parameter can be passed on creation to change that behavior to writing zeroes or performing no operation.
### Thin provisioning {#lvol_thin_provisioning}
## Thin provisioning {#lvol_thin_provisioning}
Thin provisioned lvols rely on dynamic cluster allocation (e.g. when the first write operation on a cluster is performed), only space
required to store data is used and unallocated clusters are obtained from underlying device (e.g. zeroes_dev).
Thin provisioned lvols rely on dynamic cluster allocation (e.g. when the first write operation on a cluster is performed), only space required to store data is used and unallocated clusters are obtained from underlying device (e.g. zeroes_dev).
Sample write operations of thin provisioned blob are shown on the diagram below:
@ -52,13 +42,11 @@ Sample read operations and the structure of thin provisioned blob are shown on t
![Reading clusters from thin provisioned blob](lvol_thin_provisioning.svg)
### Snapshots and clone {#lvol_snapshots}
## Snapshots and clone {#lvol_snapshots}
Logical volumes support snapshots and clones functionality. User may at any given time create snapshot of existing
logical volume to save a backup of current volume state. When creating snapshot original volume becomes thin provisioned
and saves only incremental differences from its underlying snapshot. This means that every read from unallocated cluster
is actually a read from the snapshot and every write to unallocated cluster triggers new cluster allocation and data copy
from corresponding cluster in snapshot to the new cluster in logical volume before the actual write occurs.
Logical volumes support snapshots and clones functionality. User may at any given time create snapshot of existing logical volume to save a backup of current volume state.
When creating snapshot original volume becomes thin provisioned and saves only incremental differences from its underlying snapshot. This means that every read from unallocated cluster is actually a read from the snapshot and
every write to unallocated cluster triggers new cluster allocation and data copy from corresponding cluster in snapshot to the new cluster in logical volume before the actual write occurs.
The read operation is performed as shown in the diagram below:
![Reading cluster from clone](lvol_clone_snapshot_read.svg)
@ -66,50 +54,32 @@ The read operation is performed as shown in the diagram below:
The write operation is performed as shown in the diagram below:
![Writing cluster to the clone](lvol_clone_snapshot_write.svg)
User may also create clone of existing snapshot that will be thin provisioned and it will behave in the same way as logical volume
from which snapshot is created. There is no limit of clones and snapshots that may be created as long as there is enough space on
logical volume store. Snapshots are read only. Clones may be created only from snapshots or read only logical volumes.
User may also create clone of existing snapshot that will be thin provisioned and it will behave in the same way as logical volume from which snapshot is created.
There is no limit of clones and snapshots that may be created as long as there is enough space on logical volume store. Snapshots are read only. Clones may be created only from snapshots or read only logical volumes.
A snapshot can be removed only if there is a single clone on top of it. The relation chain will be updated accordingly.
The cluster map of clone and snapshot will be merged and entries for unallocated clusters in the clone will be updated with
addresses from the snapshot cluster map. The entire operation modifies metadata only - no data is copied during this process.
A snapshot can be removed only if there is a single clone on top of it. The relation chain will be updated accordingly. The cluster map of clone and snapshot will be merged and entries for unallocated clusters in the clone
will be updated with addresses from the snapshot cluster map. The entire operation modifies metadata only - no data is copied during this process.
### External Snapshots
## Inflation {#lvol_inflation}
With the external snapshots feature, clones can be made of any bdev. These clones are commonly called *esnap clones*.
Esnap clones work very similarly to thin provisioning. Rather than the back device being an zeroes device, the external snapshot
bdev is used as the back device.
![Clone of External Snapshot](lvol_esnap_clone.svg)
A bdev that is used as an external snapshot cannot be opened for writing by anything else so long as an esnap clone exists.
A bdev may have multiple esnap clones and esnap clones can themselves be snapshotted and cloned.
### Inflation {#lvol_inflation}
Blobs can be inflated to copy data from backing devices (e.g. snapshots) and allocate all remaining clusters. As a result of this
operation all dependencies for the blob are removed.
Blobs can be inflated to copy data from backing devices (e.g. snapshots) and allocate all remaining clusters. As a result of this operation all dependencies for the blob are removed.
![Removing backing blob and bdevs relations using inflate call](lvol_inflate_clone_snapshot.svg)
### Decoupling {#lvol_decoupling}
## Decoupling {#lvol_decoupling}
Blobs can be decoupled from their parent blob by copying data from backing devices (e.g. snapshots) for all allocated clusters.
Remaining unallocated clusters are kept thin provisioned.
Note: When decouple is performed, only single dependency is removed. To remove all dependencies in a chain of blobs depending
on each other, multiple calls need to be issued.
Blobs can be decoupled from their parent blob by copying data from backing devices (e.g. snapshots) for all allocated clusters. Remaining unallocated clusters are kept thin provisioned.
Note: When decouple is performed, only single dependency is removed. To remove all dependencies in a chain of blobs depending on each other, multiple calls need to be issued.
## Configuring Logical Volumes
# Configuring Logical Volumes
There is no static configuration available for logical volumes. All configuration is done trough RPC. Information about
logical volumes is kept on block devices.
There is no static configuration available for logical volumes. All configuration is done trough RPC. Information about logical volumes is kept on block devices.
## RPC overview {#lvol_rpc}
# RPC overview {#lvol_rpc}
RPC regarding lvolstore:
```bash
```
bdev_lvol_create_lvstore [-h] [-c CLUSTER_SZ] bdev_name lvs_name
Constructs lvolstore on specified bdev with specified name. During
construction bdev is unmapped at initialization and all data is
@ -141,7 +111,7 @@ bdev_lvol_rename_lvstore [-h] old_name new_name
RPC regarding lvol and spdk bdev:
```bash
```
bdev_lvol_create [-h] [-u UUID] [-l LVS_NAME] [-t] [-c CLEAR_METHOD] lvol_name size
Creates lvol with specified size and name on lvolstore specified by its uuid
or name. Then constructs spdk bdev on top of that lvol and presents it as spdk bdev.
@ -150,12 +120,6 @@ bdev_lvol_create [-h] [-u UUID] [-l LVS_NAME] [-t] [-c CLEAR_METHOD] lvol_name s
optional arguments:
-h, --help show help
-c, --clear-method specify data clusters clear method "none", "unmap" (default), "write_zeroes"
bdev_lvol_get_lvols [-h] [-u LVS_UUID] [-l LVS_NAME]
Display logical volume list, including those that do not have associated bdevs.
optional arguments:
-h, --help show help
-u LVS_UUID, --lvs_uuid UUID show volumes only in the specified lvol store
-l LVS_NAME, --lvs_name LVS_NAME show volumes only in the specified lvol store
bdev_get_bdevs [-h] [-b NAME]
User can view created bdevs using this call including those created on top of lvols.
optional arguments:
@ -173,10 +137,6 @@ bdev_lvol_clone [-h] snapshot_name clone_name
Create a clone with clone_name of a given lvol snapshot.
optional arguments:
-h, --help show help
bdev_lvol_clone_bdev [-h] bdev_name_or_uuid lvs_name clone_name
Create a clone with clone_name of a bdev. The bdev must not be an lvol in the lvs_name lvstore.
optional arguments:
-h, --help show help
bdev_lvol_rename [-h] old_name new_name
Change lvol bdev name
optional arguments:
@ -197,12 +157,4 @@ bdev_lvol_decouple_parent [-h] name
Decouple parent of a logical volume
optional arguments:
-h, --help show help
bdev_lvol_set_xattr [-h] name xattr_name xattr_value
Set xattr for lvol bdev
optional arguments:
-h, --help show help
bdev_lvol_get_xattr [-h] name xattr_name
Get xattr for lvol bdev
optional arguments:
-h, --help show help
```

View File

@ -92,7 +92,7 @@ SPDK must be allocated using spdk_dma_malloc() or its siblings. The buffers
must be allocated specifically so that they are pinned and so that physical
addresses are known.
## IOMMU Support
# IOMMU Support
Many platforms contain an extra piece of hardware called an I/O Memory
Management Unit (IOMMU). An IOMMU is much like a regular MMU, except it

View File

@ -2,4 +2,3 @@
- @subpage peer_2_peer
- @subpage containers
- @subpage rpms

View File

@ -9,32 +9,32 @@ do not poll frequently enough, events may be lost. All events are identified by
monotonically increasing integer, so missing events may be detected, although
not recovered.
## Register event types {#notify_register}
# Register event types {#notify_register}
During initialization the sender library should register its own event types using
`spdk_notify_type_register(const char *type)`. Parameter 'type' is the name of
notification type.
## Get info about events {#notify_get_info}
# Get info about events {#notify_get_info}
A consumer can get information about the available event types during runtime using
`spdk_notify_foreach_type`, which iterates over registered notification types and
calls a callback on each of them, so that user can produce detailed information
about notification.
## Get new events {#notify_listen}
# Get new events {#notify_listen}
A consumer can get events by calling function `spdk_notify_foreach_event`.
The caller should specify last received event and the maximum number of invocations.
There might be multiple consumers of each event. The event bus is implemented as a
circular buffer, so older events may be overwritten by newer ones.
## Send events {#notify_send}
# Send events {#notify_send}
When an event occurs, a library can invoke `spdk_notify_send` with two strings.
One containing the type of the event, like "spdk_bdev_register", second with context,
for example "Nvme0n1"
## RPC Calls {#rpc_calls}
# RPC Calls {#rpc_calls}
See [JSON-RPC documentation](jsonrpc.md/#rpc_notify_get_types)

96
doc/nvme-cli.md Normal file
View File

@ -0,0 +1,96 @@
# nvme-cli {#nvme-cli}
# nvme-cli with SPDK Getting Started Guide
Now nvme-cli can support both kernel driver and SPDK user mode driver for most of its available commands and
Intel specific commands.
1. Clone the nvme-cli repository from the SPDK GitHub fork. Make sure you check out the spdk-1.6 branch.
~~~{.sh}
git clone -b spdk-1.6 https://github.com/spdk/nvme-cli.git
~~~
2. Clone the SPDK repository from https://github.com/spdk/spdk under the nvme-cli folder.
3. Refer to the "README.md" under SPDK folder to properly build SPDK.
4. Refer to the "README.md" under nvme-cli folder to properly build nvme-cli.
5. Execute "<spdk_folder>/scripts/setup.sh" with the "root" account.
6. Update the "spdk.conf" file under nvme-cli folder to properly configure the SPDK. Notes as following:
~~~{.sh}
spdk=1
Indicates whether or not to use spdk. Can be 0 (off) or 1 (on).
Defaults to 1 which assumes that you have run "<spdk_folder>/scripts/setup.sh", unbinding your drives from the kernel.
core_mask=0x1
A bitmask representing which core(s) to use for nvme-cli operations.
Defaults to core 0.
mem_size=512
The amount of reserved hugepage memory to use for nvme-cli (in MB).
Defaults to 512MB.
shm_id=0
Indicates the shared memory ID for the spdk application with which your NVMe drives are associated,
and should be adjusted accordingly.
Defaults to 0.
~~~
7. Run the "./nvme list" command to get the domain:bus:device.function for each found NVMe SSD.
8. Run the other nvme commands with domain:bus:device.function instead of "/dev/nvmeX" for the specified device.
~~~{.sh}
Example: ./nvme smart-log 0000:01:00.0
~~~
9. Run the "./nvme intel" commands for Intel specific commands against Intel NVMe SSD.
~~~{.sh}
Example: ./nvme intel internal-log 0000:08:00.0
~~~
10. Execute "<spdk_folder>/scripts/setup.sh reset" with the "root" account and update "spdk=0" in spdk.conf to
use the kernel driver if wanted.
## Use scenarios
### Run as the only SPDK application on the system
1. Modify the spdk to 1 in spdk.conf. If the system has fewer cores or less memory, update the spdk.conf accordingly.
### Run together with other running SPDK applications on shared NVMe SSDs
1. For the other running SPDK application, start with the parameter like "-i 1" to have the same "shm_id".
2. Use the default spdk.conf setting where "shm_id=1" to start the nvme-cli.
3. If other SPDK applications run with different shm_id parameter, update the "spdk.conf" accordingly.
### Run with other running SPDK applications on non-shared NVMe SSDs
1. Properly configure the other running SPDK applications.
~~~{.sh}
a. Only access the NVMe SSDs it wants.
b. Allocate a fixed number of memory instead of all available memory.
~~~
2. Properly configure the spdk.conf setting for nvme-cli.
~~~{.sh}
a. Not access the NVMe SSDs from other SPDK applications.
b. Change the mem_size to a proper size.
~~~
## Note
1. To run the newly built nvme-cli, either explicitly run as "./nvme" or added it into the $PATH to avoid
invoke other already installed version.
2. To run the newly built nvme-cli with SPDK support in arbitrary directory, copy "spdk.conf" to that
directory from the nvme cli folder and update the configuration as suggested.

View File

@ -1,18 +1,17 @@
# NVMe Driver {#nvme}
## In this document {#nvme_toc}
# In this document {#nvme_toc}
- @ref nvme_intro
- @ref nvme_examples
- @ref nvme_interface
- @ref nvme_design
- @ref nvme_fabrics_host
- @ref nvme_multi_process
- @ref nvme_hotplug
- @ref nvme_cuse
- @ref nvme_led
* @ref nvme_intro
* @ref nvme_examples
* @ref nvme_interface
* @ref nvme_design
* @ref nvme_fabrics_host
* @ref nvme_multi_process
* @ref nvme_hotplug
* @ref nvme_cuse
## Introduction {#nvme_intro}
# Introduction {#nvme_intro}
The NVMe driver is a C library that may be linked directly into an application
that provides direct, zero-copy data transfer to and from
@ -30,23 +29,23 @@ devices via NVMe over Fabrics. Users may now call spdk_nvme_probe() on both
local PCI busses and on remote NVMe over Fabrics discovery services. The API is
otherwise unchanged.
## Examples {#nvme_examples}
# Examples {#nvme_examples}
### Getting Start with Hello World {#nvme_helloworld}
## Getting Start with Hello World {#nvme_helloworld}
There are a number of examples provided that demonstrate how to use the NVMe
library. They are all in the [examples/nvme](https://github.com/spdk/spdk/tree/master/examples/nvme)
directory in the repository. The best place to start is
[hello_world](https://github.com/spdk/spdk/blob/master/examples/nvme/hello_world/hello_world.c).
### Running Benchmarks with Fio Plugin {#nvme_fioplugin}
## Running Benchmarks with Fio Plugin {#nvme_fioplugin}
SPDK provides a plugin to the very popular [fio](https://github.com/axboe/fio)
tool for running some basic benchmarks. See the fio start up
[guide](https://github.com/spdk/spdk/blob/master/examples/nvme/fio_plugin/)
for more details.
### Running Benchmarks with Perf Tool {#nvme_perf}
## Running Benchmarks with Perf Tool {#nvme_perf}
NVMe perf utility in the [examples/nvme/perf](https://github.com/spdk/spdk/tree/master/examples/nvme/perf)
is one of the examples which also can be used for performance tests. The fio
@ -80,7 +79,7 @@ perf -q 1 -o 4096 -w write -r 'trtype:PCIe traddr:0000:04:00.0' -t 300 -e 'PRACT
perf -q 1 -o 4096 -w read -r 'trtype:PCIe traddr:0000:04:00.0' -t 200 -e 'PRACT=0,PRCKH=GUARD'
~~~
## Public Interface {#nvme_interface}
# Public Interface {#nvme_interface}
- spdk/nvme.h
@ -104,9 +103,9 @@ spdk_nvme_ctrlr_process_admin_completions() | @copybrief spdk_nvme_ctrlr_process
spdk_nvme_ctrlr_cmd_io_raw() | @copybrief spdk_nvme_ctrlr_cmd_io_raw()
spdk_nvme_ctrlr_cmd_io_raw_with_md() | @copybrief spdk_nvme_ctrlr_cmd_io_raw_with_md()
## NVMe Driver Design {#nvme_design}
# NVMe Driver Design {#nvme_design}
### NVMe I/O Submission {#nvme_io_submission}
## NVMe I/O Submission {#nvme_io_submission}
I/O is submitted to an NVMe namespace using nvme_ns_cmd_xxx functions. The NVMe
driver submits the I/O request as an NVMe submission queue entry on the queue
@ -118,21 +117,21 @@ spdk_nvme_qpair_process_completions().
@sa spdk_nvme_ns_cmd_read, spdk_nvme_ns_cmd_write, spdk_nvme_ns_cmd_dataset_management,
spdk_nvme_ns_cmd_flush, spdk_nvme_qpair_process_completions
#### Fused operations {#nvme_fuses}
### Fused operations {#nvme_fuses}
To "fuse" two commands, the first command should have the SPDK_NVME_IO_FLAGS_FUSE_FIRST
io flag set, and the next one should have the SPDK_NVME_IO_FLAGS_FUSE_SECOND.
In addition, the following rules must be met to execute two commands as an atomic unit:
- The commands shall be inserted next to each other in the same submission queue.
- The LBA range, should be the same for the two commands.
- The commands shall be inserted next to each other in the same submission queue.
- The LBA range, should be the same for the two commands.
E.g. To send fused compare and write operation user must call spdk_nvme_ns_cmd_compare
followed with spdk_nvme_ns_cmd_write and make sure no other operations are submitted
in between on the same queue, like in example below:
~~~c
~~~
rc = spdk_nvme_ns_cmd_compare(ns, qpair, cmp_buf, 0, 1, nvme_fused_first_cpl_cb,
NULL, SPDK_NVME_CMD_FUSE_FIRST);
if (rc != 0) {
@ -150,7 +149,7 @@ The NVMe specification currently defines compare-and-write as a fused operation.
Support for compare-and-write is reported by the controller flag
SPDK_NVME_CTRLR_COMPARE_AND_WRITE_SUPPORTED.
#### Scaling Performance {#nvme_scaling}
### Scaling Performance {#nvme_scaling}
NVMe queue pairs (struct spdk_nvme_qpair) provide parallel submission paths for
I/O. I/O may be submitted on multiple queue pairs simultaneously from different
@ -183,7 +182,7 @@ require that data should be done by sending a request to the owning thread.
This results in a message passing architecture, as opposed to a locking
architecture, and will result in superior scaling across CPU cores.
### NVMe Driver Internal Memory Usage {#nvme_memory_usage}
## NVMe Driver Internal Memory Usage {#nvme_memory_usage}
The SPDK NVMe driver provides a zero-copy data transfer path, which means that
there are no data buffers for I/O commands. However, some Admin commands have
@ -203,12 +202,12 @@ Each submission queue entry (SQE) and completion queue entry (CQE) consumes 64 b
and 16 bytes respectively. Therefore, the maximum memory used for each I/O queue
pair is (MQES + 1) * (64 + 16) Bytes.
## NVMe over Fabrics Host Support {#nvme_fabrics_host}
# NVMe over Fabrics Host Support {#nvme_fabrics_host}
The NVMe driver supports connecting to remote NVMe-oF targets and
interacting with them in the same manner as local NVMe SSDs.
### Specifying Remote NVMe over Fabrics Targets {#nvme_fabrics_trid}
## Specifying Remote NVMe over Fabrics Targets {#nvme_fabrics_trid}
The method for connecting to a remote NVMe-oF target is very similar
to the normal enumeration process for local PCIe-attached NVMe devices.
@ -229,11 +228,11 @@ single NVM subsystem directly, the NVMe library will call `probe_cb`
for just that subsystem; this allows the user to skip the discovery step
and connect directly to a subsystem with a known address.
### RDMA Limitations
## RDMA Limitations
Please refer to NVMe-oF target's @ref nvmf_rdma_limitations
## NVMe Multi Process {#nvme_multi_process}
# NVMe Multi Process {#nvme_multi_process}
This capability enables the SPDK NVMe driver to support multiple processes accessing the
same NVMe device. The NVMe driver allocates critical structures from shared memory, so
@ -244,7 +243,7 @@ The primary motivation for this feature is to support management tools that can
to long running applications, perform some maintenance work or gather information, and
then detach.
### Configuration {#nvme_multi_process_configuration}
## Configuration {#nvme_multi_process_configuration}
DPDK EAL allows different types of processes to be spawned, each with different permissions
on the hugepage memory used by the applications.
@ -270,7 +269,7 @@ Example: identical shm_id and non-overlapping core masks
./perf -q 8 -o 131072 -w write -c 0x10 -t 60 -i 1
~~~
### Limitations {#nvme_multi_process_limitations}
## Limitations {#nvme_multi_process_limitations}
1. Two processes sharing memory may not share any cores in their core mask.
2. If a primary process exits while secondary processes are still running, those processes
@ -281,7 +280,7 @@ Example: identical shm_id and non-overlapping core masks
@sa spdk_nvme_probe, spdk_nvme_ctrlr_process_admin_completions
## NVMe Hotplug {#nvme_hotplug}
# NVMe Hotplug {#nvme_hotplug}
At the NVMe driver level, we provide the following support for Hotplug:
@ -301,9 +300,9 @@ At the NVMe driver level, we provide the following support for Hotplug:
@sa spdk_nvme_probe
## NVMe Character Devices {#nvme_cuse}
# NVMe Character Devices {#nvme_cuse}
### Design
This feature is considered as experimental.
![NVMe character devices processing diagram](nvme_cuse.svg)
@ -323,78 +322,14 @@ nvme_io_msg_process().
Ioctls that request information attained when attaching NVMe controller receive an
immediate response, without passing them through the ring.
This interface reserves one additional qpair for sending down the I/O for each controller.
This interface reserves one qpair for sending down the I/O for each controller.
### Usage
## Enabling cuse support for NVMe
#### Enabling cuse support for NVMe
Cuse support is disabled by default. To enable support for NVMe devices SPDK
must be compiled with "./configure --with-nvme-cuse".
Cuse support is disabled by default. To enable support for NVMe-CUSE devices first
install required dependencies
~~~{.sh}
sudo scripts/pkgdep.sh --fuse
~~~
Then compile SPDK with "./configure --with-nvme-cuse".
#### Creating NVMe-CUSE device
First make sure to prepare the environment (see @ref getting_started).
This includes loading CUSE kernel module.
Any NVMe controller attached to a running SPDK application can be
exposed via NVMe-CUSE interface. When closing SPDK application,
the NVMe-CUSE devices are unregistered.
~~~{.sh}
$ sudo scripts/setup.sh
$ sudo modprobe cuse
$ sudo build/bin/spdk_tgt
# Continue in another session
$ sudo scripts/rpc.py bdev_nvme_attach_controller -b Nvme0 -t PCIe -a 0000:82:00.0
Nvme0n1
$ sudo scripts/rpc.py bdev_nvme_get_controllers
[
{
"name": "Nvme0",
"trid": {
"trtype": "PCIe",
"traddr": "0000:82:00.0"
}
}
]
$ sudo scripts/rpc.py bdev_nvme_cuse_register -n Nvme0
$ ls /dev/spdk/
nvme0 nvme0n1
~~~
#### Example of using nvme-cli
Most nvme-cli commands can point to specific controller or namespace by providing a path to it.
This can be leveraged to issue commands to the SPDK NVMe-CUSE devices.
~~~{.sh}
sudo nvme id-ctrl /dev/spdk/nvme0
sudo nvme smart-log /dev/spdk/nvme0
sudo nvme id-ns /dev/spdk/nvme0n1
~~~
Note: `nvme list` command does not display SPDK NVMe-CUSE devices,
see nvme-cli [PR #773](https://github.com/linux-nvme/nvme-cli/pull/773).
#### Examples of using smartctl
smartctl tool recognizes device type based on the device path. If none of expected
patterns match, SCSI translation layer is used to identify device.
To use smartctl '-d nvme' parameter must be used in addition to full path to
the NVMe device.
~~~{.sh}
smartctl -d nvme -i /dev/spdk/nvme0
smartctl -d nvme -H /dev/spdk/nvme1
...
~~~
### Limitations
## Limitations
NVMe namespaces are created as character devices and their use may be limited for
tools expecting block devices.
@ -408,11 +343,16 @@ with SPDK NVMe CUSE.
SCSI to NVMe Translation Layer is not implemented. Tools that are using this layer to
identify, manage or operate device might not work properly or their use may be limited.
## NVMe LED management {#nvme_led}
### Examples of using smartctl
It is possible to use the ledctl(8) utility to control the state of LEDs in systems supporting
NPEM (Native PCIe Enclosure Management), even when the NVMe devices are controlled by SPDK.
However, in this case it is necessary to determine the slot device number because the block device
is unavailable. The [ledctl.sh](https://github.com/spdk/spdk/tree/master/scripts/ledctl.sh) script
can be used to help with this. It takes the name of the nvme bdev and invokes ledctl with
appropriate options.
smartctl tool recognizes device type based on the device path. If none of expected
patterns match, SCSI translation layer is used to identify device.
To use smartctl '-d nvme' parameter must be used in addition to full path to
the NVMe device.
~~~{.sh}
smartctl -d nvme -i /dev/spdk/nvme0
smartctl -d nvme -H /dev/spdk/nvme1
...
~~~

View File

@ -1,166 +0,0 @@
# NVMe Multipath {#nvme_multipath}
## Introduction
The NVMe bdev module supports two modes: failover and multipath. In failover mode, only one
active connection is maintained and alternate paths are connected only during the switch-over.
This can lead to delays and failed I/O reported to upper layers, but it does reduce the number
of active connections at any given time. In multipath, active connections are maintained for
every path and used based on a policy of either active-passive or active-active. The multipath
mode also supports Asymmetric Namespace Access (ANA) and uses that to make policy decisions.
## Design
### Multipath Mode
A user may establish connections on multiple independent paths to the same NVMe-oF subsystem
for NVMe bdevs by calling the `bdev_nvme_attach_controller` RPC multiple times with the same NVMe
bdev controller name. Additionally, the `multipath` parameter for this RPC must be set to
"multipath" when connecting the second or later paths.
For each path created by the `bdev_nvme_attach_controller` RPC, an NVMe-oF controller is created.
Then the set of namespaces presented by that controller are discovered. For each namespace found,
the NVMe bdev module attempts to match it with an existing NVMe bdev. If it finds a match, it adds
the given namespace as an alternate path. If it does not find a match, it creates a new NVMe bdev.
I/O and admin qpairs are necessary to access an NVMe-oF controller. A single admin qpair is created
and is shared by all SPDK threads. To submit I/O without taking locks, for each SPDK thread, an I/O
qpair is created as a dynamic context of an I/O channel for an NVMe-oF controller.
For each SPDK thread, the NVMe bdev module creates an I/O channel for an NVMe bdev and provides it to
the upper layer. The I/O channel for the NVMe bdev has an I/O path for each namespace. I/O path is
an additional abstraction to submit I/O to a namespace, and consists of an I/O qpair context and a
namespace. If an NVMe bdev has multiple namespaces, an I/O channel for the NVMe bdev has a list of
multiple I/O paths. The I/O channel for the NVMe bdev has a retry I/O list and has a path selection
policy.
### Path Error Recovery
If the NVMe driver detects an error on a qpair, it disconnects the qpair and notifies the error to
the NVMe bdev module. Then the NVMe bdev module starts resetting the corresponding NVMe-oF controller.
The NVMe-oF controller reset consists of the following steps: 1) disconnect and delete all I/O qpairs,
2) disconnect admin qpair, 3) connect admin qpair, 4) configure the NVMe-oF controller, and
5) create and connect all I/O qpairs.
If the step 3, 4, or 5 fails, the reset reverts to the step 3 and then it is retried after
`reconnect_delay_sec` seconds. Then the NVMe-oF controller is deleted automatically if it is not
recovered within `ctrlr_loss_timeout_sec` seconds. If `ctrlr_loss_timeout_sec` is -1, it retries
indefinitely.
By default, error detection on a qpair is very slow for TCP and RDMA transports. For fast error
detection, a global option, `transport_ack_timeout`, is useful.
### Path Selection
Multipath mode supports two path selection policies, active-passive or active-active.
For both path selection policies, only ANA optimal I/O paths are used unless there are no ANA
optimal I/O paths available.
For active-passive policy, each I/O channel for an NVMe bdev has a cache to store the first found
I/O path which is connected and optimal from ANA and use it for I/O submission. Some users may want
to specify the preferred I/O path manually. They can dynamically set the preferred I/O path using
the `bdev_nvme_set_preferred_path` RPC. Such assignment is realized naturally by moving the
I/O path to the head of the I/O path list. By default, if the preferred I/O path is restored,
failback to it is done automatically. The automatic failback can be disabled by a global option
`disable_auto_failback`. In this case, the `bdev_nvme_set_preferred_path` RPC can be used
to do manual failback.
The active-active policy uses the round-robin algorithm and submits an I/O to each I/O path in
circular order.
### I/O Retry
The NVMe bdev module has a global option, `bdev_retry_count`, to control the number of retries when
an I/O is returned with error. Each I/O has a retry count. If the retry count of an I/O is less than
the `bdev_retry_count`, the I/O is allowed to retry and the retry count is incremented.
NOTE: The `bdev_retry_count` is not directly used but is required to be non-zero for the process
of multipath mode failing over to a different path because the retry count is checked first always
when an I/O is returned with error.
Each I/O has a timer to schedule an I/O retry at a particular time in the future. Each I/O channel
for an NVMe bdev has a sorted I/O retry list. Retried I/Os are inserted into the I/O retry list.
If an I/O is returned with error, the I/O completion handler in the NVMe bdev module executes the
following steps:
1. If the DNR (Do Not Retry) bit is set or the retry count exceeds the limit, then complete the
I/O with the returned error.
2. If the error is a path error, insert the I/O to the I/O retry list with no delay.
3. Otherwise, insert the I/O to the I/O retry list with the delay reported by the CRD (Command
Retry Delay).
Then the I/O retry poller is scheduled to the closest expiration. If there is no retried I/O,
the I/O retry poller is stopped.
When submitting an I/O, there may be no available I/O path. If there is any I/O path which is
recovering, the I/O is inserted to the I/O retry list with one second delay. This may result in
queueing many I/Os indefinitely. To avoid such indefinite queueing, per NVMe-oF controller option,
`fast_io_fail_timeout_sec`, is added. If the corresponding NVMe-oF controller is not recovered
within `fast_io_fail_timeout_sec` seconds, the I/O is not queued to wait the recovery but returned
with an I/O error to the upper layer.
### Asymmetric Namespace Accesses (ANA) Handling
If an I/O is returned with an ANA error or an ANA change notice event is received, the ANA log page
may be changed. In this case, the NVMe bdev module reads the ANA log page to check the ANA state
changes.
As described before, only ANA optimal I/O paths will be used unless there are no ANA optimal paths
available.
If an I/O path is in ANA transition, i.e., its namespace reports the ANA inaccessible state or the ANA
change state, the NVMe bdev module queues I/Os to wait until the namespace becomes accessible again.
The ANA transition should end within the ANATT (ANA Transition Time) seconds. If the namespace does
not report the ANA optimized state or the ANA accessible state within the ANATT seconds, I/Os are
returned with an I/O error to the upper layer.
### I/O Timeout
The NVMe driver supports I/O timeout for submitted I/Os. The NVMe bdev module provides three
actions when an I/O timeout is notified from the NVMe driver, ABORT, RESET, or NONE. Users can
choose one of the actions as a global option, `action_on_timeout`. Users can set different timeout
values for I/O commands and admin commands by global options, `timeout_us` and `timeout_admin_us`.
For ABORT, the NVMe bdev module tries aborting the timed out I/O, and if failed, it starts the
NVMe-oF controller reset. For RESET, the NVMe bdev module starts the NVMe-oF controller reset.
## Usage
The following is an example to attach two NVMe-oF controllers and aggregate these into a single
NVMe bdev controller `Nvme0`.
```bash
./scripts/rpc.py bdev_nvme_attach_controller -b Nvme0 -t rdma -a 192.168.100.8 -s 4420 -f ipv4 -n nqn.2016-06.io.spdk:cnode1 -l -1 -o 20
./scripts/rpc.py bdev_nvme_attach_controller -b Nvme0 -t rdma -a 192.168.100.9 -s 4420 -f ipv4 -n nqn.2016-06.io.spdk:cnode1 -l -1 -o 20 -x multipath
```
In this example, if these two NVMe-oF controllers have a shared namespace whose namespace ID is 1,
a single NVMe bdev `Nvme0n1` is created. For the NVMe bdev module, the default value of
`bdev_retry_count` is 3 and I/O retry is enabled by default. `ctrlr_loss_timeout_sec` is set to -1
and `reconnect_delay_sec` is set to 20. Hence, NVMe-oF controller reconnect will be retried once
per 20 seconds until it succeeds.
To confirm if multipath is configured correctly, two RPCs, `bdev_get_bdevs` and
`bdev_nvme_get_controllers` are available.
```bash
./scripts/rpc.py bdev_get_bdevs -b Nvme0n1
./scripts/rpc.py bdev_nvme_get_controllers -n Nvme0
```
To monitor the current multipath state, a RPC `bdev_nvme_get_io_paths` are available.
```bash
./scripts/rpc.py bdev_nvme_get_io_paths -n Nvme0n1
```
## Limitations
SPDK NVMe multipath is transport protocol independent. Heterogeneous multipath configuration (e.g.,
TCP and RDMA) is supported. However, in this type of configuration, memory domain is not available
yet because memory domain is supported only by the RDMA transport now.
The RPCs, `bdev_get_iostat` and `bdev_nvme_get_transport_statistics` display I/O statistics but
both are not aware of multipath.

View File

@ -20,8 +20,8 @@ registers involved that are called doorbells.
An I/O is submitted to an NVMe device by constructing a 64 byte command, placing
it into the submission queue at the current location of the submission queue
tail index, and then writing the new index of the submission queue tail to the
submission queue tail doorbell register. It's actually valid to copy a whole set
head index, and then writing the new index of the submission queue head to the
submission queue head doorbell register. It's actually valid to copy a whole set
of commands into open slots in the ring and then write the doorbell just one
time to submit the whole batch.

View File

@ -3,7 +3,7 @@
@sa @ref nvme_fabrics_host
@sa @ref nvmf_tgt_tracepoints
## NVMe-oF Target Getting Started Guide {#nvmf_getting_started}
# NVMe-oF Target Getting Started Guide {#nvmf_getting_started}
The SPDK NVMe over Fabrics target is a user space application that presents block devices over a fabrics
such as Ethernet, Infiniband or Fibre Channel. SPDK currently supports RDMA and TCP transports.
@ -106,14 +106,20 @@ using 1GB hugepages or by pre-reserving memory at application startup with `--me
option. All pre-reserved memory will be registered as a single region, but won't be returned to the
system until the SPDK application is terminated.
Another known issue occurs when using the E810 NICs in RoCE mode. Specifically, the NVMe-oF target
sometimes cannot destroy a qpair, because its posted work requests don't get flushed. It can cause
the NVMe-oF target application unable to terminate cleanly.
## TCP transport support {#nvmf_tcp_transport}
The transport is built into the nvmf_tgt by default, and it does not need any special libraries.
## Configuring the SPDK NVMe over Fabrics Target {#nvmf_config}
An NVMe over Fabrics target can be configured using JSON RPCs.
The basic RPCs needed to configure the NVMe-oF subsystem are detailed below. More information about
working with NVMe over Fabrics specific RPCs can be found on the @ref jsonrpc_components_nvmf_tgt RPC page.
Using .ini style configuration files for configuration of the NVMe-oF target is deprecated and should
be replaced with JSON based RPCs. .ini style configuration files can be converted to json format by way
of the new script `scripts/config_converter.py`.
## FC transport support {#nvmf_fc_transport}
To build nvmf_tgt with the FC transport, there is an additional FC LLD (Low Level Driver) code dependency.
@ -130,33 +136,29 @@ After cloning SPDK repo and initialize submodules, FC LLD library is built which
the fc transport.
~~~{.sh}
git clone https://github.com/spdk/spdk --recursive
git clone https://github.com/spdk/spdk spdk
git clone https://github.com/ecdufcdrvr/bcmufctdrvr fc
cd fc
cd spdk
git submodule update --init
cd ../fc
make DPDK_DIR=../spdk/dpdk/build SPDK_DIR=../spdk
cd ../spdk
./configure --with-fc=../fc/build
make
~~~
## Configuring the SPDK NVMe over Fabrics Target {#nvmf_config}
An NVMe over Fabrics target can be configured using JSON RPCs.
The basic RPCs needed to configure the NVMe-oF subsystem are detailed below. More information about
working with NVMe over Fabrics specific RPCs can be found on the @ref jsonrpc_components_nvmf_tgt RPC page.
### Using RPCs {#nvmf_config_rpc}
Start the nvmf_tgt application with elevated privileges. Once the target is started,
the nvmf_create_transport rpc can be used to initialize a given transport. Below is an
example where the target is started and configured with two different transports.
The RDMA transport is configured with an I/O unit size of 8192 bytes, max I/O size 131072 and an
in capsule data size of 8192 bytes. The TCP transport is configured with an I/O unit size of
The RDMA transport is configured with an I/O unit size of 8192 bytes, 4 max qpairs per controller,
and an in capsule data size of 0 bytes. The TCP transport is configured with an I/O unit size of
16384 bytes, 8 max qpairs per controller, and an in capsule data size of 8192 bytes.
~~~{.sh}
build/bin/nvmf_tgt
scripts/rpc.py nvmf_create_transport -t RDMA -u 8192 -i 131072 -c 8192
scripts/rpc.py nvmf_create_transport -t RDMA -u 8192 -m 4 -c 0
scripts/rpc.py nvmf_create_transport -t TCP -u 16384 -m 8 -c 8192
~~~
@ -184,8 +186,7 @@ Basic Types
year = 4 * digit ;
month = '01' | '02' | '03' | '04' | '05' | '06' | '07' | '08' | '09' | '10' | '11' | '12' ;
digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
hex digit = 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | '0' |
'1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
hex digit = 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
NQN Definition
NVMe Qualified Name = ( NVMe-oF Discovery NQN | NVMe UUID NQN | NVMe Domain NQN ), '\0' ;
@ -268,8 +269,3 @@ nvme disconnect -n "nqn.2016-06.io.spdk:cnode1"
SPDK has a tracing framework for capturing low-level event information at runtime.
@ref nvmf_tgt_tracepoints enable analysis of both performance and application crashes.
## Enabling NVMe-oF Multipath
The SPDK NVMe-oF target and initiator support multiple independent paths to the same NVMe-oF subsystem.
For step-by-step instructions for configuring and switching between paths, see @ref nvmf_multipath_howto .

View File

@ -1,103 +0,0 @@
# NVMe-oF Multipath HOWTO {#nvmf_multipath_howto}
This HOWTO provides step-by-step instructions for setting-up a simple SPDK deployment and testing multipath.
It demonstrates configuring path preferences with Asymmetric Namespace Access (ANA), as well as round-robin
path load balancing.
## Build SPDK on both the initiator and target servers
Clone the repo:
~~~{.sh}
git clone https://github.com/spdk/spdk --recursive
~~~
Configure and build SPDK:
~~~{.sh}
cd spdk/
./configure
make -j16
~~~
## Setup hugepages
This should be run once on each server (and after reboots):
~~~{.sh}
cd spdk/
./scripts/setup.sh
~~~
## On target: start and configure SPDK
Start the target in the background and configure it:
~~~{.sh}
cd spdk/
./build/bin/nvmf_tgt -m 0x3 &
./scripts/rpc.py nvmf_create_transport -t tcp -o -u 8192
~~~
Create a subsystem, with `-r` to enable ANA reporting feature:
~~~{.sh}
./scripts/rpc.py nvmf_create_subsystem nqn.2022-02.io.spdk:cnode0 -a -s SPDK00000000000001 -r
~~~
Create and add a malloc block device:
~~~{.sh}
./scripts/rpc.py bdev_malloc_create 64 512 -b Malloc0
./scripts/rpc.py nvmf_subsystem_add_ns nqn.2022-02.io.spdk:cnode0 Malloc0
~~~
Add two listeners, each with a different `IP:port` pair:
~~~{.sh}
./scripts/rpc.py nvmf_subsystem_add_listener -t tcp -a 172.17.1.13 -s 4420 nqn.2022-02.io.spdk:cnode0
./scripts/rpc.py nvmf_subsystem_add_listener -t tcp -a 172.18.1.13 -s 5520 nqn.2022-02.io.spdk:cnode0
~~~
## On initiator: start and configure bdevperf
Launch the bdevperf process in the background:
~~~{.sh}
cd spdk/
./build/examples/bdevperf -m 0x4 -z -r /tmp/bdevperf.sock -q 128 -o 4096 -w verify -t 90 &> bdevperf.log &
~~~
Configure bdevperf and add two paths:
~~~{.sh}
./scripts/rpc.py -s /tmp/bdevperf.sock bdev_nvme_set_options -r -1
./scripts/rpc.py -s /tmp/bdevperf.sock bdev_nvme_attach_controller -b Nvme0 -t tcp -a 172.17.1.13 -s 4420 -f ipv4 -n nqn.2022-02.io.spdk:cnode0 -l -1 -o 10
./scripts/rpc.py -s /tmp/bdevperf.sock bdev_nvme_attach_controller -b Nvme0 -t tcp -a 172.18.1.13 -s 5520 -f ipv4 -n nqn.2022-02.io.spdk:cnode0 -x multipath -l -1 -o 10
~~~
## Launch a bdevperf test
Connect to the RPC socket of the bdevperf process and start the test:
~~~{.sh}
PYTHONPATH=$PYTHONPATH:/root/src/spdk/python ./examples/bdev/bdevperf/bdevperf.py -t 1 -s /tmp/bdevperf.sock perform_tests
~~~
The RPC command will return, leaving the test to run for 90 seconds in the background. On the target server,
observe that only the first path (port) is receiving packets by checking the queues with `ss -t`.
You can view the paths available to the initiator with:
~~~{.sh}
./scripts/rpc.py -s /tmp/bdevperf.sock bdev_nvme_get_io_paths -n Nvme0n1
~~~
## Switching paths
This can be done on the target server by setting the first path's ANA to `non_optimized`:
~~~{.sh}
./scripts/rpc.py nvmf_subsystem_listener_set_ana_state nqn.2022-02.io.spdk:cnode0 -t tcp -a 172.17.1.13 -s 4420 -n non_optimized
~~~
Use `ss -t` to verify that the traffic has switched to the second path.
## Use round-robin (active_active) path load balancing
First, ensure the ANA for both paths is configured as `optimized` on the target. Then, change the
multipath policy on the initiator to `active_active` (multipath policy is per bdev, so
`bdev_nvme_set_multipath_policy` must be called after `bdev_nvme_attach_controller`):
~~~{.sh}
./scripts/rpc.py -s /tmp/bdevperf.sock bdev_nvme_set_multipath_policy -b Nvme0n1 -p active_active
~~~
Observe with `ss -t` that both connections are receiving traffic (queues build up).

View File

@ -68,7 +68,7 @@ system. This is used for access control.
A user of the NVMe-oF target library begins by creating a target using
spdk_nvmf_tgt_create(), setting up a set of addresses on which to accept
connections by calling spdk_nvmf_tgt_listen_ext(), then creating a subsystem
connections by calling spdk_nvmf_tgt_listen(), then creating a subsystem
using spdk_nvmf_subsystem_create().
Subsystems begin in an inactive state and must be activated by calling
@ -78,7 +78,7 @@ calling spdk_nvmf_subsystem_pause() and resumed by calling
spdk_nvmf_subsystem_resume().
Namespaces may be added to the subsystem by calling
spdk_nvmf_subsystem_add_ns_ext() when the subsystem is inactive or paused.
spdk_nvmf_subsystem_add_ns() when the subsystem is inactive or paused.
Namespaces are bdevs. See @ref bdev for more information about the SPDK bdev
layer. A bdev may be obtained by calling spdk_bdev_get_by_name().

View File

@ -1,81 +1,36 @@
# NVMe-oF Target Tracepoints {#nvmf_tgt_tracepoints}
## Introduction {#tracepoints_intro}
# Introduction {#tracepoints_intro}
SPDK has a tracing framework for capturing low-level event information at runtime.
Tracepoints provide a high-performance tracing mechanism that is accessible at runtime.
They are implemented as a circular buffer in shared memory that is accessible from other
processes. The NVMe-oF target is instrumented with tracepoints to enable analysis of
both performance and application crashes and it has to be configured beforehand using
this [guide](https://spdk.io/doc/nvmf.html).
(Note: the SPDK tracing framework should still be considered experimental.
Work to formalize and document the framework is in progress.)
both performance and application crashes. (Note: the SPDK tracing framework should still
be considered experimental. Work to formalize and document the framework is in progress.)
## Enabling Tracepoints {#enable_tracepoints}
# Enabling Tracepoints {#enable_tracepoints}
Tracepoints are placed in groups. They are enabled and disabled as a group or individually
inside a group.
Tracepoints are placed in groups. They are enabled and disabled as a group. To enable
the instrumentation of all the tracepoints group in an SPDK target application, start the
target with -e parameter set to 0xFFFF:
### Enabling Tracepoints in Groups
To enable the instrumentation of all the tracepoints groups in an SPDK target
application, start the target with `-e` parameter set to `0xFFFF` or `all`:
~~~bash
~~~
build/bin/nvmf_tgt -e 0xFFFF
~~~
or
To enable the instrumentation of just the NVMe-oF RDMA tracepoints in an SPDK target
application, start the target with the -e parameter set to 0x10:
~~~bash
build/bin/nvmf_tgt -e all
~~~
To enable the instrumentation of just the `NVMe-oF RDMA` tracepoints in an SPDK target
application, start the target with the `-e` parameter set to `0x10`:
~~~bash
build/bin/nvmf_tgt -e 0x10
~~~
### Enabling Individual Tracepoints
To enable individual tracepoints inside a group:
~~~bash
build/bin/nvmf_tgt -e 0x10:B
~~~
or
~~~bash
build/bin/nvmf_tgt -e nvmf_rdma:B
~~~
where `:` is a separator and `B` is the tracepoint mask. This will enable only the first, second and fourth (binary: 1011) tracepoint inside `NVMe-oF RDMA` group.
### Combining Tracepoint Masks
It is also possible to combine enabling whole groups of tpoints and individual ones:
~~~bash
build/bin/nvmf_tgt -e 0x10:2,0x400
~~~
This will enable the second tracepoint inside `NVMe-oF RDMA` group (0x10) and all of the tracepoints defined by the `thread` group (0x400).
### Tracepoint Group Values
iscsi (0x2), scsi (0x4), bdev (0x8), nvmf_rdma (0x10), nvmf_tcp (0x20), ftl (0x40), blobfs (0x80), nvmf_fc (0x100),
idxd (0x200), thread (0x400), nvme_pcie (0x800)
### Starting the SPDK Target
When the target starts, a message is logged with the information you need to view
the tracepoints in a human-readable format using the spdk_trace application. The target
will also log information about the shared memory file.
~~~bash
~~~{.sh}
app.c: 527:spdk_app_setup_trace: *NOTICE*: Tracepoint Group Mask 0xFFFF specified.
app.c: 531:spdk_app_setup_trace: *NOTICE*: Use 'spdk_trace -s nvmf -p 24147' to capture a snapshot of events at runtime.
app.c: 533:spdk_app_setup_trace: *NOTICE*: Or copy /dev/shm/nvmf_trace.pid24147 for offline analysis/debug.
@ -86,20 +41,20 @@ exits. This ensures the file can be used for analysis after the application exi
shared memory files are in /dev/shm, and can be deleted manually to free shm space if needed. A system
reboot will also free all of the /dev/shm files.
## Capturing a snapshot of events {#capture_tracepoints}
# Capturing a snapshot of events {#capture_tracepoints}
Send I/Os to the SPDK target application to generate events. The following is
an example usage of perf to send I/Os to the NVMe-oF target over an RDMA network
interface for 10 minutes.
~~~bash
./perf -q 128 -o 4096 -w randread -t 600 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.100.2 trsvcid:4420'
~~~
./perf -q 128 -s 4096 -w randread -t 600 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.100.2 trsvcid:4420'
~~~
The spdk_trace program can be found in the app/trace directory. To analyze the tracepoints on the same
system running the NVMe-oF target, simply execute the command line shown in the log:
~~~bash
~~~{.sh}
build/bin/spdk_trace -s nvmf -p 24147
~~~
@ -107,13 +62,13 @@ To analyze the tracepoints on a different system, first prepare the tracepoint f
tracepoint file can be large, but usually compresses very well. This step can also be used to prepare
a tracepoint file to attach to a GitHub issue for debugging NVMe-oF application crashes.
~~~bash
~~~{.sh}
bzip2 -c /dev/shm/nvmf_trace.pid24147 > /tmp/trace.bz2
~~~
After transferring the /tmp/trace.bz2 tracepoint file to a different system:
~~~bash
~~~{.sh}
bunzip2 /tmp/trace.bz2
build/bin/spdk_trace -f /tmp/trace
~~~
@ -122,7 +77,7 @@ The following is sample trace capture showing the cumulative time that each
I/O spends at each RDMA state. All the trace captures with the same id are for
the same I/O.
~~~bash
~~~
28: 6026.658 ( 12656064) RDMA_REQ_NEED_BUFFER id: r3622 time: 0.019
28: 6026.694 ( 12656140) RDMA_REQ_RDY_TO_EXECUTE id: r3622 time: 0.055
28: 6026.820 ( 12656406) RDMA_REQ_EXECUTING id: r3622 time: 0.182
@ -169,7 +124,7 @@ the same I/O.
28: 6033.056 ( 12669500) RDMA_REQ_COMPLETED id: r3564 time: 100.211
~~~
## Capturing sufficient trace events {#capture_trace_events}
# Capturing sufficient trace events {#capture_trace_events}
Since the tracepoint file generated directly by SPDK application is a circular buffer in shared memory,
the trace events captured by it may be insufficient for further analysis.
@ -178,34 +133,33 @@ spdk_trace_record is used to poll the spdk tracepoint shared memory, record new
and store all entries into specified output file at its shutdown on SIGINT or SIGTERM.
After SPDK nvmf target is launched, simply execute the command line shown in the log:
~~~bash
~~~{.sh}
build/bin/spdk_trace_record -q -s nvmf -p 24147 -f /tmp/spdk_nvmf_record.trace
~~~
Also send I/Os to the SPDK target application to generate events by previous perf example for 10 minutes.
~~~bash
./perf -q 128 -o 4096 -w randread -t 600 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.100.2 trsvcid:4420'
~~~{.sh}
./perf -q 128 -s 4096 -w randread -t 600 -r 'trtype:RDMA adrfam:IPv4 traddr:192.168.100.2 trsvcid:4420'
~~~
After the completion of perf example, shut down spdk_trace_record by signal SIGINT (Ctrl + C).
To analyze the tracepoints output file from spdk_trace_record, simply run spdk_trace program by:
~~~bash
~~~{.sh}
build/bin/spdk_trace -f /tmp/spdk_nvmf_record.trace
~~~
## Adding New Tracepoints {#add_tracepoints}
# Adding New Tracepoints {#add_tracepoints}
SPDK applications and libraries provide several trace points. You can add new
tracepoints to the existing trace groups. For example, to add a new tracepoints
to the SPDK RDMA library (lib/nvmf/rdma.c) trace group TRACE_GROUP_NVMF_RDMA,
define the tracepoints and assigning them a unique ID using the SPDK_TPOINT_ID macro:
~~~c
~~~
#define TRACE_GROUP_NVMF_RDMA 0x4
#define TRACE_RDMA_REQUEST_STATE_NEW SPDK_TPOINT_ID(TRACE_GROUP_NVMF_RDMA, 0x0)
#define TRACE_RDMA_REQUEST_STATE_NEED_BUFFER SPDK_TPOINT_ID(TRACE_GROUP_NVMF_RDMA, 0x1)
...
#define NEW_TRACE_POINT_NAME SPDK_TPOINT_ID(TRACE_GROUP_NVMF_RDMA, UNIQUE_ID)
~~~
@ -214,17 +168,17 @@ You also need to register the new trace points in the SPDK_TRACE_REGISTER_FN mac
within the application/library using the spdk_trace_register_description function
as shown below:
~~~c
SPDK_TRACE_REGISTER_FN(nvmf_trace, "nvmf_rdma", TRACE_GROUP_NVMF_RDMA)
~~~
SPDK_TRACE_REGISTER_FN(nvmf_trace)
{
spdk_trace_register_object(OBJECT_NVMF_RDMA_IO, 'r');
spdk_trace_register_description("RDMA_REQ_NEW", TRACE_RDMA_REQUEST_STATE_NEW,
OWNER_NONE, OBJECT_NVMF_RDMA_IO, 1,
SPDK_TRACE_ARG_TYPE_PTR, "qpair");
spdk_trace_register_description("RDMA_REQ_NEED_BUFFER", TRACE_RDMA_REQUEST_STATE_NEED_BUFFER,
OWNER_NONE, OBJECT_NVMF_RDMA_IO, 0,
SPDK_TRACE_ARG_TYPE_PTR, "qpair");
spdk_trace_register_description("RDMA_REQ_NEW", "",
TRACE_RDMA_REQUEST_STATE_NEW,
OWNER_NONE, OBJECT_NVMF_RDMA_IO, 1, 1, "cmid: ");
...
spdk_trace_register_description("NEW_RDMA_REQ_NAME", "",
NEW_TRACE_POINT_NAME,
OWNER_NONE, OBJECT_NVMF_RDMA_IO, 0, 1, "cmid: ");
}
~~~
@ -233,19 +187,19 @@ application/library to record the current trace state for the new trace points.
The following example shows the usage of the spdk_trace_record function to
record the current trace state of several tracepoints.
~~~c
~~~
case RDMA_REQUEST_STATE_NEW:
spdk_trace_record(TRACE_RDMA_REQUEST_STATE_NEW, 0, 0, (uintptr_t)rdma_req, (uintptr_t)rqpair);
spdk_trace_record(TRACE_RDMA_REQUEST_STATE_NEW, 0, 0, (uintptr_t)rdma_req, (uintptr_t)rqpair->cm_id);
...
break;
case RDMA_REQUEST_STATE_NEED_BUFFER:
spdk_trace_record(TRACE_RDMA_REQUEST_STATE_NEED_BUFFER, 0, 0, (uintptr_t)rdma_req, (uintptr_t)rqpair);
spdk_trace_record(TRACE_RDMA_REQUEST_STATE_NEED_BUFFER, 0, 0, (uintptr_t)rdma_req, (uintptr_t)rqpair->cm_id);
...
break;
case RDMA_REQUEST_STATE_DATA_TRANSFER_TO_CONTROLLER_PENDING:
spdk_trace_record(RDMA_REQUEST_STATE_DATA_TRANSFER_TO_CONTROLLER_PENDING, 0, 0,
(uintptr_t)rdma_req, (uintptr_t)rqpair);
case RDMA_REQUEST_STATE_TRANSFER_PENDING_HOST_TO_CONTROLLER:
spdk_trace_record(TRACE_RDMA_REQUEST_STATE_TRANSFER_PENDING_HOST_TO_CONTROLLER, 0, 0,
(uintptr_t)rdma_req, (uintptr_t)rqpair->cm_id);
...
~~~
All the tracing functions are documented in the [Tracepoint library documentation](https://spdk.io/doc/trace_8h.html)
All the tracing functions are documented in the [Tracepoint library documentation](https://www.spdk.io/doc/trace_8h.html)

View File

@ -1,6 +1,6 @@
# SPDK Structural Overview {#overview}
## Overview {#dir_overview}
# Overview {#dir_overview}
SPDK is composed of a set of C libraries residing in `lib` with public interface
header files in `include/spdk`, plus a set of applications built out of those
@ -77,13 +77,13 @@ directory and include the headers by prefixing `spdk/` like this:
Most of the headers here correspond with a library in the `lib` directory. There
are a few headers that stand alone, however. They are:
- `assert.h`
- `barrier.h`
- `endian.h`
- `fd.h`
- `mmio.h`
- `queue.h` and `queue_extras.h`
- `string.h`
- `assert.h`
- `barrier.h`
- `endian.h`
- `fd.h`
- `mmio.h`
- `queue.h` and `queue_extras.h`
- `string.h`
There is also an `spdk_internal` directory that contains header files widely included
by libraries within SPDK, but that are not part of the public API and would not be

View File

@ -3,14 +3,14 @@
Please note that the functionality discussed in this document is
currently tagged as experimental.
## In this document {#p2p_toc}
# In this document {#p2p_toc}
* @ref p2p_overview
* @ref p2p_nvme_api
* @ref p2p_cmb_copy
* @ref p2p_issues
## Overview {#p2p_overview}
# Overview {#p2p_overview}
Peer-2-Peer (P2P) is the concept of DMAing data directly from one PCI
End Point (EP) to another without using a system memory buffer. The
@ -22,7 +22,7 @@ In this section of documentation we outline how to perform P2P
operations in SPDK and outline some of the issues that can occur when
performing P2P operations.
## The P2P API for NVMe {#p2p_nvme_api}
# The P2P API for NVMe {#p2p_nvme_api}
The functions that provide access to the NVMe CMBs for P2P
capabilities are given in the table below.
@ -33,7 +33,7 @@ spdk_nvme_ctrlr_map_cmb() | @copybrief spdk_nvme_ctrlr_map_cmb
spdk_nvme_ctrlr_unmap_cmb() | @copybrief spdk_nvme_ctrlr_unmap_cmb()
spdk_nvme_ctrlr_get_regs_cmbsz() | @copybrief spdk_nvme_ctrlr_get_regs_cmbsz()
## Determining device support {#p2p_support}
# Determining device support {#p2p_support}
SPDK's identify example application displays whether a device has a controller
memory buffer and which operations it supports. Run it as follows:
@ -42,7 +42,7 @@ memory buffer and which operations it supports. Run it as follows:
./build/examples/identify -r traddr:<pci id of ssd>
~~~
## cmb_copy: An example P2P Application {#p2p_cmb_copy}
# cmb_copy: An example P2P Application {#p2p_cmb_copy}
Run the cmb_copy example application.
@ -53,7 +53,7 @@ This should copy a single LBA (LBA 0) from namespace 1 on the read
NVMe SSD to LBA 0 on namespace 1 on the write SSD using the CMB as the
DMA buffer.
## Issues with P2P {#p2p_issues}
# Issues with P2P {#p2p_issues}
* In some systems when performing peer-2-peer DMAs between PCIe EPs
that are directly connected to the Root Complex (RC) the DMA may

View File

@ -1,73 +1,5 @@
# Performance Reports {#performance_reports}
## Release 23.01
- [SPDK 23.01 NVMe Bdev Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_nvme_bdev_perf_report_2301.pdf)
- [SPDK 23.01 Vhost Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_vhost_perf_report_2301.pdf)
- [SPDK 23.01 NVMe-oF TCP Performance Report (Mellanox ConnectX-5)](https://ci.spdk.io/download/performance-reports/SPDK_tcp_mlx_perf_report_2301.pdf)
- [SPDK 23.01 NVMe-oF TCP Performance Report (Intel E810-CQDA2)](https://ci.spdk.io/download/performance-reports/SPDK_tcp_cvl_perf_report_2301.pdf)
- [SPDK 23.01 NVMe-oF RDMA Performance Report (Mellanox ConnectX-5)](https://ci.spdk.io/download/performance-reports/SPDK_rdma_mlx_perf_report_2301.pdf)
- [SPDK 23.01 NVMe-oF RDMA Performance Report (Intel E810-CQDA2 iWARP)](https://ci.spdk.io/download/performance-reports/SPDK_rdma_cvl_iwarp_perf_report_2301.pdf)
- [SPDK 23.01 NVMe-oF RDMA Performance Report (Intel E810-CQDA2 RoCEv2)](https://ci.spdk.io/download/performance-reports/SPDK_rdma_cvl_roce_perf_report_2301.pdf)
## Release 22.09
- [SPDK 22.09 NVMe Bdev Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_nvme_bdev_perf_report_2209.pdf)
- [SPDK 22.09 NVMe-oF TCP Performance Report (Mellanox ConnectX-5)](https://ci.spdk.io/download/performance-reports/SPDK_tcp_mlx_perf_report_2209.pdf)
- [SPDK 22.09 NVMe-oF TCP Performance Report (Intel E810-CQDA2)](https://ci.spdk.io/download/performance-reports/SPDK_tcp_cvl_perf_report_2209.pdf)
- [SPDK 22.09 NVMe-oF RDMA Performance Report (Mellanox ConnectX-5)](https://ci.spdk.io/download/performance-reports/SPDK_rdma_mlx_perf_report_2209.pdf)
- [SPDK 22.09 Vhost Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_vhost_perf_report_2209.pdf)
## Release 22.05
- [SPDK 22.05 NVMe Bdev Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_nvme_bdev_perf_report_2205.pdf)
- [SPDK 22.05 NVMe-oF TCP Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_tcp_perf_report_2205.pdf)
- [SPDK 22.05 NVMe-oF RDMA Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_rdma_perf_report_2205.pdf)
- [SPDK 22.05 Vhost Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_vhost_perf_report_2205.pdf)
## Release 22.01
- [SPDK 22.01 NVMe Bdev Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_nvme_bdev_perf_report_2201.pdf)
- [SPDK 22.01 NVMe Bdev PCIe Gen4 Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_nvme_bdev_gen4_perf_report_2201.pdf)
- [SPDK 22.01 NVMe-oF TCP Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_tcp_perf_report_2201.pdf)
- [SPDK 22.01 NVMe-oF RDMA Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_rdma_perf_report_2201.pdf)
- [SPDK 22.01 Vhost Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_vhost_perf_report_2201.pdf)
## Release 21.10
- [SPDK 21.10 NVMe Bdev Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_nvme_bdev_perf_report_2110.pdf)
- [SPDK 21.10 NVMe-oF TCP Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_tcp_perf_report_2110.pdf)
- [SPDK 21.10 NVMe-oF RDMA Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_rdma_perf_report_2110.pdf)
- [SPDK 21.10 Vhost Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_vhost_perf_report_2110.pdf)
## Release 21.07
- [SPDK 21.07 NVMe Bdev Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_nvme_bdev_perf_report_2107.pdf)
- [SPDK 21.07 NVMe-oF TCP Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_tcp_perf_report_2107.pdf)
- [SPDK 21.07 NVMe-oF RDMA Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_rdma_perf_report_2107.pdf)
- [SPDK 21.07 Vhost Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_vhost_perf_report_2107.pdf)
## Release 21.04
- [SPDK 21.04 NVMe Bdev Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_nvme_bdev_perf_report_2104.pdf)
- [SPDK 21.04 NVMe-oF TCP Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_tcp_perf_report_2104.pdf)
- [SPDK 21.04 NVMe-oF RDMA Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_rdma_perf_report_2104.pdf)
- [SPDK 21.04 Vhost Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_vhost_perf_report_2104.pdf)
## Release 21.01
- [SPDK 21.01 NVMe Bdev Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_nvme_bdev_perf_report_2101.pdf)
- [SPDK 21.01 NVMe-oF TCP Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_tcp_perf_report_2101.pdf)
- [SPDK 21.01 NVMe-oF RDMA Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_rdma_perf_report_2101.pdf)
- [SPDK 21.01 Vhost Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_vhost_perf_report_2101.pdf)
## Release 20.10
- [SPDK 20.10 NVMe Bdev Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_nvme_bdev_perf_report_2010.pdf)
- [SPDK 20.10 NVMe-oF TCP Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_tcp_perf_report_2010.pdf)
- [SPDK 20.10 NVMe-oF RDMA Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_rdma_perf_report_2010.pdf)
- [SPDK 20.10 Vhost Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_vhost_perf_report_2010.pdf)
## Release 20.07
- [SPDK 20.07 NVMe-oF TCP Performance Report](https://ci.spdk.io/download/performance-reports/SPDK_tcp_perf_report_2007.pdf)

View File

@ -1,56 +0,0 @@
# Linking SPDK applications with pkg-config {#pkgconfig}
The SPDK build system generates pkg-config files to facilitate linking
applications with the correct set of SPDK and DPDK libraries. Using pkg-config
in your build system will ensure you do not need to make modifications
when SPDK adds or modifies library dependencies.
If your application is using the SPDK nvme library, you would use the following
to get the list of required SPDK libraries:
~~~bash
PKG_CONFIG_PATH=/path/to/spdk/build/lib/pkgconfig pkg-config --libs spdk_nvme
~~~
To get the list of required SPDK and DPDK libraries to use the DPDK-based
environment layer:
~~~bash
PKG_CONFIG_PATH=/path/to/spdk/build/lib/pkgconfig pkg-config --libs spdk_env_dpdk
~~~
When linking with static libraries, the dependent system libraries must also be
specified. To get the list of required system libraries:
~~~bash
PKG_CONFIG_PATH=/path/to/spdk/build/lib/pkgconfig pkg-config --libs spdk_syslibs
~~~
Note that SPDK libraries use constructor functions liberally, so you must surround
the library list with extra linker options to ensure these functions are not dropped
from the resulting application binary. With shared libraries this is achieved through
the `-Wl,--no-as-needed` parameters while with static libraries `-Wl,--whole-archive`
is used. Here is an example Makefile snippet that shows how to use pkg-config to link
an application that uses the SPDK nvme shared library:
~~~bash
PKG_CONFIG_PATH = $(SPDK_DIR)/build/lib/pkgconfig
SPDK_LIB := $(shell PKG_CONFIG_PATH="$(PKG_CONFIG_PATH)" pkg-config --libs spdk_nvme
DPDK_LIB := $(shell PKG_CONFIG_PATH="$(PKG_CONFIG_PATH)" pkg-config --libs spdk_env_dpdk
app:
$(CC) -o app app.o -pthread -Wl,--no-as-needed $(SPDK_LIB) $(DPDK_LIB) -Wl,--as-needed
~~~
If using the SPDK nvme static library:
~~~bash
PKG_CONFIG_PATH = $(SPDK_DIR)/build/lib/pkgconfig
SPDK_LIB := $(shell PKG_CONFIG_PATH="$(PKG_CONFIG_PATH)" pkg-config --libs spdk_nvme
DPDK_LIB := $(shell PKG_CONFIG_PATH="$(PKG_CONFIG_PATH)" pkg-config --libs spdk_env_dpdk
SYS_LIB := $(shell PKG_CONFIG_PATH="$(PKG_CONFIG_PATH)" pkg-config --libs --static spdk_syslibs
app:
$(CC) -o app app.o -pthread -Wl,--whole-archive $(SPDK_LIB) $(DPDK_LIB) -Wl,--no-whole-archive \
$(SYS_LIB)
~~~

View File

@ -18,6 +18,4 @@ a new version of the *env* library. The new implementation can be
integrated into the SPDK build by updating the following line
in CONFIG:
```bash
CONFIG_ENV?=$(SPDK_ROOT_DIR)/lib/env_dpdk
```
CONFIG_ENV?=$(SPDK_ROOT_DIR)/lib/env_dpdk

View File

@ -1,67 +0,0 @@
# RPMs {#rpms}
## In this document {#rpms_toc}
* @ref building_rpms
* @ref dpdk_devel
## Building SPDK RPMs {#building_rpms}
To build basic set of RPM packages out of the SPDK repo simply run:
~~~{.sh}
# rpmbuild/rpm.sh
~~~
Additional configuration options can be passed directly as arguments:
~~~{.sh}
# rpmbuild/rpm.sh --with-shared --with-dpdk=/path/to/dpdk/build
~~~
There are several options that may be passed via environment as well:
- DEPS - Install all needed dependencies for building RPM packages.
Default: "yes"
- MAKEFLAGS - Flags passed to make
- RPM_RELEASE - Target release version of the RPM packages. Default: 1
- REQUIREMENTS - Extra set of RPM dependencies if deemed as needed
- SPDK_VERSION - SPDK version. Default: currently checked out tag
- GEN_SPEC - Orders rpm.sh to only generate a valid .spec and print
it on stdout. The content of the .spec is determined based
mainly on the ./configure cmdline passed to rpm.sh.
- USE_DEFAULT_DIRS - Normally, rpm.sh will order rpmbuild to build under
customizable set of directories. Since this may be not
desired, especially when used together with GEN_SPEC,
this option will preserve the default set of directories.
~~~{.sh}
# DEPS=no MAKEFLAGS="-d -j1" rpmbuild/rpm.sh --with-shared
~~~
By default, all RPM packages should be created under $HOME directory of the
target user:
~~~{.sh}
# printf '%s\n' /root/rpmbuild/RPMS/x86_64/*
/root/rpmbuild/RPMS/x86_64/spdk-devel-v21.01-1.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/spdk-dpdk-libs-v21.01-1.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/spdk-libs-v21.01-1.x86_64.rpm
/root/rpmbuild/RPMS/x86_64/spdk-v21.01-1.x86_64.rpm
#
~~~
- spdk - provides all the binaries, common tooling, etc.
- spdk-devel - provides development files
- spdk-libs - provides target lib, .pc files (--with-shared)
- spdk-dpdk-libs - provides dpdk lib files (--with-shared|--with-dpdk)
## Special case for dpdk-devel {#dpdk_devel}
When rpm.sh finds a bare --with-dpdk argument on the cmdline it will try to
adjust the behavior of the rpmbuild to make sure only SPDK RPMs are built.
Since this argument requests SPDK to be built against installed DPDK (e.g.
dpdk-devel package) the spdk-dpdk-libs RPM won't be included. Moreover, the
.spec will be armed with a build requirement to make sure dpdk-devel is
present on the building system. The minimum required version of dpdk-devel
is set to 19.11.

View File

@ -1,110 +0,0 @@
# Scheduler {#scheduler}
SPDK's event/application framework (`lib/event`) now supports scheduling of
lightweight threads. Schedulers are provided as plugins, called
implementations. A default implementation is provided, but users may wish to
write their own scheduler to integrate into broader code frameworks or meet
their performance needs.
This feature should be considered experimental and is disabled by default. When
enabled, the scheduler framework gathers data for each spdk thread and reactor
and passes it to a scheduler implementation to perform one of the following
actions.
## Actions
### Move a thread
`spdk_thread`s can be moved to another reactor. Schedulers can examine the
suggested cpu_mask value for each lightweight thread to see if the user has
requested specific reactors, or choose a reactor using whatever algorithm they
deem fit.
### Switch reactor mode
Reactors by default run in a mode that constantly polls for new actions for the
most efficient processing. Schedulers can switch a reactor into a mode that
instead waits for an event on a file descriptor. On Linux, this is implemented
using epoll. This results in reduced CPU usage but may be less responsive when
events occur. A reactor cannot enter this mode if any `spdk_threads` are
currently scheduled to it. This limitation is expected to be lifted in the
future, allowing `spdk_threads` to enter interrupt mode.
### Set frequency of CPU core
The frequency of CPU cores can be modified by the scheduler in response to
load. Only CPU cores that match the application cpu_mask may be modified. The
mechanism for controlling CPU frequency is pluggable and the default provided
implementation is called `dpdk_governor`, based on the `rte_power` library from
DPDK.
#### Known limitation
When SMT (Hyperthreading) is enabled the two logical CPU cores sharing a single
physical CPU core must run at the same frequency. If one of two of such logical
CPU cores is outside the application cpu_mask, the policy and frequency on that
core has to be managed by the administrator.
## Scheduler implementations
The scheduler in use may be controlled by JSON-RPC. Please use the
[framework_set_scheduler](jsonrpc.html#rpc_framework_set_scheduler) RPC to
switch between schedulers or change their options. Currently only dynamic
scheduler supports changing its parameters.
[spdk_top](spdk_top.html#spdk_top) is a useful tool to observe the behavior of
schedulers in different scenarios and workloads.
### static [default]
The `static` scheduler is the default scheduler and does no dynamic scheduling.
Lightweight threads are distributed round-robin among reactors, respecting
their requested cpu_mask, only at application startup, and then they are never
moved. This is equivalent to the previous behavior of the SPDK event/application
framework.
The `static` scheduler cannot be re-enabled after a different scheduler has been
selected, because currently there is no way to save original SPDK thread distribution
configuration.
### dynamic
The `dynamic` scheduler is designed for power saving and reduction of CPU
utilization, especially in cases where workloads show large variations over
time. In SPDK thread and core workloads are measured in CPU ticks. Those
values are then compared with all the ticks since the last check, which allows
to calculate `busy time`.
`busy time = busy ticks / (busy tick + idle tick) * 100 %`
The thread is considered to be active, if its busy time is over the `load limit`
parameter.
Active threads are distributed equally among reactors, taking cpu_mask into
account. All idle threads are moved to the main core. Once an idle thread becomes
active, it is redistributed again. Dynamic scheduler monitors core workloads and
redistributes SPDK threads on cores in a way that none of them is over `core limit`.
In case a core utilization surpasses this threshold, scheduler should move threads
out of it until this condition no longer applies. Cores might also be in overloaded
state, which indicates that moving threads out of this core will not decrease its
utilization under the `core limit` and the threads are unable to process all the I/O
they are capable of, because they share CPU ticks with other threads. The threshold
to decide if a core is overloaded is called `core busy`. Note that threads residing
on an overloaded core will not perform as good as other threads, because the CPU ticks
intended for them are limited by other threads on the same core.
When a reactor has no scheduled `spdk_thread`s it is switched into interrupt
mode and stops actively polling. After enough threads become active, the
reactor is switched back into poll mode and threads are assigned to it again.
The main core can contain active threads only when their execution time does
not exceed the sum of all idle threads. When no active threads are present on
the main core, the frequency of that CPU core will decrease as the load
decreases. All CPU cores corresponding to the other reactors remain at maximum
frequency.
The dynamic scheduler is currently the only one that allows manual setting of
its parameters.
Current values of scheduler parameters can be displayed by using
[framework_get_scheduler](jsonrpc.html#rpc_framework_get_scheduler) RPC.

View File

@ -1,149 +0,0 @@
# shfmt {#shfmt}
## In this document {#shfmt_toc}
* @ref shfmt_overview
* @ref shfmt_usage
* @ref shfmt_installation
* @ref shfmt_examples
## Overview {#shfmt_overview}
The majority of tests (and scripts overall) in the SPDK repo are written
in Bash (with a quite significant emphasis on "Bashism"), thus a style
formatter, shfmt, was introduced to help keep the .sh code consistent
across the entire repo. For more details on the tool itself, please see
[shfmt](https://github.com/mvdan/sh).
We also advise to use 4.4 Bash as a minimum version to make sure scripts
across the whole repo work as intended.
## Usage {#shfmt_usage}
On the CI pool, the shfmt is run against all the updated .sh files that
have been committed but not merged yet. Additionally, shfmt will pick
all .sh present in the staging area when run locally from our pre-commit
hook (via check_format.sh). In case any style errors are detected, a
patch with needed changes is going to be generated and either build (CI)
or the commit will be aborted. Said patch can be then easily applied:
~~~{.sh}
# Run from the root of the SPDK repo
patch --merge -p0 <shfmt-3.1.0.patch
~~~
The name of the patch is derived from the version of shfmt that is
currently in use (3.1.0 is currently supported).
Please, see ./scripts/check_format.sh for all the arguments the shfmt
is run with. Additionally, @ref shfmt_examples has more details on how
each of the arguments behave.
## Installation {#shfmt_installation}
The shfmt can be easily installed via pkgdep.sh:
~~~{.sh}
./scripts/pkgdep.sh -d
~~~
This will install all the developers tools, including shfmt, on the
local system. The precompiled binary will be saved, by default, to
/opt/shfmt and then linked under /usr/bin. Both paths can be changed
by setting SHFMT_DIR and SHFMT_DIR_OUT in the environment. Example:
~~~{.sh}
SHFMT_DIR=/keep_the_binary_here \
SHFMT_DIR_OUT=/and_link_it_here \
./scripts/pkgdep.sh -d
~~~
## Examples {#shfmt_examples}
~~~{.sh}
#######################################
if foo=$(bar); then
echo "$foo"
fi
exec "$foo" \
--bar \
--foo
# indent_style = tab
if foo=$(bar); then
echo "$foo"
fi
exec foobar \
--bar \
--foo
######################################
if foo=$(bar); then
echo "$foo" && \
echo "$(bar)"
fi
# binary_next_line = true
if foo=$(bar); then
echo "$foo" \
&& echo "$(bar)"
fi
# Note that each break line is also being indented:
if [[ -v foo ]] \
&& [[ -v bar ]] \
&& [[ -v foobar ]]; then
echo "This is foo"
fi
# ->
if [[ -v foo ]] \
&& [[ -v bar ]] \
&& [[ -v foobar ]]; then
echo "This is foo"
fi
# Currently, newlines are being escaped even if syntax-wise
# they are not needed, thus watch for the following:
if [[ -v foo
&& -v bar
&& -v foobar ]]; then
echo "This is foo"
fi
#->
if [[ -v foo && -v \
bar && -v \
foobar ]]; then
echo "This is foo"
fi
# This, unfortunately, also breaks the -bn behavior.
# (see https://github.com/mvdan/sh/issues/565) for details.
######################################
case "$FOO" in
BAR)
echo "$FOO" ;;
esac
# switch_case_indent = true
case "$FOO" in
BAR)
echo "$FOO"
;;
esac
######################################
exec {foo}>bar
:>foo
exec {bar}<foo
# -sr
exec {foo}> bar
: > foo
exec {bar}< foo
######################################
# miscellaneous, enforced by shfmt
(( no_spacing_at_the_beginning & ~and_no_spacing_at_the_end ))
: $(( no_spacing_at_the_beginning & ~and_no_spacing_at_the_end ))
# ->
((no_spacing_at_the_beginning & ~and_no_spacing_at_the_end))
: $((no_spacing_at_the_beginning & ~and_no_spacing_at_the_end))
~~~

Some files were not shown because too many files have changed in this diff Show More