Commit Graph

9784 Commits

Author SHA1 Message Date
Konrad Sztyber
cff39ee7d5 nvme: add missing \n in ctrlr init fail log
Additionally, print the string representation of the ctrlr state, as it
makes debugging init failures much easier.

Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: I572ef3d6f7d5bbd52039a8872733578c92be4c4a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15305
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2022-11-08 08:20:26 +00:00
Richael Zhuang
cabbb25d5d bdev: add API to get submit tsc of a bdev I/O
Add API spdk_bdev_io_get_submit_tsc to get submit tsc of a bdev I/O,
which can be used in bdev modules to avoid calling expensive
spdk_get_ticks().

Change-Id: Ifbcecb1bc663344997c5e73b72a1dfb5d0422946
Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14989
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
2022-11-04 10:15:46 +00:00
Denis Nagorny
c273513401 nvme/rdma: Allows to use PCI Express Relaxed Ordering
This fix allows to use relaxed ordering feature where it is
supported. libibversb checks with the driver if relaxed ordering
access flag is supported and ignores it if not.

Experiments show that set by default it doesn't spoil performance but
allows to reach desired one on AMD EPYC systems. For example fio read
test (ConnectX-6, AMD EPYC 7763, two jobs, queue depth 32, block size
32K) can starve down to 6-7 GiB/s without it. Enabling this option
allows to get bandwidth more than 21 GiB/s.

Change-Id: I5983aed5d1f38ee7bec9c310597731c9a6a329da
Signed-off-by: Denis Nagorny <denisn@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14885
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2022-11-04 10:15:31 +00:00
Thanos Makatos
b8fc75c36e nvmf/vfio-user: ensure BAR5 isn't 0
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Change-Id: I60a39c8a311879b7d6c7c82df0abd7a69f9a2778
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14933
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-11-04 10:10:33 +00:00
Thanos Makatos
bad452d25e nvmf/vfio-user: calculate doorbells based on number of queue pairs
It doesn't make sense to have the size of the doorbells fixed and then
calculate the maximum number of queue pairs based on it, do it the other
way round. Also, add some sanity checks based on the spec.

Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Change-Id: I17e3509fb0a011128ca089ce78b7a296262e6f8e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14932
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2022-11-04 10:10:33 +00:00
Alexey Marchuk
0fec09fc50 bdev/part: Call bdev*_with_md even if md is NULL
The bdev*_with_md APIs now allow to pass NULL md
pointer, so calling this function without checking
for metadata simplifies code

Signed-off-by: Alexey Marchuk <alexeymar@nvidia.com>
Change-Id: I364a646630bd36120231ea87a41fea05df51befb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15090
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
2022-11-03 14:54:41 +00:00
Shuhei Matsumoto
d683d7b792 bdev/part: Modify spdk_bdev_part_submit_request() to use custom completion callback
In the following patches, we will add a feature to inject data
corruption to the error bdev module. For read I/O, we will have
to inject data corruption at completion. However, if we use
spdk_bdev_part_submit_request(), it will not be possible because we
cannot add any custom operation into the completion callback.
To fix the issue, modify spdk_+bdev_part_submit_request() and
rename it to spdk_bdev_part_submit_request_ext().
Fortunately, we can use stored_user_cb in struct spdk_bdev_io.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I46d3c40ea88a3fedd8a8fef6b68ee417c814a7a1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15002
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
2022-11-03 14:54:28 +00:00
Changpeng Liu
fabf6a83cc lib/vhost: remove session initialized flag
Session in vhost means an active socket connection from
client(e.g: QEMU or SPDK vhost initiator), but the device
state could be `started` or `stopped` because users may
remove the driver of the device in VM, so in
`foreach_session` we can always call the callback function
without checking the session state, and the callback function
may check the device state if necessary.

Change-Id: Id0fc8c7f6f0915a55a738f0c87ebe6539f7fb2db
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15038
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-11-03 14:53:55 +00:00
Changpeng Liu
9da4e15c5c lib/vhost: start device asynchronously
Now we will start the device(virtio-blk and virtio-scsi) when
there is a valid I/O queue(VRING_KICK message), the backend
device `start_session` callback will ensure this check, so
when processing VRING_KICK messages for each vring, we can
just call `new_device` if `started` is false, and if `started`
is true, it means the device is already started, it's safe
for us to add one more vring even the device is started.

With this change, we don't need to wait for the return value
of `start_session` in synchronous mode, just return is OK.

Fix #2518.

Change-Id: I92ba3d4e5c38422d7697c1d13180a4a48f0dd4cd
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14981
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-11-03 14:53:55 +00:00
Changpeng Liu
23baa6761d lib/vhost: don't restart device multiple times
We will stop/start the device multiple times when a new
vring is added, and also stop/start the device when
set vring's callfd, actually we only need to start
the device after a I/O queue is enabled, DPDK rte_vhost
will not help us to start the device in some scenarios,
so this is controlled in SPDK.

Now we improve the workaround to make it consistent with
vhost-user specification.

For each SET_VRING_KICK message, we will setup the new
added vring, and then we try to start the device.

For each SET_VRING_CALL message, we will add one more
interrupt count, previously this is done when enable
the vring, which is not accurate.

For each GET_VRING_BASE message, we will stop the
device before the first message.

With above changes, we will start/stop the device once,
any new added vrings after starting the device will be
polled in next `vdev_worker` poller.

Change-Id: I5a87c73d34ce7c5f96db7502a68c5fa2cb2e4f74
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14928
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-11-03 14:53:55 +00:00
Changpeng Liu
b7facb30f8 lib/vhost_scsi: don't start device before a valid I/O queue is enabled
Change-Id: I407c62df2117069ad1d8f6aba18cf316a3cf47bf
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14980
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-11-03 14:53:55 +00:00
Changpeng Liu
9cdd1a8a2c lib/vhost: remove vhost_session_used_signal function
`vdev_worker` in vhost-scsi is used to process request queues,
and `vdev_mgmt_worker` is used to process the event and control
queue, so we don't need to call `vhost_session_used_signal` in
`vdev_worker`, just remove it.

Change-Id: I86f3e90890e6defba69b01fec131afe1adad3a49
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14927
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-11-03 14:53:55 +00:00
Changpeng Liu
7fcbd0220e lib/vhost: alloc VQ tasks in VQ setting function
Currently we will allocate all VQ's tasks when starting
the device, it will not allow us to add new VQ after
starting the device, so here, we move it to VQ setting
function.

Change-Id: I59cfc393d66779ab8a0eb704bc73bcede3f0a2a0
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14926
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-11-03 14:53:55 +00:00
Changpeng Liu
d55bf60a89 lib/vhost: move vq settings into a function
With this change, then we can call vq settings after the
VRING_KICK message, currently we will stop/start device
multiple times when a new vq is added.

Change-Id: Icba3132f269b5b073eaafaa276ceb405f6f17f2a
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14925
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-11-03 14:53:55 +00:00
Changpeng Liu
a1cd28c6f3 lib/vhost: get negotiated features after SET_FEATURES message
Feature negotiation is done after SET_FEATURES message, here we
move it in this message context, so that we can use the negotiated
features before starting the device.

Change-Id: Ic6388dbcebd72bc5ef182e65798d34c07f6fc35c
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14924
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-11-03 14:53:55 +00:00
Changpeng Liu
835490b1d5 lib/vhost: check memory table earlier
Before starting a device, the memory table is already
there, so we can check it earlier.

Change-Id: I4996705501577cfa78c89621f7081eb0c3d4dd78
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14923
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-11-03 14:53:55 +00:00
Changpeng Liu
d941d138ad lib/vhost: merge vq settings into a single loop
Change-Id: I5a9ef59adcd383e2fae746a434dda10893a3b84a
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14922
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-11-03 14:53:55 +00:00
GangCao
7f7b468b48 lib/bdev: new __io_ch_to_bdev_ch and __io_ch_to_bdev_mgmt_ch utilities
Change-Id: Ie7d818a9a648e28cd191588164420173149af38b
Signed-off-by: GangCao <gang.cao@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15167
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-11-02 15:25:21 +00:00
GangCao
cb55e8493f Lib/Bdev: update calling to spdk_bdev_for_each_channel
Change-Id: I541ccffc90e7dc54b416da385e862e952d9db71d
Signed-off-by: GangCao <gang.cao@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14638
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-11-02 15:25:21 +00:00
Jim Harris
5497616e8f env_dpdk: add support for DPDK 22.11
DPDK has merged changes which hide remove some DPDK
object such as rte_device and rte_driver from the
public API.

So we add copies of the necessary header files into
our tree, along with a 22.11-specific pci_dpdk
implementation.

These files are copied over exactly, except for one
#include which needs to change from <> to "" so that
it picks up the header in our tree instead of looking
for it in system headers.

Longer-term we may want to look at ways to automated
checking and updating of these header files.  DPDK 22.11
isn't officially released yet, so the header files could
change, but we want to get this in now since without
it SPDK cannot build against DPDK tip at all.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I89ffd0abab52c404cfff911c1c9b0cd9e889241d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14570
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
2022-11-02 10:50:23 +00:00
Evgeniy Kochetov
8c3590a983 bdev: Add copy IO statistics
Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com>
Change-Id: Id51ac80bce33a27a8ccea273c076f39019b98339
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14348
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Community-CI: Mellanox Build Bot
2022-11-02 10:33:00 +00:00
Evgeniy Kochetov
a383a15fb1 bdev/part: Add copy IO type support
Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com>
Change-Id: I9e2dcf29794fdb9535a4f0282b3046602f09188e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14385
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Community-CI: Mellanox Build Bot
2022-11-02 10:33:00 +00:00
Evgeniy Kochetov
d14afd5000 bdev: Add copy IO type
Copy operation is defined by source and destination LBAs and LBA count
to copy. For destiantion LBA and LBA count we reuse exiting fields
`offset_blocks` and `num_blocks` in `struct spdk_bdev_io`. For source
LBA new field `src_offset_blocks` was added.

`spdk_bdev_get_max_copy()` function can be used to retrieve maximum
possible unsplit copy size. Zero values means unlimited. It is allowed
to submit larger copy size but it will be split into several bdev IOs.

Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com>
Change-Id: I2ad56294b6c062595c026ffcf9b435f0100d3d7e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14344
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Community-CI: Mellanox Build Bot
2022-11-02 10:33:00 +00:00
GangCao
e28e247954 RPC/Bdev: display the per channel IO statistics for required Bdev
Add a new parameter "-c" to display the per channel IO statistics
for required Bdev

./scripts/rpc.py bdev_get_iostat -b Malloc0 -h
usage: rpc.py [options] bdev_get_iostat [-h] [-b NAME] [-c]

optional arguments:
  -h, --help            show this help message and exit
  -b NAME, --name NAME  Name of the Blockdev. Example: Nvme0n1
  -c, --per-channel     Display per channel IO stats for specified device

This could give more intuitive information on each channel's processing
of the IOs with the associated thread on the same Bdev.

Please also be aware that the IO statistics are collected from SPDK
thread's related channel's information. So that it is more relating
to the SPDK thread. And in the dynamic scheduling case, different
SPDK thread could be running on the same Core.

In this case, any seperate channel's IO statistics are returned to
the RPC call and if needed, further parse of the data is needed to
get the per Core information although usually there is one thread
per Core.

On the other hand, user could run the framework_get_reactors RPC
method to get the relationship of the thread and CPU Cores so as
to get the precise information of IO runnings on each thread and
each Core for the same Bdev.

Change-Id: I39d6a2c9faa868e3c1d7fd0fb6e7c020df982585
Signed-off-by: GangCao <gang.cao@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/13011
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
2022-10-28 06:51:19 +00:00
GangCao
f0494649e3 Lib/Bdev: add the new API spdk_bdev_for_each_channel
And also related function pointers and APIs:
	spdk_bdev_for_each_channel_msg;
	spdk_bdev_for_each_channel_done;
	spdk_bdev_for_each_channel_continue;

Change-Id: I52f0f6f27717d53c238faf2f998810c9c5ee45d4
Signed-off-by: GangCao <gang.cao@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14614
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Mellanox Build Bot
2022-10-28 06:51:19 +00:00
Shuhei Matsumoto
6a5ecb3276 bdev/part: Consolidate all I/O types into bdev_part_complete_io()
The following patches will allow the caller to specify a custom
completion callback to spdk_bdev_part_submit_request(). To do it
easily, consolidate completions of all I/O types into
bdev_part_complete_io().

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I083695189daa7e5271787c50947e428d01a83677
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15001
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-10-28 06:49:40 +00:00
Shuhei Matsumoto
ab839831f1 nvme_rdma: Remove workaround for Soft RoCE's bug from cq_process_completions()
We do not support Soft RoCE anymore. Remove a workaround for Soft RoCE's
bug that we amy receive a completion without error status after qpair is
disconnected/destroyed. Then add a assert to check if rdma_req->req is
not NULL. This will simplify the code and the following patches.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I80c349053adc0f79679eaf8a5d7265d555d3c2b0
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14909
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2022-10-28 06:27:19 +00:00
Shuhei Matsumoto
1439f9c773 nvme_rdma: Pass poller instead of poll_group to cq_process_completions()
The following patches will support SRQ and SRQ will be per poller.
We will need SRQ in nvme_rdma_cq_process_completions().

It is not possible to identify poller if poll_group is passed to
nvme_rdma_cq_process_completions().

Based on these thoughts, add poll_group pointer to poller and pass
poller to nvme_rdma_cq_process_completions() instead of poll_group.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Signed-off-by: Denis Nagorny <denisn@nvidia.com>
Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com>
Change-Id: I322a7a0cc08bdcc8e87e720ad65dd8f0b6ae9112
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14282
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
2022-10-28 06:27:19 +00:00
Shuhei Matsumoto
194047249b nvme_rdma: Get qpair from poll group using WC
NVMe-RDMA target has a helper function get_rdma_qpair_from_wc() and
uses it to identify a qpair from a WC.

NVMe-RDMA initiator has a similar function
nvme_rdma_poll_group_get_qpair_by_id().

NVMe-RDMA initiator will support SRQ in the following patches, and
it will want to identify a qpair from a WC.

get_rdma_qpair_from_wc() of NVMe-RDMA target uses wc->qp_num internally
anyway.

However, the upcoming custom transport for RDMA will have to use other
variables of WC.

Hence, it will be convenient to pass WC instead of qp_num if we consider
future enhancements.

Based on these thoughts, for NVMe-RDMA initiator rename
nvme_rdma_poll_group_get_qpair_by_id() by get_rdma_qpair_from_wc().
remove unnecessary declaration, and pass WC instead of qp_num.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Signed-off-by: Denis Nagorny <denisn@nvidia.com>
Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com>
Change-Id: I01ead4730207e2c6ac53b83f151bd5f977a11465
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14279
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
2022-10-28 06:27:19 +00:00
Shuhei Matsumoto
6ea9de5fc8 nvme_rdma: Factor out poller destroy operation
Poller will have more shared resources when SRQ is supported.
This is a preparation.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Signed-off-by: Denis Nagorny <denisn@nvidia.com>
Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com>
Change-Id: Ic3d1cb93dde3f53653a9536a103e5518cebd58e1
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14173
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
2022-10-28 06:27:19 +00:00
Shuhei Matsumoto
6a59daad2b nvme_rdma: Poll disconnect until completion if async mode is disabled
nvme_rdma_ctrlr_disconnect_qpair() does not poll the qpair until it is
actually disconnected if it is in a poll group even if its async mode
is disabled. Hence, spdk_nvme_ctrlr_free_io_qpair() removes the qpair
from a poll group when it is being disconnected.

On the other hand, I/O qpair is destroyed after it is actually
disconnected.

When SRQ is enabled and used, a SRQ is destroyed if the corresponding
poller does not have any I/O qpair after an I/O qpair is removed from
the poller.

In particular, if we use spdk_nvme_ctrlr_free_io_qpair(), a SRQ is
destroyed before the corresponding I/O qpairs are destroyed.
Destroying a SRQ failed because it is still referenced by I/O qpairs.

This bug was found when running the SPDK NVMe perf tool with SRQ.

The reason was we had nvme_rdma_poll_group_process_completions() to call
disconnected_qpair_cb after the qpair is actually disconnected.
However, it is ensured that nvme_rdma_poll_group_process_completions()
calls disconnected_qpair_cb for any disconnected qpair.

Hence, remove a check if qpair->poll_group is not NULL from
nvme_rdma_ctrlr_disconnect_qpair() and update the comment.

Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I0fde0d827eec3280e1cc5a0fce34d163a6069bc4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14908
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
2022-10-28 06:27:19 +00:00
Vasuki Manikarnike
3fcee8ddcc lib/nvme: Do not submit queued aborts if adminq is in failed state.
With RDMA, the admin poller can experience a remote disconnect when
processing completions. The admin qpair will be disconnected to handle
this. The disconnect code path will manually complete queued aborts.
However, the completion callback for the abort will attempt to resubmit
other queued aborts from the queue, which will result in a very large
stack and can eventually cause a segfault.
The fix is to not resubmit queued aborts if the admin qpair is in any
kind of failed state.

Change-Id: I4a6f959232c8a1bd30c87ca50459014e556cbaa0
Signed-off-by: Vasuki Manikarnike <vasuki.manikarnike@hpe.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15114
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
2022-10-28 06:26:20 +00:00
Szulik, Maciej
51ae6d4002 nvme/tcp: add max_completion exit condition to loop inside read_pdu
A loop inside 'nvme_tcp_qpair_process_completions' makes
'max_completions' actually behaving like a minimum:
	do {
		rc = nvme_tcp_read_pdu(tqpair, &reaped);
		[...]
	} while (reaped < max_completions);

Before this change 'max_completion' constraint, in its true sense,
was actually not respected and a loop inside 'nvme_tcp_read_pdu'
could be executed indefinitely as long as a recv state changed.

To prevent this behavior, max_completion must be passed to
'nvme_tcp_read_pdu' and used as an additional exit condition.

Signed-off-by: Szulik, Maciej <maciej.szulik@intel.com>
Change-Id: I28da962f4a62f08ddb51915b5d0dae9611a82dee
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15136
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
2022-10-26 07:35:21 +00:00
John Levon
36dfcca2b4 nvmf/vfio-user: switch from shadow doorbells when freeing
Some reset/disable paths are freeing the shadow doorbells without
switching the SQs back to BAR0. Fix this up, and add a small cleanup
when initializing the shadow doorbells.

Signed-off-by: John Levon <john.levon@nutanix.com>
Change-Id: Ia5e5b91b7dc696a558eb0ad59cc554abced47cca
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14901
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2022-10-26 07:32:54 +00:00
John Levon
64db53f1aa nvmf/vfio-user: support multiple poll groups in interrupt mode
To support SQs allocated to a poll group other than the controller's
main poll group, we need to make sure to poll those SQs when we wake up
and handle the controller interrupt. As they will be running in a
separate SPDK thread, we will arrange for all poll groups to wake up
when we receive an interrupt corresponding to a vfio-user message
arriving.

This can mean needless wakeups: we don't (yet) have a mechanism to only
wake up the poll groups that correspond to a particular SQ write.

Additionally, as we don't have any notion of a poll group per
controller, this ends up polling all SQs in the entire poll group, not
just the ones corresponding to the controller we were handling.

As this has potential performance issues in many cases, it defaults to
disabled.

Signed-off-by: John Levon <john.levon@nutanix.com>
Change-Id: I3d9f32625529455f8d55578ae9cd7b84265f67ab
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14120
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2022-10-26 07:32:54 +00:00
liu.darong
7e17de3d81 bdev/trace: add support to trace with bdev name
Fixes #2585

Signed-off-by: liu.darong <liu.darong@xsky.com>
Change-Id: I3f9b6d4719b5eed004f383e86db8a17b8b0287f5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/13823
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-10-25 07:12:52 +00:00
Anton
7ba33f49f0 lib/idxd: fix use after free due to stale crc_dst in chained ops
When crc32c is invoked with a multiple entry input iov,
only the last op has crc_dst set in order to write the final
crc value into the user supplied location.

spdk_idxd_process_events() for every successfully completed
CRC op writes the value into *op->crc_dst
UNLESS it is NULL.

The problem is that _idxd_prep_batch_cmd() that allocates
new ops left op->crc_dst uninitialized.

This results in a memory corruption (use after free)
in the following scenario:
1) op A is allocated an crc_dst is set to point to user memory X.
2) Op A is compeleted
3) User memory X is freed.
4) Ops B and C are allocated (chained), C has crc_dst set.
   => B reused op A memory and crc_dst still points to the
   now stale user location (1)
5) B is complered, spdk_idxd_process_events() writes into X
   as B->crc_dst = X.

Fix: _idxd_prep_batch_cmd() should initialize crc_dst to NULL.

Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Change-Id: I9e7d57ec43a8fbcb3750906015a5cb7291278c35
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15115
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-10-25 07:10:55 +00:00
paul luse
13597fd4f1 accel_sw: add extra check on compression
We were missing a check when ISAL uses the complete output buffer
on compression to determine whether it was s perfect fit or if
simply not enough buffer was provided.

Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I73532666f50cb9fbef3c42f6bfb25fc5c7de01c6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14874
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
2022-10-25 07:09:37 +00:00
Krzysztof Karas
a74c8c2e8c scheduler: prevent user from switching back to static
Prevent user from switching back to static scheduler after
different scheduler has been selected. Currently we
do not have a way to save initial thread distribution
configuration, so each time user switches from dynamic
scheduler back to static, the SPDK threads may end up on
different reactors. This would cause discrepancy in
performance statistics of SPDK managed by static scheduler.

Change-Id: Ic17a6be55eaea0e1a748f92e01f7075540403637
Signed-off-by: Krzysztof Karas <krzysztof.karas@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15055
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2022-10-21 07:33:06 +00:00
Jim Harris
a9be4f2c2f trace: add likely/unlikely hints to _spdk_trace_record
This helps generate slightly better code in this function,
which can have a noticeable impact for high trace
event workloads.

Tested with bdevperf, single malloc or null bdev,
qd=32, 512B randreads on a single Xeon core.
Specify "-e bdev" to enable bdev trace events.

Null:
Before: 8.09M/s (123ns per IO)
After: 8.68M/s (115ns per IO)

Malloc:
Before: 4.21M/s (237ns per IO)
After: 4.34M/s (230ns per IO)

Note that each bdev I/O generates two trace events (START
and END) - meaning this change removes 7-8ns of overhead
for every 2 trace events, at least on my system.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I7021b7f9e28b4a7cb16f8a97b4d4004ae165efd2
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15096
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
2022-10-21 07:18:37 +00:00
Alexey Marchuk
c77b537786 accel: Save overridden options in json config file
Signed-off-by: Alexey Marchuk <alexeymar@nvidia.com>
Change-Id: Ida2c6f1c460c2b66d2d4159d225036377e488e62
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14856
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
2022-10-19 07:47:58 +00:00
Anton Eidelman
c2c8b4ebc7 lib/idxd: fix bug in crc32c with chained ops
When spdk_idxd_submit_crc32c() handles input
with multiple iovs (or multiple ops are generated
due to physically discontinuous buffers),
the first op has the original seed, while the
subsequent ops instruct the hardware to
to fetch the seed from the output of the previous op
(op->hw.crc32c_val):
        void *prev_crc;
        ...
        desc->flags |= IDXD_FLAG_FENCE | IDXD_FLAG_CRC_READ_CRC_SEED;
        desc->crc32c.addr = (uint64_t)prev_crc;  <<< virtual addr

The problem is the prev_crc is a virtual address,
so the hardware (at least with no IOMMU configured)
reports: DSA_COMP_HW_ERR1
spdk_idxd_process_events: Completion status 0x20

Solution:
Set crc32c.addr to the physical address of
the crc32c_val field in the previous desc.
Since desc->completion_addr already holds the physical address
of the dsa_hw_comp_record, we use this with the crc32c_val offset.

Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Change-Id: I330e98c2f3fd6da5cb4fc03d0745df09a9ff0e0c
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14954
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: John Kariuki <John.K.Kariuki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-10-18 07:24:55 +00:00
Konrad Sztyber
1f3a6b0398 rpc: use rw access when creating RPC lock file
It allows the users to specify the path to the RPC socket on a NFS
mounted filesystem.  This is necessary, because flock(2) on NFS requires
write access to place an exclusive lock.

Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Change-Id: If197498ed5bdcb4e02c5f2f2b2c1ef388872c457
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14993
Reviewed-by: Jacek Kalwas <jacek.kalwas@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
2022-10-18 07:23:28 +00:00
GangCao
f20b99bbb3 lib/nvme/vfio: destruct ctrlr in failed cases
Change-Id: Ie7d7ab25055c26ea1c2ae4997bf7197a170de989
Signed-off-by: GangCao <gang.cao@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15005
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2022-10-17 12:52:55 +00:00
Szulik, Maciej
dcf30711ef build: add explicit vars init to silence LTO related warning
When Link Time Optimization is enabled, compiler can sometimes produce
additional warnings saying that some variables may be uninitialized.

To supress the warning it is enough to add explicit initialization
of the variable causing the issue, in this case '*module_name = NULL'
and "*writer = NULL".

Signed-off-by: Szulik, Maciej <maciej.szulik@intel.com>
Change-Id: I30492115b28a18554b08a6f575cbcc9538f3b848
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14849
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jacek Kalwas <jacek.kalwas@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2022-10-05 10:24:53 +00:00
GangCao
8afb3d0037 lib/bdev: return error when failing to get resource
To fix issue: 2719

Change-Id: I983ef607fad154608fff9bb9355645968caf0c5a
Signed-off-by: GangCao <gang.cao@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14746
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
2022-10-04 07:07:04 +00:00
Tomasz Zawadzki
f98ac63ea7 reactor: do not switch mode for threads in non interrupt tgt
Fixes #2693

spdk threads should not be placed in interrupt mode
if the application does not have interrupt mode enabled.

This resulted in race condition, while reactor was placed
in interrupt mode, thread was scheduled on it.
Such operation is a valid one, but never should be attempt
to change the threads mode in this case.

Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I10b0bbacac1df812badb91b37064528f66743e51
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14815
Reviewed-by: Michal Berger <michal.berger@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
2022-09-30 16:14:10 +00:00
Tomasz Zawadzki
c34f15e09c env_dpdk: keep DPDK 20.11 compatiblity
Patch below added copies of pci realted headers to keep
compatiblity with <= DPDK 22.07.
(1eb35ac) env_dpdk: add copies of 22.07 pci-related header files

Unfortunetly the rte_bus/bus_pci/dev headers from DPDK 22.07 are
not compatibile going back to DPDK 20.11.

The issues are:
- lack of RTE_TAILQ_ENTRY defined in rte_os.h
- rte_intr_handle being part of rte_pci_device rather than pointer

pci_dpdk_2207.c even before this patch is not binary compatible with
DPDK 20.11 - see pci_device_*_interrupt_2207() functions.
There would need to be another copy of headers matching that version
of DPDK to resolve this issue.

SPDK supports up to two latest LTS releases. Which right now includes
DPDK 20.11, but soon will be dropped due to DPDK 22.11 release.

Having compile time defines here, keeps the older DPDK working.
Meanwhile backwards compatiblity in SPDK is no worse than before.
The recent changes to env_dpdk, are aiming to improve support
with newer versions of DPDK.

Change-Id: If4dc601cb03e18c2cad61f3a93080e8265ca5fcc
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14795
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2022-09-30 15:56:33 +00:00
Artur Paszkiewicz
a51649faf6 bdev: use write_unit_size for acwu and write_zeroes
Change-Id: Idbcfc110c153a62082f84f3304f1e245f2fc3daf
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14716
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2022-09-29 22:52:45 +00:00
Artur Paszkiewicz
69c448a30e lib/util: add ISA-L accelerated xor generation
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Change-Id: I3ef9dadb4c68e92760c8426f0fffb7b249829e2b
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12080
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
2022-09-29 22:52:45 +00:00