ivampiresp/Spdk - Spdk - Leaflow Developers

Author	SHA1	Message	Date
Konrad Sztyber	cff39ee7d5	nvme: add missing \n in ctrlr init fail log Additionally, print the string representation of the ctrlr state, as it makes debugging init failures much easier. Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com> Change-Id: I572ef3d6f7d5bbd52039a8872733578c92be4c4a Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15305 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>	2022-11-08 08:20:26 +00:00
Richael Zhuang	cabbb25d5d	bdev: add API to get submit tsc of a bdev I/O Add API spdk_bdev_io_get_submit_tsc to get submit tsc of a bdev I/O, which can be used in bdev modules to avoid calling expensive spdk_get_ticks(). Change-Id: Ifbcecb1bc663344997c5e73b72a1dfb5d0422946 Signed-off-by: Richael Zhuang <richael.zhuang@arm.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14989 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>	2022-11-04 10:15:46 +00:00
Denis Nagorny	c273513401	nvme/rdma: Allows to use PCI Express Relaxed Ordering This fix allows to use relaxed ordering feature where it is supported. libibversb checks with the driver if relaxed ordering access flag is supported and ignores it if not. Experiments show that set by default it doesn't spoil performance but allows to reach desired one on AMD EPYC systems. For example fio read test (ConnectX-6, AMD EPYC 7763, two jobs, queue depth 32, block size 32K) can starve down to 6-7 GiB/s without it. Enabling this option allows to get bandwidth more than 21 GiB/s. Change-Id: I5983aed5d1f38ee7bec9c310597731c9a6a329da Signed-off-by: Denis Nagorny <denisn@nvidia.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14885 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2022-11-04 10:15:31 +00:00
Thanos Makatos	b8fc75c36e	nvmf/vfio-user: ensure BAR5 isn't 0 Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com> Change-Id: I60a39c8a311879b7d6c7c82df0abd7a69f9a2778 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14933 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2022-11-04 10:10:33 +00:00
Thanos Makatos	bad452d25e	nvmf/vfio-user: calculate doorbells based on number of queue pairs It doesn't make sense to have the size of the doorbells fixed and then calculate the maximum number of queue pairs based on it, do it the other way round. Also, add some sanity checks based on the spec. Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com> Change-Id: I17e3509fb0a011128ca089ce78b7a296262e6f8e Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14932 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2022-11-04 10:10:33 +00:00
Alexey Marchuk	0fec09fc50	bdev/part: Call bdev_with_md even if md is NULL The bdev_with_md APIs now allow to pass NULL md pointer, so calling this function without checking for metadata simplifies code Signed-off-by: Alexey Marchuk <alexeymar@nvidia.com> Change-Id: I364a646630bd36120231ea87a41fea05df51befb Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15090 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com>	2022-11-03 14:54:41 +00:00
Shuhei Matsumoto	d683d7b792	bdev/part: Modify spdk_bdev_part_submit_request() to use custom completion callback In the following patches, we will add a feature to inject data corruption to the error bdev module. For read I/O, we will have to inject data corruption at completion. However, if we use spdk_bdev_part_submit_request(), it will not be possible because we cannot add any custom operation into the completion callback. To fix the issue, modify spdk_+bdev_part_submit_request() and rename it to spdk_bdev_part_submit_request_ext(). Fortunately, we can use stored_user_cb in struct spdk_bdev_io. Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I46d3c40ea88a3fedd8a8fef6b68ee417c814a7a1 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15002 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>	2022-11-03 14:54:28 +00:00
Changpeng Liu	fabf6a83cc	lib/vhost: remove session `initialized` flag Session in vhost means an active socket connection from client(e.g: QEMU or SPDK vhost initiator), but the device state could be `started` or `stopped` because users may remove the driver of the device in VM, so in `foreach_session` we can always call the callback function without checking the session state, and the callback function may check the device state if necessary. Change-Id: Id0fc8c7f6f0915a55a738f0c87ebe6539f7fb2db Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15038 Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2022-11-03 14:53:55 +00:00
Changpeng Liu	9da4e15c5c	lib/vhost: start device asynchronously Now we will start the device(virtio-blk and virtio-scsi) when there is a valid I/O queue(VRING_KICK message), the backend device `start_session` callback will ensure this check, so when processing VRING_KICK messages for each vring, we can just call `new_device` if `started` is false, and if `started` is true, it means the device is already started, it's safe for us to add one more vring even the device is started. With this change, we don't need to wait for the return value of `start_session` in synchronous mode, just return is OK. Fix #2518. Change-Id: I92ba3d4e5c38422d7697c1d13180a4a48f0dd4cd Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14981 Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2022-11-03 14:53:55 +00:00
Changpeng Liu	23baa6761d	lib/vhost: don't restart device multiple times We will stop/start the device multiple times when a new vring is added, and also stop/start the device when set vring's callfd, actually we only need to start the device after a I/O queue is enabled, DPDK rte_vhost will not help us to start the device in some scenarios, so this is controlled in SPDK. Now we improve the workaround to make it consistent with vhost-user specification. For each SET_VRING_KICK message, we will setup the new added vring, and then we try to start the device. For each SET_VRING_CALL message, we will add one more interrupt count, previously this is done when enable the vring, which is not accurate. For each GET_VRING_BASE message, we will stop the device before the first message. With above changes, we will start/stop the device once, any new added vrings after starting the device will be polled in next `vdev_worker` poller. Change-Id: I5a87c73d34ce7c5f96db7502a68c5fa2cb2e4f74 Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14928 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2022-11-03 14:53:55 +00:00
Changpeng Liu	b7facb30f8	lib/vhost_scsi: don't start device before a valid I/O queue is enabled Change-Id: I407c62df2117069ad1d8f6aba18cf316a3cf47bf Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14980 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2022-11-03 14:53:55 +00:00
Changpeng Liu	9cdd1a8a2c	lib/vhost: remove `vhost_session_used_signal` function `vdev_worker` in vhost-scsi is used to process request queues, and `vdev_mgmt_worker` is used to process the event and control queue, so we don't need to call `vhost_session_used_signal` in `vdev_worker`, just remove it. Change-Id: I86f3e90890e6defba69b01fec131afe1adad3a49 Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14927 Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2022-11-03 14:53:55 +00:00
Changpeng Liu	7fcbd0220e	lib/vhost: alloc VQ tasks in VQ setting function Currently we will allocate all VQ's tasks when starting the device, it will not allow us to add new VQ after starting the device, so here, we move it to VQ setting function. Change-Id: I59cfc393d66779ab8a0eb704bc73bcede3f0a2a0 Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14926 Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2022-11-03 14:53:55 +00:00
Changpeng Liu	d55bf60a89	lib/vhost: move vq settings into a function With this change, then we can call vq settings after the VRING_KICK message, currently we will stop/start device multiple times when a new vq is added. Change-Id: Icba3132f269b5b073eaafaa276ceb405f6f17f2a Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14925 Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2022-11-03 14:53:55 +00:00
Changpeng Liu	a1cd28c6f3	lib/vhost: get negotiated features after SET_FEATURES message Feature negotiation is done after SET_FEATURES message, here we move it in this message context, so that we can use the negotiated features before starting the device. Change-Id: Ic6388dbcebd72bc5ef182e65798d34c07f6fc35c Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14924 Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2022-11-03 14:53:55 +00:00
Changpeng Liu	835490b1d5	lib/vhost: check memory table earlier Before starting a device, the memory table is already there, so we can check it earlier. Change-Id: I4996705501577cfa78c89621f7081eb0c3d4dd78 Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14923 Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2022-11-03 14:53:55 +00:00
Changpeng Liu	d941d138ad	lib/vhost: merge vq settings into a single loop Change-Id: I5a9ef59adcd383e2fae746a434dda10893a3b84a Signed-off-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14922 Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2022-11-03 14:53:55 +00:00
GangCao	7f7b468b48	lib/bdev: new __io_ch_to_bdev_ch and __io_ch_to_bdev_mgmt_ch utilities Change-Id: Ie7d818a9a648e28cd191588164420173149af38b Signed-off-by: GangCao <gang.cao@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15167 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2022-11-02 15:25:21 +00:00
GangCao	cb55e8493f	Lib/Bdev: update calling to spdk_bdev_for_each_channel Change-Id: I541ccffc90e7dc54b416da385e862e952d9db71d Signed-off-by: GangCao <gang.cao@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14638 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2022-11-02 15:25:21 +00:00
Jim Harris	5497616e8f	env_dpdk: add support for DPDK 22.11 DPDK has merged changes which hide remove some DPDK object such as rte_device and rte_driver from the public API. So we add copies of the necessary header files into our tree, along with a 22.11-specific pci_dpdk implementation. These files are copied over exactly, except for one #include which needs to change from <> to "" so that it picks up the header in our tree instead of looking for it in system headers. Longer-term we may want to look at ways to automated checking and updating of these header files. DPDK 22.11 isn't officially released yet, so the header files could change, but we want to get this in now since without it SPDK cannot build against DPDK tip at all. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I89ffd0abab52c404cfff911c1c9b0cd9e889241d Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14570 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>	2022-11-02 10:50:23 +00:00
Evgeniy Kochetov	8c3590a983	bdev: Add copy IO statistics Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com> Change-Id: Id51ac80bce33a27a8ccea273c076f39019b98339 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14348 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Community-CI: Mellanox Build Bot	2022-11-02 10:33:00 +00:00
Evgeniy Kochetov	a383a15fb1	bdev/part: Add copy IO type support Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com> Change-Id: I9e2dcf29794fdb9535a4f0282b3046602f09188e Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14385 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Community-CI: Mellanox Build Bot	2022-11-02 10:33:00 +00:00
Evgeniy Kochetov	d14afd5000	bdev: Add copy IO type Copy operation is defined by source and destination LBAs and LBA count to copy. For destiantion LBA and LBA count we reuse exiting fields `offset_blocks` and `num_blocks` in `struct spdk_bdev_io`. For source LBA new field `src_offset_blocks` was added. `spdk_bdev_get_max_copy()` function can be used to retrieve maximum possible unsplit copy size. Zero values means unlimited. It is allowed to submit larger copy size but it will be split into several bdev IOs. Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com> Change-Id: I2ad56294b6c062595c026ffcf9b435f0100d3d7e Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14344 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Community-CI: Mellanox Build Bot	2022-11-02 10:33:00 +00:00
GangCao	e28e247954	RPC/Bdev: display the per channel IO statistics for required Bdev Add a new parameter "-c" to display the per channel IO statistics for required Bdev ./scripts/rpc.py bdev_get_iostat -b Malloc0 -h usage: rpc.py [options] bdev_get_iostat [-h] [-b NAME] [-c] optional arguments: -h, --help show this help message and exit -b NAME, --name NAME Name of the Blockdev. Example: Nvme0n1 -c, --per-channel Display per channel IO stats for specified device This could give more intuitive information on each channel's processing of the IOs with the associated thread on the same Bdev. Please also be aware that the IO statistics are collected from SPDK thread's related channel's information. So that it is more relating to the SPDK thread. And in the dynamic scheduling case, different SPDK thread could be running on the same Core. In this case, any seperate channel's IO statistics are returned to the RPC call and if needed, further parse of the data is needed to get the per Core information although usually there is one thread per Core. On the other hand, user could run the framework_get_reactors RPC method to get the relationship of the thread and CPU Cores so as to get the precise information of IO runnings on each thread and each Core for the same Bdev. Change-Id: I39d6a2c9faa868e3c1d7fd0fb6e7c020df982585 Signed-off-by: GangCao <gang.cao@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/13011 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>	2022-10-28 06:51:19 +00:00
GangCao	f0494649e3	Lib/Bdev: add the new API spdk_bdev_for_each_channel And also related function pointers and APIs: spdk_bdev_for_each_channel_msg; spdk_bdev_for_each_channel_done; spdk_bdev_for_each_channel_continue; Change-Id: I52f0f6f27717d53c238faf2f998810c9c5ee45d4 Signed-off-by: GangCao <gang.cao@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14614 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Community-CI: Mellanox Build Bot	2022-10-28 06:51:19 +00:00
Shuhei Matsumoto	6a5ecb3276	bdev/part: Consolidate all I/O types into bdev_part_complete_io() The following patches will allow the caller to specify a custom completion callback to spdk_bdev_part_submit_request(). To do it easily, consolidate completions of all I/O types into bdev_part_complete_io(). Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I083695189daa7e5271787c50947e428d01a83677 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15001 Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com> Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2022-10-28 06:49:40 +00:00
Shuhei Matsumoto	ab839831f1	nvme_rdma: Remove workaround for Soft RoCE's bug from cq_process_completions() We do not support Soft RoCE anymore. Remove a workaround for Soft RoCE's bug that we amy receive a completion without error status after qpair is disconnected/destroyed. Then add a assert to check if rdma_req->req is not NULL. This will simplify the code and the following patches. Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I80c349053adc0f79679eaf8a5d7265d555d3c2b0 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14909 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2022-10-28 06:27:19 +00:00
Shuhei Matsumoto	1439f9c773	nvme_rdma: Pass poller instead of poll_group to cq_process_completions() The following patches will support SRQ and SRQ will be per poller. We will need SRQ in nvme_rdma_cq_process_completions(). It is not possible to identify poller if poll_group is passed to nvme_rdma_cq_process_completions(). Based on these thoughts, add poll_group pointer to poller and pass poller to nvme_rdma_cq_process_completions() instead of poll_group. Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Signed-off-by: Denis Nagorny <denisn@nvidia.com> Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com> Change-Id: I322a7a0cc08bdcc8e87e720ad65dd8f0b6ae9112 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14282 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>	2022-10-28 06:27:19 +00:00
Shuhei Matsumoto	194047249b	nvme_rdma: Get qpair from poll group using WC NVMe-RDMA target has a helper function get_rdma_qpair_from_wc() and uses it to identify a qpair from a WC. NVMe-RDMA initiator has a similar function nvme_rdma_poll_group_get_qpair_by_id(). NVMe-RDMA initiator will support SRQ in the following patches, and it will want to identify a qpair from a WC. get_rdma_qpair_from_wc() of NVMe-RDMA target uses wc->qp_num internally anyway. However, the upcoming custom transport for RDMA will have to use other variables of WC. Hence, it will be convenient to pass WC instead of qp_num if we consider future enhancements. Based on these thoughts, for NVMe-RDMA initiator rename nvme_rdma_poll_group_get_qpair_by_id() by get_rdma_qpair_from_wc(). remove unnecessary declaration, and pass WC instead of qp_num. Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Signed-off-by: Denis Nagorny <denisn@nvidia.com> Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com> Change-Id: I01ead4730207e2c6ac53b83f151bd5f977a11465 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14279 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>	2022-10-28 06:27:19 +00:00
Shuhei Matsumoto	6ea9de5fc8	nvme_rdma: Factor out poller destroy operation Poller will have more shared resources when SRQ is supported. This is a preparation. Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Signed-off-by: Denis Nagorny <denisn@nvidia.com> Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com> Change-Id: Ic3d1cb93dde3f53653a9536a103e5518cebd58e1 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14173 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>	2022-10-28 06:27:19 +00:00
Shuhei Matsumoto	6a59daad2b	nvme_rdma: Poll disconnect until completion if async mode is disabled nvme_rdma_ctrlr_disconnect_qpair() does not poll the qpair until it is actually disconnected if it is in a poll group even if its async mode is disabled. Hence, spdk_nvme_ctrlr_free_io_qpair() removes the qpair from a poll group when it is being disconnected. On the other hand, I/O qpair is destroyed after it is actually disconnected. When SRQ is enabled and used, a SRQ is destroyed if the corresponding poller does not have any I/O qpair after an I/O qpair is removed from the poller. In particular, if we use spdk_nvme_ctrlr_free_io_qpair(), a SRQ is destroyed before the corresponding I/O qpairs are destroyed. Destroying a SRQ failed because it is still referenced by I/O qpairs. This bug was found when running the SPDK NVMe perf tool with SRQ. The reason was we had nvme_rdma_poll_group_process_completions() to call disconnected_qpair_cb after the qpair is actually disconnected. However, it is ensured that nvme_rdma_poll_group_process_completions() calls disconnected_qpair_cb for any disconnected qpair. Hence, remove a check if qpair->poll_group is not NULL from nvme_rdma_ctrlr_disconnect_qpair() and update the comment. Signed-off-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Change-Id: I0fde0d827eec3280e1cc5a0fce34d163a6069bc4 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14908 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>	2022-10-28 06:27:19 +00:00
Vasuki Manikarnike	3fcee8ddcc	lib/nvme: Do not submit queued aborts if adminq is in failed state. With RDMA, the admin poller can experience a remote disconnect when processing completions. The admin qpair will be disconnected to handle this. The disconnect code path will manually complete queued aborts. However, the completion callback for the abort will attempt to resubmit other queued aborts from the queue, which will result in a very large stack and can eventually cause a segfault. The fix is to not resubmit queued aborts if the admin qpair is in any kind of failed state. Change-Id: I4a6f959232c8a1bd30c87ca50459014e556cbaa0 Signed-off-by: Vasuki Manikarnike <vasuki.manikarnike@hpe.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15114 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>	2022-10-28 06:26:20 +00:00
Szulik, Maciej	51ae6d4002	nvme/tcp: add max_completion exit condition to loop inside read_pdu A loop inside 'nvme_tcp_qpair_process_completions' makes 'max_completions' actually behaving like a minimum: do { rc = nvme_tcp_read_pdu(tqpair, &reaped); [...] } while (reaped < max_completions); Before this change 'max_completion' constraint, in its true sense, was actually not respected and a loop inside 'nvme_tcp_read_pdu' could be executed indefinitely as long as a recv state changed. To prevent this behavior, max_completion must be passed to 'nvme_tcp_read_pdu' and used as an additional exit condition. Signed-off-by: Szulik, Maciej <maciej.szulik@intel.com> Change-Id: I28da962f4a62f08ddb51915b5d0dae9611a82dee Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15136 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>	2022-10-26 07:35:21 +00:00
John Levon	36dfcca2b4	nvmf/vfio-user: switch from shadow doorbells when freeing Some reset/disable paths are freeing the shadow doorbells without switching the SQs back to BAR0. Fix this up, and add a small cleanup when initializing the shadow doorbells. Signed-off-by: John Levon <john.levon@nutanix.com> Change-Id: Ia5e5b91b7dc696a558eb0ad59cc554abced47cca Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14901 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2022-10-26 07:32:54 +00:00
John Levon	64db53f1aa	nvmf/vfio-user: support multiple poll groups in interrupt mode To support SQs allocated to a poll group other than the controller's main poll group, we need to make sure to poll those SQs when we wake up and handle the controller interrupt. As they will be running in a separate SPDK thread, we will arrange for all poll groups to wake up when we receive an interrupt corresponding to a vfio-user message arriving. This can mean needless wakeups: we don't (yet) have a mechanism to only wake up the poll groups that correspond to a particular SQ write. Additionally, as we don't have any notion of a poll group per controller, this ends up polling all SQs in the entire poll group, not just the ones corresponding to the controller we were handling. As this has potential performance issues in many cases, it defaults to disabled. Signed-off-by: John Levon <john.levon@nutanix.com> Change-Id: I3d9f32625529455f8d55578ae9cd7b84265f67ab Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14120 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2022-10-26 07:32:54 +00:00
liu.darong	7e17de3d81	bdev/trace: add support to trace with bdev name Fixes #2585 Signed-off-by: liu.darong <liu.darong@xsky.com> Change-Id: I3f9b6d4719b5eed004f383e86db8a17b8b0287f5 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/13823 Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2022-10-25 07:12:52 +00:00
Anton	7ba33f49f0	lib/idxd: fix use after free due to stale crc_dst in chained ops When crc32c is invoked with a multiple entry input iov, only the last op has crc_dst set in order to write the final crc value into the user supplied location. spdk_idxd_process_events() for every successfully completed CRC op writes the value into *op->crc_dst UNLESS it is NULL. The problem is that _idxd_prep_batch_cmd() that allocates new ops left op->crc_dst uninitialized. This results in a memory corruption (use after free) in the following scenario: 1) op A is allocated an crc_dst is set to point to user memory X. 2) Op A is compeleted 3) User memory X is freed. 4) Ops B and C are allocated (chained), C has crc_dst set. => B reused op A memory and crc_dst still points to the now stale user location (1) 5) B is complered, spdk_idxd_process_events() writes into X as B->crc_dst = X. Fix: _idxd_prep_batch_cmd() should initialize crc_dst to NULL. Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Change-Id: I9e7d57ec43a8fbcb3750906015a5cb7291278c35 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15115 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Paul Luse <paul.e.luse@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2022-10-25 07:10:55 +00:00
paul luse	13597fd4f1	accel_sw: add extra check on compression We were missing a check when ISAL uses the complete output buffer on compression to determine whether it was s perfect fit or if simply not enough buffer was provided. Signed-off-by: paul luse <paul.e.luse@intel.com> Change-Id: I73532666f50cb9fbef3c42f6bfb25fc5c7de01c6 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14874 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>	2022-10-25 07:09:37 +00:00
Krzysztof Karas	a74c8c2e8c	scheduler: prevent user from switching back to static Prevent user from switching back to static scheduler after different scheduler has been selected. Currently we do not have a way to save initial thread distribution configuration, so each time user switches from dynamic scheduler back to static, the SPDK threads may end up on different reactors. This would cause discrepancy in performance statistics of SPDK managed by static scheduler. Change-Id: Ic17a6be55eaea0e1a748f92e01f7075540403637 Signed-off-by: Krzysztof Karas <krzysztof.karas@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15055 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>	2022-10-21 07:33:06 +00:00
Jim Harris	a9be4f2c2f	trace: add likely/unlikely hints to _spdk_trace_record This helps generate slightly better code in this function, which can have a noticeable impact for high trace event workloads. Tested with bdevperf, single malloc or null bdev, qd=32, 512B randreads on a single Xeon core. Specify "-e bdev" to enable bdev trace events. Null: Before: 8.09M/s (123ns per IO) After: 8.68M/s (115ns per IO) Malloc: Before: 4.21M/s (237ns per IO) After: 4.34M/s (230ns per IO) Note that each bdev I/O generates two trace events (START and END) - meaning this change removes 7-8ns of overhead for every 2 trace events, at least on my system. Signed-off-by: Jim Harris <james.r.harris@intel.com> Change-Id: I7021b7f9e28b4a7cb16f8a97b4d4004ae165efd2 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15096 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>	2022-10-21 07:18:37 +00:00
Alexey Marchuk	c77b537786	accel: Save overridden options in json config file Signed-off-by: Alexey Marchuk <alexeymar@nvidia.com> Change-Id: Ida2c6f1c460c2b66d2d4159d225036377e488e62 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14856 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>	2022-10-19 07:47:58 +00:00
Anton Eidelman	c2c8b4ebc7	lib/idxd: fix bug in crc32c with chained ops When spdk_idxd_submit_crc32c() handles input with multiple iovs (or multiple ops are generated due to physically discontinuous buffers), the first op has the original seed, while the subsequent ops instruct the hardware to to fetch the seed from the output of the previous op (op->hw.crc32c_val): void *prev_crc; ... desc->flags \|= IDXD_FLAG_FENCE \| IDXD_FLAG_CRC_READ_CRC_SEED; desc->crc32c.addr = (uint64_t)prev_crc; <<< virtual addr The problem is the prev_crc is a virtual address, so the hardware (at least with no IOMMU configured) reports: DSA_COMP_HW_ERR1 spdk_idxd_process_events: Completion status 0x20 Solution: Set crc32c.addr to the physical address of the crc32c_val field in the previous desc. Since desc->completion_addr already holds the physical address of the dsa_hw_comp_record, we use this with the crc32c_val offset. Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Change-Id: I330e98c2f3fd6da5cb4fc03d0745df09a9ff0e0c Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14954 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Paul Luse <paul.e.luse@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: John Kariuki <John.K.Kariuki@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2022-10-18 07:24:55 +00:00
Konrad Sztyber	1f3a6b0398	rpc: use rw access when creating RPC lock file It allows the users to specify the path to the RPC socket on a NFS mounted filesystem. This is necessary, because flock(2) on NFS requires write access to place an exclusive lock. Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com> Change-Id: If197498ed5bdcb4e02c5f2f2b2c1ef388872c457 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14993 Reviewed-by: Jacek Kalwas <jacek.kalwas@intel.com> Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot	2022-10-18 07:23:28 +00:00
GangCao	f20b99bbb3	lib/nvme/vfio: destruct ctrlr in failed cases Change-Id: Ie7d7ab25055c26ea1c2ae4997bf7197a170de989 Signed-off-by: GangCao <gang.cao@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15005 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>	2022-10-17 12:52:55 +00:00
Szulik, Maciej	dcf30711ef	build: add explicit vars init to silence LTO related warning When Link Time Optimization is enabled, compiler can sometimes produce additional warnings saying that some variables may be uninitialized. To supress the warning it is enough to add explicit initialization of the variable causing the issue, in this case 'module_name = NULL' and "writer = NULL". Signed-off-by: Szulik, Maciej <maciej.szulik@intel.com> Change-Id: I30492115b28a18554b08a6f575cbcc9538f3b848 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14849 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jacek Kalwas <jacek.kalwas@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>	2022-10-05 10:24:53 +00:00
GangCao	8afb3d0037	lib/bdev: return error when failing to get resource To fix issue: 2719 Change-Id: I983ef607fad154608fff9bb9355645968caf0c5a Signed-off-by: GangCao <gang.cao@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14746 Reviewed-by: Changpeng Liu <changpeng.liu@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot	2022-10-04 07:07:04 +00:00
Tomasz Zawadzki	f98ac63ea7	reactor: do not switch mode for threads in non interrupt tgt Fixes #2693 spdk threads should not be placed in interrupt mode if the application does not have interrupt mode enabled. This resulted in race condition, while reactor was placed in interrupt mode, thread was scheduled on it. Such operation is a valid one, but never should be attempt to change the threads mode in this case. Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com> Change-Id: I10b0bbacac1df812badb91b37064528f66743e51 Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14815 Reviewed-by: Michal Berger <michal.berger@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>	2022-09-30 16:14:10 +00:00
Tomasz Zawadzki	c34f15e09c	env_dpdk: keep DPDK 20.11 compatiblity Patch below added copies of pci realted headers to keep compatiblity with <= DPDK 22.07. (`1eb35ac`) env_dpdk: add copies of 22.07 pci-related header files Unfortunetly the rte_bus/bus_pci/dev headers from DPDK 22.07 are not compatibile going back to DPDK 20.11. The issues are: - lack of RTE_TAILQ_ENTRY defined in rte_os.h - rte_intr_handle being part of rte_pci_device rather than pointer pci_dpdk_2207.c even before this patch is not binary compatible with DPDK 20.11 - see pci_device_*_interrupt_2207() functions. There would need to be another copy of headers matching that version of DPDK to resolve this issue. SPDK supports up to two latest LTS releases. Which right now includes DPDK 20.11, but soon will be dropped due to DPDK 22.11 release. Having compile time defines here, keeps the older DPDK working. Meanwhile backwards compatiblity in SPDK is no worse than before. The recent changes to env_dpdk, are aiming to improve support with newer versions of DPDK. Change-Id: If4dc601cb03e18c2cad61f3a93080e8265ca5fcc Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14795 Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com> Reviewed-by: Jim Harris <james.r.harris@intel.com>	2022-09-30 15:56:33 +00:00
Artur Paszkiewicz	a51649faf6	bdev: use write_unit_size for acwu and write_zeroes Change-Id: Idbcfc110c153a62082f84f3304f1e245f2fc3daf Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14716 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>	2022-09-29 22:52:45 +00:00
Artur Paszkiewicz	69c448a30e	lib/util: add ISA-L accelerated xor generation Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Change-Id: I3ef9dadb4c68e92760c8426f0fffb7b249829e2b Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12080 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: Mellanox Build Bot Reviewed-by: Ben Walker <benjamin.walker@intel.com> Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>	2022-09-29 22:52:45 +00:00

1 2 3 4 5 ...

9784 Commits