This was implemented as 3 separate functions but
it is simpler as 1.
Also, this wasn't previously freeing the buffer pools.
Change-Id: Ic1b2b3a0596e745a223099cb2a79bea6ef5c69cc
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This was broken into three functions, but it is
a lot simpler as one.
Change-Id: If58ad50fe7d4f65c598b62f24e9e1ce7a64fdd8e
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This is better organizationally, but also will serve as
an io_device in the future.
Change-Id: I6d65cf39df59e874d13f5fccc5a489720e86c48f
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Return types should be on a separate line for definitions.
Change-Id: Iaa38dd00042359fc6640fc67053bd69ebbb7af03
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Make the buffer allocation work for all types of
commands, not just read.
Change-Id: I72d8f67a724566630e7c4a74759fcb08449f7de4
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Eliminate rte_memcpy dependency by replacing it with
regular memcpy. This may impact performance, but the only
use of rte_memcpy was in the malloc bdev which is for
testing only.
Change-Id: I3e8592cb08262272518ec3d29ea165b4e8f48a5c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Blockdevs already indicate support for unmap via
spdk_bdev_io_type_supported(bdev, SPDK_BDEV_IO_TYPE_UNMAP).
Change-Id: I634f27a281fd900bb3a6da2e4ff8a74e43579578
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
We plan to use these buffers for more than just reads.
Change-Id: I8fa6cb432a6cfe4406fbf240cd3aa2ae4ab5f3d5
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The user can get there via the bdev, so this didn't
have a purpose.
Change-Id: I7f85bb71d5ee238d37ba3624d0ac68a161c95e49
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Older kernel headers don't have the definition of this macro, so define
it if necessary.
This is the same workaround as used in rte_vhost/vhost.h.
Change-Id: I01e0661db05de517adf8e24a47c63d32853cd385
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
vhost_net.c file is not needed and fail scan build so remove it.
Change-Id: I5817201373f7253cc8bc1a9bdc5884197e166a14
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
guest_pages is being allocated in vhost_setup_mem_table(), reallocated
in add_one_guest_page(), but never freed. This patch fixes a memory
leak.
Change-Id: Ie381c43bafea5cdea2ac9f057c0282044a340dce
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
This prevents from destroying & recreating user device in "incomplete"
vring state. virtio_is_ready() was returning true for devices with
vrings which did not have valid callfd (their VHOST_USER_SET_VRING_CALL
hasn't arrived yet)
Change-Id: Idc4b41efd544ff5c6b093a5a48798b41c55bbe06
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
vhost-net devices might keep track of last descriptors indices by
themselves, and assuming they initially start at 0, but that is not the
case for vhost-scsi. Initial last descriptor indices are set via
VHOST_USER_SET_VRING_BASE message, and we cannot possibly predict what
will they be. Setting these to vqueue->used->idx is also not an option,
because there might be some yet unprocessed requests between these and
the actual last_idx. This patch adds API for getting/setting last
descriptor indices of vrings, so that they can be synchronized between
user-device and rte_vhost.
The last_idx flow could be as following:
* vhost start
* received SET_VRING_BASE msg, last_idx is set on rte_vhost side
* created user-device, last_idx pulled from rte_vhost
* requests are being processed by user-device, last_idx changes
* destroyed user-device, last_idx pushed to rte_vhost
* *at this point, vrings could be recreated and another SET_VRING_BASE
message could arrive, so last_idx would be set*
* recreated user-device, last_idx pulled from rte_vhost
Change-Id: I247ba4e461a2a2b524ccade364f5b7bf260f7538
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
There is an issue when QEMU sets new memory table just after guest OS
starts booting. Then, if guest OS tries to issue any I/O to device (e.g.
using BIOS INT13h - EDD) it will get stuck because previous addresses of
mmaped memory might change.
To fix this issue, defer using the new mem table until after we receive
the first SET_VRING_ADDR message. SET_VRING_ADDR will be sent by QEMU
when guest OS virtio (e.g. virtio-scsi) driver starts initialization.
At this point it is safe to invalidate the old mem tables because there
will be no more outstanding IO at this point.
Change-Id: I24772be87a8b6c8781868b9b7773317761499748
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
For now DPDK assumes that callfd, kickfd and last_idx are being set just
once during vring initialization and device cannot be running while DPDK
receives SET_VRING_KICK, SET_VRING_CALL and SET_VRING_BASE messages.
However, that assumption is wrong. For Vhost SCSI messages might arrive
at any point of time, possibly multiple times, one after another.
QEMU issues SET_VRING_CALL once during device initialization, then again
during device start. The second message will close previous callfd,
which is still being used by the user-implementation of vhost device.
This results in writing to invalid (closed) callfd.
This patch destroys vhost device before setting callfd, kickfd and last
vring indices. It will be recreated right after (with updated vring
data).
Change-Id: I293bd91106f53f6c2f65d8b8a41f47ae7548cddc
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
This will be decoupled from the build to start. Next
patches will modify this code to prepare it for use with
SPDK vhost-scsi. The final patch will replace the existing
v17.02-based code with this version, and make the necessary
SPDK vhost changes to use it.
This enables to better track the differences between upstream
DPDK and our internal copy, while not breaking the build at
any point in the git history.
While here, expand the POSIX include file check to exclude
any directory starting with lib/vhost/rte_vhost (which would
include this new directory).
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Icf1202c1b7a898edff12aa226943a08b578cf962
Scan the source for POSIX includes outside of the
allowed locations in check_format.sh. This only
tests for POSIX headers - not Linux Standards Base.
Also, fix one bug that was caught by this addition.
Change-Id: Ib0ca93fe6ac552dc49d95b27b4803e40282027e8
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
LOG_DEBUG is a symbol defined by POSIX, so if sys/log.h
is included the symbols conflict.
We'll need to push this patch to upstream DPDK too.
Change-Id: Ib263731864aca4791226ea6e3abb5ddfe42e97d8
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
FOREACH_DEVICE_ON_PCIBUS macro has been defined since rc2.
Change-Id: Iad61401520735dfde4e5715c32e74a54a2dff7da
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Replace with it with check the returned req
via spdk_unlikely macro
Change-Id: I1202b3955af9a68496d8ced7cf66c20cf26f7fff
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
The SCSI layer always passes task->iovs to spdk_bdev_readv(), so there
is no way for task->iovs != bdev_io->u.read.iovs to be true.
Change-Id: I4c0a2075c6e50e4304d62707a29bededa37b4e5c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The SCSI task bdev I/O should never be pending when spdk_scsi_task_put()
is called, and just setting the status to failed is not correct (when
the bdev eventually completes the I/O, it will write into the now-freed
bdev_io, which may be reused by someone else).
Change-Id: Iaad6ce9ab41539652abc40147fed47c5012109dc
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The SCSI layer was not using the task ID for anything; the iSCSI layer
was using it to store the task tag, so move it there and rename it to
"tag" to make its purpose clear.
Change-Id: Ibda4f4e215056116b9be4a3a0264f98bc4c29535
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The SCSI layer doesn't use subtasks; these are an iSCSI layer concept.
Change-Id: I83871f02362f10fd4ecd4b2a1544eb76bfa53595
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
In the pattern set by spdk_bdev_io_complete_nvme_status(), allow
blockdev modules to complete a bdev_io with a SCSI status code.
Also move it to the internal bdev header file, since only bdev modules
should be setting bdev_io status codes.
Change-Id: I8b6afad2c02d7c010c5e60f06a7c7e0785eb87ca
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Move the scsi_nvme translation code from the SCSI library into bdev, and
provide a generic way to translate any bdev_io status into a SCSI
status.
Change-Id: Ib61a6209387c24543e31574e2b5ca249e2ac8b74
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Since we keep a copy of DPDK vhost library, the header file don't
have dependency on DPDK vhost library.
Change-Id: I14d48e10227633547231e4f429e7375ffa76128d
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
The code will locks clear to put those definitions into tree.h header
Change-Id: Ib1a34f19d9849acd7ea979eb0a6e153b0e8e39de
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Since DPDK 17.05 API rte_eal_device_insert is only used for
virtual device scan and initialization, for PCI devices
which use Domain:Bus:Dev:Function, this API is no longer
valid.
Change-Id: I1ab63dfc3af188d01836e67cd8db745e035fc450
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
These channels can handle generic bdev context.
Change-Id: I61f41884ddf4cf86fa156e9051421b354bbb349d
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
When destroying a connection, we need to check if we got to
full feature phase before freeing any io channels. This is because
the io channels are only allocated as part of a successful login.
The Calsoft iSCSI test suite has tests which will fail login.
Since the test system was just using a malloc backend with memcpy,
so even though a channel was NULL in some cases, it was never used
since the memcpy engine doesn't need it.
This prepares for some future patches which extend the use
of io channels in the bdev layer.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I2fb7b18a781caa0aadca319aa1e61a6ccf2c55fd
This allows astyle to format the cast of address-of operations
correctly.
Change-Id: I9c8a4545c44601e769acc712ec7acf3a96f45ebb
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The ISID field is a 6-byte field in network (big-endian) byte order.
The previous code was casting the uint8_t isid[6] value into uint64_t,
which was actually casting the address of the first byte of isid (not
the contents of the array), and it was also not correctly converting
byte order.
Change-Id: Idd114e06d30040cf28931d7da7ffdc8d6c45e82a
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Avoid allocating a large amount of stack space when increasing
NVME_MAX_CONTROLLERS.
Change-Id: I7017e5ed9f4d4f5c860dac608c3e5ae3c35864e7
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The contents of struct spdk_scsi_lun don't need to be part of the public
API.
Change-Id: I101b77871054557380610fd901ab38bada463202
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The VirtualBox emulated NVMe device will intermittently
hang on the first read/write command after an I/O
qpair has been allocated. The frequency of the hang
diminishes if a delay is added after allocating the I/O
qpair - until it disappears completely with a 100us delay.
So add a quirk to insert this delay.
Note - the 100us delay was tested by running
the hello_world example app 50000 times.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I237e31b1b8a1a1e28262851ae0a21cd7345f0f1a
Fixes a scan-build warning about using qpairs after they have been
freed.
Change-Id: I263eabd6b784acf540c66136965f7705ef110a78
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Fix up the existing comment blocks misaligned in the first column.
Also add line numbers to the comment checks.
Change-Id: I9d28c365271df36e7013d74cbb02d0023ab4f581
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This patch assigns correct value to page control.
Now that page control value is correctly taken from CDB,
error via sense data is reported when processing "saved values".
"Changeable values" are not supported, so all parameters
are reported as not changeable when requested.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I41378c96b1e8c716b5d0ce4b72777065fb122228
Fix up all existing spacing errors in comments and add an automated
check for patterns like /*comment*/.
Change-Id: I28f61c93612dc0f8aed66bd509da78e91ea9737e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Vhost needs to register memory given by guest in VFIO container to be
able to do any DMA using this memory.
Currently DPDK doesn't provide any interface to handle guest memory, so
for now lets find container fd in /proc/self/fd/ directory and provide
some VFIO internal API that finally should extend DPDK API.
Change-Id: Iee9d496367ccd61219068fc0eadc17e786ff0731
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
The new format is: domain.bus.device.function
For this format, since we use '.' as separator,
to avoid misusing, we only support the following:
1 domain.bus.device.function ( 4 values provided)
2 bus.device.function (3 values provoided with domain = 0)
3 bus.device (2 values provided with domain = 0, function = 0)
Change-Id: Ide03db38b4ac7802cf36f0e536e8b997101d6cd3
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
According to the scsi standard, all ASCII data fields "may be
terminated with one or more ASCII null (00h) characters"
[7.6.10, 4.4.1]. Windows SCSI Compliance tests expect a null terminator
there, so let's include it.
Change-Id: I18fa35295233a163cea711a5c4ff8e3d3e80c4f1
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
A 100us is so small that applying the quirk to the specific
SSDs that require the delay is more trouble than it is worth.
So remove the quirk and always wait 100us before re-enabling
the NVMe SSD during initialization.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Id6a8cc6e35d103fffdf135580301fc3e5b27e722
Also avoid an spdk_get_ticks() call in the default
case where a timeout_cb_fn is not defined.
On my Intel(R) Xeon(R) E5-2699 v3 system with an
Intel(R) P3700 SSD, these modifications reduce software
overhead per I/O by 3-5% (as measured by the SPDK
overhead tool).
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I5bb5a87747b15d9e27655fabcd2bc1a40b0b990e
Remove the "Nvme" from several field names. The parser
will still accept the old name for backward compatibility.
Change-Id: I6fa86ec359b23fb63960d0aa479a845b36a0977a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The user can now not only specify an optional timeout for
commands, but also the action to take when a timeout is
detected.
Change-Id: I7d7cdd846d580e0b3a5f733d398ee9b19d6fe034
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Queue aborts that would exceed the abort command limit
in software as a convenience for the user.
Change-Id: I8c1f0380984cc6c0cdb453db961939a7f571b336
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Instead, pass NULL when an ADMIN command times out.
We don't expose the admin queue to the user.
Change-Id: If0768d329a689f6f7c3734c9d419e680d7378ed1
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
For each command that times out, call the timeout
callback one time if the user registered one.
Change-Id: Iaad39a886468e89bef63fe292c5cad1dce97a57c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Instead, they register some internal structure of
their choosing.
Change-Id: Id1f8c563d0a2c6f1066d741f86b8aa6fe09b6319
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Some calls were passing bdev->ctxt, some calls just
bdev. In most of our implementations those are the
same pointer, but they aren't necessarily.
Change-Id: If2d19f9eef059aded10a917ffb270c1dc4a8dc41
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The DPDK mask and the reactor mask are always the same.
Change-Id: I83d3ab87cdfb405574f6472cfc222d3f311abdb1
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Changed spdk_iscsi_portal_grp_create_from_portal_list so that it fails
if any given portal is invalid.
Change-Id: I708621a538a52abfed4dce01668d26602a5ada59
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
It has been discovered that some devices require
a very small delay before writing CC.EN to 1 after
CSTS.RDY goes to 0.
Change-Id: I73d31726d17ebf5bbec7ee528e2f98fcd05234dd
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This isn't the indentation pattern I would have chosen, but
it's a complicated negotiation between what I want and what
astyle will let me get away with.
Change-Id: I4909587823931842ac3f227134e1d05e7d80da74
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Any Intel device reporting device ID 0x0953 needs this quirk.
Change-Id: I690b01ecf05105df00ec8cf6f2da7f7c0a601aa8
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Also add a message when a controller is attached and assigned a name.
Change-Id: I54f2d711d55ba7ae99913fdfea652770b1f8931d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
As VFIO does not work with vhost library print warning during vhost
initialization
Change-Id: Iaa31808c7007f1840a6a441e2591f0a3986b0c29
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
This fixes multiple SCSI reset issue.
This patch does not remove sleep in iSCSI tests.
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Piotr Pelplinski <piotr.pelplinski@intel.com>
Change-Id: I5e9f3705e5dc34004b9d1b9e40fbdcb04a3bee4e
This prevents a deadlock if the user immediately
calls spdk_nvme_detach.
Change-Id: I79f28abe163cbbf184bea907692c44aa4e1c8893
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Some intermittent issues still observed with multiple
resets in quick succession. Reverting for now while the
issue is more fully root caused.
This reverts commit 7fa7f91ee3.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I493b564e8a110bbfb7a6cc47107d53d6eca40053
First step is do not destroy an existing device in
vhost_user_set_mem_table(). This is because we may
still be processing I/O via INT13 while QEMU is setting
up the mem tables for OS boot.
The primary part of this patch though is to defer
using the new mem table until after we receive the
first SET_VRING_ADDR message. SET_VRING_ADDR will be
sent by QEMU when guest OS virtio-scsi driver starts
initialization. At this point it is safe to invalidate
the old mem tables because there will be no more
INT13 I/O at this point.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I45fb5910f45e7fd2cf4a325341ad105a57d8ea40
Make sure the name will not exceed the length of SPDK_BDEV_MAX_NAME_LENGTH.
Change-Id: I33a3f10c836e650fdcb578c7d9e58169d9bb766a
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
The descriptor type must be 0 to break out of the loop,
so we need to initialize this.
Change-Id: I5fdb24dcfece01332c487364d5694c4fb8412e1b
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Fixed double free in spdk_rpc_add_portal_group()
spdk_iscsi_portal_create() now takes string arguments as const char* and makes internal copies of them.
This patch also fixes potential memory leak when id == NULL
Change-Id: I4d0efb101471fb2368ceb8ceecb0e40614e3585d
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Or rather, at least assert that the allocation failed.
This is not a recoverable error in general.
Change-Id: I9bc325066e829fc311ce84ce83536e9933ac5473
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Make sure that we have space for termination char '\0'
Change-Id: Iaebdad3b4278ee322bd78247acc7f0997c3f4b44
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
According to the analysis, the largest name size is
24 not including '\0' (NVMF_RDMA_WRITE_COMPLETE),
so change the the size of name. Also add a check
to avoid the str exceeding our defined name size.
Change-Id: Iddf2cb52a3f5358306a59fc66bb997fa8098cde0
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
This avoids corner case where a buffer gets allocated on the 100th
try.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: If65053d539d458d9a53c8850bbb4cbe4ee84f604
this patch fix the potential possibility of coredump when
we have NVMe device hot inserted.
Change-Id: Idac255f25f42b4746c2d3ae6dfc57a19b7001160
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
This is the initial commit for "blobfs", a lightweight
filesystem built on top of the SPDK blobstore.
Also included in this patch:
1) a shim for using SPDK bdevs as the backing store for
SPDK blobstore/blobfs
2) documentation for using blobfs as the storage engine
with RocksDB
3) scripts for running a set of workloads and collecting
profiling data with RocksDB and blobfs
See doc/blobfs/getting_started.md included in this commit
for more details on blobfs, including some of the current
limitations.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I2a6d3d4b87236730051228ed62c0c04e04c42c73
Avoid division by zero in the event mempool cache size calculation.
Change-Id: Ic117ef2dc3a798fb0a57572f1178233e83e73849
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
It was causing segfaults and infinite looping.
Change-Id: I4c19b5d3af1ba1360250cd5f6aa573a27003409f
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
This enables the vhost library to build on systems missing the (fairly
recent) linux/virtio_scsi.h header.
Change-Id: I680863b26961ec3cbe4ad4e575555454f6461bbf
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
If we do not do a bounds check, this can run off the end
of an array.
Change-Id: I43cc4848fca7d68218e507db20e33823f8b550e4
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Attempting to add a listen address for an unavailable transport will
fail with a better error message.
Change-Id: If4cf5b66c16dadcb6e0f0b28cea4aa510ba6a9fc
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Rather than failing silently, let the user know why the listen address
failed.
Change-Id: I41c2a51c6071ee739b282a1a39198a2887a73c4d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The message about the uevent socket is not a fatal error; it just means
that hotplug monitoring will not work.
Change-Id: I29f6a253e96a86420c0fde9e19135f9f1d229bb9
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This is the initial commit for the "blobstore", a lightweight,
highly parallel, persistent, power-fail safe block allocator.
Documentation will be added in future patches.
Change-Id: I20a4daf899f1215d396f7931c3ec9a2e2bb269d0
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The user now must choose the name for each AIO bdev. This
provides consistency for names across restarts.
Change-Id: I13ced1d02bb28c51d314512d60f739499b0c7d8d
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Simplify code that previously needed to check for subsystem type by
factoring out the discovery controller operations into a new ops
instance.
Change-Id: Id87b498e4623451993fe779ffb765be5a6743fd9
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
No functional change, just rearranging code.
Change-Id: I28328dfefd7de269d326834c484f2c2fca4e6c1f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
When data needs to be transferred from the controller
to the host, do a single ibv_post_send containing
both the data and the completion.
Change-Id: I072c545b31593e0e324c97ed700b42c6a4c358e1
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This call had been reduced to a simple wrapper
around the ibv call. Delete it.
Change-Id: I42926d123db262617119a9cff77bc0d0eb1e8f31
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
These functions were only called from one place and
their functionality has been reduced to a wrapper
around the underlying ibv call. Remove them.
Change-Id: I65182012dbe6393b9d57f4191fd327bcd025a6c8
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This keeps all SGL handling in the prep_data function.
Change-Id: I9bfeed3748c1b329288350b85aa87bd604cfce4e
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Now that all of the SGL mappings are static,
this function just called ibv_post_recv. Delete
the function and call ibv_post_recv directly.
Change-Id: I45216170a157709249b08c4cb0ebdb1adb906049
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This patch makes create_vhost_scsi_controller check if given file is a socket before deleting it
Change-Id: I7a37c12913b461f779732e724c85e2f7b5d67442
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
For an NVMe read, send the completion immediately
following the RDMA WRITE, without waiting for
the acknowledgement. RDMA is strictly ordered,
so the WRITE will arrive before the completion.
Change-Id: I7e4e01d7a02c2130b655ef90f5fdaec992d9361a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Except for a CONNECT capsule, always use the central data
pool for RDMA READ/WRITE operations. The in-capsule
data buffer is associated with the receive operation
while the pool data buffers are associated with the
completion, and using the in-capsule data buffer
causes a lifetime mismatch.
Change-Id: Ieb45e521d78daa7c706078a3dd5c5a146f8dc1d6
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
After commit b654e9b, this is no longer required.
Change-Id: I0cf1a7059d7fba0303aca5ad5a15afe3890b4172
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The RDMA protocol this module uses is strictly ordered,
which means messages are delivered in exactly the order
they are sent. However, we have detected a number of
cases where the acknowledgements for those messages
arrive out of order. This patch attempts to handle
that case.
Separate the data required to post a recv from the
data required to send a response. If a recv arrives
when no response object is available, queue the
recv.
Change-Id: I2d6f2f8636b820d0c746505e5a5e3d3442ce5ba4
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Names for the NVMe bdevs are now assigned by the user.
This means the same name will always be assigned to the
same device, even across restarts.
Change-Id: If9825ec9abcb5236b4671bc44a825e4f0d704fe3
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
remove the unnecessary rte_eal_pci_probe_one() in function
spdk_pci_device_detach(), this could cause error message when we
terminate the application, it will also not make sense try to probe one
device after we detach it, we could call spdk_pci_nvme_device_attach()
instead of spdk_pci_nvme_enumerate() when we have one given device address,
dpdk will try to scan the device and add it back to pci device list then.
Change-Id: I35f5bb412249bb20da57394f0531c10a49691906
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
This clarifies the relation between the values assigned to sg_list and
num_sge (no functional change).
Change-Id: I8e81d47dd97a033b17cd3b813b06e4887127146c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
All devices must be specified by BDF. Add support for scripts
to use lspci to grab the available NVMe device BDFs for the
current machine.
Change-Id: I4a53b335e3d516629f050ae1b2ab7aff8dd7f568
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The mappings are all static, so it isn't interesting
to print them out on each I/O.
Change-Id: I85301b4518d4523a7c031f6ca9ff678d91428504
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This allows pipelining of READ/WRITE with completion.
Change-Id: Ib3ab5bffb8e3e5de8cbae7a3b2fff7d9f6646d2d
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
This allows static initialization of the scatter
gather list as well as future optimizations
around pipelining commands with data.
Change-Id: I8af8f3e3425610bc720677c9bc84f163cfb6278a
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
The first version of the Linux kernel NVMe-oF initiator had
a bug when reporting queue size where it was off by 1. We
had a workaround to deal with this. Now that the kernel
has been fixed, remove the workaround.
Change-Id: I0ad4a5c6db68cfa9683ab93e6f5210772c713b55
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Move claimed flag to struct spdk_scsi_lun and remove RPC call that allow
SCSI LUN to be deleted by user.
Change-Id: I0fe57d33ab017816ab4799bce259807735e0c783
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Register all spdk_malloc() memory regions as ibv_mr in a spdk_mem_map
so we can look up the RDMA key for the user's buffer and pass it in the SGL
directly, rather than copying through a pre-registered bounce buffer.
Change-Id: I7340bc2020b5256750c95dbd24ba67961404e5e7
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The extended LBA format flag should be initialized after namespace
capability flag.
Change-Id: Iad479b454bb4e31120c17d40ae23937a099c6f8f
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Change SCSI device configuration format from "DevX LUN0" to "Dev X LUN0"
This allow checking configuration against silly errors when device
number is out of range.
Also assert exactly only one LUN is given.
Change-Id: Idccd6878119282fc51947b092bdda7ae06aa94ad
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
The send completions must be processed prior to the
recv completions. However, if the completion queues
are separate this leaves a small window where
a send+recv completion arrive between polling
the send_cq and the recv_cq, resulting in the code
seeing the recv completion prior to the send
completion.
By combining the completion queues, this eliminates
any potential gap. The send completion will always
be processed before the recv completion.
Change-Id: I06bfef6af48559d0b9e00524ebc10f1a102e7387
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The sq_head handling is already done in
spdk_nvmf_rdma_request_send_completion, so do not need to
do again.
Change-Id: I527ff8adfcbdf43ac79794cb5c7777c0e8ef6973
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
The env layer already understands that shm_id < 0 means that
multi-process is not enabled. Leave shm_id defaulted to -1 so that
other code can detect when it is not set.
Change-Id: Ifd1667598d55c216f95f13561dc2a550677db5f4
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
These options are only necessary for applications that intend to be used
in a multi-process configuration.
Change-Id: I3e1fa0682611d92267d0ad1b3f2016dc926b96b6
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Previously, if the maximum number of virtual namespaces had already been
reached, adding a bdev to a subsystem would claim it without actually
adding it to the ns_list array.
Change-Id: Iab68ad1a75748c0e88232240185695aac08d71d2
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
They are not used outside of their respective files.
Change-Id: I754834e7354caec877cd2fe193e56854e5a34e20
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This patch fix the issue when large IO failed:
when we handle the read command which need split, we need make
sure all the subtasks to be handled if one of the subtask failed,
this will make sure the command have chance return back to initiator.
Change-Id: I0c01e1a34c6179fce37ab52c8121268b6ee31102
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
The actual uses of intrinsics are already guarded by feature-specific
ifdefs in nvme_pcie_copy_command(), but the header itself should also
only be included when it will actually be needed.
Change-Id: Ife65d6432b8dfd9d9db80fe4e385ab76491874c0
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
SPDK_COUNTOF works like sizeof, except it returns the number of elements
in an array instead of the number of bytes.
Change-Id: I38ff4dd3485ed9b630cc5660ff84851d0031911f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>