This function add possibility to check if there are any scheduled operations
on particular thread.
Return from spdk_thread_poll() will be used as a way to load-balance and
signify if any work was performed during the single iteration.
A poller could return 0, but still be registered.
This helps especially in fio_plugin that only checked active_pollers or
messages via spdk_thread_poll().
Change-Id: Id6237278eb3b4bd4922b2abaa3c8ebd5e434d45d
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445915
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
This function add possibility to check if there are registered pollers
on particular thread.
Change-Id: I80af06a10c5c1b54fed5bb28a3aa769a52d8a206
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446624
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
The nsdata assignment is strangely aligned with some
variable declarations - fix it to make it more clear.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I43b1a6d5a69ca035a21f3996e8f859a45bd10b9c
Reviewed-on: https://review.gerrithub.io/c/446447
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
* Remove unneeded include files. Some of them belong in the .c file instead.
* Use create/delete_aio_bdev naming, removing aio_disk names
* Make some similar changes in the bdev_aio.c file for the associated ctx.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ie325f4761f0419e9cc4e6556ab551fe606cd0d6c
Reviewed-on: https://review.gerrithub.io/c/446567
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
This RPC was deprecated a couple of releases ago.
bdev modules now each have their own RPC for deleting
bdevs. Due to how bdevs are created differently on
different modules, it is simply not possible to
have one delete_bdev RPC that would work for all bdev
types.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ia46c95dce6e35f7557e8d41c41b8fea382924547
Reviewed-on: https://review.gerrithub.io/c/442615
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
We have conflict to handle the NVMf subsystem shut
down. The situation is that:
If there is shutdown request (e.g., ctrlr+c),
we may have subsystem finalization and subsystem
initialization conflict (e.g., have NVMf subsystem fini and
intialization together), we will have coredump
issue like #682.
If we interrupt the initialization of the subsystem,
following works should do:
1 Do not initilize the next subsystem.
2 Recycle the resources in each subsystem via the
spdk_subsystem_fini related function. And this patch will
do the general thing, but will not consider the detailed
interrupt policy in each subsystem.
Change-Id: I2438b4a2462acb05d8c8e06dfff3da3d388d4b70
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446189
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Other io_type, like FLUSH, has a similar character with
UNMAP, that has a range description (offset and length),
but has no data payload. So the process for UNMAP io_type
can be extended to io_type like FLUSH.
Change-Id: I9467dfc3cc4fc1431b79359b0c477807ec138ac7
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446491
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: wuzhouhui <wuzhouhui@kingsoft.com>
Reviewed-by: Piotr Pelpliński <piotr.pelplinski@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
In the error path, we were first decrementing a variable and then
asserting that it must be >0. These operations should occur in the
opposite order.
Change-Id: I6cec544faf17bb75cbfca3d3a3c173dc5db14f99
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446440
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: yidong0635 <dongx.yi@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
When the decision was made to uncouple the number of shared buffers from
the queue depth and allow the user to decide for themselves, the default
was also significantly lowered, which caused some issues when trying
torun performance tests (See https://github.com/spdk/spdk/issues/699).
While this is a user modifiable variable, it is still best to keep the
higher default value.
The original value was equivalent to max_queue_depth *
SPDK_NVMF_MAX_SGL_ENTRIES * 2 with the defaults for max_queue depth and
max_sgl_entries being 128 and 16 respectively. Hence 4096
fixes: 0b20f2e552
Change-Id: I809e97a10973093a2b485b85bca7160091166f70
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446525
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Default 'unmap' option stays as it was.
'Write_zeroes' comes useful when one wants to make sure
that data presented from lvol bdevs on initial creation presents 0's.
'None' will be used for performance tests,
when whole device is preconditioned before creating lvol store.
Instead of performing preconditioning on each lvol bdev after its creation.
Change-Id: Ic5a5985e42a84f038a882bbe6f881624ae96242c
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442881
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
base_bdev_io_expected can be used for the situation
that IO requries multiple and uncertain number of
base bdevs.
Change-Id: I912400f839c02c95606bc94e7c8ad4946e90b6bf
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446009
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
This feature was added to DPDK by Jim to avoid the failures that can
come from splitting a buffer over memory regions in RDMA.
Change-Id: I13b646e22a4e2a4ccf915b0274061d31d02c03f7
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446166
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Since we already checked the core info in _spdk_subsystem_fini_next
function.
Change-Id: I6ab28d8fb11a7a07ae8c14c27357db236bf51b3e
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446190
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: qun wan <qun.wan@intel.com>
If success is false in each bdev module's spdk_bdev_io_get_buf_cb,
call spdk_bdev_io_complete with SPDK_BDEV_IO_STATUS_FAILED, and
then return.
Change-Id: I6f106d8d39a3616f7305201fa2efc4805d4d00ee
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/446046
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Break out the failure handling code to a separate
function.
Change-Id: Ic530bb4d33c19edb62360e06afe3946b963445b1
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446008
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: wuzhouhui <wuzhouhui@kingsoft.com>
When the specified buffer size to spdk_bdev_io_get_buf() is greater
than the permitted maximum, spdk_bdev_io_get_buf() asserts simply and
doesn't call the specified callback function.
SPDK SCSI library doesn't allocate read buffer and specifies
expected read buffer size, and expects that it is allocated by
spdk_bdev_io_get_buf().
Bdev perf tool also doesn't allocate read buffer and specifies
expected read buffer size, and expects that it is allocated by
spdk_bdev_io_get_buf().
When we support DIF insert and strip in iSCSI target, the read
buffer size iSCSI initiator requests and the read buffer size iSCSI target
requests will become different.
Even after that, iSCSI initiator and iSCSI target will negotiate correctly
not to cause buffer overflow in spdk_bdev_io_get_buf(), but if iSCSI
initiator ignores the result of negotiation, iSCSI initiator can request
read buffer size larger than the permitted maximum, and can cause
failure in iSCSI target. This is very flagile and should be avoided.
This patch do the following
- Add the completion status of spdk_bdev_io_get_buf() to
spdk_bdev_io_get_buf_cb(),
- spdk_bdev_io_get_buf() calls spdk_bdev_io_get_buf_cb() by setting
success to false, and return.
- spdk_bdev_io_get_buf_cb() in each bdev module calls assert if success
is false.
Subsequent patches will process the case that success is false
in spdk_bdev_io_get_buf_cb().
Change-Id: I76429a86e18a69aa085a353ac94743296d270b82
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/446045
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
VMWare Workstation NVMe emulation does not seem to write the
SHST_COMPLETE bit within 10 seconds, resulting in an ERRLOG
during detach/shutdown. So add a quirk to cover these VMWare
SSDs. But rather than squashing the ERRLOG completely for
these SSDs, just add a message instead indicating this is
somewhat expected on these VMWare emulated SSDs.
Fixes issue #676.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I3dfcb631feda639926fd712f1f41abb66cbf2096
Reviewed-on: https://review.gerrithub.io/c/445942
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Adapted our custom rte_vhost APIs to the upstream DPDK
version which has independently added similar APIs.
This will potentially allow us to remove our internal
rte_vhost copy.
rte_vhost_set_vhost_vring_last_idx() was renamed to
rte_vhost_set_vring_base() and the last vring indices
have to be acquired with a newly introduced rte_vhost_get_vring_base()
rather than rte_vhost_get_vhost_vring().
This is only a refactor, no functionality is changed.
Change-Id: I1ca2c1216635c117832c9d9c784d5661145c04cd
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/446081
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
The elements and functions which are used for raid reset io,
can also be used for other potential raid IO requests which
need multiple base_bdev involved.
Change-Id: Ide7ea190fdbd29da9f9fa22862a0a7c162509697
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441308
Reviewed-by: wuzhouhui <wuzhouhui@kingsoft.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Make modyfication of global allocator index tread safe
by using atomic operation
This patch also changes mempool size to be 2^n - 1
which makes it more efficient
Change-Id: I5b7426f2feef31471d3a4e6c6d2c7f7474200d68
Signed-off-by: Vitaliy Mysak <vitaliy.mysak@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442695
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Removing band from "free list" is moved from FTL_BAND_STATE_OPENING
to FTL_BAND_STATE_PREP state's change actions.
This will fix race condition when one band is prepared (erased)
and write pointer is trying to get next active band.
Change-Id: I9e4fe9482a01ee732271736e4a0e6fcedf2582d8
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445118
Reviewed-by: Jakub Radtke <jakub.radtke@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
ENOMEM is expected when nvme_qpair will be out of resources.
In such a case ENOMEM shall be propagated to allow upper (bdev)
layer proper handling.
Change-Id: Ie647c2d3efff24a8de949a22ac42a31dfd0e78b7
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445580
Reviewed-by: Jakub Radtke <jakub.radtke@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
When an operation fails, we shouldn't pass a handle or
a 'valid' blob ID to the caller's completion function.
The caller *should* ignore it when bserrno != 0, but
it's best to not take that chance.
Fixes#685.
Note: #685 seems to have a broader issue related to
a possibly locked NVMe SSD in the submitter's system.
This only fixes the assert() that was hit.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I3fb3368ccfe0580f0c505285d4b1e9aca797b6a6
Reviewed-on: https://review.gerrithub.io/c/445941
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
There are some cases that virtual bdev open and close
the device and QoS will be disabled at the last close.
In this case, when a new bdev open operation comes again,
the QoS needs to be enabled again.
Change-Id: I792e610f4592bad1cac55c6c55261d4946c6b3e2
Signed-off-by: GangCao <gang.cao@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442953
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
SPDK ring size used for write buffer submission queue
must be increased if required number of batches is a
power of two.
Change-Id: I9b9f885064cf6f0f5fe94b0ed4f9d49a4e5c0cd0
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445721
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
For real PCIe drives, if we removed one drive, existing hotplug
monitor will trigger the remove callback twice, there is one
workaround for vfio-attached device hot remove detection which
will also trigger the hot removal callback. For now we add
the check in the bdev_nvme layer so that coredump will not happen.
Fix issue #606.
Change-Id: I0605fbdf391fed20c4aa9a2d54b4f059f29dc483
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445642
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
It seems like DPDK 19.02 has split the "session mempool"
into two separate mempools but this isn't really described
in the DPDK release notes, so this patch only makes our
crypto code behave just like DPDK crypto examples.
rte_cryptodev_queue_pair_setup() no longer accepts
a separate mempool parameter but instead requires it
to be passed through a new field in struct
rte_cryptodev_qp_conf, which is also passed as a param
to rte_cryptodev_queue_pair_setup(). It's referred to as
"session private mempool" instead of "session mempool",
which makes some sense since we already use
rte_cryptodev_sym_get_private_session_size() (with the
word "private" in name) to calculate its size.
The other mempool - "session mempool" - now has to be
allocated with rte_cryptodev_sym_session_pool_create()
instead of regular rte_mempool_create().
Change-Id: I3bc6185855988b864ca59bc1972beaf4f7ea8925
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443738
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
I think this simplifies the process a little bit.
Change-Id: Icc87a59c9f6fd965ef35531975b7036d85c4bc95
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445916
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We were only using one value from this array to tell us if the qpair was
idle or not. Remove this array and all of the functions that are no
longer needed after it is removed.
This series is aimed at reverting
fdec444aa8 which has been tied to
performance decreases on master.
Change-Id: Ia3627c1abd15baee8b16d07e436923d222e17ffe
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445336
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Since we no longer rely on the state queues for draining qpairs, we can
get rid of most of them. We cn keep just a few, and since we don't ever
remove arbitrary elements, we can use stailqs to perform those
operations. Operations on Stailqs carry about half the overhead as
operations on tailqs
Change-Id: I8f184e6269db853619a3581d387d97a795034798
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445332
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Users should not access the internal probe context fields when
using the asynchronous probe API, so change spdk_nvme_probe_async()
to let it can only return the probe context pointer.
Change-Id: I0413c2d8db6cbe4539ad80919ed34dd621a9df70
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445870
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Allow user to add seed value for guard compuation to DIF context.
This will avoid the guard being zero in case of all zero data.
NVMe controller doesn't support seed value for guard computation
explicitly, and hence if we want to use such a seed value in
NVMe controller, we have to format metadata more than 8 byte,
and add seed value into the reserved metadata field.
But some popular iSCSI/FC HBAs and SAS controllers have supported
seed value for guard computation, and so supporting seed value
in the SPDK DIF library is very helpful for some use cases.
Hence this patch makes the DIF library possible to specify seed
value for those use cases.
Change-Id: I7e9e87cb441bf263e64605c7820409fdc22dd977
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/444334
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: wuzhouhui <wuzhouhui@kingsoft.com>
Older versions of QEMU (<= 2.11) expose the VGA BIOS
hole (0xA0000-0xBFFFF) by specifying two separate memory
regions - one before and one after the hole. This results
in the "size" not being a 2MB multiple. But the underlying
memory is still mmaped at a 2MB multiple - so that's what
we should be checking to ensure the memory is hugepage backed.
Fixes#673.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I1644bb6d8a8fb1fd51a548ae7a17da061c18c669
Reviewed-on: https://review.gerrithub.io/c/445764
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
spdk_env_opts->env_context may now contain a DPDK-specific
string that will be appended directly into rte_eal_init().
It can be used to e.g. override the default EAL loglevel,
which was hardcoded to RTE_LOG_NOTICE so far.
This is primarily meant to be used during development.
As a test for this feature, the vtophys test app will now
set the highest possible EAL loglevel which will give us
a ton of additional debug logs.
Note: the opts->env_context field is implementation-specific
and hence the vtophys app needs to check if it's run with
our env_dpdk. As SPDK_CONFIG_ENV is a raw text not even
surrounded with quotation marks, the vtophys app needs to
do a bit of #define magic to make it a string.
Change-Id: I0b2196770e5b59a6c33d0170337c34f9f8b8466e
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445111
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
When we were trying to push a newly allocated string
into the arg array and the array realloc() failed,
the string we were about to insert was leaked.
Change-Id: I31ccd5a09956d5407b2938792ecc9b482b2419d1
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445149
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This patch expose backend's bdev's PI setting to the corresponding
NVMe-oF Initiator by Ideintify command, and removes the check if
block size is 512 multiple.
These change enables NVMe-oF Initiator to send extended LBA payload.
Change-Id: Ia7aa8332d36f056872a515b6da90c83112edb909
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/445056
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Move req->submit_tick assignments from specific transports to generic
qpair code.
Check whether submit_tick has been assigned before doing the actual
assignment, because a request may be submitted several times and the
original submit_tick shouldn't be covered.
Change-Id: I2de8018dc21763eb5a19bb9d48dfbdef764b036e
Signed-off-by: lorneli <lorneli@163.com>
Reviewed-on: https://review.gerrithub.io/c/444702
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
In iSCSI, SPDK_ISCSI_MAX_SEND_DATA_SEGMENT_LENGTH was an alias
of SPDK_BDEV_LARGE_BUF_MAX_SIZE.
iSCSI had used both interchangeably.
SPDK_BDEV_LARGE_BUF_MAX_SIZE means the buffer size of the large
buffer pool in generic bdev layer, and will be changed to be
configurable.
SPDK_ISCSI_MAX_SEND_DATA_SEGMENT_LENGTH had been used to negotiate
MaxRecvDataSegmentLength with iSCSI initiator and to split large
read data, but both are determined by not iSCSI target but generic
bdev layer.
Hence this patch replaces SPDK_ISCSI_MAX_SEND_DATA_SEGMENT_LENGTH
by SPDK_BDEV_LARGE_BUF_MAX_SIZE.
Change-Id: I822a5203a5092fe8b2d1ca3f93423f1acbfc782e
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/444539
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This macro constant is not related with data size and should be moved
to the separate location.
Change-Id: I73b337f5750c39d1f87591c2e372664019e50b95
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/444545
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
If the current recv_state of qpair is same with the state to be set,
we will print error message. And checked the current code,
we should add a check to avoid this.
Change-Id: I49334f637c48e565e785d1fe6d0f000e18b2048a
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445653
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Add a memory barrier for arm64 to prevent possible reordering
of tracker and cpl access,
because arm64 has less strict memory ordering behavior than x86.
Change-Id: I0a8716f7bfeffb0bbce27ee3174e214c8e4566b4
Signed-off-by: heyang <heyang18@huawei.com>
Reviewed-on: https://review.gerrithub.io/c/442964
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
If users didn't set the "HotplugPollRate" field, the value
will be set to NVME_HOTPLUG_POLL_PERIOD_MAX, which isn't
aligned with our design purpose.
Change-Id: I9795d7a16a1cc44ed4de7c40f376c563d977b455
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/445077
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Purpose: solve the coredump issue for the buffer
return later in spdk_nvmf_tcp_request_free_buffers.
If keep this statement, we cannot return the buffer
to the polling group.
Change-Id: Ib5c95ba54b37540950e654110fe6317cab507076
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/445435
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Purpose: to make the timeout work for NVMe TCP transport,
we miss this for TCP transport.
Change-Id: Iab4af988cc4796b4d6d98430453f3dbce1fcf313
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/445117
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This patch refactors driver init and in doing so eliminates the mem
leak described in the GitHub issue. Also it is now consistent with
how the pending compression driver does init.
Fixes#633
Change-Id: Ia2d55d9e98fb9470ff8f9b34aeb4ee9f3d0478f5
Signed-off-by: paul luse <paul.e.luse@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442896
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We should not add addtional check since we already have this
option in timeout_cb function, the addtional check is unnecessary.
Change-Id: I77c89303155e0c14072a1838994f9e76a0ffc0f4
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/445319
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This patch is used to implement this function.
Since we need to call nvme_tcp_req_complete in this
function, so we need to adjust the location of the
nvme_tcp_rep_complete funtion.
Change-Id: I5fc3693aec8dc166ac1eb03babcd2d73d7b00e63
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/444489
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
In this patch series, spdk_bdev_scsi_read and spdk_bdev_scsi_write
became almost identical. Hence squash them into spdk_bdev_scsi_read_write.
Change-Id: Ibbaddf74c1bf2dac37a0133eac27086af650a061
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/444780
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
This is in a effort to consolidate SCSI read and write I/O
for the upcoming transparent DIF support.
Previously conversion of bytes and blocks are done both in
SCSI layer and BDEV layer. After the patch series, conversion is
consolidated into SCSI layer.
Change-Id: Ib964a41ec22757f2a09cea22f398903f78d0781f
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/444779
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
This is in a effort to consolidate SCSI read and write I/O
for the upcoming transparent DIF support.
Previously conversion of bytes and blocks are done both in
SCSI layer and BDEV layer. After the patch series, conversion is
consolidated into SCSI layer.
About conversion from bytes to blocks, we don't expose bdev API
spdk_bdev_bytes_to_blocks and but create private helper function
_bytes_to_blocks because we will use not block size but data
block size when we support transparent DIF feature.
Change-Id: I37169c673479c92e027e2507a0e54a1e414b43e1
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/444778
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
The last parameter xfer_len of spdk_bdev_scsi_read is not used,
and of spdk_bdev_scsi_write is used only to check task->transfer_len.
Hence remove the last parameter xfer_len from spdk_bdev_scsi_read/write
and extract the check operation from spdk_bdev_scsi_write and insert
it into spdk_bdev_scsi_read_write.
Additionally, remove a debug log because xfer_len is not passed to
spdk_bdev_scsi_write anymore. Hopufully, this will not degrade any
maintainability.
On top of this, factoring out the operation to convert byte to
block in spdk_bdev_scsi_read/write be done.
Change-Id: I35faca269a9c4a7f15d27e8e61b6a1b809a36b3f
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/444776
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
This helps ensure it gets inlined in the spdk_vtophys
code path, now that spdk_vtophys is defined in the same
compilation module.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I0d0d9bba4295f0d9a7c0657834aa5d39f3b682d8
Reviewed-on: https://review.gerrithub.io/c/445354
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
CPU profiling on workloads with intensive vtophys
operations (i.e. very small CB-DMA transfers) exposed
overhead introduce by spdk_vtophys having to call
spdk_mem_map_translate in a different compilation
unit. Let's just move the vtophys.c contents into
memory.c so that spdk_vtophys can inline
spdk_mem_map_translate and avoid this extra overhead.
This of course breaks the memory and vtophys unit
tests, so some additional changes are needed there
to keep everything linking correctly.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I295ed5f441d3eec7abdbc9d881c49d2174ec9f48
Reviewed-on: https://review.gerrithub.io/c/444975
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Previously, we can -p + hex value(e.g., 0x1) to assign the master core
and start the NVMe-oF or iSCSI target app.
However now it is not supported and prints error. I checked
the code, it only supports transformation with Decimal format,
so chaning the base to 0 to make it supporting other formats.
Change-Id: I82510ba0cef47b5593484b4fd3490f85c93cf6a5
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/444830
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Do not set the completion_update bit except on
the last descriptor built before the dmacount doorbell
is written. This allows much better batching of
completions (to match batching of the submissions).
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Idd0281fb2e9e1ad2eb0f65f097c54fc051dfd935
Reviewed-on: https://review.gerrithub.io/c/444974
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Add spdk_ioat_build_copy and spdk_ioat_build_fill
which mirror the existing spdk_ioat_submit_copy
and spdk_ioat_submit_fill. These new functions
*only* build the descriptors in the ring - they do
not write the doorbell. This enables batching
which can significantly improve performance by
reducing the number of MMIO writes.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ia3539f936924b7f833f4a7b963d06ffefa68379f
Reviewed-on: https://review.gerrithub.io/c/444973
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
This will enable batching of doorbell writes in
future commits. For now, just make the API public.
This is the first in a series of patches that
drastically improves performance for high queue
depth CB-DMA workloads. Some basic tests on
my Xeon E5-v3 platform shows about 4x improvement
for 512B transfers.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ia8d28a63f5020ae8644c1efdec7f68740bb6920c
Reviewed-on: https://review.gerrithub.io/c/444972
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
To enable the timeout function.
Change-Id: Id5c40848957743683b6a5c2d085e7f777f14497d
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/444803
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Use NBD_SET_SOCK to check whether the nbd device is setup
by other process or whether nbd kernel module is ready
before other nbd ioctl operations. This can avoid bad
influence to the nbd device setup by other process.
Change-Id: Ic12acbfddb8c4388e25731c39159b1ce559b8f23
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/444805
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The ioctl NBD_SET_SOCK can return EBUSY on conditions not
only the kernel module hasn't loaded entirely yet, but
also the nbd device is setup by another process, which will
lead the poller's infinite polling.
This patch will wait only 1 second if device is busy.
Change-Id: I8b1cfab725cba180f774a57ced3fa4ba81da2037
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/444804
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
There is no need to lock g_ftl_bdev_lock when unregister a ftl_bdev.
Besides, the destructor of ftl_bdev will lock it again.
Change-Id: I99870483183879d9422584dbac6e154f605daea8
Signed-off-by: wuzhouhui <wuzhouhui@kingsoft.com>
Reviewed-on: https://review.gerrithub.io/c/444794
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Added check before write submission to indicate if
LBA was update in meantime. In such case don't set band's
metadata and rwb entry cache bit. Previous implementation
invalidates such address during write completion and could
cause that inconsistent lba map was stored into disk.
Change-Id: I4353d9f96c53132ca384aeca43caef8d11f07fa4
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/444403
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We assumed io_channel allocation always succeeds, but
that's not true. Doing I/O to any vhost session that
failed to allocate an io_channel would most likely
cause a crash.
We'll now detect io_channel allocation failure and
print a proper error message. The SCSI target for
which the channel allocation failed simply won't be
visible to the vhost master. All I/O to that target
will be rejected.
We should probably report the error to the upper
layer and either prevent the device from starting
or fail the SCSI target hotplug request. But for now
let's just prevent the crash.
Change-Id: I735dfb930d8905f70636a236b4fa94288d0aaf3a
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/444874
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
nvme_ctrlr_submit_admin_request() will access admin queue, and we
should hold ctrl->ctrlr_lock when access it.
Change-Id: Iff576fe5e14e854eb38dbc64d6c6d9ec1ba17056
Signed-off-by: wuzhouhui <wuzhouhui@kingsoft.com>
Reviewed-on: https://review.gerrithub.io/c/444793
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Also use the same style condition check for secondary process
with PCIE type.
Change-Id: I93c83126145255887914ef5efea1a493c8f7f767
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/444492
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
The helper function spdk_get_data_out_buffer_size() is a little
confusing because it does only returning macro constant
SPDK_ISCSI_MAX_RECV_DATA_SEGMENT_LENGTH.
The macro constant will be configurable and so the helper function
is not sustainable.
Replace the helper function simply by the macro constant.
Change-Id: I4ec300f61783da7bb712512603c2dd80987ec702
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/444537
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
When hotplug feature is enabled by NVMe driver, users may
call delete_nvme_controller() RPC to delete one controller,
however, the hotplug monitor will probe this controller
automaticlly and attach it back to NVMe driver. We added
a skip list, for those user deleted controllers so that
NVMe driver will not attach it again.
Fix issue #602.
Change-Id: Ibbe21ff8a021f968305271acdae86207e6228e20
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/444323
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Error logs in nvmf_rdma_dump_request lead to report error about
address points to the zero page, add judgement to return.
this issue occurs in heavy load fio testing.
Change-Id: I50302be88b3af53f718e3800aa16df7c506ca4e8
Signed-off-by: yidong0635 <dongx.yi@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441110
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
User can create a probe context to probe and attach controllers
asynchronously, the controllers will be added to the context list
for the first step, then users can poll the context until the list
becomes empty.
Change-Id: I3a96e2d8a9724332ff15542f78f9553fdab505e2
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442664
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Existing NVMe driver uses a global list g_nvme_init_ctrlrs
to track the controllers during initialization, and internal
function will start each controller in the list one by one
until the list is empty. We introduce a probe context
and move the global list into the context, with the context
we can enable asynchronous probe API in the next patch, also
this can enable parallel probe feature.
Change-Id: I538537abe8c1a4a82fb168ca8055de42caa6e4f9
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/426304
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Previously, function spdk_nvme_probe_internal() will probe
NVMe controllers and then bring up probed controllers
into the ready state after that. Broke up original two parts
with probe and start stage, this will help us to introduce
a probe context in the next patch.
Change-Id: Ie0c55a6a5463fb437f84349b0b2b33a217ba63e0
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/426303
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
It iterates over the list and polls each one. However,
in practice the list still contains just one thread for
now.
Change-Id: I9bac7eb5ebf9b4edc6409caaf26747470b65e336
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/440763
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This is the inverse of spdk_thread_get_ctx.
Change-Id: I81541ff1687cfea358cb7046caf69982c38f6a38
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/444455
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Schedulers can use this region to store required information.
Change-Id: I93efb44f1a534596f6285bbe014579311fe011e7
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/444454
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This is much simpler and avoids the problems with requiring
it to run on a thread.
Change-Id: I811444c5a15d292356703beccc17e505d55d7678
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443645
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
The thread scheduling mechanism is being rewritten and this
won't be used in the new system.
Change-Id: I829e8118ed0a10480bd86934b45e68fcb810931a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/444453
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Althrough SPDK already provides a API to users which
can process runtime timeout NVMe commands, but it's
nice to have another API here, SPDK NVMe driver can
use it to break the endless wait. Also use the API
first in the initialization process, because we don't
want to add another initialization state with Intel
only supported log pages.
Change-Id: Ibe7cadbc59033a299a1fcf02a66e98fc4eca8100
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/444353
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
From TP8000 spec 7.4.7,
"In response to a C2HTermReq PDU, the host shall terminate the connection.
If the host does not terminate the connection in an implementation specific
period that does not exceed 30 seconds, the controller may terminate the
connection on its own".
It means that the timeout is designed for: when the target is
sending out C2hTermReq, if the host does not terminate the connection,
the target should terminate the connection.
PS: For detecting the malicous connection without sending response
(such as no response of R2T PDU) which should be another patch.
Change-Id: I586dbb235d99aeab5d748a19b9128cd8b0cef183
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/440831
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Currently, SPDK does not support vfio in no-IOMMU mode. However, it
seems quite easy to extend the vtophys code to add support for this.
vfio in no-IOMMU mode does not support DMA remapping. This implies that
physical DMA addresses are used instead of IOVAs.
This patch checks whether the vfio no-IOMMU mode is enabled using
function rte_vfio_noiommu_is_enabled() from the DPDK RTE vfio interface.
In this case, physical addresses are used for the DMA mappings. This is
the same code path for the DMA translations as when the uio is used as a
kernel driver.
Change-Id: I6fb3c849a345c6f2f2b4141dddb8c17be2581495
Signed-off-by: Nikos Dragazis <ndragazis@arrikto.com>
Reviewed-on: https://review.gerrithub.io/c/441061
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Keep all of the thread library interactions in one file.
Change-Id: Iecb20d3767190b5da105a29670ead9e192d03257
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/440761
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This prevents issues where spdk_thread_poll may report
that it did not useful work (for the one poller it ran),
causing the system thread to go to sleep.
Change-Id: I7a4842d5e399758c19268aee343a001ccfc88a3a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/440598
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
The persistence feature can't support for now, but as the features
are mandatory for reservation, so add the two function here, and
we can enable it with future patches for power loss persist feature.
Change-Id: Ic358eda00058809bbfd6984b0861f8b6b5aabecd
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/438213
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
When this structure was brought up to the generic layer, the tcp
transport was using max_io_size and the rdma transport was using
io_unit_size. In the interest of conserving memory, we should use
io_unit_size instead of max_io_size.
Change-Id: I2633306fcbfd8c3d557445959c745cb2d9a0999e
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442778
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
We should never be going over these limits in the respective transports,
but add asserts to check this during testing.
Change-Id: Ifcaa82ccf58546a38020b31df54ee5d1d9822b8b
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442777
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
It is possible for spdk_nvmf_poll_group_add to fail. In this case we
need to tear down the qpair in the same way that we do in the new_qpair
function.
Change-Id: I17abdec2646d2b7f9ed07c9b9b3e74d3d0991903
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443472
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This intermediate state is unused and meaningless. the qpair transitions
into this state right before calling a synchronous operation and then
transitions to active as soon as that operation completes successfully.
If the operation did not complete successfully, we were leaving qpairs
in this weird intermediate state when for all intents and purposes they
had reverted to an uninitialized state. Keeping qpairs in the
uninitialized state until they have been added to a poll group creates a
meaningful distinction between states that can be actionable from the
transport level.
Change-Id: I6de9bc424b393b6fff221aa2f4212aaa91488629
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443471
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Connections in the uninitialized state haven't been added to a poll
group yet, so submitting dummy requests to them will be pointless since
they will never be polled. We need to reject the connection and destroy
the qpair immediately.
Change-Id: Id5dd711882e1ae7c13ae32c06da2285186b00a1b
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443470
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Since there are multiple events/conditions that can trigger a qpair
disconnection, we need to funnel them to a single point of entry. If
more than one of these events occurs, we can ignore all but the first
since once a disconnect starts, it can't be stopped.
Change-Id: I749c9087a25779fcd5e3fe6685583a610ad983d3
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443305
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
For devices that support fewer SGE elements than our default values, we
need to adjust the I/O unit size so that we don't ever try to submit
more SGLs than we are allowed to.
Change-Id: I316d88459380f28009cc8a3d9357e9c67b08e871
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442776
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This prevents us from overrunning the send queue.
Change-Id: I6afbd9e2ba0ff266eb8fee2ae0361ac89fad7f81
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443476
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This value was not being decremented when we got SEND completions for
write operations because we were using the recv send to indicate when we
had completed all writes associated with the request. I also erroneously
made the assumption that spdk_nvmf_rdma_request_parse_sgl would properly
reset this value to zero for all requests. However, for requests that
return SPDK_NVME_DATA_NONE rom spdk_nvmf_rdma_request_get_xfer, this
funxtion is skipped and the value is never reset. This can cause a
coherency issue on admin queues when we request multiple log files. When
the keep_alive request is resent, it can pick up an old rdma_req which
reports the wrong number of outstanding_wrs and it will permanently
increment the qpairs curr_send_depth.
This change decrements num_outstanding_data_wrs on writes, and also
resets that value when the request is freed to ensure that this problem
doesn't occur again.
Change-Id: I5866af97c946a0a58c30507499b43359fb6d0f64
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443811
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Pass IO flags to NVMe write IO and verify PI error when PI error
occurs.
To know the location that caused PI error, checked read with disabling
PRCHK is necessary and is used in this patch.
Change-Id: Id90fb90c4b3ca95840785a4443ff98d637ceb247
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/443189
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Currently struct bdev_io holds io_channel that the I/O was submitted
on through bdev_io::bdev_channel, but bdev_io::bdev_channel is private
in bdev.c and cannot be referenced in other files.
Hence add an new API spdk_bdev_io_get_io_channel API to get io_channel
coniveniently.
Change-Id: Ic2e2fde845d324f7a1637e3c75080727a62de5ec
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/443843
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Pass IO flags to NVMe write IO and verify PI error when PI error
is detectec.
For write I/O, PI error will be already contained in write data
buffer, and no extra I/O is necessary.
Change-Id: I2f2359c4201aded7abccb182c39c00b25ff0bd5f
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/443188
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Including bdev_module.h and using spdk_bdev_unregister_cb instead of
spdk_delete_passthru_complete will follow other bdev modules.
This patch doesn't change any behavior.
Change-Id: Ia236ea37ae22ed5c7740b02d1c5bd37491b9cf9a
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/444166
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Using passthru virtual bdev's name instead of base bdev's name as
io_device's name will be meaningful. This patch doesn't change any
behavior.
Change-Id: I33f7aa78c60cd1d9f6a7b36280441bc559f44857
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/444165
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Subsequent patches will implement PI verification when PI error occurs,
but PI verification will be different between read and write.
Subsequent patches will set IO flags for normal read and write but
will not set IO flags for checked read.
Current nesting stack,
bdev_nvme_readv/writev
-> bdev_nvme_queue_cmd
-> spdk_nvme_ns_cmd_readv/writev
-> bdev_nvme_queued_done
makes these changes difficult.
Hence this patch inlines bdev_nvme_queue_cmd into bdev_nvme_readv/writev,
adds separate completion function bdev_nvme_readv/writev_done, and
removes enum direction.
This patch doesn't cause any functional change.
Change-Id: I2f97ff21245539c690490d0fc4134d2e0049eddd
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/443187
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
PI check flags is not set to NVMe controllers created by hot plug
handler automatically. Document this behavior for clarification.
Change-Id: I9590d0cb7f53a24c33afd706e222065893d23cb4
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/444012
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Add "prchk:reftag|guard" to the 3rd item of the TransportID row
in [Nvme] section.
apptag is not supported yet as same as JSON RPC.
These two patches cannot control hot added NVMe controllers, but
we should not set prchk options to hot added NVMe controllers
automatically. Hence the next patch will document this behavior
explicitly.
Change-Id: I74a73ac52779aa50c5b45e20ffb61002e95f33ef
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/443835
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
The next patch will use the string "prchk:reftag|apptag" as
per-controller prchk options for .INI config file.
Hence add helper functions for them beforehand.
Change-Id: I58c225cc36cc84bf594f108e611028996b5eedb9
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/443834
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Add prchk_reftag and prchk_guard to construct_nvme_bdev RPC.
In spdk_rpc_construct_nvme_bdev, create prchk_flags based on them
and pass it to spdk_bdev_nvme_create, and in spdk_bdev_nvme_create,
pass it to create_ctrlr.
A single option enable_prchk may be enough but add separate options
for reftag and guard to clarify that apptag is not supported yet.
The next patch will make per-controller PRCHK options configurable
by .INI config file.
Change-Id: I370ebbe984ee83d133b7f50bdc648ea746c8d42d
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/443833
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Add prchk_flags in struct nvme_ctrlr and set it at creating of
the corresponding controller, and copy it to each bdev of the
controller.
Change-Id: Ie971a0c1539b5419de9e5168ed47ac0e579be2c5
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/443186
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Bdev don't support APIs that passes metadata not interleaved with
logical block data. So, return error explicitly when creating NVMe
bdev with separate metadata for now.
Change-Id: I0776e72232c8e7758ad11b405e7e4914e779d131
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/444011
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Metadata location and DIF type are set only if there is metadata, and
DIF location is set only if DIF is enabled.
Change-Id: Ib684b54332820446ff1a0b609f5b4e0b3d42f2f9
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/443344
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The patch is used to fix issue:
https://github.com/spdk/spdk/issues/638
Reason: For supporting sgl, the implementation of
function nvme_tcp_pdu_set_data_buf is not correct.
The translation is not correct for incapsule data
when using SGL. In order not to do the translation
via calling sgl function again, we use a variable
to store the buf.
Change-Id: I580d266d85a1a805b5f168271acac25e5fd60190
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/444066
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Currently, the SPDK_BDEV_REGISTER_MODULE() macro uses __LINE__
to generate functions like spdk_bdev_module_register_187().
Typically, this is not a problem as these functions are not called directly
rather, they are only used as constructor functions to load the bdevs during
system startup.
There are languages however, (e.g rust) that require these functions to be
referenced explicitly to prevent them from being removed during the linking phase.
In order to reference them, having the names predictable (and potentially
changed per commit) makes things easier.
Change-Id: I15947ed9136912cfe2368db7e5bba833f1d94b15
Signed-off-by: gila <jeffry.molanus@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/443536
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
spdk_thread_poll()
This is an optimization if the calling function already knows the
current time.
Change-Id: I1645e08e7475ba6345a44e0f9d4b297a79f6c3c2
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443634
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
strip_size as an rpc param is now deprecated and can be removed
in a future release. Either strip_size or strip_size_kb can be
used but only one of them or the rpc will fail.
Internally we maintain both fields because strip size always
comes in as KB but we convert it to blocks so having both elements
makes it clear for developers what they're looking at.
JSON output includes both strip_size and strip_size_kb.
Fixes#550
Change-Id: I5dc51e8af22eae3d56af8f8d37a564dbaae228fa
Signed-off-by: paul luse <paul.e.luse@intel.com>
Reviewed-on: https://review.gerrithub.io/c/437873
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
DPDK 19.02 requires this mempool to be allocated via
crypto-specific function which returns rte_mempool.
To keep the amount of #ifs minimal, we'll use rte_mempool
unconditionally.
Change-Id: I3a09de41e237e168580bb92b574854e291e68a74
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443785
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We setup the qpairs on module init but never
released them. Some memory was leaked, although since
it was allocated with rte_malloc() it couldn't be
picked up by ASAN.
rte_cryptodev API offers rte_cryptodev_queue_pair_setup()
to setup a qpair, but there's no equivalent function to
release it. We have to access the rte_cryptodev structure
directly and call a qpair release function ptr that's
stored inside. It seems very very hacky, but the entire
rte_cryptodev structure is a part of the public API and
the global array of all such devices is an exported
symbol.
Change-Id: I17ac73d1098ca9a92d2dfd52e0f905e2c2b5488f
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443561
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The typical rdma qpair disconnect function goes through the function
_nvmf_rdma_disconnect_retry. When this function was introduced, it was
discovered that we could receive a qpair disconnect event for a given
qpair before that qpair had been assigned to a poll group. In order to
ensure that the disconnect procedure completed properly, we waited on
the current thread in _nvmf_rdma_disconnect_retry for the qpair to be
assigned a poll group before we finally disconnected. see rdma.c:2250.
Since _nvmf_rdma_disconnect_retry was not necessarily called from the
poll group's thread, we relied upon the assumption that the group
variable would never be set back to NULL. See the comment on rdma.c:
2243.
However, in _spdk_nvmf_qpair_destroy we were setting the group back to
NULL. This operation can result in the following set of operations
across multiple threads that prevent a qpair from ever being fully
destroyed.
1. thread 1: receive a disconnect event - call nvmf_rdma_disconnect
2. thread 1: from nvmf_rdma_disconnect call
spdk_nvmf_rdma_qpair_inc_refcnt - setting rqpair->refcnt to 1.
3. thread 2: call spdk_nvmf_rdma_poller_poll.
4. thread 2: in spdk_nvmf_rdma_poller_poll reap a completion with an
error status which causes us to call spdk_nvmf_qpair_disconnect -
rdma:2846
5. thread 2: spdk_nvmf_qpair_disconnect calls _spdk_nvmf_qpair_destroy which sets
qpair->group = NULL
6. thread 1: from nvmf_rdma_disconnect we call
_nvmf_rdma_disconnect_retry which checks if qpair->group == NULL. If
that is the case, we assume that the qpair has not been assigned a group
yet and send ourself a message to call _nvmf_rdma_disconnect_retry again. see rdma.c:2253
7. thread 2: from _spdk_nvmf_qpair_destroy we call
spdk_nvmf_transport_qpair_fini which results in a call to
spdk_nvmf_rdma_close_qpair. which sends dummy send and recvs to the
qpair.
8. thread 2: we call poller_poll and get completions for both the send
and recv dummy requests. This results in a call to
spdk_nvmf_rdma_qpair_destroy.
9. thread 2: spdk_nvmf_rdma_qpair_destroy checks rqpair->refcnt and when
it sees that it does not = 0 (see step 2 above) it returns without
freeing the resources. see rdma.c:629
10. thread 1: we keep churning in _nvmf_rdma_disconnect_retry sending
ourselves messages because rqpair->group is going to be null. Thread 1
never reaches line 2257 where it sends a message to call
_nvmf_rdma_qpair_disconnect. _nvmf_rdma_qpair_disconnect is the function
that decreases the rqpair->refcnt and allows us to make forward progress
on destroying the qpair.
I encountered this issue while trying to disconnect from our target
using the kernel initiator with an x722 NIC. I think the timing on this
bug comes out with that specific configuration because come of the calls
in the disconnect path on thread 1 fail causing it to take longer giving
a chance to the second thread to delete the qpair.
There are really two issues at play here. We don't have a single point
of entry for disconnecting RDMA qpairs, and we rely on the qpair->group
variable never being set back to NULL. This patch addresses the second
issue, and the next patch in the series addresses the first.
Change-Id: I65395d0bbb67edfa7bad2ddc70906606c3d83781
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443304
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Includes the required DPDK dependencies for SPDK block Reduce aka
Compression.
Change-Id: Ic1ea3cbeb9373a7700f6f0c2a3194d65d6a34a41
Signed-off-by: paul luse <paul.e.luse@intel.com>
Reviewed-on: https://review.gerrithub.io/c/429523
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This patch is for DIF check types.
Add enum spdk_dif_check_type to DIF library.
Add a field dif_check_flags to struct spdk_bdev and add
spdk_bdev_is_dif_check_enabled to bdev APIs.
Added enum is intended to improve usability. If no enum, the
caller will have to get raw data of flags and mask each bit.
Change-Id: Ia46a37a9684dc968dcc51963674f0a9963e0cd4d
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/443339
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This patch is for DIF settings.
Add fields dif_type and dif_is_head_of_md to struct spdk_bdev and
add APIs spdk_bdev_get_dif_type and spdk_bdev_is_dif_head_of_md to
bdev APIs.
The fields dif_type and dif_is_head_of_md are added to the JSON
information dump.
Change-Id: I15db10cb170a76e77fc44a36a68224917d633160
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/443184
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Next patch will introduce enum spdk_dif_check_type for user to
know easily if checking DIF field is enabled or not.
This patch renames bitmask macros from SPDK_DIF_*_CHECK to
SPDK_DIF_FLAGS_*_CHECK to avoid mis-interpretation .
Using FLAGS was derived from SPDK_NVME_IO_FLAGS_PRCHK_* in
include/spdk/nvme_spec.h.
Change-Id: I89e155d047352f54091c14b9251464cd3a72a162
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/443338
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
To support DIF, bdev will need to expose the following information:
- Metadata format
- Block size
- Metadata size
- Metadata setting (interleave or separate)
- DIF settings
- DIF type 1, 2, or 3
- DIF location
- DIF check types
- Guard check
- Reference tag check
- Application tag check
This patch is for the metadata format. Subsequent patches will do for the DIF
setting and DIF check types.
Add fields, md_len and md_interleave, to struct spdk_bdev and add APIs,
spdk_bdev_get_md_size and spdk_bdev_is_md_interleaved, to bdev APIs.
The fields, md_len and md_interleave, are added to the bdev JSON infomation dump.
DIF will be used only in the NVMe bdev module and the upcoming virtual
DIF bdev module first. But additional required storage by md_len and md_interleave
will be very small and they are simple. Hence add them to struct spdk_bdev simply.
Change-Id: I4109f6a63e6f0576efe424feb0305a9a17b9b2e8
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/443183
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The timeout is set to 0, so it never waits anyway. But
this should be 0.
Change-Id: I8b4058017a91b647ea9324f1474a732921c389f0
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443647
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
This doesn't fix any bug, but it makes more sense to leave the qpair
in the NVME_TCP_PDU_RECV_STATE_AWAIT_PDU_READY state until it
receives at least one byte.
Change-Id: Ic5f34a733a80b58f65a1334fae7e07dbded2b3d0
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441811
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The `len` field wasn't used at all and `reserved` is
no longer needed after we removed the paddr in the
previous patch.
This effectively cuts down spdk_mobj struct size by half.
Change-Id: Ica39f3a30e14ec1275a87d827dc41df5df9cf623
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443483
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
The physical addresses in iSCSI are completely unused
as iSCSI does not perform any DMA on its own.
Change-Id: I350037b708a9f36f423e6ca6f7c822d8b6b95116
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443482
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
We explicitly checked for one of the strings in the
parsed RPC request even though it's required for the
entire request to parse successfully. The extra check
is now removed.
Change-Id: I19c446786e4ac88b88f14e18dc5258f31b1a87f1
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443317
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Since we no longer use external events and we access
all vhost devices synchronously, we no longer need
to dynamically allocate our RPC request contexts. They
can be put just on the stack.
Change-Id: Ie887607b67451aba4f3404c4b9551e6424335beb
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/440380
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Removed their various usages inside the core vhost code
together with the external events themselves. External
events were completely replaced by spdk_vhost_lock()
and spdk_vhost_dev_find().
Change-Id: I1f9d0268c27a06e2eecab9e7d179b1fd54d4223d
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/440379
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Replaced them with inline code that performs exactly
the same but is shorter and easier to follow. External
events were replaced by spdk_vhost_lock() and
spdk_vhost_dev_find().
Change-Id: Id46a619c592c20a573664b54efc097489e9bb893
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/440378
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Currently infrequent cases in request completion path are marked as
unlikely. This patch applies that to submission path.
These cases are infrequent and marked using unlikely marco:
a. The sq tail reaches the end of queue.
b. The sq tail equals to sq head. (never happen if FW runs correctly)
c. The qpair is admin queue.
Change-Id: I8b873a18615788f2efbf7c683aad710c7007a082
Signed-off-by: lorneli <lorneli@163.com>
Reviewed-on: https://review.gerrithub.io/c/443451
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
The management channel was used in the RDMA transport prior
to the introduction of poll groups and made its way over to
the TCP transport when it was written. Eliminate it in favor
of just using the poll group.
Change-Id: Icde631dd97a6a29190c4a4a6a10a0cb7c4f07a0e
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442432
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
This was only temporarily required for polling. With
a per-group aio ctx, it isn't needed anymore.
Change-Id: Ie59b50a4700f0f99dea470f857d187ac656dd229
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443467
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
We only need one aio context for the entire set of channels
sharing a thread.
Change-Id: I1143247901586efe50530b28323ddb923bc6b242
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443314
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This is marginally more convenient.
Change-Id: I9989d687b80051ccb2e07edc5e1efdbca75e8716
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443313
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
This will be used later.
Change-Id: I12b07756a13d03a34c9705306d720c1db7ecb15c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443312
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
This wasn't actually necessary. The next patch in this series will
change the way aio is used such that only one aio context is
polled for the entire group of channels on a single thread.
Change-Id: I05c4d824d9c63a51c8a2d608d84c184f249f66d7
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443311
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This isn't used just yet, but will be necessary temporarily
during this patch series.
Change-Id: I7f04426c27e3fe0417e2f60bac28217fa44c0cb2
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443310
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Move it next to the other channel definition.
Change-Id: I9ec33c135836d3dc326abe4ce7588e7a2eff77d4
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443309
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
These didn't need to be visible.
Change-Id: I337a02802cac4431b4abd9a922408d4147801565
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443308
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Small static function only called from one place, so
just inline it.
Change-Id: Ibc54f790da55dd1635d81181208b1d506550ca9c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443307
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
It does not need to be in the header file.
Change-Id: I5c489de81e48b11d02b66cbdd6d9ac05eae16429
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443306
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
max_read_depth should be based on max_qp_init_read_atomic, or the
maximum number of read values that the initiator will accept as
outstanding.
The device attributes object contains values for both the initiator
(remote side) and the target (local side). All attributes with the name
init in them are meant to correspond to the initiator. The
qp_read_atomic value represents the number of reads and atomic
operations that can have this device as the target. qp_init_read_atomic
represents how many read operations the initiator has said that we can
have outstanding that have the initiator's rdma device as the target.
Since this number represents how many outstanding reads we will send to
the initiator at once, we should use the qp_init_read_atomic value.
Change-Id: Iacc044e8321080de8accd9128ac3777bbb948afc
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442409
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
ftl_process_reloc should process free_queue in first place
(this will start read operations) and then process write queue.
Change-Id: I3a44b3651cc1526f8a024330472f94aa8d818193
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443403
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Id44f9de4500ec2be45aa4203c5945b1501fbdb21
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443236
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This function gets used as a function pointer, which
seems to keep the compiler from trying to inline the
function. Stack manipulation was showing up in the
perf profile pointing to this. Marking the function
as inline gets it actually inlined in the hot I/O
path.
Improves bdevperf microbenchmark from 78M to 85M IO/s.
Cores are virtually identical - 11.4M on core 0 and
10.4-10.6M on remaining cores.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iadced071dfc07fc09db6da3571c930988b2dc3fd
Reviewed-on: https://review.gerrithub.io/c/443278
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This keeps the hottest structures at the head of the
cache and helps improve performance.
Improves microbenchmark (8 null bdevs on 8 lcores,
bdevperf seq read with qd=1) from 67M to 78M on my
Xeon E5-v3 system. Core 0 performance remains about
the same (10.7-10.8M) but others cores improve from
around 8.0M each to 9.4M.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ia3ccf94ab39b6f911127f0bd1016e352027b11fc
Reviewed-on: https://review.gerrithub.io/c/443277
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I2bad16b6649c279448a3c662ab7b035dbe0a4bfb
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443251
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
The ocssd spec and buildtime-check already ensures
sizeof(struct spdk_ocssd_geometry_data) is 4096, so we can use
struct spdk_ftl_dev::geo as buffer directly.
Change-Id: Id7a52f978d80284fe941d9f5d7bc7219518871e8
Signed-off-by: wuzhouhui <wuzhouhui@kingsoft.com>
Reviewed-on: https://review.gerrithub.io/c/443069
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
According to the current implementation, the functions that called by
bdev_ftl_init_bdev() will not call callback if they return errno.
Besides, the caller of bdev_ftl_init_bdev() (e.g.
spdk_rpc_construct_ftl_bdev()) don't expect callback be called if callee
return errno.
Change-Id: I5f36d5332ac66db65bb2090e9625a73b1107306b
Signed-off-by: wuzhouhui <wuzhouhui@kingsoft.com>
Reviewed-on: https://review.gerrithub.io/c/443068
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
There is no need to hold g_ftl_bdev_lock when calling bdev_ftl_create.
Besides, the functions (e.g. bdev_ftl_add_ctrlr) that called by
bdev_ftl_create will lock g_ftl_bdev_lock again.
Change-Id: I74751822364e16c58a3065dc78f8a4dce157e925
Signed-off-by: wuzhouhui <wuzhouhui@kingsoft.com>
Reviewed-on: https://review.gerrithub.io/c/443066
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Vhost external events no longer do any asynchronous
calls, they only lock the vhost mutex and directly
call the provided function. The mutex encapsulation
isn't worth the additional complexity of splitting
each vdev-handling code into multiple functions, so
we expose low-level APIs that should eventually
replace external events entirely.
Instead of:
```
static int do_something_cb(struct spdk_vhost_dev *vdev, void *arg)
{
struct my_data *ctx = arg;
/* access the vdev and ctx */
free(ctx);
}
struct my_data *ctx = calloc(...);
rc = spdk_vhost_call_external_event("my_vdev", do_something_cb, ctx);
if (rc != 0) { /* err handling */ }
```
We can now do just:
```
spdk_vhost_lock();
vdev = spdk_vhost_dev_find("my_vdev");
if (vdev == NULL) { /* err handling */ }
/* access the vdev any context data */
spdk_vhost_unlock();
```
Change-Id: I06e1e149d6dd006720b021d3bef8d9b7bfaeceaa
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/440377
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
_allocate_bit_arrays() needs vol->backing_dev set which was being done
after the call.
Change-Id: Ic8c36c98aee94fbd8230273638011b948cd95675
Signed-off-by: paul luse <paul.e.luse@intel.com>
Reviewed-on: https://review.gerrithub.io/c/443048
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: wuzhouhui <wuzhouhui@kingsoft.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This is only needed within the c file. It doesn't
need to be in the public header.
Change-Id: I0e072ea5eddc6edc84faecee9ef50fb2c20dbb24
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442426
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This is a holdover from before poll groups were introduced.
We just need a per-thread context for a set of connections,
so now that a poll group exists we can use that instead.
Change-Id: I1a91abf52dac6e77ea8505741519332548595c57
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442430
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The READ and ATOMIC in the comment above are capitalized, so
make this all caps too.
Change-Id: I49fae2ceb826b22953d9b26d42b95f17e2dac617
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442427
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
request.c didn't have much code, so let's collapse
it into ctrlr.c and make that the place where all
software emulator of the NVMe controller, including
request handling, is done.
Change-Id: Id7c98010cb222a414a5aa0b78bfb299a0ffc418f
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/440592
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Previously, all I/O commands were implemented by simply
passing them to the bdev layer. Now, some I/O commands will
be emulated. Prepare for that by moving the code for this
function to ctrlr.c, where the emulation will occur.
Change-Id: Id34e5549e5ce216d602fb347b4506fbd324eed4e
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/440591
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This was previously very unmap specific. Make at least the top level
DSM call more general purpose by eliminating the unmap_ctx.
Change-Id: I9c044263e9b7e4ce7613badc36b51d00b6957d3a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/440590
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
These are left over from the removal of virtual mode over a year ago.
Change-Id: Ia797c4570bf9090346ff22ab9c7d719a78d023d0
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/440589
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This was only used by the target, and it didn't actually need it.
Change-Id: Ibcef410165efdc16077da24419580ed51b087d70
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442440
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This type was actually two entirely different types for
the initiator and the target, so just make it void.
Change-Id: I15512d9d4efd790dce0fa4323b7230de66144bc6
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442438
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
After passing the check of protective mbr, there is a high probability that
this bdev is in gpt format. If parsing primary table fails, read the secondary
table and try to get partition info from it. When parsing secondary table
successfully, add a warning log to notify users that primary table is broken.
Change-Id: I4f16edcdd57b9cde8d8cc74ec88ba95b97bd6b63
Signed-off-by: lorneli <lorneli@163.com>
Reviewed-on: https://review.gerrithub.io/c/441201
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: wuzhouhui <wuzhouhui@kingsoft.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Modify existing code of parsing primary partition table to support parsing
the secondary.
Main difference of these two tables is that they have inverse buffer layout.
For primary table, header is in front of partition entries. And for secondary
table, header is after partition entries. So add helper functions to extract
header and partition entries buffer region from primary or secondary table
based on current parse phase.
Split the exported funtion spdk_gpt_parse into two functions spdk_gpt_parse_mbr
and spdk_gpt_parse_partition_table. So spdk_gpt_parse_partition_table could be
used to parse both primary and secondary table.
Change-Id: I7f7827e0ee7e3f1b2e88c56607ee5b702fb2490c
Signed-off-by: lorneli <lorneli@163.com>
Reviewed-on: https://review.gerrithub.io/c/441200
Reviewed-by: wuzhouhui <wuzhouhui@kingsoft.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Unmap, discard, write zeros will be sent down from
higher stack. Remove these IOs for the QoS limit.
Change-Id: Ieb3cc19f31c43f8ddf8f8d2fd338f442ef48b679
Signed-off-by: GangCao <gang.cao@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442673
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Liang Yan <liang.z.yan@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
When a connection goes to close and has no I/O outstanding,
the current_recv_depth was being decremented beyond 0 and rolling over.
If the poll group then finds a successful receive completion on the next
poll (for a command that arrived prior to starting the disconnect but
hadn't been processed yet), it would trip the max queue depth check
added recently and start another disconnect process. If only one command
arrives in this window, everything actually works out ok.
However, if there are two receive completions sitting in the completion
queue after the disconnect process is started, the first one does the
double disconnect and the second one does another disconnect which ends
up dereferencing a null pointer.
Since there is always a special reserved slot for the dummy recv, don't
do decrements or increments of the current_recv_depth for the dummy
recv. This allows the code to still enforce the actual max_queue_depth
on recvs without underflowing or overflowing the counter.
Change-Id: I56c95b2424e956a3b007b25c50cbf47262245b8f
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442642
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Seth Howell <seth.howell5141@gmail.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
trace_record is used to poll the spdk trace shm file
and store new entries from it to another specified trace file.
This could help retain the trace_entires from the overlay of
trace circular buffer
Note:
* trace_record reads the input tracefile into a process-local
memory and writes trace entries to the output file only at shutdown.
* trace_record can be shut down on SIGINT or SIGTERM signal.
A usage sample is:
./spdk_trace_record -s bdev_svc -p <spdk app pid> -f trace.tmp -q
Change-Id: If073a05022ec9c1b45923c38ba407a873be8741b
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/433385
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
This RPC doesn't really work in some cases - for example,
trying to delete one NVMe namespace bdev from a controller
with multiple namespaces, or just one virtio SCSI device
from a virtio-scsi controller. We've previously kept it
and marked it as "debugging only" - but every bdev module
has its own RPC method now for deleting what it constructed,
so keeping the generic delete_bdev RPC is asking for
trouble in some of the cases mentioned above. We'll remove
it in the 19.04 release.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I639254b32a3e1c840a4e9ae2658c42f4f321b676
Reviewed-on: https://review.gerrithub.io/c/442616
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This was marked deprecated in the v18.10 release, so
remove it now before v19.01 is tagged.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I57673a5ab475b97c812bebcefd77ff90d9305d1c
Reviewed-on: https://review.gerrithub.io/c/442412
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This includes properly detecting when a key's name
extends past the end of the valid data.
Note that the unit tests were using sizeof() instead
of strlen() since some of the strings contain
NULL characters. This means that we should be
subtracting one to account for the implicit null
character at the end of the string. Note that the
iSCSI spec only says that the key/value pair has to
end with a null character - a key/value pair that
is split across two PDUs will not have a NULL character
at the end of the first PDU.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ie95d6dd3b9ffa6a3902a31771ac4edb482418cce
Reviewed-on: https://review.gerrithub.io/c/442450
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Since params are parsed directly from the PDU's data
buffer, we need to know the end of the valid data. Otherwise
previous PDUs that used this same data buffer may have left
non-zero characters just after the end of the text associated
with a LOGIN or TEXT PDU.
Found this bug while debugging an intermittent Calsoft test
failure. Added a unit test to reproduce the original issue,
which now verifies that it is fixed.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ic3706639ff6c4f8f344fd58c88ec11e247ea654c
Reviewed-on: https://review.gerrithub.io/c/442449
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
For the number of trace entries, change strtoull to spdk_strtoll
because no issue will occur by the change.
Besides, getopt guarantees that if an argument is followed by a
semicolon, optstring of it is not NULL. spdk_app_parse_args()
had unnecessary NULL pointer check related with this. Hence
remove those NULL pointer checks too.
Change-Id: I33d0328205d1765f70f70fc734d0d8b4165fef5e
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/441641
Reviewed-by: wuzhouhui <wuzhouhui@kingsoft.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Without this check valgrind complains that we are using
uninitialized variable.
Change-Id: I5cb73d10e167004f6e4df9e3621ec3b35ec2448d
Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442519
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
In SPDK, we will build isa-l with no shared option
and then integrate it into SPDK. And we do not need
to install isal in the system libaries.
Note: ocf build in autobuild.sh now needs to build
include/spdk/config.h before building the ocf library,
to ensure that header is available in a clean build
environment.
Change-Id: I3f0ce6932b386de17a77cf5bfdfd738b22417e2d
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Signed-off-by: paul luse <paul.e.luse@intel.com>
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441279
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Chunyang Hui <chunyang.hui@intel.com>
The io_device associated with the aio bdev was only
getting unregistered when the aio bdev was explicitly
deleted - not in the implicit deletion path at shutdown.
Move the io_device_unregister into the destruct_cb -
this makes sure the io_device is always unregistered, whether
the bdev is getting unregistered via an explicit RPC or
implicitly in the shutdown path.
Fixes#618.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I44b77f5c38339f4cf97b02c0ee4002bf5fcc9998
Reviewed-on: https://review.gerrithub.io/c/442119
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Verify that the namespace used is formatted with a supported LBA format
(4K block size).
Change-Id: I59e2ed71354e8530d9fa0e3f6b323ded83097afa
Signed-off-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441881
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Wojciech Malikowski <wojciech.malikowski@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Existing specific vhost socket messages for vhost-nvme target
will get some information from backend target before start_session
call, so we should iterate the associated nvme controller by vid
but not session.
Fix issue #628.
Change-Id: Ia400bf33895a0feee0058a870f26b0ff72b7556f
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442498
Reviewed-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-by: Liang Yan <liang.z.yan@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Marked as deprecated in 18.10.
Change-Id: I40d0e6103623aee6e6a0b9fa6e82f7b826ca1fe6
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442420
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Error check of strtol is left to users of it. But some use cases
of strtol in SPDK do not have enough error check yet.
For example, strtol returns 0 if there were no digits at all.
It should be avoided for each use case to add enough error checking
for strtol.
Hence spdk_strtol and spdk_strtoll do additional error checking
according to the description of manual of strtol.
Besides, there is no use case of negative number now, and to keep
simplicity, spdk_trtol and spdk_strtoll allows only strings that
is positive number or zero.
As a result of this policy, callers of them only have to check if
the return value is not negative.
Subsequent patches will replace atoi to spdk_strtol because atoi
does not have error check.
Change-Id: If3d549970595e53b1141674e47710fe4dd062bc5
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-on: https://review.gerrithub.io/c/441626
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: wuzhouhui <wuzhouhui@kingsoft.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
This was marked deprecated in 18.10
Change-Id: Id47e770b0388c935fe684aeef7a9824f24cef47f
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/442416
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Add rte_pause to waiting while loop
This commit also adds spdk_pause as interface for rte_pause
Change-Id: I56e1023731e2e78febaa4f45808d6f07656d290f
Signed-off-by: Vitaliy Mysak <vitaliy.mysak@intel.com>
Reviewed-on: https://review.gerrithub.io/c/436494
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Use mempool for allocating OCF requests with constant size
Previous method was using mallocs which is significantly slower
Change-Id: I539ff22efc18fbd353ceb2687ea211d2baaa7523
Signed-off-by: Vitaliy Mysak <vitaliy.mysak@intel.com>
Reviewed-on: https://review.gerrithub.io/c/439680
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Maciej Szwed <maciej.szwed@intel.com>
One of the messages we send on memory hotplug event is
SET_VRING_BASE, which tells vhost e.g. the position in
a vring it should start processing requests from. Sending
this message with any outstanding I/O could cause that
I/O to be never processed as it could be at a vring
position that won't be practically polled.
To fix the above, we don't send SET_VRING_BASE message
on memory hotplug event anymore since it's completely
unnecessary. It was sent together with a couple other
messages that would reinitialize the vring, but we know
vrings occupy a memory buffer that won't be hotremoved
during vring lifetime. We also know that vring GPAs will
never change. Hence we can initialize the vrings just
once on device start now.
We still need to send SET_VRING_ADDR after updating the
memory table, as rte_vhost depends on it to apply that
new memory table. Luckily, this single message doesn't
cause us any trouble.
Change-Id: I2125099f1cf3f8c76e8160ec819bd1a9a3e7823c
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/439436
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
We assumed the second descriptor in an I/O descriptor
chain will always point to a payload buffer, but in case
there is no payload, the second descriptor will point to
a response buffer. The vhost code doesn't provide proper
checks to handle such case, so to avoid various errors
down the stack, we just fail all requests with no
payload.
Change-Id: I6785c2843d6db4fc17e68e03562c2a1530bb469b
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/437187
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: <dstepanov.src@gmail.com>
This ensures that SPDK will detect descriptor chains
that are too long.
The additional check in vhost block stands as an
optimization and makes us fail the corrupted I/O early.
Change-Id: Icceaa0dd938dca96a1872e5ee96bf6a151fdd9e7
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Signed-off-by: Dima Stepanov <dstepanov.src@gmail.com>
Reviewed-on: https://review.gerrithub.io/c/433641
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
SPDK doesn't provide sufficient runtime checks to properly
handle clients with memory sizes that aren't 2MB multiples
and could potentially segfault during I/O processing.
That's why we'll reject such clients now.
Change-Id: I34e85be5b5c6df863371d0ad688f228ed44107ff
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/433640
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Add new RPC method for OCF bdev: get_ocf_bdevs
It is useful in respect to not registered OCF bdevs
which do not appear in standard get_bdevs call
Change-Id: I8a5fc86a880b04c47d5f139aa5fa4d07ca39c853
Signed-off-by: Vitaliy Mysak <vitaliy.mysak@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441655
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Add basic handling of base devices hotremove
When either core or caching device gets unregistered,
the vbdev_ocf does so as well
Change-Id: I05769f714bf22cb320558fed86adc8c3d8a0a185
Signed-off-by: Vitaliy Mysak <vitaliy.mysak@intel.com>
Reviewed-on: https://review.gerrithub.io/c/435729
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Since we have different requirements for submitting RDMA read and write
operations, we should track them separately so that we don't block
writes when the device does not have enough resources for read
operations.
Change-Id: I5d6424c0e26f2f5362866d1bb21eb46700c245da
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441794
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Before, the number of WRs and the number of RDMA requests were linked by
a constant multiple. This is no longer the case so we need to make sure
that we don't overshoot the limit of WRs for the qpair.
Change-Id: I0eac75e96c25d78d0656e4b22747f15902acdab7
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/439573
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Add OCF module based on OCF meta-library
Open CAS Framwework (OCF) is high performance block storage
caching meta-library
It is open-source, published at https://github.com/Open-CAS/ocf
With this patch OCF-enabled device is represented in SPDK
as virtual bdev having core and caching devices as its base devices
This patch includes implementation of:
* OCF top adapter (vbdev_ocf.c)
* OCF bottom adapter (dobj.c, data.c)
* Adaptation layer for OCF (env/)
* OCF context abstractions (ctx.c)
Adaptation layer and context abstractions are not dependent on SPDK bdev
OCF bdev supports reads and writes, configured at startup
Other features will be added with separate patches
Change-Id: Ic2dcab378c8238d16f1e4b64d4374bdf257565bc
Signed-off-by: Vitaliy Mysak <vitaliy.mysak@intel.com>
Reviewed-on: https://review.gerrithub.io/c/435708
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
_spdk_bdev_io_submit uses the bdev_io->internal.in_submit_request
flag to ensure we unwind in cases where the I/O is completed
inline (i.e. malloc or null bdevs). But when an I/O gets queued
for QoS, and then we iterate through the queued I/O in
_spdk_bdev_qos_io_submit(), this flag was not getting set
when those I/O would get submitted to the underlying bdev. This
would allow for _spdk_bdev_qos_io_submit recursion, resulting
in all kinds of different types of memory corruption.
Fixes#613.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I29263f4e7b2ead60f08b60474d210defa803348c
Reviewed-on: https://review.gerrithub.io/c/442127
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Liang Yan <liang.z.yan@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: GangCao <gang.cao@intel.com>
It is perfectly valid for a bdev to not support the
unmap command - there's no need to print an ERRLOG
when a SCSI INQUIRY 0xB2 (LOGICAL BLOCK PROVISIONING)
command is sent to query if the LUN supports it.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I18389df4d55a1ac186707d624ddea292a5470e80
Reviewed-on: https://review.gerrithub.io/c/442104
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Technically this check is correct, but the Linux kernel
target doesn't have it, and older versions of libiscsi
have a bug which result in stale ExpStatSN getting sent
resulting in terminated connections with the SPDK iSCSI
target at high queue depths.
Fixes#600.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I76eaf9dee2d733bfa3f8d43b86528de6b556cbd6
Reviewed-on: https://review.gerrithub.io/c/441981
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Those cases should never occur. Klocwork pointed out
possible dereference based on the returns later in
the functions.
Change-Id: I282a56f3f415f85c38e9c451cbb10bc80fc6176b
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441546
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This gives us more realistic control over the number of requests we can
submit.
Change-Id: Ie717912685eaa56905c32d143c7887b636c1a9e9
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441606
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
rw_depth was a misinterpretation of the spec. It is based on the value
of max_qp_rd_atom which only governs the number of read and atomic
operations. However, we were using rw_depth to block both read and write
operations which is an unnecessary restriction. write operations should
only be governed by the number of Work Requests posted to the send
queue. We currently guarantee that we will never overshoot the queue
depth for Work requests since they are embedded in the requests and
limited to a size of max_queue_depth.
Change-Id: Ib945ade4ef9a63420afce5af7e4852932345a460
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441165
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This will be necessary later on when we need to throttle send and recv
requests in software.
Change-Id: Ifb25eaabd15e101fbfc2959a08a321f80857b280
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/441604
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>