Names for the NVMe bdevs are now assigned by the user.
This means the same name will always be assigned to the
same device, even across restarts.
Change-Id: If9825ec9abcb5236b4671bc44a825e4f0d704fe3
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
remove the unnecessary rte_eal_pci_probe_one() in function
spdk_pci_device_detach(), this could cause error message when we
terminate the application, it will also not make sense try to probe one
device after we detach it, we could call spdk_pci_nvme_device_attach()
instead of spdk_pci_nvme_enumerate() when we have one given device address,
dpdk will try to scan the device and add it back to pci device list then.
Change-Id: I35f5bb412249bb20da57394f0531c10a49691906
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
This clarifies the relation between the values assigned to sg_list and
num_sge (no functional change).
Change-Id: I8e81d47dd97a033b17cd3b813b06e4887127146c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
All devices must be specified by BDF. Add support for scripts
to use lspci to grab the available NVMe device BDFs for the
current machine.
Change-Id: I4a53b335e3d516629f050ae1b2ab7aff8dd7f568
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The mappings are all static, so it isn't interesting
to print them out on each I/O.
Change-Id: I85301b4518d4523a7c031f6ca9ff678d91428504
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This allows pipelining of READ/WRITE with completion.
Change-Id: Ib3ab5bffb8e3e5de8cbae7a3b2fff7d9f6646d2d
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
This allows static initialization of the scatter
gather list as well as future optimizations
around pipelining commands with data.
Change-Id: I8af8f3e3425610bc720677c9bc84f163cfb6278a
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
The first version of the Linux kernel NVMe-oF initiator had
a bug when reporting queue size where it was off by 1. We
had a workaround to deal with this. Now that the kernel
has been fixed, remove the workaround.
Change-Id: I0ad4a5c6db68cfa9683ab93e6f5210772c713b55
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Move claimed flag to struct spdk_scsi_lun and remove RPC call that allow
SCSI LUN to be deleted by user.
Change-Id: I0fe57d33ab017816ab4799bce259807735e0c783
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Register all spdk_malloc() memory regions as ibv_mr in a spdk_mem_map
so we can look up the RDMA key for the user's buffer and pass it in the SGL
directly, rather than copying through a pre-registered bounce buffer.
Change-Id: I7340bc2020b5256750c95dbd24ba67961404e5e7
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The extended LBA format flag should be initialized after namespace
capability flag.
Change-Id: Iad479b454bb4e31120c17d40ae23937a099c6f8f
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Change SCSI device configuration format from "DevX LUN0" to "Dev X LUN0"
This allow checking configuration against silly errors when device
number is out of range.
Also assert exactly only one LUN is given.
Change-Id: Idccd6878119282fc51947b092bdda7ae06aa94ad
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
The send completions must be processed prior to the
recv completions. However, if the completion queues
are separate this leaves a small window where
a send+recv completion arrive between polling
the send_cq and the recv_cq, resulting in the code
seeing the recv completion prior to the send
completion.
By combining the completion queues, this eliminates
any potential gap. The send completion will always
be processed before the recv completion.
Change-Id: I06bfef6af48559d0b9e00524ebc10f1a102e7387
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The sq_head handling is already done in
spdk_nvmf_rdma_request_send_completion, so do not need to
do again.
Change-Id: I527ff8adfcbdf43ac79794cb5c7777c0e8ef6973
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
The env layer already understands that shm_id < 0 means that
multi-process is not enabled. Leave shm_id defaulted to -1 so that
other code can detect when it is not set.
Change-Id: Ifd1667598d55c216f95f13561dc2a550677db5f4
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
These options are only necessary for applications that intend to be used
in a multi-process configuration.
Change-Id: I3e1fa0682611d92267d0ad1b3f2016dc926b96b6
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Previously, if the maximum number of virtual namespaces had already been
reached, adding a bdev to a subsystem would claim it without actually
adding it to the ns_list array.
Change-Id: Iab68ad1a75748c0e88232240185695aac08d71d2
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
They are not used outside of their respective files.
Change-Id: I754834e7354caec877cd2fe193e56854e5a34e20
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This patch fix the issue when large IO failed:
when we handle the read command which need split, we need make
sure all the subtasks to be handled if one of the subtask failed,
this will make sure the command have chance return back to initiator.
Change-Id: I0c01e1a34c6179fce37ab52c8121268b6ee31102
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
The actual uses of intrinsics are already guarded by feature-specific
ifdefs in nvme_pcie_copy_command(), but the header itself should also
only be included when it will actually be needed.
Change-Id: Ife65d6432b8dfd9d9db80fe4e385ab76491874c0
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
SPDK_COUNTOF works like sizeof, except it returns the number of elements
in an array instead of the number of bytes.
Change-Id: I38ff4dd3485ed9b630cc5660ff84851d0031911f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This patch adds a library, application and test scripts for extending
SPDK to present virtio-scsi controllers to QEMU-based VMs and
process I/O submitted to devices attached to those controllers.
This functionality is dependent on QEMU patches to enable
vhost-scsi in userspace - those patches are currently working their
way through the QEMU mailing list, but temporary patches to enable
this functionality in QEMU will be made available shortly through the
SPDK github repository.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Signed-off-by: Krzysztof Jakimiak <krzysztof.jakimiak@intel.com>
Signed-off-by: Michal Kosciowski <michal.kosciowski@intel.com>
Signed-off-by: Karol Latecki <karolx.latecki@intel.com>
Signed-off-by: Piotr Pelplinski <piotr.pelplinski@intel.com>
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Signed-off-by: Krzysztof Jakimiak <krzysztof.jakimiak@intel.com>
Change-Id: I138e4021f0ac4b1cd9a6e4041783cdf06e6f0efb
This avoids registering PMDs that are not used by a given
application. For example, an app may wish to *not* use
ioat - in this case, ioat PMD would not be registered with
DPDK, and we would not waste time probing these devices
when probing other devices like NVMe.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: If378e40bde9057c7808603aa1918bcfe80fa0e9d
These allocations need to be from memory registered with the SPDK env
library to allow future work on automatic ibverbs memory registration.
Change-Id: I6ec6999ecd6d6bf6ba4ab159630f7d01f3d46154
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
It is no longer used now that AER handling holds the request until it is
triggerred.
Change-Id: I71a75e86f82bc06f677cf26defa701e60b9aa1bd
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This will allow returning a different default value per mem map.
Change-Id: I94d3de197acfb2e6ad40092ab0588ba4e951af80
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Add a top-level structure that can be reused for other kinds of memory
address translations.
Change-Id: I046f98b76b4e98087d90095d6e9dea5cd6ab7898
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This patch make the function spdk_scsi_task_process_null_lun() as public and
finish the task immediately once we get task in iscsi layer.
Change-Id: I4ada027d3a324dce8ef0d0f7706dbc14184ead96
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
After checking the code, aerl in our session is 0,
so there will be only 1 AER. So currently,
we will only handle 1 AER case.
When the AER event is triggered by real NVMe device owned
by the subsystem, it notifies all sessions belonging to
the subsystem.
Change-Id: Ia80fb0f03e893c20d8dd14afbed8db10db38301c
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
By the time the module is unloaded, the reactors
have already stopped. That means the event will never
actually fire. Simply remove it.
Change-Id: I4fe371ae7a679d51254d0267fbbbf74c3e9cf477
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The wr_id should never be NULL - it will always correspond to a request
we previously posted. Convert the check to an assert() so we notice if
this ever happens (which would indicate a programming error somewhere
else).
While we're here, add a more robust check to make sure the request is
actually in the correct array of requests for the connection being
polled (also in an assert, since this should never fail in normal
execution).
Change-Id: I855763d7d827fb8cf00a775c7bc2ccb579db8d0f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Kernel nvmf host always tries to connect nvmf target
when we does not issue nvme disconnect command. Thus,
we face rdma_create_qp issue, the reason is that we call
rdma_listen too early, and the event retrieved from
rdma_cm_get_event is too late.
And this patch solves this issue.
Change-Id: I153a8aea7420a86a236301dad9bd54af97f60865
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
The function spdk_iscsi_transfer_in will handle the task if the
status is not SPDK_SCSI_STATUS_GOOD.
Change-Id: I61155ffa056b3eac551f215d50e1808e5389fdb5
Signed-off-by: cunyinch <cunyin.chang@intel.com>
Calling spdk_nvmf_request_complete to complete spdk_nvmf_request
causes some fields in completion queue entry not set correctly.
Calling spdk_nvmf_request_complete fixes the problem.
This can be used for issuing an abort for the timed-out command.
Change-Id: I3c5727fdddc156cd7c8f99afbc3e6da8e73bba56
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Make sure the compiler arranges the fast path as the fallthrough case by
annotating the checks in spdk_vtophys().
Change-Id: If0fc3149297131894b5c7a94bff31bf8ee40326e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Now that all DPDK memory is registered at startup, spdk_vtophys() never
needs to add new translations to the vtophys map. This means that any
lookup that fails to find an allocated map_1gb will always return
SPDK_VTOPHYS_ERROR rather than trying to allocate it and then failing
the lookup anyway.
Change-Id: I7e6f7af183199651f5808a17810a17970b0e3331
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
vtophys_get_paddr() and vtophys_get_dpdk_paddr() are doing similar
things; combine them into one function that works for all DPDK
memory addresses.
Part of the vtophys test is temporarily disabled until the next commit,
which will register all DPDK memory at startup and stop lookiing up
addresses at runtime.
Change-Id: I91312837aa1e6170bacaf3b0d2adbdc4391d3afa
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This just moves the lookup of the physical address up one level - now
_spdk_vtophys_register_one() is only responsible for filling out the
mapping table, not looking up the translation.
Change-Id: I9fd5b85da623e403fda0563b6bdebd4aaaf42864
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Rather than storing the page frame number, just store the full physical
address of each 2 MB page. This simplifies the lookup code and makes
the map generic (values are inserted and retrieved without any
modification) for future uses.
Change-Id: Ib1081513a0682f6b8b908f3401c00d87b00f484c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
No need to build a whitelist and scan anymore - the NVMe
driver can directly attach to a specified device.
Change-Id: Ie60c09b6ab37a7f068c496f0cad53bfdc8617349
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Move the ibv_recv_wr initialization in
nvme_rdma_alloc_rsps. Thus we can save some
CPU times
Change-Id: Id449b2684290431f8b3ba97ec4058171d34038bf
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
We do not need to set it for submission since the contents
are same
Change-Id: I345094e2e8a858b318be73d28f09393566587d95
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
When performed limitation iSCSI tests, 128 target nodes with 1
connection for each target node, for IO bigger than 256KiB iSCSI
target will report run of out task pool issue sometimes. When
all the iSCSI parameters with default values, each connection
will consume maximum 189 tasks, we hardcoded the task pool with
16384, so 189 * 128 connection will exceed 16384. Increase the
default number from 16384 to 32768 will fix the issue.
With 1MiB block size and queue depth with 128 for each connection,
there will be 64 outstanding iSCSI commands in the iSCSI target,
for Writes, the maximum R2T number is 4, so the maximum tasks for
the 4 R2T is (1 + 16) * 4 = 68, 8KiB for the first burst task, 16
for the data segment. For Reads, the maximum 64 data in segment can
be used as 4 iSCSI Read commands. The rest 56 iSCSI commands will
cost 56 tasks, so the total number is 56 + 64 + 68 = 188, 1 additional
task for NOP task.
Change-Id: I945871cbe3076139f08c2ef647af2d9c84601dcb
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
This is consistent with the rest of the RPC calls that report a number
of blocks, and it matches the field in the split_disk structure.
Change-Id: Ie25534617112d65979c317fe13e05a6c32520a15
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The driver_specific object should contain a single object with the
blockdev driver's name so that the user can determine how to interpret
it. This matches the NVMe blockdev driver.
Change-Id: I434b910a95dd527363af78469dc900e9d19ec12e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Now that namespace splitting support has been removed from the NVMe bdev
in commit efccac8 ("bdev/nvme: remove NvmeLunsPerNs and LunSizeInMB"),
the block_size and total_size fields in the NVMe bdev's driver_specific
config data are redundant. The generic get_bdevs num_blocks and
block_size fields provide the same information.
Change-Id: I080d2017d608716a593bb553ee667e9c4017ffb7
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Move cb_arg to the first argument to match the other NVMe callback
function signatures.
Change-Id: I4e699c8071dcb7ba4ce3cdb82ee985600208204c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Whether a nvme command having data transfer cannot be completely
determined by command opcode. For set features command, some features
don't require data transfer.
Change spdk_nvmf_request_prep_data to fix this issue.
This has been reported for a number of different device
types. We suspect these devices are technically out of
spec, but they work with most other available NVMe
drivers on accident.
Change-Id: I529cfc03fc314cbab2a1cd40620bf1dd5b54182d
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reason: the 4 fields of struct ibv_recv_wr is already
set in the following 4 lines.
Change-Id: I97437ee2e4c6e944154813bb48b1740b182220df
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
The responsibility for writing the pid file should lie with the init
system, not the application itself.
This was also broken by the recent instance ID/shared memory ID rework;
the pid file was named based on the pid, making it fairly worthless.
Change-Id: Ifb4f2d3ce5cf132f2c2e8bd3d0ba605ff8ccd8fe
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This was added by mistake in commit 18d26e42a3 ("env: Move DPDK
intialization into the env library."). It is always dead code, because
shm_id is set to getpid() right above it, and it will never be -1.
Change-Id: I19c798a87bf7a3b12547d772b981b038857abcaa
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This patch makes spdk_scsi_lun_construct behave as documented.
spdk_scsi_lun_construct will return only newly created LUN.
If LUN with that name already exists, NULL will be returned.
Unit test relevant to this behaviour is now changed to show
this functionality is now working.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I002903d6f96555c638aba3fa99cc2c2504ced603
This is necessary to prevent claiming the same LUN twice
and properly cleanup in case of an error during spdk_scsi_dev_construct.
This patch addresses three issues:
- spdk_scsi_lun_claim error is correctly handled in spdk_scsi_dev_add_lun
- on error when constructing scsi dev, it is now correctly removed along with attached luns
- spdk_scsi_dev_destruct not only unclaims, but calls spdk_scsi_lun_destruct on each lun in dev
Unit tests relevant to this behaviour are changed to show this functionality is now working.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I111c320f875e5003e3f1f7748a2630097301ce1b
This patch adds two new unit tests for scsi device:
- creating two different devices, each containing the same lun
- creating one device, with the same lun twice
As noted in code, three asserts are incorrectly set to show functionality
that is not working currently.
Next patch in series implements that functionality and changes asserts
in the unit tests.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I2645401fee4f2cd986458e0a4db108ce4e1bf9db
By default, all SPDK applications will not share memory.
To share memory, start the applications with the same
shared memory id.
Change-Id: Ib6180369ef0ed12d05983a21d7943e467402b21a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Funnel all of the return paths in the main parsing routine through the
code that sets the *end pointer so that all error cases set it.
Change-Id: I0565913f7b9488470ede79dc1af84eb4b9a03225
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The direct and virtual mode code is identical; move it to session.c like
the other virtualized get/set features.
Change-Id: I0a0e2dd795197c142ad5d9d0e4ddedb2aa5c8c2a
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Even for direct mode, each session should use its own
async event configuration like virtual mode instead of
passthrough.
Change-Id: I9c1175f3677c672c0cad684341b8a46a575d753e
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Instance ID is too overloaded and the uses are beginning
to conflict. Separate the RPC configuration out.
Change-Id: I712731130339fee4fc8de4dc2d0fea7040773c58
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The host and port output parameters point into the (non-const) char *ip,
so it makes more sense for them to be non-const as well.
This allows the flexibility to pass non-const char pointers as the
output parameters, which will be used in the nvmf_tgt/conf.c parsing
code.
Change-Id: I1d5b102fc389c06d36432904e4fda944437b659e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
For namespaces with end-to-end protection information, metadata size
of exactly 8 bytes, and extended LBA configured, the NVMe driver would
calculate the size of the data block incorrectly. The NVMe spec has a
special provision for this specific case (8-byte metadata only) and
PRACT = 1 that requires that the host does not send the metadata as part
of the host memory buffer.
To fix this, clean up the calculation of the per-block data transfer
size by adding a new extended_lba_size field in the namespace, which
represents the total size of data to be transferred per block based on
the namespace's configured metadata size and whether it transfers
metadata as part of the data buffer. Then add the special case for
PRACT = 1 and PI configured and extended LBA in the R/W helper
functions.
Change-Id: I0b383a58c773cac06e6c018858b57129064c6059
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
These were repeated a few different places, so pull them into a common
header file.
Change-Id: Id807fa2cfec0de2e0363aeb081510fb801781985
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This removes one addition from the submission path (negligible, but a
nice side effect), but also opens up the possibility of reporting the
total time an I/O took - since we are always tracking the submission
time anyway, there is no extra cost to report it in the completion
callback.
Change-Id: I7129e7c09d20da8082042a7622d045846461dd9c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The phenoemon is that we can not shutdown the nvmf tgt.
The solution is that we need to adjust the shutting down orders of
nvmf tgt subsystem and rdma trasport layer.
Change-Id: Ie39657370b1574960e0ee7cf604cc5872db0bed3
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
This function parses in place by inserting null terminators.
Change-Id: I61cb97b87ec05d0183fbaa993fd3d7580a188bde
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Split the ref_count field of the bottom level of the vtophys map tree to
a separate array so that the pfn_2mb field can be expanded to a full 64
bits again. This doesn't change behavior for the current use as a page
frame number; it is setup work for storing an arbitrary 64-bit pointer
value in the bottom level.
Change-Id: I0bc44df3edc9df4a479229d69c2f3884d43a340d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Since the io_channel will be passed to the underlying bdev's
read/write/... functions later, we need to also acquire an io_channel
for the underlying bdev, not for the virtual bdev.
Change-Id: Ica13076973fef875ea636770fce8eb27017aa1c3
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
These strings are not modified by the functions they are passed to, so
they can be const char *.
Change-Id: I11532f232990a305d706c14aac1b0f8f93b8f576
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
For infinite timeout states, instead of printing UINT64_MAX as a
decimal number, interpret it as "no timeout" instead.
Change-Id: I579f5857f96286734940ab5f493261e60354c4fe
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The PCIe transport initializes the quirks directly, so the generic hook
to get PCI ID is no longer necessary. This path was dead code.
Change-Id: I25bdaa598db53e4312a264d9d8356d1b416696e5
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The logic to fail queue pairs when the controller is failed should be
handled in the generic code, not in the individual transports.
This also allows nvme_qpair_fail() to be private to nvme_qpair.c.
Change-Id: I6194576dceb35073b9af8847e59314900028637c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Avoid a runtime check for the rte_ring type - we know that the event
ring is multi-producer/single-consumer at compile time.
Change-Id: I5d42aee9c635db86e545b661361a68818d80961d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Avoid accessing the internals of the bdev_io from outside of the bdev
library.
Change-Id: I01dfc38b2520353ad42bcd8587b90f197eadf101
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This is more CPU efficient than only grabbing one
completion per call to ibv_poll_cq.
Change-Id: I0c70d33639f0f345482d9e7c810f9c6723937058
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This virtual block device takes an underlying block device and splits it
into several smaller equal-sized block devices.
Change-Id: I6f6e686c1177b2e4885f7e88809ad329caae55bd
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
These were only intended for testing and should be replaced by a virtual
blockdev that can be layered on top of any kind of bdev.
Change-Id: I3ba2cc94630a6c6748d96e3401fee05aaabe20e0
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The work item queueing code was replaced with the current reactor/event
model, but the block comment above _spdk_reactor_run() wasn't updated to
match. Replace the pseudo-code with something resembling the current
behavior, and delete the outdated paragraph below it.
Change-Id: If0686c6a5d063f56d8ea3df9bf3a1e98eef40207
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
When we do frequent same subsystem add/delete,
we will face the adding issue. For example,
1 Add subsystem A
2 Delete subsystem A
3 Add subsystem A (Fail in this step).
The reason is that we did not correctly free
the listener resources of subsystems, and this patch
can solve this issue.
Change-Id: I6765a306a3f10c9a0f38c95dbba12e2a4073e705
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Detect whether the specified DPDK directory contains static or shared
libraries, and use the appropriate extension when building the library
list. Static libraries are still preferred.
Change-Id: I78c68fd38fba1ea42dd605fb77209651f8cdca75
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The $(ENV_LIBS) variable was including system library linker arguments
like '-ldl', but $(ENV_LIBS) is intended to be used as a dependency for
other Makefile targets, and those arguments don't belong there.
Add the system library linker arguments to ENV_LINKER_ARGS instead.
Change-Id: I247264d287047f1423365806042982b492eec311
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
In our previous code, we did not ack the event in
exceptional cases when we get a event via rdma_get_cm_event.
Thus, the code may block with in this statement:
rdma_destroy_id(rqpair->cm_id);
in some exceptiaonal cases. And this patch will solve this
issue.
Change-Id: Iddb6fb5356a5ee0ed04e261a040ba53042fca302
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
This make sure the qpair failure could be started from upper level application.
Change-Id: I7e04fe36929cc634ddf0078db96fbc40afb38f8c
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
Instead, check them every 5 iterations by default.
Change-Id: I9c42922868f8e965a0c801109e59e06aff5adf62
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Simplify and remove a direct call to a DPDK function.
Change-Id: I08eaf86a48df67e3248eeaa764ae924b784d9277
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Store each reactor's per-socket event mempool in the spdk_reactor
structure to avoid calling rte_lcore_to_socket_id() on every iteration,
and make the function definition an internal, inlineable version
that takes the reactor pointer directly.
Change-Id: I841f7d7594308d7c572f5b7f609913c428bd13d7
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Minimize the number of times spdk_get_ticks() is called
because it is expensive.
Change-Id: I2f34ca724ec28f42866b76d224dacbe1f31e7a41
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
64k sessions over the lifetime of a single target is something
that really could happen, so handle this case.
Change-Id: Iaed92b9ff6cd078fcd7c1efe88cf0c860c77c4ac
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
For iscsi read/write, expected_data_xfer_len
is 0, dxfer_dir is set to SPDK_SCSI_DIR_NONE.
But we can still have read/write op in SCSI layer.
This patch solves this issue.
Change-Id: I950e163fffb06fefaf8a913d1f6de29c96a52264
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
The g_thread_mmio_ctrlr should be not NULL pointer when it enter the
handler function.
Change-Id: I45dba601c672b16e2c6feafd9059bafde0d8f1b4
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
If namespace is formatted with per lba metadata feature and also disable end-to-end protection
feature, host couldn't use per extended-lba metadata area.
Signed-off-by: Zhihao Zhang <thomas.zzh@alibaba-inc.com>
If the user asked for a specific PCI address in spdk_nvme_probe(), we
need to return 1, not 0, for the other PCI addresses that don't match
when enumerating. 0 means to attach the PCI driver, whereas 1 means to
continue enumerating.
With the previous behavior of returning 0, all NVMe devices would be
attached to the DPDK PCI driver, even if the user did not request for
them to be probed, and further calls to spdk_nvme_probe() would not find
any devices.
Change-Id: Ifbbcd7d1abe8ab535b6957855172e66a3e69fbe4
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This is not actually optional - it contains required
information for setting up the connection.
Change-Id: I21136de12794a0f4f5c14c5d3e2e3f2306c5c102
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This isn't used anywhere yet, but it will be for
NVMe-oF 1.1.
Change-Id: Ieae0688e6ad5b7a44568e5760382b5716b02e6f0
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The code doesn't actually use this property of cntlid
for anything yet, but we will need it later.
Change-Id: I5fd514d75b903cc8769e7b9f196a4624e9cf876c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This is necessary to process asynchronous events, as well as keep-alive
support for NVMe over Fabrics connections.
Based on a patch by Edward Yang <eyang@us.fujitsu.com>
Change-Id: I3e81f3d5061f75b12b625fa1a06629c6dc3dc61b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
There is only a single device ID for all channels on the SKX
implementation of I/OAT.
Change-Id: I90ee79b1b673a199754f1ca4c9e38e934294e261
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This prevents the need for bdev users and modules to manipulate the
internal bdev_io error.nvme fields.
For now, all non-NVMe error types are treated as a generic device error,
but translation from SCSI to NVMe could be added in the future.
Change-Id: I4e831b26a2f41bf2f405c7576d5019bb898d4d1b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Currently we use the pci functions provided by DPDK,
it identifies the device by class id related
info but not by pci bdf info, so we can add the filering
by pci_addr in pcie_nvme_enum_cb function.
Change-Id: I5942e98853f00fc10fa6aae5c113517653d1b357
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Since nvme_ns_cmd.c now walks the SGL, some of the test code
needs to also be updated to initialize and return correct values
such as ctrlr->flags and sge_length.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I521213695def35d0897aabf57a0638a6c347632e
Convert the number parsing function into a linear sequence with a goto
label for each state, rather than a single loop with a state variable.
This makes the code easier to read and also improves speed (better
branch prediction and smaller inner loops for the common case).
On my test system, jsoncat citylots.json > /dev/null improves from
~1.7s to ~1.2s.
This changes behavior of some number parsing test cases: inputs matching
the number grammar as defined by JSON will be returned even if there is
trailing garbage, consistent with the rest of the parser. For example,
the input 01 will be parsed as a valid number 0 followed by trailing 1.
This only makes any difference when the full input is a single
number value, since if the value was nested in an object or array, the
trailing garbage will not match the expected syntax and the whole parse
will fail with SPDK_JSON_PARSE_INVALID (e.g. [00 will parse the first 0
as a number and then fail on the second 0, since only a comma or right
square bracket would be accepted).
Change-Id: Ifabfaed611219b3e0a06c8677190a28b87e8a13b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This improves output speed significantly, especially if the write
callback is expensive (e.g. issues a syscall or takes a lock).
On my test system, jsoncat citylots.json > /dev/null improves from
~2.8s to ~1.7s.
citylots.json: https://github.com/zemirco/sf-city-lots-json (~181 MiB)
Change-Id: I7d411ce92366712ed87ad5fc6e9b64828541db4d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
If a blockdev module calls spdk_bdev_io_complete() within its
submit_request function, and the user's completion callback issues a new
I/O, it is possible to cause infinite recursion, consuming all available
stack space.
To avoid this, track whether a bdev_io is being processed by
submit_request, and if io_complete() is called in this case, defer the
completion via an event.
Change-Id: I6ccdb8ed4ee0d5738e6c9840d35431de52bd5fa2
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Preivously, we only supports probe the NVMf target
via discovery info, now we can support to directly
to connect it.
Change-Id: I08ce1d95de6744286357e68b48c97b773b902ac8
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
I do not see any reason to ignore using this channel. If that,
we should give comments in the file, otherwise we need to add it.
Change-Id: I56ad491c67a23831befc8c761ad0a02e721a15a4
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Because of the addition of io_channel support to the bdev layer, there
is no longer a need to re-run a completed I/O through the submission
event pipeline; it can be freed directly.
Change-Id: I2b9163c87293345acf0e85f6d0c1032f30209659
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Include timer-based pollers in the active/idle check that uses
last_action to determine when a reactor last executed an action.
Change-Id: Ib8f1253675b57aeb59206d099c6257f6d07f5acf
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
One microsecond is not really long enough to detect an idle condition
where calling the OS usleep() makes sense. Increase the minimum time
spent spin-waiting on events and pollers from one microsecond to one
millisecond.
Change-Id: I678118e357330f133251f4cfada8ff27e10158a5
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
When a connection enters full-feature phase and is assigned to an lcore,
we need to increment the counter for the new lcore, not the connection's
existing lcore.
Change-Id: Idced4090b6e8ac35a767fd223fbd81ba824615d3
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
Claim the block devices used by iSCSI LUNs and NVMe-oF subsystems so
they can't accidentally be reused.
This will also be used by virtual block devices to allow layering of
bdevs.
Change-Id: I5384923fbf24f13f4ce720a797c5a628053d49f4
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
(1) Add nvme_rdma_build_sgl_request function
(2) Merge nvme_rdma_pre/post_copy_mem to nvme_rdma_copy_mem
Change-Id: I86abab821b32b4da0aa9489a6b9f7dc430333159
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Use a plain function pointer + callback context for the bdev I/O
completion callback. This is possible now because each I/O channel will
be polled on the core that submitted the I/O.
Change-Id: I29ee8e4a3430df11c74845adab840395b9bc5010
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
An old prototype SPDK AHCI driver would return
TASK_SET_FULL if all NCQ slots were full on a given
disk. This would kick the SCSI task back to the LUN
to be retried later. Since then, we have pushed
responsibility onto the bdev modules themselves
to handle this kind of queueing/retry logic.
Removing this logic allows us to make some additional
changes that enable tasks to get completed inline without
an extra event callback to handle completion. We also
no longer need to worry about checking if pending tasks
need to be executed in the complete_task() routine, since
the execute() routine will now always exhaust the pending_task
list.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: If2dc3ab017e0dbc225c8f627e1f87c5a8e9b1e3e
Now that the hotplug code is isolated in nvme_pcie.c, it can call the
PCIe transport attach function directly.
Change-Id: I2df3b9168473b537cc9b13367e06d3d3b6fa22be
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The reactor structures are allocated in a contiguous array, and each
reactor is accessed from a different core, so align the reactor
structure to avoid false sharing.
Change-Id: I95162620ccb58fae060b2d95e47a38621dfbd140
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
It is private to lib/event/reactor.c and does not need to be exposed in
the global namespace.
Change-Id: Idfff0365a0afdd90a0567825d520adf61d99fd2b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Previously, we did not calculate the ref for the LUN.
Change-Id: If2b7bc7d129e7efd994a7987ae2c421048969acb
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
An SGE could be for a payload that is greater than the NVMe
devices MDTS (i.e. 128KB), but that SGE may not be aligned
on a sector-size boundary. We can safely assume that each
iov is individually physically contiguous - the DPDK
mempools for example guarantee this.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I8143ed01814c3154d0a06b8bbc548484437c1e88
The spdk_nvme_qpair::num_entries value is never used in the common code,
so move it to the individual transport qpairs to make it clear that it
is a transport-specific implementation detail.
Change-Id: I5c8f0de4fcd808912ba6d248cf5cee816079fd32
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The 'next' event pointer was never used in the entire code base (always
NULL).
Change-Id: I75f999d3a2e10512d86edec1a5a46ef263e2635b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Use 'struct spdk_event *' directly for consistency with the rest of the
API.
Change-Id: Ib41a9bf47f5b18f4aebf5f4dee055455cb12ef7d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This allows the elimination of the spdk_event_get_arg1() and
spdk_event_get_arg2() macros, which accessed the event structure
directly; this was preventing the event structure definition from being
moved out of the public API header.
Change-Id: I74eced799ad7df61ff0b1390c63fb533e3fae8eb
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The public API user is supposed to retrieve the defaults via the
spdk_app_opts_init() function.
Change-Id: Ie2bd6e809b2d47dbd5d62d396e8715f89f4052d9
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The spdk_poller_register() function provides a way to pass an event to
call once the poller is registered, but it is always NULL in the current
code base.
Change-Id: I459bf40ae4d050589577d113b7984f1563aaa9cc
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The event->next field can be accessed directly from within the event
library implementation, and public API users should not be using it.
Change-Id: I98a1f0017e03e951d0c4eee3c7989b04324e57d1
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
It is only used within bdev.c and can be static.
Change-Id: Id6e2cd9e5dd61a3ef1e1a27993d7a5ea7728bff2
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This is consistent with the other internal-only API headers.
Change-Id: I2c4748977d38a6c173311d26197d6273c168da7d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The definition of SPDK_UNREACHABLE uses the build-time DEBUG definition,
which is not available in the public API.
Change-Id: I1862c99fa5c85ccd3483f94e9c35de531da57f3c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Instead of passing the work completion, just pass the
response index. This keeps the work completions localized
to the polling function.
Change-Id: I0e6a1d8564200b5ac3aa43dfd58ae152d439bbd8
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This eliminates an if statement, since the two callers
of this function know the desired queue size.
Change-Id: I28fabac8613f7b8fc7d96cf95b085b6e4dcf985f
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Just call the regular qpair create function instead.
Change-Id: Ic35b1eb6fcdf0d82733ea573a493f583dd63d5bd
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Use the num_entries value in the generic qpair instead. These
values had to match anyway.
Change-Id: Ia6400fbaba97df3ef6db4dc07a2ab95af1e5143f
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change the reactor name, replacing space with underline.
Since Linux system didn't recommend file name with space.
And when reactor crashed, the core dump file name has space in it.
Change-Id: Iba36ba7903c95db09a9decbc023a01e5e6ab18b4
Signed-off-by: Liang Yan <liang.z.yan@intel.com>
Avoid an extra level of pointer chasing when we are filling out the NVMe
SGL.
Change-Id: I1a40af16fda80f7480c419524876bfb1a1902eb8
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This will allow it to be better be reused for some future patches
enabling splitting of non-PRP-compliant SGL-based requests.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ica38fd6cf191f72baa524bcc4896b3c9939ab762
This intermediate function is no longer needed.
Change-Id: I3523cc6d8f3b290165a953d42cca8b76eda762c5
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Sending the fabric connect command is part of establishing
a connection, so move it into the main connection-establishing
function.
Change-Id: I55e7ffdd16b576c81b51d7d3910203f9afc1f4c2
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This function initializes the members of an existing
qpair struct. It doesn't construct one from scratch.
Change-Id: I0b9afac1ad25cfb217efd146702f693c74f5f697
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
No need to allocate all of the requests and responses until
we know a connection can be established.
Change-Id: I072a10aadfd7ced773634448f7d7e788622d0a4c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The code is clearer if this function is incorporated
into its only caller.
Change-Id: I33901cddf80ae27896b2acfd1b9e7d212f21f5f3
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This is resolving the address and route to the target, not
binding a socket to an address.
Change-Id: I80055481ed2e020410a1e186a4e7371b60faaee9
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
They were very close to the same already, so finish the job.
Change-Id: Ifba9e3b2d11a3e70cbfbe46f57a67552db2757ed
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
We should be sending the bounce buffer's remote key to the target so it
can put it into an RDMA SGE on the remote side.
Change-Id: Icded155ad2292c67baa722f001c9c07178bc2754
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
There is no particular reason for this to be 127; make it 128 to at
least be consistent with the PCIe transport.
Change-Id: I60500e0044d3549ba6350e1f35f09d624848bd21
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This function was only called from one place and saved no
lines of code.
Change-Id: If5e653732df57c1f2c93e20cf4f286eac31df91c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This matches the behavior of nvme_pcie, which queues a request if no
tracker is available.
Change-Id: Idbf6c951c89451cfea22ec6bc553ff46f988f818
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Make the caller pre-allocate an rdma_req and change req_init() so it
only does initialization, not allocation.
This is necessary to distinguish between rdma_req allocation failure and
other types of failures, which will become important in future patches
when requests will be queued if rdma_req allocation fails.
Change-Id: Ie6edebc1b5f05001b42fc959a29ce0ea6875e41e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Simplify the control flow and match the name of the function to its
purpose.
Change-Id: I65bad7e3b2ef710ca29eff9799b8dcaae3999315
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Make the qpair construct functions private to the transports - it
doesn't need to be called from generic code.
Change-Id: I5f730a4bcf60ce231fe27bc8f4c3c39cb647dd2d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Add a transport callback to return the maximum queue size, and enforce
it in the generic nvme_ctrlr layer.
This allows the user to tell what io_queue_size was actually selected by
the transport via the ctrlr_opts returned during attach_cb.
Change-Id: I8a51332cc01c6655e2a3a171bb92877fe48ea267
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Equivalent to commit 6ab28a201b except now
for commands instead of responses.
Change-Id: Ibe4382dc0f65c1b90c2cee2ad285bbdd21b96a89
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The only field of bb_sgl that was actually used is lkey, and that is
already stored in bb_mr.
Change-Id: I790369a06ce223f88e356df20a9d9a74a93ff225
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This also changes the default listen address from 0.0.0.0 (accept any
connection) to 127.0.0.1 (accept only connections from the local host).
Change-Id: I3de09c582c95126d240795550a56be7aedea639c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Generate the full discovery log page in a memory buffer, then copy just
the requested part of it for each Get Log Page call.
Change-Id: I12730c59c0395cdac57aaab96337e938952e3011
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Refactor the discover log page processing into a loop that calls a
function for each log page entry. This sets us up to add support for
multiple Get Log Page calls to handle larger discovery service lists.
Change-Id: I85676ada375d0dadda2a3f4ab6331123ac7aaf60
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Note that the offset is not actually used yet, just sanity checked.
Change-Id: I9464dc934e94e3d38ac0d474fce876552650f92b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This allows hosts to determine when the discovery log page has changed
when reading it across multiple Get Log Page calls.
Change-Id: I3c3459959c6246a88938e4f82e3e0046419e7d00
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This keeps the existing subsystem list (and therefore the discovery
service log page) in order when new subsystems are added dynamically.
Change-Id: I071639be0fef4139f8f017b433185c786ae55378
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This isn't used yet in the NVMe library, but it will be necessary later
for supporting non-IPv4 addresses.
Change-Id: I167ce63ad25b0e0c9aa192b12d764c8d078e67f9
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This better describes what the field controls (it does not affect the
admin queue size).
Change-Id: I851ae46fb4ed0fce819af07ae235824e0fc817e6
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
For requests split in _nvme_ns_cmd_split_request() the payload offset is
set after children are created using recurrent call _nvme_ns_cmd_rw().
This makes impossible to reset SGL to proper offset in
incomming patches that split non-PRP complaint SGL requests.
To change this the payload offset is set after each request is allocated
in _nvme_ns_cmd_rw() not in _nvme_ns_cmd_split_request().
Change-Id: I9d3b2e3bbd9d93a4c8a37e1db8c4e01276e2cacb
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
This is preparation for handling non-PRP compliant SGL.
Change-Id: I445790f9802292971256cf821d9730814c95a073
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
This is preparation for handling non-PRP compliant SGL.
Change-Id: I49c3745498411c5ff9e17cd08f181d4d434c2d08
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Comply with the definition format used by other bdev
modules
Change-Id: Iac108bac540687b32fea4bb70374c22534c60aa0
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
The vendor ID for Intel shoudl be "INTEL", according
to the following page:
http://www.t10.org/lists/vid-alph.htm#VID_V
Change-Id: Ib9611e5604c8b5e3eaec8101548aaf4a3c45597a
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Don't leave garbage from previous discover entries in the trid we are
returning to the user.
Change-Id: I60ae5932db4a95cedb8df1ff98a2479220b55ce4
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The other simplifications to probe_info and trid made the
trtype argument redundant.
Change-Id: Ie7bea4e2204e690dc4909eeacd065e0722b53272
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The probe_info was reduced to just containing a
transport_id, so remove probe_info entirely.
Change-Id: Ica9a22d126cd14e282decd3eea1a0afe0460f099
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This can be obtained by parsing traddr into a pci_addr,
then getting a handle to the pci_dev and asking for all
of the pci information.
Change-Id: I1948cbd3ec65611293192ef5558ace19dd444d4c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This function will return a device handle from a pci
address.
Change-Id: I323d92c71014ef571f3df9f19c2ec887844707e8
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This bug was preventing multiple calls to spdk_nvme_probe() from
working, since the first call would return 0 from all of the DPDK driver
init callbacks and prevent other devices from ever being enumerated in
subsequent calls.
Reported-by: Tsuyoshi Uchida <tuchida@us.fujitsu.com>
Change-Id: I871aa170bbd03be111604eeabe3a7a7a4f40ce89
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Use the standard quirk mechanism to specify which devices
need software assisted striping.
Change-Id: Id8156876a90b4caf9d687637e14c7ad4a66ceda6
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The emulated NVMe controller exposed by virtual subsystems does not
provide the Intel vendor-specific commands and behaviors, so it should
not use the 0x8086 vendor ID.
Change-Id: Iab4f0513d30f610feb62b1899da1b6316f11691c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This way, all new controllers discovered will be initialized
in parallel.
Change-Id: Iebedb3905eb2787a3708f74425afae40ca31253d
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
If the first call to spdk_nvme_probe probes a device and
the driver elects not to take it, still call the probe
callback for that device on subsequence calls to
spdk_nvme_probe.
Change-Id: If06467cf6796c827a0bbfba6e36d5b91534526fc
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Move this down a level so it happens on all paths.
Change-Id: Iea9913f0e102353882466c8dea4ee39abb857520
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Scanning the transport may result in both new
devices and removed devices, so pass the callback
for both operations.
Change-Id: I6f73dbe6fd7cf61575c354b43f8ae3e2a01e2965
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Simplify the arguments to nvme_transport_ctrlr_scan to take
a transport id that identifies the discovery service (or
NULL to scan PCIe).
Further, separate scan into two functions - scan and attach.
Scan is for scanning an entire bus, attach is for a specific
device.
Change-Id: I464f351a02a04bc5a45096dcf5dc8fc5ac489041
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Instead of repeating the fields, just embed a transport_id.
Change-Id: I282704c9d59784abd5f7c93be4e47c673fcf6dde
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This is a small step toward making discovery more like
scanning a local PCI bus.
Change-Id: Ie7149ad060f2eeb56939b1241187bdf09681f2aa
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Before adding readv/writev support in nvme_rdma,
using this patch.
Change-Id: I25ff0df61d0346f22560d011158d7f80e72007ea
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
We cannot quit the process when user did not Logout from the session,
because the active connections always bigger than zero. User cannot
use Ctrl+C to quit SPDK iSCSI target. Add a new state to connection
to avoid destruct connection more than once.
Change-Id: I8efa79aa47534bd6ead965713769f751d9802e47
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Drop the complicated buffer size/strlen math and just split the version
string formatting into two cases depending on whether the tertiary
version is set.
Change-Id: I4b4983cb8805e8734c408f473dd8c592ec8e8138
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The printf # specifier adds 0x for %x values, but the field width then
includes the 0x part, so for example printf("%#04x", 0x1) prints "0x01"
rather than the intended "0x0001".
Rather than increasing the field width, just manually insert the 0x in
the format string and drop # for less confusion.
Change-Id: Ie6044619a22b51b39562bfa5c0c0239933bf38c8
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
NUMDU was added with NVMe 1.2.1 and allows a larger log page size to be
described.
Change-Id: I1a4ac42393c1a21175b3564980d56b6e7a6ae80d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The NVMe over Fabrics transports should already be setting this in the
initial admin queue Connect command, so setting it again is not useful.
The kernel NVMe over Fabrics target additionally has a bug in the Set
Features - Keep Alive Timeout handler (it is extracting the KATO value
from the wrong offset in the command), so this works around the kernel
bug by not sending the Set Features command at all.
Change-Id: I0d7f09b71fcea116acf8810c5880157bb9315a04
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
The reason is that kernel nvmf target will check the
value. If not set, it will fail the other commands later.
Even for discovery ctrlr, kernel nvmf target will
check the cc value.
Change-Id: I998327f91ba96281d261952878eb84d648a823da
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
It's not the whole transport - it's just an enum for the
type of transport.
Change-Id: Ia435a21792f221ddf50ddf4f0923c6152622eccb
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
When we enabled the ERL1 configuation, for the DATAIN task release
process, we will queue the task to the SNACK list firstly, and then
remove the list when got ACK from initiator, but for this part of
logic, the reference count of primary task was not released correctly.
Change-Id: Ic5959cf644c74f676be0b84c5650292dc426b2d8
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Change it according to the spec thus we can test
kernel nvmf target
Change-Id: Ica98dd40503a40c0f0de8efaefb1f6f67a89cde8
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change the PCI enumeration API to individual functions per device type
so that only the drivers that are actually in use get linked into the
final executable. All of the common code is still shared internally in
the env_dpdk library.
Change-Id: I2ba83afe59202a510f999a0674e23e60b6581221
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
It is not necessary, and it prevents the linker from removing unused
object files.
Fix the iscsi_tgt Makefile's library order so that env is added at the
end after the libraries that use it.
Change-Id: I241eb46703c12691444037a350be65143259e82e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Add the following infromation.
- PCI Address
- Vendor ID
- Model Number
- Serial Number
- Firmware Revision
- NVMe spec version
- Namespace sector size
- Namespace total size
The user's remove_cb should detach the NVMe controller when it can
ensure that it is no longer in use. In the interim (between remove_cb
and spdk_nvme_detach()), the controller will remain in a failed state,
so any new I/O submissions will return an error code but not crash.
examples/nvme/hotplug is not yet updated for this change, but that will
be done in a separate patch.
Change-Id: I8827ba36f9688ccb734e7871f20f11ec11e88f96
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
While we're here, fix up typos and add error logs for all error exits
in nvme_rdma_qpair_connect().
Change-Id: I236fe6571c2012ca047aa8a447638d9227454c2f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This version of multi-process support needs to have DPDK 16.11 builtin.
Change-Id: I3352944516f327800b4bd640347afc6127d82ed4
Signed-off-by: GangCao <gang.cao@intel.com>
The discover and probe 'nqn' fields are subsystem NQNs, so name them
subnqn to be consistent with the spec and the rest of the code and to
distinguish them from host NQNs.
Change-Id: I4a80fbc1f4b037c8a4f91c8f28d2a96e47c66c47
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Allow the host NQN to be overriden when connecting to NVMe over Fabrics
controllers.
Change-Id: I8fcf2e89ae7d9722677e834f76a8fe805c52f91b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This makes the function and file/line info actually useful (instead of
pointing to the helper function itself).
Change-Id: I22bac68827115880a49d456706a7eaecdc12e9b5
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Each transport should handle its own qpair cleanup internally.
Change-Id: I7dd737be820ea6bad686f4aad7d74044fad58a47
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Let the transport access the controller options during
ctrlr_construct().
Change-Id: I83590c111e75c843685dd9315f0f08416168356d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
nvme_rdma_req_get() is an internal function, and its only caller already
checks for a valid rqpair, so the NULL check is unnecessary.
Also clean up the redundant STAILQ_EMPTY/STAILQ_FIRST logic and use
STAILQ_REMOVE_HEAD.
Change-Id: Ic3828e8b5e881879173cb59350e39c5fac90e6ef
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
nvme_rdma_pre_copy_mem() does not have any failure cases, so remove its
return value and remove the never-taken branch in its only caller,
nvme_rdma_qpair_submit_request().
Change-Id: I91011734ed0c20f8db691d62172fe1a3021dd3a1
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
nvme_rdma_req_put() is an internal nvme_rdma.c function, and all of the
callers already have the rqpair, so pass it directly. We also already
verify that all of the callers have a valid rqpair and req before
calling nvme_rdma_req_put(), so it doesn't need to check for NULL
pointers.
This also means that spdk_nvme_rdma_req doesn't need to hold a pointer
to its rqpair anymore.
Change-Id: I893a46a9074f0a843e379d10c123f9292eb3b1a4
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The only place where outstanding_reqs was checked was in
nvme_rdma_req_put(), but the error case there could only happen if some
kind of internal programming error occurred (e.g. calling
nvme_rdma_req_put() on an invalid request).
Change-Id: I71e40ce562a8720dfaf70437ffd4c6493327c091
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
nvme_rdma_ibv_send_wr_init() was only called in one place, so just move
its contents into nvme_rdma_qpair_submit_request() since it allows
simplification of the code:
- req was always NULL, so remove the code that used req entirely.
- wr and sg_list are never NULL, so remove the checks for those.
Change-Id: I12a4f3502219d3681607686945e343f6808c0d2f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
We currently don't handle discovery service referrals, so skip those, as
well as any other unknown subsystem type.
Change-Id: I64f889e9272fb57b5cf9bb5467b3abca3955baf5
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
QEMU's virtual NVMe controller device does not support the AER Set
Feature, so ignore its failure and continue.
Change-Id: I8b5c217a3112edabb6f76ec3e5f4ef774981a1d7
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
Catch SIGBUS and handle it by remapping new memory into the
location where the BAR previously was.
Change-Id: Ie8d00a60a0bbe7f7ec57a5c39c0a63c5d9443206
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
These functions will attach or detach from a PCI device. Attaching
typically means mapping the BAR.
Change-Id: Iaaf59010b8a0366d32ec80bb90c1c277ada7cfe7
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
This patch make sure the connection in normal state before any further
operation on this connection.
Change-Id: I776740b5b33b1de6707990c09d9131c385adf556
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
spdk_nvme_probe frees ctrlr when nvme_ctrlr_process_init is failed. But
ctrlr has already been freed while calling nvme_ctrlr_destruct. So
spdk_nvme_probe doen't need to free ctrlr.
The generic NVMe library controller initialization process already
handles enabling the controller; the RDMA transport should not need to
set EN itself.
For now, the discovery controller is cheating and not using the normal
initialization process, so move the EN = 1 hack to the discovery
controller bringup until it is overhauled to use the full
nvme_ctrlr_process_init() path.
The previous code where CC.EN was set to 1 before going through the
controller init process would cause an EN = 1 to EN = 0 transition,
which triggers a controller level reset.
This change stops us from causing a reset during the controller
startup sequence, which is defined by the NVMe over Fabrics spec as
terminating the host/controller association (breaking the connection).
Our NVMe over Fabrics target does not yet implement this correctly, but
we should still do the right thing in preparation for a full reset
implementation.
This patch also reverts the NVMe over Fabrics target reset
handling hack that was added as part of the NVMe over Fabrics host
commit to its previous state of just printing an error message.
Change-Id: I0aedd73dfd2dd1168e7b13b79575cc387737d4f0
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Most of the NOTICE level messages should have been TRACE.
Change-Id: Icbc4d398ab2580cf3a2349be11441b7a09603020
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Verify that qpair is not NULL before doing pointer math on it.
The NULL check after calling nvme_rdma_qpair(qpair) would not
trigger if qpair was NULL.
Fixes a crash if the Connect command failed, causing
nvme_rdma_ctrlr_create_qpair() to return NULL.
Change-Id: I158a5b1752892a7d5a72a9ac20c0c5b2cd781a81
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This allows us to print better error messages when connecting to a
subsystem that exists but does not allow a specific host.
Additionally, we can now return the correct error code for a host that
is not allowed.
Change-Id: I16cd4ac2745cf50bb54601b464b0d23954f86fda
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Official installs of DPDK place headers in a 'dpdk'
subdirectory under include, so detect that.
Change-Id: If64421c84c91cae31688994484c22fce398dc622
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The status.done flag polled by nvme_ctrlr_set_keep_alive_timeout()
was never initialized.
Change-Id: I323fae5f4ce12209a9699965ce07894bc3c6205a
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The code in virtual.c and direct.c was identical - move it to session.c
to share it.
Change-Id: Ic6e4e9238e8ffacb212e76293c440109aa839f8c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Move the current Virtual mode implementation to session.c and use it for
Direct as well.
Change-Id: I3f0ac93b4247b93d158b0dcb77e257b4b91be129
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Store the host identifier from the Connect command and report it via Get
Features.
Change-Id: I79bc27e05c5944549e7986aadb919c19748e7474
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Also return Invalid Field rather than Invalid Opcode to be more
accurate. The spec doesn't seem to define any more specific error code
for this case.
Change-Id: I992c6cca3020ff80b8495c71170222bc75316800
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
None of the log pages are actually implemented yet, but at the very
least, we don't want to leak random bits of uninitialized data.
Change-Id: Ic889260eb18d49122f2f250b645bdc5be3561dc5
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Use the NVMe over Fabrics spec definitions for TRTYPE rather than the
internal library transport type.
Change-Id: Idead559a8f8d95274fc580d10e82033822e6eda8
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
These need to be available for the lifetime of the probe_info structure,
so they can't be pointing at e.g. temporary buffers on the stack.
Change-Id: I5aaa898acf9314aab51600dd756f966965d37fd0
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
-Wformat-nonliteral needs to be disabled since clang triggers it on the
call to vsnprintf() now that it is nested two calls deep.
Change-Id: I228b9d099cfc2b65181941cbb4798b7f8eae3baa
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This is a counterpart to spdk_strcpy_pad() which determines the length
of a string in a fixed-size buffer that may be right-padded with a
specific character.
Change-Id: I2dab8d218ee9d55f7c264daa3956c2752d9fc7f7
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
It always points to the same internal RDMA request complete function, so
just call that function directly.
Change-Id: Ic1fb6236bf43eaad62413df77d43be9ab855e5c7
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
We can't transfer more than the bounce buffer in a single command, so
report that rather than some bogus value.
Change-Id: I39b147916dcc2ee478470917298763a239a6a35a
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Record the user-provided asynchronous event configuration set via Set
Features, and return it in Get Features.
This value is not actually used, since AER is not implemented yet in the
virtual controller model, but it at least implements the mandatory
Set/Get Features.
This allows the hack in the NVMe host code that ignored the Set Features
failure to be reverted.
Change-Id: I2ac639eb8b069ef8e87230a21fa77225f32aedde
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Fill in the cached copy of CAP in the generic NVMe controller to match
the PCIe transport.
This is not really early enough, since CAP is used during the reset
process to determine the reset timeout, but that will have to be fixed
separately by rearranging some of the transport callbacks.
Change-Id: Ia8e20dbb8f21c2871afb9e00db56d0730e597331
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Make sure the entire NQN field is zero-padded, rather than using
strlen() on the input.
Change-Id: Icee68bd033feed057813beeb30cec102ed90840e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This fixes a compiler warning about unhandled enum cases in a switch.
Change-Id: Icecb56b47a05c13f390f03b877f8eae243b481a6
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
- add SPDK_NVME_OPC_KEEP_ALIVE to admin_opcode
- add SPDK_NVME_SC_INVALID_SGL_OFFSET, SPDK_NVME_SC_INVALID_SGL_OFFSET,
SPDK_NVME_SC_HOSTID_INCONSISTENT_FORMAT, SPDK_NVME_SC_KEEP_ALIVE_EXPIRED
and SPDK_NVME_SC_KEEP_ALIVE_INVALID to generic_status
Make it easier to use SPDK libraries by putting them all in a single
directory that can be added with -L rather than scattered around the
source tree.
Change-Id: I5c0f5dd6e7058b5f92fa9bc41548190ffc064761
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Track the maximum copy task size as modules are registered rather than
recalculating it every time spdk_copy_task_size() is called.
Change-Id: I141aca61e7075402dac41915080d1b43faee32ce
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Make the public API clearer - if the user wants to allocate a
spdk_copy_task directly, they need to allocate spdk_copy_task_size()
bytes.
Also change the return type to size_t for consistency.
Change-Id: I0f3757056757c510421d680c5b4532edd9bc2561
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The SPDK_TRACELOG macro depends on a CONFIG setting (DEBUG), so it
should not be part of the public API.
Create a new include/spdk_internal directory for headers that should
only be used within SPDK, not exported for public use.
Change-Id: I39b90ce57da3270e735ba32210c4b3a3468c460b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Remove usage of the conf structs so they can be moved out of the public
API header.
Change-Id: I1c7375ec7708b323f50af09aeb7b2b2c9c770df4
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Considering the process can be terminated in the cases like ctrl+c,
kill command or memory fault, the ref is tracked in the per process
structure spdk_nvme_controller_process and whenever there is other
process attaches or detaches the controller, a scan will be issued
to cleanup those unexpectedly exited processes.
Change-Id: Ib4f974f567a865748d42da4ead49edd383dfc752
Signed-off-by: GangCao <gang.cao@intel.com>
These APIs can be used to register/unregister regions
of pinned, huge page memory that are separate from
huge page memory allocated by the default DPDK
allocations. These APIs will be used by an upcoming
SPDK vhost-scsi target to enable SPDK to target
NVMe DMA operations directly to VM memory that has
been allocated by QEMU using pinned huge pages.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I649a4adeeb758b29bd29cd42c8872eed3d5d6ce9
Now that the env PCI framework already requires enumerating devices
based on an enum of specific device types, it is not useful to query the
class code of a PCI device handle.
It is currently unused and does not work in its current form on FreeBSD
(it reads a file from /sys). This lets us drop a big chunk of file
reading and parsing code.
Change-Id: I1d720398416ba3d6f91e077b807ec11a6de562cf
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The details of the structure were removed earlier, but
now remove all references even to a pointer to the
structure. The user can refer to transports by their
string name.
Change-Id: I273356f46329ea5372dcd951eda6f14767477d69
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This is a step toward abstracting away the definition
of the subsystem.
Change-Id: I88b2aa107b27152620f51a1ca2a153792b4c85e9
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Remove 4k allocation size in spdk_scsi_task_alloc_data(). From now on
all commands must obay allocation length.
Change-Id: Ica9384c62d431483ae1d0bd2e6fdee18b570861f
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
This refactor MODE SENSE 6 and 10 related functions to respect buffer
size parameter.
Change-Id: I03bad456bac0554a8bf7b56f69d1f9cf5b1991f6
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
This patch is preparation for fixing alloc_len overrun in SENSE 6/10 and
READCAP 6/10. To simplify code forbid usage of iov outside of
scsi/task.c.
This also drop SPDK_SCSI_TASK_ALLOC_BUFFER flag that obfuscate code. As
a replacement assume that if field alloc_len is non zero it mean that
iov.buffer is internally allocated. Functions
spdk_scsi_task_free_data(), spdk_scsi_task_set_data() and
spdk_scsi_task_alloc_data() manage this field.
Change-Id: Ife357a5bc36121f93a4c5d259b9a5a01559e7708
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Use the len field from the generic spdk_bdev_io instead of duplicating
it in blockdev_rbd_io.
Change-Id: I3ebfab8dd1303add83bc2206fc87319ba7d605b3
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This function needs to check for SGEs that straddle a
2MB page boundary, and ensure it does not return
a length that will cross that boundary.
This cannot happen in practice currently with SPDK
since all buffers are allocated using rte_malloc(),
but an upcoming vhost-scsi target may produce
SGEs from a guest VM's physical memory that span
a 2MB boundary.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I8b83c7c39c4cf33815abb22ff2ebc90941b21e28
No functional change, but removes a few assumptions
that will be invalid in a future patch that fixes a
bug in this function. Primarily we no longer assume
that this function will always increment the
iovpos and reset iov_offset to 0.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I770f2f24c37626063e113af850a2af792aed332a
The bdev function table should not be part of the public API.
Change-Id: I5d6f40d1b37c4471041c1c9d6253a3f92e9e9701
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
It was written but never read (and the I/O channel is already stored in
the generic spdk_bdev_io).
Change-Id: Id33392e9d3940b2c1439e9fed2553aa091ecedf8
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
No need to duplicate the bdev-defined I/O type.
Change-Id: I15cb68c3c68b3f25b286b04500b53081ed5e7881
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The status field in blockdev_rbd_io was only used within
blockdev_rbd_io_poll(), so replace it with a local variable.
Change-Id: I3629225f28b752a3acc7521699c33bc98f1e4b7b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Instead of the next_sge callback returning the physical address
directly, make it return the virtual address and convert to physical
address inside the NVMe library.
This is necessary for NVMe over Fabrics host support, since the RDMA
userspace API requires virtual addresses rather than physical addresses.
It is also more consistent with the normal non-SGL NVMe functions that
already take virtual addresses.
Change-Id: I79a7af64ead987535f6bf3057b2b22aef3171c5b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Remove the complex list management for pool_name and just strdup() it
directly. It is not worth the trouble to save a few bytes.
Change-Id: I8a4f7eeea619bd824ea593854423e317041c540e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Remove a DPDK dependency from generic code.
Change-Id: I8e3e2c0a36d980b426a1967ed1f88fb8b855c382
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Custom bdev modules can return any SCSI status and SCSI sense
information to a host by this patch. This is usefull when a custome bdev
module detect an error in the module and need to return meaningful
information to a host.
Function pointers will not work for the DPDK multi-process model (they
can have different addresses in different processes), so define a
transport enum and dispatch functions that switch on the transport type
instead.
Change-Id: Ic16866786eba5e523ce533e56e7a5c92672eb2a5
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Make a wrapper that spdk can call a function without thread affinity, and
call this wrapper to open rbd image.
Change-Id: Iadc87a948f43632abf497f88165483a0e269ba54
This enables using SPDK within a larger process that
is SPDK-centric. In this case the process may start
SPDK and then wish to stop it explicitly (without a
signal).
While here, remove an incorrect comment - DPDK mempools
can be used from non-DPDK threads. Also set the
g_shutdown_event to NULL after it is called. After the
event executes, the event is freed and is no longer valid.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ie4f07bee7d05fae683c72f6680cb3bcce2d4a119
The initialization of dev_addr was replaced with probe_info.pci_addr,
but its use in spdk_pci_addr_compare() wasn't replaced to match.
Fixes commit fcb00f3780 (nvme: expand
probe information to a struct).
Change-Id: Ic4c273d2aa0bf1f9e3e1527f3ab09d3c019158cd
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Since we are usually going to be removing multiple events from the queue
at once, use the DPDK burst dequeue interface to improve efficiency.
Also rework the event queue runner to always process a fixed maximum
number of events per timeslice for simplicity. This removes the
rte_ring_count() call from the hot path and improves fairness between
events and pollers.
Now that events are dequeued in bulk, we can also put the event objects
back into the mempool in bulk. Add an env wrapper around
rte_mempool_put_bulk() and use it to free all of the events at once.
Basic performance benchmark using test/lib/event/event/event -t 10
is improved: previously ~40 million events per second, now ~46 million
events per second.
Change-Id: I432e8a48774a087eec2be3a64c38c339608af42a
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
It always returns NULL in the current DPDK env implementation and was
not used outside of a few ioat examples where it is not particularly
informational.
Change-Id: I14b237c33bc25ddebc6b36bfbd6a4edf6762e3ca
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This removes the 2 bytes of SenseLength from the beginning of the SCSI
sense_data buffer, so now the offsets within sense.data match up to the
expected values from the SCSI spec.
Change-Id: I9188560096a9ec5a8fcf83bec95201521b127494
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
spdk_nvme_probe() will now provide a struct spdk_nvme_probe_info to the
probe and attach callbacks in place of the PCI device pointer.
This struct contains the useful information that could be retrieved from
the PCI device during probe.
The goal of this change is to allow expansion of the probe information
in the future when other transports (specifically, NVMe over Fabrics)
are added that do not necessarily use PCI addressing or device IDs.
Change-Id: I59a2a9e874e248ce5fa1d7f4b57c8056962ff3cd
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Add a helper function that converts a PCI address from a string into a
struct spdk_pci_addr and use it in place of the various sscanf()
invocations throughout SPDK.
Change-Id: Id2749723f76db741567e01b4bcb0fffb0e425fcd
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Print the error information when the kernel RNIC driver did not load
properly, and fix the cleanup logic for the exceptional exit.
Change-Id: I97a45e73d830280b994818f3defc491bc2b6b020
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
As we can support multiple sessions now for each Subsystem, the Host
will use cntlid field to create IO queues, if 2 different Hosts
connected to the same Subsystem, for IO queues' creation process, it
will use cntlid field with 0 for current code logic.
Change-Id: I6fd437892e8eb3146f62f4b211c0baadd70b505e
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Add an RPC interface to list all blockdevs and their properties.
Change-Id: I50db730d5eff8cffcbe8fe5df6b3461457e8581e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The PCI device claim function does not need the whole spdk_pci_device
structure, just the address.
Change-Id: If59df512043ee062cf9f759bdc104fc522625ba8
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The NVMe over Fabrics target was storing the PCI device pointer for each
direct-mode controller, but it only really needs the PCI address, which
is exposed via the get_nvmf_subsystems RPC.
Also update the same code path to use the new spdk_pci_device_get_addr()
function for brevity.
Change-Id: I0708b3331b7c279c1a86f0d7459b5deb40dd7c89
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Use the new public PCI ID structure in the NVMe library to replace the
previously private struct pci_id.
Change-Id: I267d343917f60bdae949a824bc0fe67457cbbc0d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
- Split the part that gets a PCI device's address into its own function,
spdk_pci_device_get_addr(). This is useful outside of the comparison
function and is orthogonal to comparing addresses.
- Make the comparison function take two addresses instead of a device
and an address. The more general form will be useful with addresses
that are not directly associated with a device. Because of this, also
rename the function from spdk_pci_device_compare_addr() to
spdk_pci_addr_compare().
- Return a signed value similar to strcmp() so that addresses can be
ordered, not just compared for equality.
Change-Id: Idf304454af09ea57f1e1d5dc3a39b077378cecad
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Rename the construct_rbd_bdev "size" parameter to block_size so that it
is consistent with other bdev construct RPCs.
Change-Id: I88f8ed35444495ffce9550dc224fbcbd58231787
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
When creating a bdev via the RPC interface, there was no way to know
what name it was assigned (other than predicting it based on the
numbering scheme). Change all of the relevant RPC interfaces to return
an array of bdev names so they can be used to construct LUNs/subsystems
dynamically in scripts.
Change-Id: I8e03349bdc81afd3d69247396a20df5fcf050f40
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Add a field to struct spdk_nvme_ctrlr_opts that allows the user to
specify a keep alive timeout, and add automatic submission of Keep Alive
commands to spdk_nvme_ctrlr_process_admin_completions().
Change-Id: Ib282299a571d8edc59c7933418751bc3a6c98b40
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Specify SPDK_JSON_WRITE_FLAG_FORMATTED when creating a write context to
output more human-readable JSON.
Change-Id: Ie1f0451496aae7e36e4cdb1f05edb4bc4963be17
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
If the RDMA transport failed to initialize, g_rdma.event_channel may be
NULL.
Change-Id: I4510ee5893389f244f0fbaa1cd4a182868939b25
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
For iWARP devices, buffers that are intended to be the
target of an RDMA read initiated by the target must additionally
have IBV_ACCESS_REMOTE_WRITE permission. This is because iWARP's
RDMA read path essentially requests the remote side to do
an RDMA write.
This is unfortunate because there is no way to differentiate between
memory that the remote side can do an RDMA write to and memory
that will only be the target of RDMA reads initiated by the
target. There is nothing we can do about this serious deficiency in
the specification, however, so we have to live with it.
Change-Id: I3d2f2814ce0cb1df4e5347296ef371db4d16be21
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This patch adds support for spdk_bdev_readv in scsi layer.
It also fixes write so that it uses multiple iov's instead of one.
Currently we should use only task->iov (for single vector operation)
or task->iovs (for multiple vector operations).
Signed-off-by: Piotr Pelplinski <piotr.pelplinski@intel.com>
Change-Id: Ia3b2f6d18fd212b11d7b63b11dc46ec5bbc74788
Make the quirks mechanism generic in preparation for quirks for devices
from other vendors.
Change-Id: Ic003b020a38f1b966021db30e3f2bce9cf6a1a0d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This patch removes reduntant field in spdk_scsi_task and
fixes all logic to use iov.iov_base
Signed-off-by: Piotr Pelplinski <piotr.pelplinski@intel.com>
Change-Id: Ie2fa1e2357b6383c118d05aec9206d1c60537d40
Previously, if spdk_rpc_setup() returned early due to the RPC service
being disabled in the configuration file, it would leave itself
registered as a poller and continue to run for the life of the app.
Change-Id: I0532fe23a732b87d68f83847b2db7627f87e9a1c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
I believe this is required for NICs to report, but handle
the case where it isn't reported.
Change-Id: I38d10c3590d1df8bb902ab312af0f9e01b9e5032
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This makes it consistent with the way connections and
requests work.
Change-Id: Ifb97499ba72f7dfd02ac54ba1b622726d266262c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The shared memory pool for a session is associated with
a particular RNIC via the protection domain. New connections
attempting to join a session that came in on a different RNIC
can't use that memory, so must be rejected.
Change-Id: Ibd79fe90566a231f76b7472e5e9b484c3e528454
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Rearrange the functions in rdma.c to match the order
of the function pointers in the transport. No other
code changes.
Change-Id: I9dbc68912ecd5dfdf53f20b4807d4116933a3c3a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Use the lower level registration functions. The RDMA-CM
examples use the ibv_* versions, so who knows if the
rdma_reg_* wrappers are even well tested.
Change-Id: I8e8250ab09a1401e636aebe2fc04a60806f7a827
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Add a transport function to get the max data transfer size to break the
dependency on NVME_MAX_XFER_SIZE.
Change-Id: I846d12878bdd8b80903ca1b1b49b3bb8e2be98bb
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Move the PCIe-specific admin queue setup to nvme_pcie_ctrlr_enable.
Change-Id: Ic3f5625fa804f719040ba86b7fc3bf82fcc057c0
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Previously, we mixed use free and spdk_nvmf_rdma_conn_destroy to
free allocated spdk_nvmf_rdma_conn structure, which sounds not
exactly free all the resources.
Change-Id: I2917b442c34d63ba5c014add58f429ae4b831595
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
The RDMA API doesn't say whether the wr is copied, so be
safe and allocate it on the heap.
Change-Id: I091af50aa031e1861333f19d864eb52335d6b756
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
A status member of spdk_bdev_io structure is set after the if block.
Therefore a status parameter should be checked instead of a status
member.
Change-Id: I4030a7fcdb36d9c589802ec5b4e424591dc2a3b6
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The value of CAP should not change during the lifetime of a controller,
so read it once during ctrlr_construct and store it in the ctrlr.
Change-Id: I089d4141b4e0c9aae6c53abf9bb0ef6577dabe0b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Rather than embedding adminq directly in the spdk_nvme_ctrlr structure,
change it to a pointer to a spdk_nvme_qpair. This is necessary to allow
the transport to extend the qpair structure.
Change-Id: I041685d5037088cf56d046fe99bf204edcfc57b1
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Previously, we directly assigned the pointer of pool_name
and rbd_name, and this is not safe. After the rpc test,
we found the string value is not correct, so use strdup.
Change-Id: Ibadc57d3cb5b9869b7db5a22c2459769e92edebd
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
This requires a couple of related changes:
- I/O queue IDs are now allocated by using a bit array of free queue IDs
instead of keeping an array of pre-initialized qpair structures.
- The "create I/O qpair" function has been split into two: one to create
the queue pair at startup, and one to reinitialize an existing qpair
structure after a reset.
Change-Id: I4ff3bf79b40130044428516f233b07c839d1b548
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Make the transport ctrlr_construct callback responsible for allocating
its own controller.
Change-Id: I5102ee233df23e27349410ed063cde8bfdce4c67
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Four read/write functions share the same code for checking
IO len and offset. Extract this code into separate function.
Change-Id: I40f0021e70a60c591b048ad3a70b22eaa07af3b4
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
These are specific to local NVMe PCIe devices, so move them out of the
generic NVMe code into the PCIe transport.
Change-Id: Iea2056a4c438b7d3a303b4b5e977ce7aa9e58c05
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This allows the entire transport structure definition
to become private.
Change-Id: I9ca19edbfc3cfb75b9b113a89bb2b90bc499ab16
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This changes as little code as possible while still creating
a single public API header. This enables future clean up
of the public API and clarification of the exposed
concepts.
Change-Id: I780e7a5a9afd27acf0276516bd71b896ad301c50
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Only call spdk_bdev_io_complete() where IO error is seen.
Change-Id: I829e4c589dbcb47017e810035837a4c61c3428f9
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
This will allow factoring out PCIe-specific code into a swappable
transport so that NVMe over Fabrics host support can be added.
Change-Id: I4df74dd268d655e3b36e8d6114ebe7d79a24844d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
For large writes that require multiple SCSI tasks (one for immediate
data, then one or more for R2T-solicited data), we bump the refcount
for the task associated with the initial immediate data PDU, to
ensure it does not get freed until all of the child tasks are
completed. But in some cases this initial immediate data PDU could
complete after all of the R2T-soliciated data PDUs. The
completion code was not handling this case correctly which would
result in the iSCSI connection thinking it still had outstanding
SCSI tasks when the connection closed.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I9f9c5322755462d1918fde0075c87c84295cb10c
This patch enables vector operation for bdev drivers aio, malloc and
nvme.
The rbd driver still handle only one vector.
Change-Id: Ie2c1f6853bfd54ebd8039df9a0305854ca3297b9
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Make sure the function reentrant, prepare for rpc method.
Change-Id: Ie5230e4ac6c9a750e8e779c5e0b67134729c07e3
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
This prepares for future scatter-gather support.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ie21c4d86c1e932dcaf63cf13d7a7198890595d79
Return void in main I/O path, and have functions
explicitly complete the I/O back to the bdev layer
if any failures are encountered.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ia729b0af555f87c2fb36b92e79a47d19a325de7a
Add public function which could be used by rpc method.
Change-Id: Id9d2938801e0acdf0f9827ef2990a54c75aec22a
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
This patch removes the lock in RBD module. And it requires
the librbd library supports rbd_poll_io_events function.
Change-Id: I040a7d8369ab4f69f41d1d0233115f885168f019
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
This patch makes lun_name:lun_id pair as one object,
the same for the pg_tag:ig_tag.
Change-Id: Ib08450d12bde9b8388d4ae41e214cc0ba64c8b1e
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
For the nvme readv/writev APIs, the PRP checking logic was
incorrectly failing single SGE payloads that were larger
than 4KB. This patch adds a test case for this scenario,
and fixes the PRP checking logic.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I6357d620599666046d2cb74d7923dac1f75418c5
Use standard GCC style atomic operations instead of
the DPDK calls. The DPDK calls end up translating
to the gcc standard inline calls in the generic
case anyway.
Change-Id: I0ea760c4e23c3660b082a803bbc174de7250f365
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Remove #includes for all DPDK headers that weren't
necessary.
Change-Id: Ib02522e0f04e64a1c98afceb7508cc0e8d931a9d
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This was only used for debugging. Everywhere else
used the spdk_memzone abstraction.
Change-Id: I8a828ea3c7abccb66c8a027cb13de43c560ff7a1
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This converts some, but not all, usage of rte_mempool
to spdk_mempool. The remaining rte_mempools use features
we elected not to expose through spdk_mempool such as
constructors, so that will need to be revisited.
Change-Id: I6528809a864ab466b8d19431789bf0f976b648b6
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change the type from int to bool and change the name
from data_ref to data_from_mempool.
Change-Id: If1fc11761e63561443ed44d6a0860e416e424df8
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
This is required since pollers are now directly removed
(rather than scheduling an event) when the unregister call
is made on the poller's lcore.
Without this change, if a poller is registered then
immediately unregistered, the unregistration will seg
fault since the event adding the poller has not executed
yet.
Also add a test case that exhibits the sequence of events
described in this commit message.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I5c6ba0ee224ac1f8f3ebb8e7571714e718bd42db
Use the env library to perform all memory allocations
that previously called DPDK directly.
Change-Id: I6d33e85bde99796e0c85277d6d4880521c34f10d
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Enforce exactly one trailing \n, and fix all of the existing cases.
Change-Id: I6218e4700e90aeb647eaee78089530c79993c8c8
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This patch enables vector operation for bdev drivers aio, malloc and
nvme.
The rbd driver still handle only one vector.
Change-Id: I5f401527c2717011ecc21116363bbb722e804112
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
'virtual' is a keyword in C++, so avoid using it in variable
and structure names in case any files are eventually
included from a C++ project.
Change-Id: I2122750445def63038af68a3000758e33b937f9d
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
All completion queues for the same listen address
now share a common completion queue channel.
Change-Id: I42c149fe7e221951e8a3826b1713482c37a265b8
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
These 4 callbacks can be condensed into two callbacks, which
simplifies the API.
Change-Id: I069da00de34b252753cdc8961439e13a75d1cc68
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Validate the number of unmap descriptors in the generic bdev layer
before calling the blockdev-specific unmap function.
Change-Id: Ib24e7ec63f782f23f2ee3e63393aa8463123fdb4
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
All timers have been converted to SPDK pollers.
If an app requires rte_timer support, it should register its own poller
that calls rte_timer_manage().
Change-Id: I8a827a357b344deac76d42357a5a84ac2daabbf8
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This allows users to swap their PCI library from
libpciaccess/dpdk to another mechanism using the standard
method for swapping out the env library.
Change-Id: Ib2248f8b43754a540de2ec01897e571f0302b667
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This allows users to swap out SPDK's third party
libraries for an implementation based on their own
framework.
Change-Id: Ia0b7384ce5e31acba5ad0d7002dec9e95b759c52
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The new env library will wrap all third-party library
calls and be easily swappable with alternate implementations
at build time. For now, it's just the memory library
renamed.
Change-Id: I26a70933289f8137107208ba75f7520fd7a33da0
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Increased printed data width, added data offset indices.
Change-Id: I44f81396e33870109c2bece5e152657f8a24a56a
Signed-off-by: Krzysztof Jakimiak <krzysztof.jakimiak@intel.com>
Preparation for SGL support (readv/writev).
Change-Id: I14a116d764ebc582ea0a0077cc5a0d0bac638cb0
Signed-off-by: Krzysztof Jakimiak <krzysztof.jakimiak@intel.com>
This patch also drops support for automatically unbinding
devices from the kernel - run scripts/setup.sh first.
Our generic pci interface is now hidden behind include/spdk/pci.h
and implemented in lib/util/pci.c. We no longer wrap the calls
in nvme_impl.h or ioat_impl.h. The implementation now only uses
DPDK and the libpciaccess dependency has been removed. If using
a version of DPDK earlier than 16.07, enumerating devices
by class code isn't available and only Intel SSDs will be
discovered. DPDK 16.07 adds enumeration by class code and all
NVMe devices will be correctly discovered.
Change-Id: I0e8bac36b5ca57df604a2b310c47342c67dc9f3c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Provide a convenience wrapper for general purpose dataset
management commands. The previous wrapper for deallocate
was difficult to use correctly and only for deallocate.
Note that the name is "dataset_management" as opposed to
"data_set_management" to match the NVMe specification.
It's questionable whether "dataset" is valid English, but
it is best to match the specification.
Change-Id: Ifc03d66dbabeabe8146968cf8a09f7ac3446ad68
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Namespaces can be allocated but inactive, which causes
the identify namespace command to fail. Handle this
case so that attaching to the controller does not fail.
Change-Id: I9d692f8e7841a9315a737b0a5e44d9b4e4484a13
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Modify the spdk_poller_unregister() function so that it works correctly
when unregistering a poller from its own callback function.
Change-Id: I57fa5ebd8a8bad522e34f597b406a4726f1b76ad
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This patch will add a new bdev module, rbd.
It can make ceph rbd as the backend of iSCSI
target.
Change-Id: Id5eb3b159ee607052e3c33a2e59d721739fd9977
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
These offsets are passed to the bdev I/O functions, which take uint64_t
offsets.
Change-Id: I1d597d066dfb64b6c7658906e7ee8e6fb2f8e4db
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The offset variable is used to store the result of a uint64_t * uint32_t
multiplication; a signed integer is not the correct type for the result.
Change-Id: If1fb22314ba7e3cec91808cc051678f809c9e58b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This code calculates the difference between two pointers and then stores
it in an off_t, which is intended for file offsets.
In this particular case, the offset will never be large enough to
overflow off_t, but use the correct ptrdiff_t type anyway.
Change-Id: I6b159bf0286a7f5962d08b9894538f4d99c8647b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
off_t is problematic for use as a file/block offset: it is signed, and
on 32-bit platforms, it can be 32 bits (depending on the settings of
_FILE_OFFSET_BITS and _LARGEFILE_SOURCE).
The blockdev layer already uses uint64_t to represent offsets; replace
the blockdev module uses of off_t in internal functions with uint64_t
to match.
Change-Id: I77a2e594572c56f1cd8a7a080f985ea5b27c35f3
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The free function is missed for dev in spdk_io_device_
unregister function.
Change-Id: Id212344dfcde2ae4780c631e3443f530ef25cfd1
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
For process_read-task_completion, add a
new paramter and remove the duplicated code
since this function is the critical path
Change-Id: I6a56327def717ee965c701383f01d6745a8c6988
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
This feature should only be used if clients are coordinating
with one another.
Change-Id: I89a437441a7e3fbcc1e5f6efa1c8e970ade7c2ec
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
We already require the assert header from the C standard library,
so use that instead of RTE_VERIFY to further isolate DPDK
dependencies.
Change-Id: I4a718af858c88aff6080e33e6c3dd533c077b8f4
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Some subsystems may wish to create unique I/O channels
which are not shared across all users of the same I/O
device on the same thread.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I3ade3675d57338cf85b6a301285e6f392bd6cd2e
We need to return -1, when there is still tasks. From the
usage, return 1 is wrong.
Change-Id: Ibf1b53e0be92818c73590c0b4211d34332073c74
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Fix the existing cases (all missing void in parameter lists) and enable
the warning to prevent new ones from being introduced.
Change-Id: Ieaf00b3dfd5daf1e21fcbefb124514882e8996c9
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This patch aimed at avoid run out of large rbuf for read commands
Change-Id: Ibc42b2216e929f8dfa59cba1b32ae8d52a1a345e
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
bdev and copy modules no longer have check_io functions
now - all polling is done via pollers registered when
I/O channels are created.
Other default resources are also removed - for example,
a qpair is no longer allocated and assigned per bdev
exposed by the nvme driver - the qpairs are only allocated
via I/O channels. Similar principle also applies to the
aio driver.
ioat channels are no longer allocated and assigned to
lcores - they are dynamically allocated and assigned
to I/O channels when needed. If no ioat channel is
available for an I/O channel, the copy engine framework
will revert to using memcpy/memset instead.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I99435a75fe792a2b91ab08f25962dfd407d6402f
I/O channels are not actually used for I/O yet however.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iaa3774ecacc7ec206c7c0c66e6b2f2d10c8fa785
This will start testing the I/O channel allocation paths. I/O channels
are not actually used for submitting I/O yet however.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I901402633248170324db1e2fc8fb813f7629c2b0
This will help catch any cases where I/O channels are not
released during shutdown.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I96cf93218026b9ef319abcf0662fe258bf75174d
Also implement these functions for all of the bdev drivers in
the tree.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Idea97743d601150044b1fe2d9d76e922d46d3ee1
This patch adds a basic framework for creating I/O channels
for I/O devices. An spdk_io_channel represents a one-to-one
mapping between a calling thread (represented by spdk_thread)
and an I/O device that the thread will perform I/O operations
on.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I658ab7f995cc962f4e2a204e058cdd3ad3fd735d
Change the return type from void to int so that the result of
spdk_subsystem_fini() can be reported.
Change-Id: I811c25513e41573ca0c9cb111512d7705d107f66
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Instead of polling for only 1 completion at at time,
poll for batches of 32.
Change-Id: I5ef99a270489e7b3d2a58cb765915f187775a93e
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Purpose: To make the function definition style consistent
Change-Id: I7ade943881aa5076fdd419958e386ae3c3661da6
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
SPDK_NOTICELOG() can already do printf()-style formatting, so there is
no need to use snprintf() on a temporary buffer first.
Change-Id: Iffb5369b74f27fb2c4b3ac07ea0cdeab52258ba1
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This patch aimed at avoid run out of large rbuf for read commands.
Change-Id: If10f45292da5d5a26c2e338f1ddeafccedb88a4c
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
There are some error paths that can get to spdk_iscsi_conn_stop_poller()
before the conn's session is set. So check whether the session is NULL
before trying to check its session type.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I352a2aa513541ba630ace368137433e509700e32
1 In our nvmf tgt implemention, we use the async
mode to delete the nvmf subsystem. However, when
we parse nvmf subsystem, we need to use the sync
function to delete the nvmf subsystem. Since if
there is error, we will call spdk_app_stop, thus
async functions will not be executed. It is
approved in my local test.
2 Add debug info in spdk_nvmf_delete_subsystem
Change-Id: Ia8ecd6eee1bbd25cb3e1ceeb0e2146f3f03be228
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
This ensures against races, when an existing session to a target node
stalls, causing the initiator to create a new session. These new
session's connection may get migrated to a different core than the
core of the stalled session.
In practice, this does not happen, but is a common occurrence when
debugging the iSCSI target using gdb.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I1864c2ca0c330dc4faeeb1312adac7a02c8281dc
This enables some future changes which will use per-thread
nvme_qpairs.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I1efcacfa6aedc970656633c9ce1393dc9b4fdbcc
This breaks out the resources needed to perform
aio-based I/O into a separate data structure, as a steps
towards some future patches that will enable per-thread
resources to enable parallel I/O without synchronization.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I84b95713133f9c411863ff0aeef8f886a08e0857
While here, also break out a new ioat_poll() function which
takes the ioat_channel as a parameter. This will be reused
for some future refactoring.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I9c03577e8d90d9bbd4d7adb9c186f21f54b85e82
This moves towards a single pair of functions where code can be placed
that must execute on the polling thread before the poller starts execution
and after the poller stops execution.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I2df7bacaa7b173f495c41c7cc79bafae53a57729
This list was originally intended to ensure blockdev I/O operations
with a malloc backend would not be completed until after the blockdev
I/O submission routine completed. This is no longer necessary, since
blockdev I/O completion operations are now handled by events. Removing
this simplifies the memcpy copy engine implementation significantly.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I4d318bed996694e49946d67baa3c2403d4bbef7a
ibv_poll_cq is actually an expensive call to make, so take
steps to begin to minimize the number of times it is called.
Change-Id: I6fc64979604220eb8cacd612b46e3a3b1bca0924
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The out-of-bounds case in the bit array accessors should not happen
normally, so help the compiler order the basic blocks correctly so that
the in-bounds case is the fallthrough path.
Change-Id: Id778e724b3a58c17c728b8544c2653c60d90a6ba
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
My previous pdu leak fixing patch breaks the
large logic for large read, and this patch
fixes this.
Change-Id: Ic3f654527f7addd4ee45aad53a752de72a84edfd
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
This matches the general order (LBA start then LBA count) for
the NVMe API.
While here, fix a copy/paste error in a debug message (write
instead of writev).
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ice326af5d6025867dffed4d1f6c7b81fb9eba5eb
Set status code to invalid opcode when opcode is not supported
in nvmf_process_discovery_cmd.
Change-Id: Ibab8097e536f26f16c322d5f539277688906cfc3
Signed-off-by: Liang Yan <liang.z.yan@intel.com>
Rather than forcing the NVMe library user to pass a specially-allocated
block of memory (e.g. rte_malloc() in the case of the default
nvme_impl.h), just make the NVMe library allocate a suitable buffer
itself and copy to/from the user buffer as needed.
The fast path I/O functions still require special rte_malloc()
allocations, since we don't want to add an allocation and copy to the
I/O critical path.
Change-Id: I7fe88c0ba60c859a33bbe95b7713f423c6bf1ea8
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The spec does not define NQNs as case-insensitive, so replace the
strcasecmp() matching of NQNs with strcmp().
Change-Id: I5946d9ee8e1d0aa5966e9b1b3c6f14f3f5119aec
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
It is pdu memory leak issue. The reason is that
we did not correctly handle the read pdu task.
Change-Id: I719c87fe7825537b9c77f5ee7e0816671de4c051
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
1 Rename this function and make it more meaninful, since
we have spdk_nvmf_session_connect which is used to link a
connection to the session
2 split spdk_nvmf_session_destruct.
Change-Id: I150df7ccdf4de3428d8cecbb286d5f7944510a8c
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Fix copy-and-paste errors - when polling the recv CQ, we should print
"Recv" instead of "Send" in log messages.
Signed-off-by: Roland Dreier <roland@purestorage.com>
This can just directly assign the completion instead
of calling memcpy.
Change-Id: I07819c824eba45245b00fa3538a99bc81bcb9fcc
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This function always shows up as one of the hottest functions when
profiling. I believe it is the memset that is expensive, so instead
use default initialization when the wr is declared on the stack
and just set the members that need to be updated in the function.
Also make the function inline for good measure.
Change-Id: I29e24cdd375311fa033b5a6df772ff4f73e35302
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
We need to free the session resource, if there is error
for creating a new session
Change-Id: I7c4f3e779e0b30e213e02b8676d93bd2fe9bf851
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
The application is now entirely responsible for scheduling subsystem
pollers and sending events between threads.
Change-Id: I88da1f53b5e8852c7c4acd6f0a7a1e2219fbed41
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reason: In acceptor_poller_unregistered_event, we
directly call spdk_nvmf_check_pools and spdk_app_stop,
it will fail the memory check.
And function nvmf_delete_subsystem_poller_unreg will
not be called since we already call spdk_app_stop.
Change-Id: I3ffa30c87b149a66cee1d87d1bb81d4dc8cc96b9
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
flush the data in pdu to client if the pdu are ready and sequential.
Change-Id: Idf0ec0c7f6058790a85407dff324900fd36c9527
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
Our SCSI translation layer only fills 4 version descriptors
meaning the last 30 bytes of the 96 byte standard inquiry
data format are not used. Some compliance tests expect
the full 96 bytes to be returned, even if they are unused.
So zero the remaining bytes (up to 96) if those bytes were
allocated.
This fixes a regression introduced by recent commit d3b58c006.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Id61614b904b5dff39f034b7ba4da624be1b25bae
The translation code currently cheats a bit - it allocates a full 4KB
buffer for any DATA_IN command that is not a READ, and then the
different SCSI commands that fall into this category (INQUIRY,
READ_CAPACITY, MODE_SENSE, etc.) can write as much data as they
want without having to worry about a buffer overrun. Code higher
up the stack makes sure we only send the correct amount of data back
to the iSCSI initiator.
This patch fixes this behavior for standard INQUIRY (EVPD = 0).
Future patches will fix the behavior for other non-READ DATA_IN
commands, at which point we can remove the 4KB allocation and
only allocate the amount of data specified in the CDB.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: If5e4a10eeba9851e2d91cab71228d2fc2d5baad0
The "+" is not correct, should be "-". Currently,
the issue doest not happen since the offset is 0,
then both + and - is OK. But if we adjust the location
of spdk_nvmf_conn or spdk_nvmf_request, we can find
this bug.
Change-Id: Ib358dc729da901a69442d0402a6089989f49b05c
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
The table of bdev function pointers should not need to be modified at
runtime.
Change-Id: I3e8876fc83df9296ce528231269b1a905c96072c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
If a bdev doesn't need to be polled, allow it to specify NULL for the
check_io function pointer to indicate that no poller needs to be
registered.
This will be useful for virtual blockdevs that don't have any associated
hardware to poll.
Change-Id: I0ef8f848587b0c200296805ccc710340dde683b5
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
When an I/O with children is being freed, also free its child I/O
requests that were allocated via spdk_bdev_get_child_io().
Change-Id: I2d44aed845c1035ae8f8cb07c5992da855f1dc99
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This callback was only used for freeing buffers, but the buffers are now
managed by the bdev core, so none of the free_request callbacks actually
do anything.
Change-Id: Icfe2e6169e829159dda5e3d75a27d8f040de07c6
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Add unmap support to the ramdisk block device for testing purposes.
Change-Id: Ibeb5530b2b5a31603d09d2d1de07760f32dea0f8
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The bdev layer can be used independently of iSCSI, so fix the
misleading names.
Change-Id: I3fd5b113403acdd7578ce93234dde0fd4f148e96
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Check that the number of blocks/ranges in the command fits within the
length specified by the SGL.
Change-Id: I21aded797dc1f1e752fe0bc9cec27310a4fb106a
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The Dataset Management command allows several operations to be specified
at once; the virtual controller only supports deallocate for now, but it
should just ignore the other bits in order to be spec compliant: "If the
Dataset Management command is supported, all combinations of attributes
[...] may be set".
The spec also explicitly states that it is acceptable for controllers to
choose to take no action based on information provided, so not
implementing the other attributes is fine.
Change-Id: Ia989dc1faa9c852660bf1299ea18fa8e7bdf4053
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Also add a diagnostic message if the requested log page ID is not
supported.
Change-Id: I7551b5905d5ebc29356839f0f9153dc86f237106
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Rather than comparing the bdev name against "NVMe", use the new I/O type
supported API to query whether the unmap operation is supported.
Change-Id: I62c7a1ea5529366ff2ae4723b62f24ea78aa8193
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Bdev modules need a separate interface than public
consumers of the blockdevs.
Change-Id: I581ee493570c114f7e96b31a425bc077a791c71e
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This compilation unit depends on bdev.h definitions, but
was only getting them due to #include ordering elsewhere.
Change-Id: I4fcbdb2582a40836bcabc3539cc558614fbfacfd
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Some block devices do not support the unmap operation, and we may add
other optional I/O types in the future. Add a method to check which I/O
types a specific block device supports.
Change-Id: I6e6414bf6b6482ea0224022d8326b252bd363c7f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Switch from the non-portable <sys/endian.h> functions (htobeXX/beXXtoh)
to the SPDK endian conversion functions.
Change-Id: Id49b87f2e536c68f0d5d567e78e1990c0a37ef14
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Intel DC P3*** NVMe devices specify a desired stripe size, which was
used for splitting I/O. Not all devices, however, specify a desired
stripe size (such as the Intel DC D3*** line), and for only these
devices there was a logic mistake that overwrote the maximum I/O
size with a 2MB default. This patch corrects that error.
Change-Id: I94b72a3a3dd1dfa18bd638daf7e01a592eb6ed17
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Move the NQN validation into the subsytem creation function, and fix the
allowed size to match the spec.
The spec is not clear about the allowed NQN size; for now, interpret it
as 223 bytes, including the null terminator (222 bytes of actual NQN
plus one terminator byte).
Change-Id: If9743ab2fe009d9d852e8b03317d9b38d8af18dc
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
DPDK 16.07 introduced a new PCI ID field for matching by class code
instead of vendor/device ID. Use it to match all NVMe devices instead of
explicitly listing vendor and device ID pairs.
Change-Id: Ib2a5cc6833bf2b793d37d77caab97207f365df8f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
SUBNQN is a UTF-8 null terminated string according to the NVMe base
spec, so pad it with zeroes using strncpy().
Change-Id: I486161b26d91f3ea1fd17428e220b9f20a874732
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
These are specified as "ASCII string", which means they should be
left-aligned and padded with spaces, according to the NVMe base
specification.
Change-Id: I25babe0ca417c2e16137b0bfc41fc7834277114e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This will be useful outside of the SCSI code, so put it in the common
string utility file.
Also reorder the parameters so they match the order used in strncpy().
Change-Id: I9e25a59b64e4bedf04e5a96de463b1d8aa0ddac3
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Clean up the poller and only then free the associated subsystem's
memory. This prepares for future dynamic subsystem creation/deletion.
Change-Id: I9e56cbf8822814930fdbb662095c51b6ad40fbc4
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Currently the NVMf target listens for new connections on any address.
Instead, listen only on the addresses specified by the user.
Change-Id: Idb6d37c422e442fc70a8673bd3fcfb9c27b57828
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
assert is part of the C standard library and is available
on any platform we'd consider porting to. Don't put a
wrapper around it.
Change-Id: I0acfdd6a8a269d6c37df38fb7ddf4f1227630223
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
pthreads are widely supported and are available on any
platform we currently foresee porting to. Use that API
instead of attempting to abstract it away to simplify
the code.
Change-Id: I822f9c10910020719e94cce6fca4e1600a2d9f2a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
pthreads are widely supported and are available on any
platform we currently foresee porting to. Use that API
instead of attempting to abstract it away to simplify
the code.
Change-Id: I28123d427ea8da07c6329b0233f0702f2d85c2a0
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Comments are not allowed in the JSON RFC, but some JSON libraries accept
JavaScript-style comments.
Add a flag that enables non-spec-compliant comment parsing.
Change-Id: I9dfb66bb46ecff1a22d8af5a9c50620686a4707c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Use the event framework's new delay parameter to allow
for idle cores to sleep for up to 1ms at a time.
Change-Id: I665f38e590c07338418892afe0e75b0b2c79706e
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
It is no longer needed, since the nvmf_tgt app handles initialization
and shutdown.
Change-Id: I051afe2b4fcbd09b32998386c63f591a0ab343c2
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The user can now specify a maximum delay, in microseconds, that
defines the maximum amount of time a reactor will sleep for
between polling for new events. By default, the time is 0
which means the reactor will never sleep.
Change-Id: I94cddb69c832524878cad97b66673daa4bd5c721
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This will be used in future patches outside the library.
Change-Id: I1fcf5709944a884e161e5a6a9eaec033a995a812
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The NVMe over Fabrics target library now exposes a simple function call
that polls the acceptor once, and the application handles registration
of the poller.
Also rename the transport function pointers related to the acceptor so
they better reflect their purpose.
Change-Id: I5fa0d516586bf17e73afeb88ff3c2d5b0d46794d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This will become more important when other transports are added.
For now, it is also useful to be able to start nvmf_tgt on systems
without RDMA hardware.
Change-Id: I6b9002cc7711f928c4e6b73adcd9b677349ebdd6
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
spdk_shutdown_nvmf_subsystems() was removing the subsystem from the
list, but nvmf_delete_subsystem() also wants to remove it, so drop the
extra removal.
Also rewrite the shutdown loop as a TAILQ_FOREACH_SAFE() to make the
static analyzer happy (and make it more obvious that the loop will
terminate).
Change-Id: Iccadafa77d9cd3e26be21c0f11e62cfc1ef0197c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Verify that the record format is the one we support (only 0 is defined
by the spec for now).
Change-Id: Iddf038b381e540134abf572e0545c97a0ef71d5f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The spec requires that NQNs are null terminated and maximum of 223 bytes
long, despite the Connect command fields being larger (256 bytes), so
add checks for both subsystem NQN and host NQN before using them as null
terminated strings.
Change-Id: I343d9e44a09ab4d0f6654feba460b31e976c4e56
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Since we bind the NVMe device to UIO driver to protect against native
NVMe driver, but for Admin queue, there are still INTx interrupts
exist, as all the completion for Admin queue will be processed in
user space, so we don't need INTx anymore.
Change-Id: Ife5b3e410ae95690ed0f3f9a2f2dfaf55a7797b5
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Users can specify the core for each subsystem and the acceptor listen routine
to run on different cores for performance consideration.
Change-Id: I4bd1a96f39194c870863b4b778e6ea7cf8fc1a2d
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
This is causing issues during shutdown because the poller removal is not
synchronized with the rest of the cleanup path.
This reverts commit 7dfc5e922d.
Change-Id: If95c4b72c5d120f18bdc3db6d7d532ad1aada642
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The lcore_id field in the get_iscsi_connections RPC was removed in
commit 5d8c94536a7d1d4c1f0ee3349188bf0e7e8c9e74; add a field to
spdk_iscsi_conn to track the lcore so this can be re-added.
Change-Id: I6c9574829466b168880728f4620401987fc7dd3c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This should enhance performance, since the hardware admin queue poll
function takes a mutex and should not be in the performance path.
Change-Id: I7e4acde0337aaf7079811612cba5348acf0a467d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This leaves more flexibility for future changes to the poller
representation without requiring API changes (after this one).
It also prevents the user from accidentally using poller fields in a
non-thread-safe way, since they can't be accessed directly anymore.
Change-Id: I7677d5b93668665d29ae39c5e0ba74333ad3f878
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Replace other critical rte_zmalloc() sites that actually depend on the
memory being zeroed.
Change-Id: If6856ad44a4c50869811d3ce9411c993ce88018d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Linux block layer driver will use the maximum transfer length field to
split IOs larger than this value. We should set the field according to
iSCSI target limitation.
Change-Id: I03ee35bb96f0949418bb976a6c8013f88622a324
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Allow the tables to be in the read-only data section.
Change-Id: I58199a86d4d44dbad7baed397b2e148c45b3a3de
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
rte_zmalloc() is broken and does not actually return zeroed memory on at
least DPDK 16.07 on FreeBSD, so do it ourselves.
Change-Id: If8da93ead0b3911c8bca24aa27ed90dc00b8a9a4
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
For VPD page 0xB1 and 0xB2, the scsi target did not return correct
value to the initiator, so return the length with correct value.
Change-Id: Ic17d804ca00d490fd6a2f833db5c9b73ce8dc160
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
This value was incremented and decremented, but it was never used
otherwise.
Change-Id: I6e83a504cf2ef4043363ca04b77556c612068658
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
In files that don't otherwise use DPDK, switch to the standard C library
assert().
Change-Id: I79756908ecf9a2e141b036321e42309db30b5e0f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The NVMe submission queue head wraparound point can be determined in the
generic NVMe over Fabrics layer; it should not be using the RDMA
connection queue depth.
Change-Id: I9da8f09e4f057f8fdc1ff4c6cc5f48cea7123e11
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Report the maximum admin queue size correctly.
Change-Id: I52cad654bf59806e0abb8d869c22973647056617
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Use the max_queue_depth parameter rather than rdma_conn->max_queue_depth
so that we can start to eliminate rdma_conn->max_queue_depth.
Change-Id: I1670c634e6d12aa004fb5a10338b7624850fbc4a
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
There were two unchecked allocations in the nvmf library. Check
for allocation failures.
Change-Id: Ic6b3104d825dba1ee6bd1748fa99e132702f300c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This fixes a static analysis warning for unsigned/signed
mismatch.
Change-Id: I49bd8d6d195f13b402e14a85503a5de6114f5b7f
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This is the size of a logical block in bytes; 4 GB is more than plenty.
Also allows cleaning up casts to uint32_t in the SCSI translation layer.
Change-Id: I3ec2e2f41fd378f1a83f31aac25c46ef780f63e9
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The large buffer pool allocation was using the per-connection queue
depth, whereas the RDMA memory region registration was using the global
RDMA max queue depth. These sizes need to match, so use the global RDMA
max queue depth for both calls.
Change-Id: Iae161b719e09e19ca3e81df6593b68a4a2e86614
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This is a step towards enabling sharing SPDK NVMe
device access from multiple processes using DPDK's
multi-process framework.
Change-Id: I57d5eec158b42addc1036bd2583596471a467a95
Signed-off-by: GangCao <gang.cao@intel.com>
Similar to our NVMf target, this is an iSCSI target that
can interoperate with the Linux and Windows standard iSCSI
initiators.
Change-Id: I6961c5ef99f7b161c396330ed5b543ea29b0ca7b
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This is a useful abstraction when you want to plug in
a userspace networking layer instead of using the kernel.
Change-Id: I7039d2987e6abad9dcd1987fa105282b1598e2f5
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The public header file was missing some required definitions.
Change-Id: Ic4f8028367b1e21ea00c02660ca36be28da54e37
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Use the new timer-based poller functionality to replace rte_timer.
Change-Id: Ic40653306cc73b40139fe18e06bab29b35721a43
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Allow pollers to be scheduled to be run periodically every N
microseconds instead of every iteration of the reactor loop.
Change-Id: Iaea3e98965d81044e6dc5ce5f406bcb7a455289e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Just getting a reference to a bdev should not claim it.
Change-Id: I21e07160662490ec95b52fa31ea1d2ae93a21f09
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Combine the necessary functionality with the main bdev file.
Change-Id: I96d796bc87ac2a8688cdf1fd3c16d2a7c8aef730
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The rte_ring used for pollers is already single-producer and
single-consumer, so it is not providing any thread safety guarantees.
ALl modifications to the active_pollers ring are done from the core that
is running the reactor (via events). This means the rte_ring can be
replaced with a simpler intrusive linked list.
This simplifies the removal of pollers in the middle of the list and
avoids extra allocations for the ring.
Change-Id: Ica149b7a1668a8af1e6ca8f741c48f2217f6f9bf
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
We reported virtualized NVMe devices through NVMe over Fabric specification,
with 1.2.1 NVMe version. For direct mode, the NVMe device maybe has lower
version, such as 1.0, the identify namespace list can not support in those
devices, so we need to add helper function here to simulate such commands
from initiator.
Change-Id: I226f4f34bf61017f538d2dd80332f1d054a501f1
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Allow higher queue depths by allowing many more send/recv
operations than read/write.
Change-Id: I66c424a6463e5e09be6d5463667241ce9271404b
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The target can only provide updates to sq_head inside
of completions. Therefore, we must update sq_head prior
to sending the completion or we'll incorrectly get into
queue full scenarios.
Change-Id: If2925d39570bbc247801219f352e690d33132a2d
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This allows the target to poll for internal completions
at higher priority.
Change-Id: I895c33a594a7d7c0545aa3a8405a296be3c106fb
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This ensures that the data buffers are not in use
when we go to send the completion.
Change-Id: I30467b3e3964001150f81b21e5b695dcd0974b0c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This is useful for holding session-wide buffer pools.
Change-Id: I7024da24b210a2205bf1e159d5935e0093b81120
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
For small SGLs, even if they are keyed and not inline, use the
buffer we allocated for inline data.
Change-Id: I5051c43aabacb20a4247b2feaf2af801dba5f5a9
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Read/Write depth is much lower than Send/Recv depth.
Calculate them separately to prepare for supporting
a larger number of receives than read/writes.
Currently, the target still only exposes a queue depth
equal to the read/write depth.
Change-Id: I08a7434d4ace8d696ae7e1eee241047004de7cc5
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
These don't actually work quite yet, but pipe the
configuration file data through to where it will
be needed.
Change-Id: I95512d718d45b936fa85c03c0b80689ce3c866bc
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
For each connection, allocate a single buffer each
of requests, inline data buffers, commands, and
completions.
Change-Id: Ie235a3c0c37a3242831311fa595c8135813ae49e
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This can be used to release requests that don't
require a completion to be sent.
Change-Id: I8fb932ea8569bf3c45342d9fa4e270af5510c60c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
PORT IDs indicate hardware failure domains according
to the NVMf specification, which means they should
indicate which transport addresses are on the same
NIC. Unfortunately, that doesn't really make sense for
IP-based fabrics because IP addresses can move. The
safest way to present this is to show all IP addresses
as part of different subsystem ports.
Change-Id: I056a50c69be70b4fbf1f896e684ce65bd792241e
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The NVMe over Fabrics 1.0 spec corresponds to the NVMe base spec version
1.2.1, so we should pretend to be at least that new.
Change-Id: I36fc44c780de01d6c666e87b803cd47dba0e74c5
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
These belong in nvme_spec.h anyway and are not used.
Change-Id: I889dfebee523dc5ae503fd0370bb800f1d17fb5d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This is a leftover from a previous controller numbering scheme that is
no longer used.
Change-Id: I3058802f0324b0e38708111634ee993c6e884087
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Move the ctrlr and io_qpair out of spdk_nvmf_subsystem, package them
as a new data structure. Union the direct and virtual mode namespaces.
Change-Id: I839aee3372c6c57aa03a0be76f8aaeb5045ecdaf
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
CAP.CQR indicates whether contiguous queues are required; this is
meaningless in NVMe over Fabrics, since queue creation is handled
implicitly for each connection, but the spec requires it to be set to 1.
Change-Id: I6b05954eefa6928beecd7a640bbbdbd835c6b69a
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Use the size of the applicable structs directly.
Change-Id: I4a65de548d409c9962b11a75d3fde2bfe434a3ec
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
nvmf_create_subsystem() already copies the name, so the strdup() in the
caller is unnecessary.
Change-Id: I225f0f077fee30051b197a4b1d7276b113ec6b01
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
It isn't actually necessary to drain the cq before
destroying it.
Change-Id: I6f77ae578176a14b5de935274a14cfd165229ec5
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This logically belongs inside the session handling code, not
in the transport-specific layer.
Change-Id: I93b2271f38dbfc742162c98c40acb153c7e9022a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Track and print out the currently outstanding I/O in debug
mode with rdma tracing enabled.
Change-Id: I0a1f0cd6e22dbf21e18ca0ec7d0c2c6d194509e3
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Instead of reimplementing handling for checking the
completion queue, nvmf_rdma_accept can now call
the general purpose poller.
Change-Id: Id2c899d1e500a8cb8491e51cc101a1bf0e167764
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
AER breaks our current model of requests/completion pairs.
Temporarily handle it by immediately re-posting the
capsule while we work on a real solution.
Change-Id: Ie7a4d88030b6fff5a11c4697eec0f024f9737f27
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Inline this code into the places that called it. These two
spots will be combined into a single path in a later patch.
Change-Id: Ice2f009ad56b783dc28ebbf1abbb877ce6000293
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This is an RDMA-specific operation, so hide it inside
the transport-specific layer.
Change-Id: Iaa097e8dde78d820547b3a39e9717c992581340b
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
These can be done at the same time now that the queue depth
is known ahead of time.
Change-Id: I7ecef30ebb4311e0a1c88f37461d34534f8600bf
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Calculate queue depth into a local variable without
touching the rdma_conn.
Change-Id: Ie804ed39ddecbf59015a4e4f7aa127f1381d9080
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Make sure the trace history that is exported via shared memory is always
the same size, regardless of DPDK configuration.
Also removes the necessity of including DPDK headers from spdk/trace.h
(so we have to fix up other files to include what they use).
Change-Id: I32f88921fd95c64a9d1f4ba768ae75e2ca5d91da
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
It is not currently configurable, but this will allow us to make the
discovery subsystem have config options (e.g. which lcore to run on).
Change-Id: I788a64ba4462b023453191e509ce8de59fd90ae4
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This is a much simpler approach and is only slightly
less efficient.
Change-Id: I909de376d576a74156c1be447e90e7dbc240f025
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Drop the redundant controller ready check.
nvmf_process_io_cmd() was checking CSTS.RDY, but this is not necessary,
since its only caller, spdk_nvmf_request_exec(), is already checking
CC.EN, which always matches RDY in our virtual controller
implementation.
The initialization of status is a dead store -
nvmf_complete_cmd() always writes the full response, and the only other
branch is the return immediately below the call, which also sets status.
Change-Id: I1ec2b8a225a91c4b2997d8ab4f45d050cc216de3
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
No reason to use DPDK in this file just for an equivalent to assert().
Change-Id: Ic6932a16d0a36cd1a3cb25c8cc5e295c59f3e2db
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Temporarily set the in-capsule data size to the maximum data transfer
length. This should actually be updated by the transport layer, but for
now, the only transport (RDMA) supports the full bounce buffer size.
Also drop the check that prevents admin connections from using
in-capsule data; the host may send in-capsule data for the Connect on an
I/O queue, and we don't know the type of connection until after Connect
is processed.
Fixes: 828dca7 ("nvmf: Move some stray session init code to the right place")
Change-Id: I369ee5497247d7e875ad0b6f0aaf6c47c1d3887c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Make sure no response fields are left over from the previous command in
the spdk_nvmf_request.
Change-Id: I42937e991d9dd6550fd4bc9b6d0dd66b44c6b83e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The kernel driver unloading/loading code is Linux specific; replace it
with stubs on FreeBSD for now.
Change-Id: Ic67c1d89b2fb9a65e9ce5b88d27b6cd6af5554a7
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
spdk_nvmf_request_complete() always sets CID to the value in the
command, so there is no need to set it in the command execution
functions.
Change-Id: Ibbe745b862e27fff7c55e553758ef093e3ef7f6d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Use the passthrough command for all Identify commands except Identify
Controller.
Also only check the CNS field of CDW10 and use the new enumerated names
instead of magic numbers.
Change-Id: Ia94f820ac85a2d6b2d0ae02659e73c53f1b1a4cd
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Drop the special-case preprocessor definition for PCI access library now
that config.h is available with an equivalent SPDK_CONFIG_PCIACCESS
define.
Change-Id: I4891d0f2fd7d3eea51b767df9e594555b36265ea
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
If we connected a subsystem twice from the initiator, the second
connection will be rejected by the NVMf target, however, the previous
connection will also be impacted because we destroy the connection id
before ack the disconnect event.
Change-Id: Ib597cc68a7823524460693053898f4d6e5499eb4
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
There is no need to handle Read and Write commands separately; the
generic raw I/O command case can handle them just as well.
Change-Id: I8475eed0a20bd809c447ed2ccac0b99f6c2a9b4d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Replace use of the newly-deprecated rte_mempool_count() with the new
name, rte_mempool_avail_count().
Also add a compatibility wrapper so that builds against older DPDK
versions still work.
Change-Id: If3c44bdef4bbcf7a456a1dfa272348ccc6f35261
Reported-by: Jay Sternberg <jay.e.sternberg@intel.com>
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The host is not allowed to send normal admin or I/O commands until the
controller is enabled (via the Fabric Property Set command).
Change-Id: Ib62be3a3792fc0b36bace28b4c9afdf78dad3bcd
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Only allow Connect on a new connection (one that has no associated
session yet), and only allow Propert Set/Get on admin queues.
Change-Id: Iae22379ee47b095333372e6d151a7a1509acf654
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The NVMe spec requires that the I/O queue entry size values in CC are
set before any I/O queues may be created.
Change-Id: I4f0c9a9c20411223d281993745c85a8431197961
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Track each individual bit in the Set Property handler for CC, and fail
the request if any unhandled bits are modified.
Also add handlers for IOSQES and IOCQES (I/O submission and completion
queue entry size).
Change-Id: I374dc3c15197e029ba07fd9ee1cff0e38a0a884d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
It is not implemented yet, but add a message to remind us to write it
later.
Change-Id: Ic1c35a0d35f728bc63b38c334d9c622493bee967
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Property Set of CC.SHN is not supposed to terminate the session - remove
the commented-out code that was attempting to do this.
Change-Id: I1db230df9be549764287a8fd45ccdebea1d22a8b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Set CSTS.SHST = 10b to indicate that shutdown is complete, and
CSTS.RDY = 0 to match the state of CC.EN.
Change-Id: Ia651c34427526a38f22cba3910df2cf7d4bedd92
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Explicitly include spdk.common.mk at the top of all lib Makefiles so
that CONFIG options and other predefined variables are set.
Change-Id: I1e560c294fe8242602e45191a280f4295533ae44
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
There is no need to allocate ibv_sge structures within the RDMA request;
we can just fill them out on the stack right before submitting each
request.
Change-Id: I438ff0be2f6d07ffa933255c92c4ec964aa1b235
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Just return success or failure - the actual count was not used.
Change-Id: I26e7c4c6319af444d221d9b0f313fb7071733619
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
All of the WC events that we handle map back to a request, so look it up
before checking the opcode.
Change-Id: I1b70a773374f64387df0a21a4f7fd64b26534b14
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Make sure all tracelogs in rdma.c use SPDK_TRACE_RDMA.
Change-Id: Idc3d3b6654215b5ab3ee84a106e46ffd3019cc7a
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
These NVMf spec structure definitions are the same as the equivalent
NVMe structs.
Change-Id: I21c45973b7843e3767c48f97ec42e7b446df296f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
There was only one function and a structure declaration
left.
Change-Id: I63277b4182120e7a76a925ed0bf7378ec7c23f20
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
These can be simplified and merged into the subsystem.
Remove the concept of mappings from subsystems and replace
it with a list of hosts and ports. The host is optional -
not specifying a host means any host can connect.
Change-Id: Ib3786acb40a34b7e10935af55f4b6756d40cc906
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Make the transport responsible for filling out the fabric-specific
details in the discovery log entry.
Change-Id: I41d871c605becd557dca18f8ef7e80da66950257
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Make the core NVMf to transport interface generic and allow for multiple
transport types to be registered.
Change-Id: I0a2767a47d55999c45f788ae1318bb50af60ab4e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Change the Port configuration file entries to a new format:
[Port1]
Listen <transport> <address>:<service>
Initially, this still only supports RDMA, but the new format will allow
specifying other transports once they are added.
Change-Id: Iadfd19b91db57b571064379368dbe77204ccecbb
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Each subsystem will run on a single core, which is more than enough
to fully saturate a device and a NIC. For now, all subsystems
run on the master lcore.
Change-Id: I95340a262d70fd346fa81fe519e7d4190a369e64
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Instead of starting the connection poller immediately upon
the connect event, wait for the first connect capsule to
start the poller.
This builds toward associating all connections with the same
session with the same lcore.
Change-Id: I7f08b2dd34585d093ad36a4ebca63c5f782dcf14
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
It can be different per fabric interface within a single port.
Change-Id: If13590d7f12291499ccfd705efaf6d2b1b1d7003
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The type is already stored in the fabric_intf.
Change-Id: Icd33dd29f2fa1313329b4053892693c7ff90945d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
For now, it just contains RDMA, plus a raw byte array to allow generic
copying.
Change-Id: I02fe11f99dd8b49000de0dba991cd34c99fd7a4a
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Pull out the duplicated min checks against the ibdev_attr values.
Change-Id: I774c355ba669486afde5c05c55a4ed653723db98
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Set a status code in the response capsule for each possible error case.
Also enforce CC.EN == 1 before I/O connect.
The NVMf spec requires that the controller is enabled before any I/O
queue Connect commands are allowed.
Change-Id: If56d6b4d6bedad00e9e845e77f05f715e3969f8b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Drop the debug print in conn.c that was the only user.
We still have the connect data structure when determining the connection
type, and after that point, the queue ID is not needed.
Change-Id: Ida9e170099f977ec6b84478874863c40d6f7d8a1
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The NVMf target is being refactored to split the RDMA transport-specific
code into its own file. Once this is complete, we should be able to
plug in other transports and build the NVMf target without any RDMA
dependency if desired.
To enable this, change the CONFIG option to RDMA; it still controls
whether the whole NVMf target is built for now, but once the RDMA
dependency is actually made optional, we will be able to build the
generic NVMf target code without libibverbs installed.
Change-Id: I8cd90a9aaa85dcefcc9b0f8f2e7b6af21958b2a8
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Move the configuration file parsing for subsystems
into the configuration file parsing file.
Change-Id: Ie16e73cdc65fae7f2f3c3b22f9cba7f167024fa1
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The code for parsing the configuration file still
referred to a host as an init_grp, so fix it.
Change-Id: Ifa250b09de495dd7d393ccc3557fd6d56a54e790
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This never really made sense, so replace it with a list of
subsystems.
Change-Id: Ie7a9400083c091ac7142d01c23948200f515bdf7
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This is just extra complication for no real benefit.
Change-Id: I528af98e799d0641e753390fe35ff561fa3d7d76
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This allows the custom UIO hotplug driver to be used as well as the
uio_pci_generic driver.
Change-Id: Ica3316ed716827ad305eb4a146d0864d61ff190f
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Linking this new library allows applications using
the framework to be killed via an RPC call.
This only works if the RPC subsystem is loaded.
Change-Id: Ifcf91c212add620fe410589eba5490337c635776
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Linking this library into an application makes the log
configurable via RPC. This only works if the rpc subsystem
is also loaded.
Change-Id: If1340cf2a845ef159290232c26f341150c98fb9a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Use the number of devices returned by ibv_get_device_list() instead of
stopping at 4.
While we're here, drop the unused MAX_SESSIONS_PER_DEVICE definition
too.
Change-Id: I21ca6c6c95b7f2cccc1de4d0a34b95217a522bfc
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This is the only file that calls it, so it can be static.
Change-Id: I47573b7b38b40ad37e758234245eedbe94ae0a12
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
These were internal-only APIs; initialize just checks to see that the
pool was initialized (which is already checked internally), and shutdown
just called spdk_nvmf_shutdown_nvme(), which we can call directly.
Change-Id: I95e1b912d61a38fa9934f58df7b1512678303452
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
These can be isolated in rdma.c rather than being part of the generic
transport API.
Change-Id: Idc2b969a2f7685420cda2f7c4aa12495ffc3fcbc
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Just calculate the required number of requests once and store it in a
global variable.
Change-Id: Iffeb637a3ac5f69ec89989b84f03699bac483b6e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
There can be only one session per subsystem.
Change-Id: I8ba85a5ebd11dd71fda2a4bafa97a0935609379f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
It is just a duplicate of the NVMe library request_mempool.
Change-Id: I2a5484e5d515b965503b2cfcd8d85ccfcb0dee05
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Clean up everything that isn't strictly necessary in rdma.h.
Change-Id: Ied9acbed5f5b64860eae39816cdcb74620009a79
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This essentially turns the current nesting (of RDMA conn inside NVMf
conn) inside out. Now the transport owns the connection structure and
allocates it when necessary.
Change-Id: Ib5ca84e2a57b16741d84943a5b858e9c3297d44b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This sets up the RDMA layer to be able to embed the NVMf conn inside the
RDMA conn.
Change-Id: I5e3714ac8503826504d78d06fb5eaafabd025bb8
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The whole cleanup process is now started by
spdk_shutdown_nvmf_subsystems(). Each subsystem will clean up its
session, if any, and each session will clean up its connections.
Change-Id: I9915d4547751ed4ffc4baa2c45c628698dd0b881
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The per-lcore connection counter was incremented and decremented, but it
is no longer actually read. The lcore allocation should happen at the
session level instead.
Change-Id: I7bdf1b521bfda4892304338d43fad3ed5123c494
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Nothing actually maps the shared memory region, so there is no need to
allocate the array of connections that way.
Change-Id: I3d5eca748f892e37fbb0ec52942f1c510e9f9dc8
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
There is only one controller per subsystem, so therefore
there can be 0 or 1 sessions. Change the list of sessions
to a pointer that can be NULL if no session exists.
Change-Id: I2c0d042d9cecacae93da3e806093faf0155ddd6e
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Subsystems only have one controller, so cntlid
is always 0.
Change-Id: I690a1793ad3a696adbaefca856e559dd0177b11a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This was intended to track the number of NVMe device
queues per session, but there is only one hardware
queue per session. It was conflated with the number
of RDMA queues in several places as well.
Change-Id: I74a1c56a5d395dea8bee4778882821e904cebcf9
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Everything can be done when the session is created.
Change-Id: I7cb38c093b2b1b69460cabba465828eed0cec432
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The cntlid is inside the session, so no need for
duplicate data.
Change-Id: I5669ee6393807959506dfec36a7583af77386fc4
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Since we only allocate workers to the master lcore,
remove the logic that places I/O conns on the same
lcore as the admin conn.
The "right" logic would be to place the I/O conn
on the same lcore as the whole session, and this
patch builds toward that.
Change-Id: I8983b56de41062ec834b0a169ba0fa61326c466d
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Temporarily, only run on the master lcore. This makes
some temporary refactoring possible that is required
to move to a truly scalable threading model.
Change-Id: I13a2e03107a27f8ec18b023b15f653d374a137b5
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
A connection function was initializing some session data, so
move that code to the function that initializes the session.
Change-Id: I5f2d4349585cb97985a7bbd9fb8d6c66eeaa7d4e
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
There was an extra layer of indirection complicating
things for no reason. This removes it.
Change-Id: I8d4e654eb17f8f6ec028d775329794f0745fb0f7
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The NVMf target set the maximum data transfer size(MDTS) to the default value
of 128KB now, and the initiator driver will read the value and set it to the
block layer, so each command sent from initiator will not runoff 128KB.
Change-Id: I1d4f259e887b2fc70c7f1c5406c07c58f7fc9b8d
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
If any completion indicates an error, we need to close the connection.
Change-Id: I50b30aa692ae121932f1baec32f713422ff415ed
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
NVMf does not have the concept of subsystem groups; the (former)
subsystem_grp files really contain structures and functions related to
individual subsystems.
Change-Id: I4b3a64de799fffb29f8685ea4908d754516815cd
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
For a single poll of the completion queue, if the user
submits I/O from within their completion callback and their
completion callback is particularly slow to execute, the loop
could potentially continue forever. To support this, we
need to limit the number of completions we'll process
in one batch.
Change-Id: If6bae47e52b36347dbe5622ace68c866ee88a0b2
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Create a list of valid properties with get and set callbacks (set is
optional to allow read-only fields).
Remove handling for fields declared as "reserved" in the NVMe over
Fabrics 1.0 specification.
Also simplify the vcprop structure to only contain the required fields.
Change-Id: I14d3ddfd008c62b75fce8e64d193c87fb6f7b5ad
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This is intended to be used for examples/nvme/identify and similar
diagnostic utilities.
Change-Id: Ib2f941e9af7a3fb7555865ef253742e30ccad2b5
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Multiple NVMe controllers within a subsystem does not work correctly,
since we would need to virtualize the controller data, namespace IDs,
and so on. For now, only allow pass-through mapping of a single NVMe
controller per subsystem.
Change-Id: Ib2d3576d2856c46a086f38eb6bec56f3e7a73575
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Previously, we used cap_lo and cap_hi to represent the 32-bit halves of
the full CAP register. However, it is simpler to keep them in a single
64-bit structure, and is no less efficient on 64-bit platforms.
Also name the NSSRS field from NVMe 1.2, which was previously reserved.
Change-Id: I1d5d9b0dccbb12373b4aed3db29c883881d43223
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The bb_sgl must follow recv_sgl make the logic obscure.
Change-Id: I8d47477986efd8f2d4ed964ab9373b7f157af274
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
Make sure the reactor mask in profile take effect.
Change-Id: Ia471b2b88a711f05738cf93068c4f3a8c9a3039d
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
Admin commands technically don't allow inline data,
but there is nothing from preventing us from posting
a recv buffer that could handle inline data. It just
won't be used for incoming admin capsules.
Change-Id: I3e7e4406e01ab870654a166d52221c11fc0ac683
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
We need to bind to each port declared in the config file; there is not a
single global port number.
Change-Id: I41c315588078d131c32cb145d22314047505c95c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The access to the NVMf IOCCSZ (I/O Queue Command Capsule Supported Size)
field in the Identify Controller data was incorrect.
Change-Id: I23b0aa175de8e5d8a0220e9c35e0cb6868121cb5
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The maximum in-capsule data size is determined by the I/O queue bounce
buffer size, and there is no point in limiting it beyond that, so remove
the need to configure it.
Change-Id: I64806516b847e819f57ac9f62a162f7a04805b57
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
4420 is the officially assigned IP port from IANA for NVMe over Fabrics.
Change-Id: I433a5ed0780d1ffd7ca6512617759d59fa5e8def
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The queue type and queue depth are not known until
the connect capsule is processed. Delay allocating more
than 1 recv wqe until then.
Change-Id: I0e68c24bc3d6f37043946de6c2cbcb3198cd5d1b
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Currently, the recv wqe is re-posted immediately. This
closes a small window where we could get more I/O
than we could handle.
Change-Id: I9b0b1f0cc526069033b9e04f170195c4fb130e37
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This is going to be used elsehwere in teh code, so
name it according to the public namign convention
and make it public.
Change-Id: Id5fd57e78e146f3235741a251bb30244d6530f2c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This is going to be used elsewhere in the code, so
name it according to the public naming convention
and make it public.
Change-Id: I0dcb88e902c5e609fe6acd06ad06743203fcaa60
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Break out the code to allocate a single rdma request
to be used elsewhere.
Change-Id: I687ce5ec862831fed5300157bfb4bf980d22c782
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
When Debug is not defined, SPDK_TRACELOG will do nothing,
thus cmd_type is an unused variable, and will trigger the
compilation warnings. And this patch will solve this issue
Change-Id: I821f7601a16c98e514227aee2e18fbfa61928bea
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
The queue depth allowed for incoming commands is set
such that we can do the maximum number of RDMA reads
necessary. There is never a case where a READ will need
to be queued anymore.
Change-Id: I4f7e7f4a59f6358065b82f36a5e22744af210d07
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
There were 4 variables tracking queue depths. In reality,
only one is needed once the minimum is computed correctly.
Change-Id: I9bb890e92a33a3c7bd6e27cbd31d6bee7ca0cf3d
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
NVMe over Fabrics defines its own NVMe Qualified Name (NQN) format; it
does not use iSCSI Qualified Names.
Also change the default node base for nvmf_tgt to "nqn.2016-06.io.spdk".
Change-Id: I2b73c1426ef1d8c83cc2df499d79228ea61257cd
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Fix the sizes of the UUID fields to match RFC 4122.
Change-Id: I1458a22579f455cde0a67ee3ce616e78d5c810c2
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This will allow removal notifications to be propagated to the library
user (e.g. for hotplug).
The callback is currently unused, but this at least prepares the API for
the future hotplug support.
Based on a patch by Dave Jiang <dave.jiang@intel.com>
Change-Id: I20b1c2dbf5e084e0b45a7e51205aba4514ee9a95
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Use the knowledge that both the source and destination of
nvme_copy_command() are aligned to emit the aligned variants of the
SSE2/AVX mov instructions.
Change-Id: I0a7e32a3bb10b9a1920cd85691b79fa7172eecb3
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The function call of spdk_nvmf_check_pools can be
directly put in nvmf.c.
Reason: This pool is created by nvmf subsystem,
it should be recycled by this subsystem.
Change-Id: I49e49bcb56079fc25d26b1f5078a1808c2f8e189
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Drop the RDMA-specific fields from spdk_nvmf_request and get them
directly from the command SGL in the transport-specific read function.
Change-Id: Icd06a9018a8c341213fbc8d26d3d7cbf2fb32d30
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The connection will be closed in these cases anyway, so just let the
normal connection cleanup deal with the active tx_desc.
Change-Id: I96c68d5802e189bb82b180cc3c7d7c3f4135be1f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
If the transport poll routine fails, we need to close the connection.
Change-Id: Ie534b0f05e6642c31e0450865e309a784abbe744
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
If spdk_nvmf_request_exec() fails, the connection will be closed anyway,
so just leave the tx_desc in the active array; it will be cleaned up in
the normal connection cleanup path.
Change-Id: Ie4f60bd6001658403dd7e1c6a47d40be756ef6f2
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
If an invalid SGL is specified, send a response with a status code
indicating what the error was rather than silently dropping the command.
Change-Id: I12d1fd847d3bc0ea8de7698e934626c2586a7452
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Make all command processing functions return a bool to indicate
asynchronous (false) or synchronous (true) completion.
Change-Id: I7c2e4d28fa473b36ff26c902e4bb69f38b64d18d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Only use SPDK_TRACE_RDMA within the RDMA transport code.
Change-Id: Ie15fd24bb142a68f3661929267ebe396b556c351
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The error case could only be reached with tx_desc != NULL in one case,
so move the cleanup code there and drop the goto.
Change-Id: I7aace6b40dd75ef8d86fb173f9d58110e929b082
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Also split the generic nvmf_trace_command() function out of
the RDMA-specific handler and move it to request.c
Change-Id: If29b89db33c5e080c9816977ae5b18b90884e775
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Also finish up the req_state -> req conversion.
Change-Id: I131dd52dcd36a790b942e06f0207a3274cc04ffc
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The overflow condition can't happen unless there is a programming error
in the nvmf_tgt library; we can only possibly receive command capsules
(sq entries from the point of view of the host) if we have posted a RDMA
Recv for the command capsule memory region.
This means that we also don't need to track sq_tail in the NVMf library.
Change-Id: I101509080c744528871e72fa46d188e2850c928a
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The "immediate completion" cases in spdk_nvmf_request_exec() already
call spdk_nvmf_request_complete(), so the ret == 1 case in nvmf_recv()
is bogus.
Also fix a couple of spdk_nvmf_request_complete() calls in
nvmf_process_admin_cmd() that should be handled by its caller.
Change-Id: I41b865d5e6e7fec08087faf9c6f3da3b057a5fb2
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
These are not supported (and did not actually function) in NVMe over
Fabrics. Queue creation is handled automatically when new connections
are initiated.
Change-Id: If3a10e5df2f0625537b2c453cd8c835e570fa31e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Move toward making request.c transport agnostic.
Change-Id: I25fbe74fff21a5c23138e1a6e2d40bc6a4a984ec
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Make nvmf_post_rdma_read() interface generic (don't require a tx_desc).
Change-Id: I331a93eed4bb1912a47a88bb904cf392fcc364c6
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This fixes an oversight that allowed in-capsule data block SGLs to
potentially refer to more than the received in-capsule data size.
It also makes spdk_nvmf_request_prep_data() less dependent on the
RDMA-specific rx_desc/tx_desc structures.
Change-Id: I34d61aca4cf5ba033849673116d16ec90488dcd4
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
It is the same as spdk_nvme_cpl, aside from reserved fields.
Change-Id: I62b0718dd58c998b4d26a0d1b44ee16d37eff25d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The RDMA read and write commands can determine the desired length based
on the nvmf_request length field.
Change-Id: I97b63289556e7de3c19c5a17ecbacbbbdfc10425
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Replace the generic "msg_buf" naming with command and response.
Change-Id: I19baff43b41a5eb7db9be9d7feec33d17112e320
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The mempool functionality is never used at runtime - all bounce buffers
were immediately assigned to a rx_desc.
Change-Id: Ie2195059858e34b30b07e104739f046c13abc335
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The RDMA tx_desc and rx_desc pools were only used at startup; all
descriptors are immediately allocated and put into a queue, and the
mempool functionality was never used at runtime.
Change-Id: I2882274962550191a555c8483b8f7be2854b32ec
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This is an implementation detail of the RDMA layer.
Change-Id: Ib97d6fbd593789eed0b6e746972b8882a3320995
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This code is operating on a list owned by the RDMA
connection, so move it to rdma.c
Change-Id: I8b81f9d1ffc1df489c9b698969725ed0d1db6a06
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
These are an implementation detail of RDMA, so move
them into the RDMA portion of the connection.
Change-Id: I68d146019c5d78fbf5e9968abfd7baed2a54a2ed
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Separate out the RDMA connection from the
NVMf connection. For now, the RDMA connection
is just embedded in the NVMf connection, but
eventually they will have different lifetimes.
Change-Id: I9407d94891e22090bff90b8415d88a4ac8c3e95e
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This structure will be expanded in future patches.
Change-Id: Ibb04917134243560e09a2a255844739eb33fab65
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The request needs to know which connection specifically
it is associated with.
Change-Id: I492b9968b4d2e307b5af44edee0778478b32d2ba
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
They each only had 1 function left that belonged
in the session.c file.
Change-Id: I405902b02e9316d2dc02d3732d8bc085c2b84d31
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Only move nvmf_request definitions from nvmf_internal.h
for now. Subsequent patches will move more.
Change-Id: If47472542515fd050cc78d95540eb25beee59d2a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Fabric commands were skipping a step, so unify all
types of requests through the same completion path.
Change-Id: I5f38a7e1cdcdf33baf71486d5ddae9f5a6157fac
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The nvmf_request structure holds the pair of pointers
for rx_desc and tx_desc.
Change-Id: I3e735979bbdcdc0e70ad78762e289849d41158ba
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This moves some definitions from nvmf_spec.h to
nvme_spec.h based on the latest publication.
Change-Id: I51b0abd16f7d034696239894aea5089f8ac70c40
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Swap the order of checks in the failure check - if rc is not 0, addr may
be garbage.
Change-Id: I110710efd00397c777d59ac8b219ba3cc2156596
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The nvmf_request object is generic and is mapped 1:1 with rx_desc.
Change-Id: I397224a3859c3c93d6eca99f7ba7c53ce7963f57
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Instead of searching the global list of connections to find a matching
cm_id, we can just store the pointer back to the spdk_nvmf_conn in the
rdma_cm_id context field.
Change-Id: I39ea16be6a633a1136d65743747b63b600f20e63
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
All of the variables are private to conn.c, so they don't need global
visibility.
Change-Id: I7c24cfc6249a9f8164b162b4b8de0e24c452e0df
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
It is always set to nvmf_process_async_completion and is only used
within the library.
Also rename nvmf_process_async_completion to spdk_nvmf_request_complete
to clarify its purpose.
Change-Id: Ie737fb60688329bfe329a8553c4a40ff2e5f8f1d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Create spdk_nvmf_request_prep_data(), which handles SGL processing and
data transfer for all command types, and spdk_nvmf_request_exec(), which
executes a command after data transfer has completed.
Change-Id: I51c2196260dd0686db8acca4d8f7c93e17618c2f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The pending type can be determined based on the command opcode.
This also moves the "issue pending RDMA reads" case out of the I/O queue
handling into the generic continuation code; this should not make any
difference for the current case, since the Fabrics Connect command is
the only other continuation case currently, and there cannot be any
pending RDMA reads in that case.
Change-Id: Idddfa496b6e5b7e6da772aa3ab1b9d1a5344771f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
nvmf_connect_continue() no longer needs the RDMA-specific tx_desc.
Change-Id: I95f6938063e9853aa7dcd419f488b91422ff9b60
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Let the calling function handle the tx_desc if nvmf_connect_continue()
fails.
Change-Id: I25a8cbc4c3be0608bcec8db2fb8c50e55fbe3e8c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Everything necessary for processing an admin command is now stored in
nvmf_request.
Change-Id: I74e75a5b7bb3b406ad167c2b31cab1af7a1f270a
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Everything necessary for processing an I/O is now stored in
nvmf_request.
Change-Id: I3f390707ebe83ea66a116dcfda4d0388a6823629
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Keep a pointer to the local bounce buffer in the transport-agnostic
struct nvmf_request rather than groveling in tx_desc/rx_desc to get it.
Change-Id: Ic328d8e2b3a15759ccb149a89fb3562e928ca500
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Now that we have xfer to track data direction, the length field can be
populated correctly for all transfers (including in-capsule data).
Change-Id: I7b2228f3fac80aab983a4103ba095c7bc38e0b21
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This field is used to decide whether data needs to be transferred back
to the host after a command is completed.
Previously, this was determined using the length field, and length was
cleared to 0 after a transfer was completed. However, length will be
used in future patches after a host to controller transfer completes, so
we need some other way to tell what kind of data transfer is required.
Change-Id: I6b27cf7816908394735fc95c15bd5eb40a7c0157
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
vtophys() was checking for out-of-range addresses incorrectly: vfn_2mb
is already shifted to account for 2 MB hugepages, but it was being
compared with a mask that did not account for the shift. This would
allow out-of-bounds access to the 128tb map array for certain invalid
addresses (it had no effect on addresses within the valid userspace
range).
Change-Id: Ida7455595e586494c9025f9ba65d050abb16b1b9
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Other subsystems can depend on this one and then use
SPDK_RPC_REGISTER to register RPC functions.
Change-Id: I557f774331ce7146d299d06b3f81426e2103a11f
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Since we used u64 to mask CPU cores, the available number of CPU
is 64, for default RTE_MAX_LCORE in DPDK, the value is 128, in some
cases(e.g.: when nr_io_queues > 4) we can get the wrong lcore ID.
Change-Id: Icc334b1bf5b068a310839118be341e61071cff65
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
A transfer using less than the total bounce buffer size is a normal
occurrence and not worthy of a tracelog. Also drop the pointless
conditional.
Change-Id: Ibcdcf693fea439d5034fa51b08b3fbd8fd7df8f2
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Avoid using the RDMA-specific tx_desc when the transport-agnostic
nvmf_request will suffice.
Change-Id: Id35bbdfb353cb72e0feb4f5af19e5bd5c86d3ff4
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Should set tx_desc=NULL here. If not,When
nvmf_post_rdma_recv(conn, rx_desc) fails,
we would make tx_desc deactive again, and
this is wrong.
Change-Id: Ieabc7e3864b7f124b003d052f66ab8799a1d632e
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
The variable cur could be null, so if we use do while
will cause a segment fault
Change-Id: I19ec26e88948a0c3fd957e03e717b68750f40c62
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
The conventional rule for returning errno is negative, hence there is no
need to modify caller's code to adjust this NVMe library.
Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Signed-off-by: Minfei Huang <minfei.hmf@alibaba-inc.com>
If we just pass NULL to rdma_create_qp, it will do
the right thing.
Change-Id: I9621a5110ace6237a1e47c6e5defb4cac3afc4ae
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The wrappers are much simpler to use than the low
level ib verbs calls.
Change-Id: I4b09a96a60020bc27df9396d40d955733f618837
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
They are only ever called in sequence and do related
operations.
Change-Id: I825abe08deba1dafb405757bb4f2d52062a801ca
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This enables SPDK_NVMF_BUILD_ETC to be moved out of the library as well,
since only authfile was using it before
Change-Id: I10d1145881f9a0358d7effe2d2d9851899413e1b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
SPDK_NVMF_BUILD_ETC will be cleaned up in another commit; it is
currently used both in the lib and in nvmf_tgt.
Change-Id: Ibc5f15cc4341f9d52b29c84defcd332bec4a4d09
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Most of the #include statements in nvmf.h aren't part of the public API.
Change-Id: I0d43dd542a28744a91a4fd0c4c806a991d1e194e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
It is not part of the NVMf library's public API.
Change-Id: I665d5713343c9185cbdadaef4fedfdc83b8232d6
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
There is only a single global g_nvmf_tgt that can be passed to this
function, so remove the parameter and use the global directly.
Change-Id: Ia1a2a1e6cd3801101ddeb4de5526dd115fa7ef8f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The section is really defining a subsystem as defined
by the NVMf specification. There does not appear to be
any need for a group of subsystems.
This change only updates the configuration file. It does
not remove all references to a subsystem group from
the code.
Change-Id: I38e62735a5ac924dcafacb3c9a332a103d751d4a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The specification refers to this concept as a Host,
so use that term. This only changes the configuration
file usage. Initiator groups are still referenced in
the code and will be removed later.
Change-Id: I897f4dbdfb65d94da1e5a77434fc07a2c18bcdc2
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Eliminate the misconception of reactor, it actually not a subsystem.
Change-Id: I63ea46f0dfa34661f16526a71c47e8fba9813474
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
The index should be 0 for fabricintf.
Moreover, when there is no fabricintf found, error should
be returned
Change-Id: I3aa04566a5a318b8c921dd37c8573ed075254266
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
There is no logical split between nvmf.c and framework.c, so combine
them and drop nvmf.c.
Change-Id: I91230c01ed7f171bfed04456b0bfcf0e7ddbc263
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The mutex is initialized, but otherwise is unused.
Change-Id: Ia68adbd430fad391cc465c07dd6e937e90dd2c5c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The code that actually removed items from the list was removed in
addition to the free() call, which caused a hang on shutdown.
Change-Id: If0e843d0d0ebfa28638b12104da880e70b3e548a
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Previously, there was no way to determine what namespace ID was assigned
when a namespace was created via the NVMe library interface.
Also drop the incorrect comment about calling
spdk_nvme_ctrlr_process_admin_completions(), since
spdk_nvme_ctrlr_create_ns() checks the admin queue internally.
Change-Id: If90a6e9fc773aefa220ebbf6effc2d033c9f20cc
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This lets spdk_nvme_qpair fit in 128 bytes exactly.
Change-Id: I7c42582f22ece72a7f1d651468e63d4fe05babd6
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Avoid potential double free cases.
This fixes a clang warning during scan-build.
Change-Id: I487d6fcd485d1f8ebb96b6f8cb54511628461f39
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
All trackers in outstanding_tr should have a non-NULL request. Add an
assert to verify this.
Fixes a clang warning during scan-build.
Change-Id: I0ac4d2bad17449f684808cbb98777627d890b65b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Document the requirement that qpair and cmd can't be NULL.
This placates clang, which previously generated a warning
during scan-build.
Change-Id: Ic2d5e808faee0028c890ce1312444fb3dc95f223
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
These allow linear searches of the configuration file
sections.
Change-Id: I8d8b9594bc8a974c16d999689a6195434c1efac8
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
The user can determine whether submission queues will be placed in the
controller memory buffer by checking the controller options use_cmb_sqs
flag in the attach callback.
Change-Id: I8a925ef99a48665a0e2ffaa90d9ff2b79b90b2fa
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Add a special "all" trace flag name to set or clear all registered trace
flags.
Change-Id: Ib579df7c41ce4aca72174e04734df20f2752035c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The D3700/D3600 series support Controller Memory Buffer(CMB) feature,
CMB is available for holding submission queues, for those controllers
which can support submission queues in CMB, user can set the option
whether to enable it or not.
Change-Id: I8b0dc9e28dd6f5bb01bee99a532087212c04e492
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
When recursively calling spdk_json_write_val() on another object or
array, the child call will handle printing out the whole
subobject/array, so the parent call should skip over all of its values.
Also fix the return value for the array/object case - if we get to the
end of the array or object, we should return 0 for success.
Change-Id: I1da80c88ab8759620114c1ab141baaaaf9f0023a
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Resolve relative paths before using them to clean up command lines.
This should also help shorten the overall command line length that gets
embedded in the binary and used when locating the executable from a
coredump.
Change-Id: Ibff9849ede198bb04313496c8b7131485ffaf14f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This patch add support for Intel specific log pages :
marketing description page.
Change-Id: I87bccb2af286279598c9dd3c870094b384a0d2f7
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
For existing \file markers, move them to the top of the header and tweak
the wording for consistency.
Change-Id: Icce748effe4dbe97d79a8c87d31caf0ee5797058
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Function to trim leading and trailing whitespace from a string.
Originally based on code imported from istgt.
Change-Id: I87abe584130bdf4930098fadb8e57291f18eda7f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Parsing function for delimited strings with embedded quotes.
Originally based on code imported from istgt.
Change-Id: I448feb53ea232048ed8c68738e12bc3660eb4235
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Add function to convert string to lowercase in place.
Originally based on code imported from istgt.
Change-Id: Ica9fe2208e6ee09b22c9a652a33c5affe5be23cc
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
For those controllers which can support end-to-end data protection
feature, add the support in the driver layer.
Change-Id: Ifac3dd89dec9860773c850416a6116113a6ce22a
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
When using DPDK PCI support via VFIO, the PCI device is reset
immediately before calling the PCI driver's init function. In some
cases, the device seems to not be ready to handle MMIO accesses right
away. Until the cause of this issue is fully understood, add a 500 ms
sleep as a workaround.
Change-Id: Ic893080a6f34d57eee80df3e6aa68c220c08df3e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
All children must be removed from a parent request before the parent is
freed.
Change-Id: I073ff0e9c5bcdd6181d90b918bfe4cce054f6c0b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Rename nvme_remove_child_request() to nvme_request_remove_child() and
move it next to nvme_request_add_child() to make the symmetry clear.
Change-Id: I78747c44ab3db1a656b33555a45f634dc5a55b31
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The NVMe specification recommends destroying all I/O submission and
completion queues before setting CC.SHN.
Change-Id: Iad71dd3fe03d897858034f3ca6ee02e0c55cc2b0
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The NVMe specification recommends that orderly shutdown should just
write CC.SHN while the controller is still enabled rather than writing
CC.EN = 0 first.
This also allows removal of the now-unused nvme_ctrlr_disable() and
nvme_ctrlr_wait_for_ready() functions.
Change-Id: I4702ffda153f218ebb8ed92f0e36144b7ceded93
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This can happen if the controller is still resetting as the SPDK NVMe
driver takes control.
Change-Id: I263ae8f2e7b271e0448450557452a115c90c4fb6
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This patch is used to handle the memory leak issue when
a parent nvme_request is free. In our current code,
we did not free the nvme_request allocated by the children
in the exceptional case.
Change-Id: Iabd1f1c3594af60c38e74e3d96c14f78d1aa1aed
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
This patch is used to add a nvme_request remove child
helpler function
Change-Id: I1e5bb228d53333ca3601f4ae30fcd801ea39e532
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Drop the "data_transfered" variable and just update length, since length
was not used otherwise after this point in the loop.
Change-Id: Icd2991e4e85de7e8c951ba14c441434e871ea4ef
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
If only one descriptor is needed, there is a special case in the spec for
SGL1 using the Data Block descriptor type. Add a comment to make it
clear what is going on.
Also tweak the SGL1 setup to copy from the first SGL descriptor list
element instead of relying on the last value from the loop above, since
that could be easily broken by accident.
Change-Id: I49ef97fe5bf18d2bf1d86b4310a7d3abdfd03e57
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
tr->u.sgl is already a struct spdk_nvme_sgl_descriptor pointer.
Change-Id: Ie2c8c052fc28e6369d1d095b8d566acae47975d1
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
_nvme_qpair_build_hw_sgl_request() will only be called for payload_size
!= 0, so every SGL will have at least one segment. Drop the 'else' that
was handling nseg == 0, and add an assert to document the payload_size
requirement.
Change-Id: I48e2a862a7657ba85605c0d35c0b65dfac072167
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The assert is checking a variable that is invariant within the loop, so
move the assert up to the top of the function.
Change-Id: Iee7eea1736bc7f953665feb390c3d6340dbeffbc
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This makes it easier to find the larger doc comments that produce separate
pages.
It also allows removing the lib/nvme directory from the Doxyfile, so
only the public API headers are used to generate documentation.
Change-Id: I8c46edb8067a91dda5b23fb0864efd3dd8aaeba5
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
spdk_nvme_qpair_process_completions() is already documented in
spdk/nvme.h, so merge the doc comment from nvme_qpair.c into the public
header.
Change-Id: Id7722d99d209852ee64286e0a3fa127b863e10aa
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Don't allow the user to request more than the valid maximum number of
I/O queues (65535) or 0 I/O queues, since this can't be encoded.
Change-Id: I2d6e0bba03476085842bad683b273cdf9d6e6d5e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Per the NVMe spec, SGL segments must be Qword (8-byte) aligned. Add a
static assert to make sure this is true for the sgl member of struct
nvme_tracker (assuming the whole nvme_tracker is at least 8-byte aligned).
Change-Id: I827aa40b56de648d83f524a4f1e79c3202b676be
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
If the controller is failed, attempting to submit additional I/O is
futile - it will be immediately failed using the completion callback,
which can result in infinite recursion if the application code resubmits
I/Os on failure.
Instead, provide a way for request submission to indicate failure, and
use it to exit early if the controller is failed; this can only happen
when a reset failed (timed out).
If a request is submitted directly by the user when the controller has
failed, we can return an error code directly. For the case where I/O
was queued and is being resubmitted after a reset, we still need to call
the completion handler via _nvme_fail_request_ctrlr_failed().
Change-Id: I9e144328d524b25db2acf48e923b584746e8d0b6
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Provide a new structure, spdk_nvme_ctrlr_opts, to let the user modify
the default controller initialization options during probe/attach.
Currently, only the number of queue pairs can be modified in this way;
other options will be added later.
Change-Id: Ie27b9429291d93a9353c0d820f0ad467d3b0e7cb
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Add a parameter to each I/OAT library function that requires a channel
instead of implicitly using the thread-local channel registration model.
I/OAT channels are already reported by the spdk_ioat_probe() attach
callback, so no infrastructure for channel allocation is necessary.
Change-Id: I8731126fcaea9fe2bafc41a3f75c969a100ef8f0
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Replace the previous code that allocated each tracker individually with
one large allocation per queue pair.
struct nvme_tracker is now explicitly padded to reach exactly 4096 bytes
to allow normal array indexing to work correctly while maintaining the
alignment requirement that ensures each tracker's PRP list does not
cross a page boundary.
This also allows removal of the act_tr array, since the tr array can be
indexed directly now, and each tracker can store its own active state.
Change-Id: Ia7c51735b96594d12f7f478cefcc4aedc84207ad
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The previous method for registering I/O queues did not allow the user
to specify queue priority for weighted round robin arbitration, and it
limited the application to one queue per controller per thread.
Change the API to require explicit allocation of each queue for each
controller using the new function spdk_nvme_ctrlr_alloc_io_qpair().
Each function that submits a command on an I/O queue now takes an
explicit qpair parameter rather than implicitly using the thread-local
queue.
This also allows the application to allocate different numbers of
threads per controller; previously, the number of queues was capped at
the smallest value supported by any attached controller.
Weighted round robin arbitration is not supported yet; additional
changes to the controller startup process are required to enable
alternate arbitration methods.
Change-Id: Ia33be1050a6953bc5a3cca9284aefcd95b01116e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Since the nvme_tracker struct was extended to allow space for 253 SGL
descriptors at 16 bytes each, we can use the same amount of space in the
other branch of the union to store 506 PRP list entries at 8 bytes each.
This increases the maximum supported I/O size for PRP-only devices from
128 KB to slightly under 2 MB.
Change-Id: I2b9905be41343ff360b4cdaccca87ea6f753e89c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Also add a compile-time assert to make sure this doesn't accidentally
break again in the future.
Change-Id: I4d18cfbf21392291e1bdd76eff055429009d28d6
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
nvme_ctrlr_cmd_create_io_cq() was using the queue ID as the
IV (Interrupt Vector) field in the Create I/O Completion Queue command.
Since the SPDK NVMe driver does not enable interrupts, this is
misleading at best.
Change-Id: I3ea53701fdb9f21d9dc8d8fe20ccf2833b76cfbf
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This patch just implement the feature of format progress indicator.
the NVMe available does't support FPI currently.
Change-Id: Ie937591fb1720d8a062354322aabcc95ff14b2d3
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
This field is write-only in the current code; the NVMe library does
not track timeouts on requests.
Change-Id: I50e53bb3c299bf16912c48be8aad3eec829154af
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
When I/O spans a stripe boundary, the driver splits the request into
multiple requests, so for 1 segment memory larger than the stripe
size, we also need to split the segment memory.
Change-Id: I22ea5734d7066865a57a3c90fe18d5f76f373f1d
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
For those NVMe controllers which can support SGL feature in
firmware, we will use SGL for scattered payloads.
Change-Id: If688e6494ed62e8cba1d55fc6372c6e162cc09c3
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
nvme_spec.h already has a structure with the correct bitfields for the
CSTS register, so use it in struct spdk_nvme_registers.
Change-Id: Id0663aee2611fb5195f9012a3176799e32701bb0
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This will be exposed in the public API. This rename is in a separate
commit to ease review.
Change-Id: I1b7fef36f85265db27935ac4d22ceef3c7282502
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Set up the infrastructure for creating I/O submission queues with
variable queue priority (QPRIO in Create I/O SQ command).
Currently, this is unused, since we always use the default arbitration
method (round robin), but it will allow reinitializing submission queues
with the correct priority once weighted round robin is supported.
Change-Id: I425003879e624cfcc9687bdc495b5c1726b5a8af
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This function no longer exists and was not part of the public API.
Change-Id: I94fd066b63e812367687d11bc00aa11ab88d4671
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Many of the internal controller initialization functions did not check
for allocation failure; add return codes and check them where
applicable.
Change-Id: Id1b33bb06fca84035369d8b7ecd4c36b8ba7134c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This function returns true if the namespace is active or false if it is
inactive (e.g. no namespace has been attached to the specified namespace
ID yet).
Also use the new function to add checks in the examples and tests where
applicable.
Change-Id: I35465b315ae1a1677c5a82191ad9b1da1c216d50
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Prepare for qpair to be exposed as part of the public API.
Change-Id: Ia63e863e95554adceeade20c829f12fe346375d5
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
When multiple NVMe controllers are being initialized during
spdk_nvme_probe(), we can overlap the hardware resets of all controllers
to improve startup time.
Rewrite the initialization sequence as a polling function,
nvme_ctrlr_process_init(), that maintains a per-controller state machine
to determine which initialization step is underway. Each step also has
a timeout to ensure the process will terminate if the hardware is hung.
Currently, only the hardware reset (toggling of CC.EN and waiting for
CSTS.RDY) is done in parallel; the rest of initialization is done
sequentially in nvme_ctrlr_start() as before. These steps could also be
parallelized in a similar framework if measurements indicate that they
take a significant amount of time.
Change-Id: I02ce5863f1b5c13ad65ccd8be571085528d98bd5
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This check was dead code, since both places that called
nvme_ctrlr_wait_for_ready() could only ever have cc.en = 1.
Remove the original nvme_ctrlr_wait_for_ready() wrapper and rename
_nvme_ctrlr_wait_for_ready() without the underscore to replace it.
Change-Id: I6c9aa6a5b93606fb89d168c23f6735fcf3a84eaa
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
In nvme_ctrlr_hw_reset(), if we encounter a controller whose CC.EN bit
is already 0 (controller is disabled), the previous code would enable
the controller just so that it could be disabled to get a full reset
(transition from CC.EN = 1 to CC.EN = 0). However, it is a safe
assumption that if CC.EN is already 0, the controller has just been
reset, so we don't need to reset it again.
This saves a significant amount of time (2+ seconds per controller with
Intel SSD DC P3700) during initialization for devices that were disabled
on startup.
Change-Id: I552b1f0f185a84a8a0ce57a93b012d9d5fe096f3
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
cleanup.sh and unbind.sh have been combined into a single
setup.sh that takes one optional parameter (reset). If no
parameter is given, the script will automatically bind
all NVMe and IOAT devices to either uio_pci_generic
or vfio-pci, as appropriate based on IOMMU settings. If
the reset parameter is given, the devices will be bound back
to the appropriate kernel drivers.
Change-Id: I25db3234f1ecfb352a281e5093f4c1aa455152ae
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Explicitly include system headers for types that are used in public
headers.
These were being pulled in by example code, so SPDK itself would build,
but other apps that did not include stdbool.h would fail to compile when
including spdk/nvme.h.
Also include nvme.h first in nvme_internal.h so this case gets tested
during normal compilation.
Change-Id: I8ed0fc4e0dcf71551738c461b4b825cc2ee1d233
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
payload here is a pointer to the buffer, not a struct nvme_payload.
Use nvme_allocate_request_contig() and pass the length in bytes rather
than dwords.
Change-Id: Idbbb3614b1d69148fe041d26e0c148bd9ce53724
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The parent field is in the cache line of nvme_request that is only
supposed to be accessed for split (child) I/Os. All accesses to parent
are done from child-specific calls now, so it does not need to be
initialized in the common case of a non-split I/O.
nvme_request_add_child() will set parent when splitting occurs.
Change-Id: Ib86c16ba1ea2ce32f62079831101da2a099047af
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Allows simplification of _nvme_qpair_build_sgl_request(), which does not
need to know whether a request is a child or not.
This also removes a read of req->parent for non-split I/Os; the parent
field is in the section of nvme_request that is only intended to be
initialized for split I/Os, which should be detected by looking at
num_children.
Additionally, this fixes a potential problem if requests were nested
more than one level deep (e.g. req->parent was not the original user
request).
Change-Id: I3ea1dc134bbb1e3b8c6b5a479f5d760bd97ea848
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This patch fixes the following issues in ioat_impl.h
while CONFIG_PCIACCESS is set to n:
1 lack of the definition of ioat_pcicfg_unmap_bar.
2 Define a macro to remove the duplicated vendor code
for ioat_driver_id enumration.
Change-Id: I4011098ac296d1ec320bffb5ffa6e098b70a5545
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Building the tip of the spdk master against the dpdk-2.2.0 fails with
inappropriate RTE_CACHE_LINE_SIZE error. The simple reversal of the RTE
include file order below fixed it for me.
Change-Id: I8782b7ee21d7f185e6e678f874fbdab9403117a5
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This patch is used to remove the duplicated code.
As we found the structure in "ifdef and else" are same.
Change-Id: I1717ce3dcc14134ac59c165d801e5e811b987be5
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
This patch is used to support spdk_pci_device_cfg read/write
with byte and word size.
Change-Id: I49084e231bd6b5f5b22180a3eb36ddad4430b3a4
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Move toward collecting PCI IDs, class codes, etc. in pci_ids.h instead
of individual device-specific headers.
Change-Id: Icff162d48ac663db71d0576ceee16a9bd7a751cd
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
PCI_VENDOR_ID_INTEL -> SPDK_PCI_VID_INTEL
Also change the inclusion guard macro to be consistent with the other
SPDK headers.
Change-Id: I29346267172cb8c07cc4289eed4eca2d55e942d6
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This doesn't need to be part of the public API. It is only used by the
NVMe quirk lookup tables.
Change-Id: I7662e333c70b7c5f814bd7c8a528b6bff1f0732e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Rename all functions with a spdk_ prefix, and provide enough of an API
to avoid apps needing to #include <pciaccess.h>.
The opaque type used in the public API for a PCI device is now
struct spdk_pci_device *.
Change-Id: I1e7a09bbc5328c624bec8cf5c8a69ab0ea8e8254
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This is a step toward abstracting PCI access so that libpciaccess can be
swapped out more easily.
Change-Id: I5491459460cbfbd0be471f70f9d07a7eb3175234
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Instead of writing the completion doorbell once per completion,
just write it once at the end of the completion while loop.
This reduces the number of mmio writes by coalescing several
writes into one when we get multiple completions at a time.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I3cc3864dcfe43186bec51be1a732e84ef3be05ae
Similar to the NVMe API change, this allows better abstraction of the
PCI subsystem.
Change-Id: I2b84d9c3c498a08d4451b4ff27d0865f0456c210
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This patch did the work to use pci related functions
provided by DPDK.
Change-Id: I263b79f1b42868ef0c1efcf1bc392a4b3a328e93
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
With CONFIG_PCIACCESS=y in CONFIG file, we can
use libpciaccess library; With CONFIG_PCIACCESS=n
in CONFIG file, we use pciaccess functions provided
in DPDK.
Change-Id: I786c5589b8e7909ba2e59d222938dd5ba45bf92d
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
The new probing API will find all NVMe devices on the system and ask the
caller whether to attach to each one. The caller will then receive a
callback once each controller has finished initializing and has been
attached to the driver.
This will enable cleanup of the PCI abstraction layer (enabling us to
use DPDK PCI functionality) as well as allowing future work on parallel
NVMe controller startup and PCIe hotplug support.
Change-Id: I3cdde7bfab0bc0bea1993dd549b9b0e8d36db9be
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
For those Crystal Beach DMA channels which support block fill capability,
we add a fill API here that can zero out pages or fill them with a
fixed pattern.
Change-Id: I8a57337702b951c703d494004b111f6d206279fb
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
This patch adds Intel NVMe device list and overrides the
supported log pages according to the quirk list.
In particular, the READ_CMD_LATENCY and WRITE_CMD_LATENCY pages are
supported on Intel DC P3x00 devices despite not being listed in the
Intel vendor-specific log page directory.
Change-Id: I3a2b6a5fa142c6e9c93567df65e85980bd3c7cc0
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
Also add a space between Copyright and (c).
The copyright year can be determined using git metadata.
Also remove the duplicated "All rights reserved." - every instance of
this line already has a corresponding "All rights reserved" immediately
below it, except for examples/ioat/kperf/kmod/dma_perf.c, where I have
added it manually.
Performed using this command:
git ls-files | xargs sed -i -e 's/Copyright(c) \(.*\) Intel Corporation. All rights reserved./Copyright (c) Intel Corporation./'
Change-Id: I3779f404966800709024eb1eb66a50068af2716c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
1 Add new API nvme_ctrlr_is_feature_supported().
2 Add unit test for new API.
Change-Id: Ia6d8710755c3b13984fca9d56700efe043be1402
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
This will allow replacing these _nvme_fail_request_bad_vtophys() calls
with the correct error later. vtophys is not actually used within the
SGL request builder, so this is the wrong error.
Change-Id: Ibc2a3b029a8abad1d563b9df200325d7d64498da
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
No code change, just moved into a function for readability.
Change-Id: I883443c06d961c6dbeffed1a6fb153177e6e3fcd
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This cleans up the I/O splitting code somewhat.
It also moves the SGL payload function pointers up into the hot cache
section of struct nvme_request without pushing the other important
members past the cacheline boundary (because payload is now a union).
Change-Id: I14a5c24f579d57bb84d845147d03aa53bb4bb209
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
A namespace indicates support for reservations by reporting a non-zero
value in the Reservation Capabilities field in the Identify Namespace
data structure, and controller indicates support for reservation in the
Identify Controller data structure, Here we used namespace field as the
support flag.
Change-Id: I0e1e29548aa3fc8b6d3bbeb4149ec4864316f092
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Support for the Force Unit Access and Limited Retry
bits on reads and writes.
Change-Id: I9860848358377d63a967a4ba6ee9c061faf284d4
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
This patch is used to wrap the inner implementation
of libpciaccess and prepare for the same interface
to applications in the future patch
Change-Id: I4d40fae0bd86b451ed38dbfd9bcc015f9bfc8436
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
NVMe reservations provide capabilities that may be used by two or more
hosts to coordinate access to a shared namespace, here we add the 4
reservation commands: reservation register/acquire/release/report.
Change-Id: Ib03ae2120a57dd14aa64311a6ffeb39fda73018c
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
For the purpose to support different types of input scattered payloads,
such as iovs or scattered list, we define common method in the NVMe
driver, users should implement their own functions to iterate each
segment memory.
Change-Id: Id2765747296a66997518281af0db04888ffc4b53
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Change nvme_ctrlr_is_log_page_supported() to match
nvme_ctrlr_cmd_get_log_page().
Change-Id: I4c8a1f11044b083f8f8990ef40a4f789fa3c24e3
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Request allocation may fail, so we need a way to indicate failure to the
caller.
Change-Id: I278c3f42e4d2fa1902bb0ab33ad3bf7c7007fd0d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
1 Add supported log pages data structure.
2 Bulid up supported log pages when NVME start.
3 Provide unified API for getting log pages.
3 Unit test suit optimization base on above modification.
Change-Id: I03cdb93f5c94e6897510d7f19bc7d9f4e70f9222
Signed-off-by: Cunyin Chang <cunyin.chang@intel.com>
Use /sys/bus/pci/devices/.../driver to determine which driver is loaded
for a particular device.
Change-Id: I5859a776e524033e1c6d6ec3796b7e11bdcf0bc4
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This more accurately represents what function it performs.
Also remove pci_device_has_uio_driver() from the public API. Callers
should use pci_device_has_non_uio_driver() instead.
Change-Id: I9623fe1345b43e981d5823804e33d01ac0d3bb1c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
nvme_ctrlr_process_io_completions() and
nvme_ctrlr_process_admin_completions() now return the number of
completions processed.
This also adds the possibility of returning an error from the
process_*_completions functions (currently unused, but this at least
gets the API ready in case error conditions are added later).
Change-Id: I1b32ee4f2f3c1c474d646fa2d6b8b7bbb769785f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This ensures that any uninitialized fields are 0/NULL so if
ioat_channel_start() fails, ioat_channel_destruct() will not try to free
bogus pointers.
Change-Id: I99278c9fa280cbcdf3f7448e77db3ac98b59cdd6
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Previously, if nvme_allocate_request() failed in
nvme_ctrlr_construct_and_submit_aer(), there was no error checking, so a
NULL pointer would be dereferenced.
Add a return value to nvme_ctrlr_construct_and_submit_aer() so we can
signal failure to the caller. This can only really be reasonably
handled during initialization; when resubmitting a completed AER later,
there is nowhere to report failure, so the AER will just remain
unsubmitted.
Change-Id: I413eb6c21be01cd9a61e67f62f2d0b7170eabaa3
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The ioat library currently only supports DMA copy operations, but the
hardware can do other types of transfers. Add a union of the hardware
descriptor structures to enable support for the other operations in the
future.
Also add a generic hardware descriptor type to allow access to the parts
of the descriptor that are common between all types.
Change-Id: I3b54421ce771f58b78910e790b53026f311f918e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
According to the specification, the dataset management for deallocate
attribute can support to 256 ranges, so we should use uint16_t
instead of uint8_t as the ranges parameter.
Change-Id: Ibacc00da8b4b9e2b2f3454d382aadf7ad353ff31
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The parent field is no longer used in the normal (non-split) I/O path,
so move it down to the default-uninitialized part of struct nvme_request
that is only touched for parent/child I/O.
This also puts it closer to other related fields (children,
child_tailq, parent_status) for improved readability.
Change-Id: I120df1df0c967d2f74daa6e97c0bc83626e3be7f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
nvme_qpair_submit_tracker() and nvme_qpair_manual_complete_request() are
only used from within nvme_qpair.c, so they can be static.
nvme_qpair_submit_tracker() is moved up to avoid needing a declaration
(no other code change).
nvme_ctrlr_hw_reset() is only used from within nvme_ctrlr.c, so it can
be static.
Change-Id: I9a7953d7baaec76e875dd535daf557ea24bef801
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
These delays are left over from early development. They are completely
unnecessary and not based on anything in the NVMe spec.
Startup time should be slightly improved (on the order of 100 ms in
normal cases).
Change-Id: I9068b1a0f42feabcfe656d68be91e05a56cc53a3
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This lets us signal an error if the channel is halted in
ioat_process_channel_events().
Change-Id: Iffaf4fd1e27d1254f9d95a37d732ae4a5f3a0465
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Rather than individually allocating each ring entry, use two large
allocations, one for the hardware descriptors and one for the software
descriptor contexts.
This allows the use of simple array indexing on the rings and also
allows the removal of most of the software descriptor structure,
since the necessary information can be retrieved based on the ring
index now.
Change-Id: I73ef24450f69ca0fc35e350286282c6b1c77a207
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
NVMe doesn't require the specific 64-bit MMIO ordering on 32-bit
platforms performed in spdk_mmio_read_8(), but it doesn't hurt.
We have to pick one of the two possible orderings, so pick the one
required by I/OAT.
Change-Id: I2b909d64d0c077b797d0f64a11d78d1ecc55eec7
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The ioat driver supports DMA engine copy offload hardware available on
Intel Xeon platforms.
Change-Id: Ida0b17b25816576948ddb1b0443587e0f09574d4
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Use $(CLEAN_C) throughout the Makefiles to clean up a consistent set of
generated files.
This also adds coverage files to the list of cleaned files.
Change-Id: Iceb922935a45c9eecbf2f3443bd0ee4f5c966825
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Most devices today support far fewer than 1024, but this is a
more reasonable default upper limit than the spec-defined 64K.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ia8a6d80c3a5aa181f27c8354758c6ca468013d92
lib/memory was already using this pattern; extend it to lib/util and
lib/nvme.
Change-Id: I84a633d7805522fc94d8fc11ad5486ce552702e5
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The .o files are always kept anyway, so there is no need for an explicit
rule.
Change-Id: Id1687ba89daabfda5802e4328deb127403277928
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Add common $(COMPILE_C), $(LINK_C), and $(LIB_C) variables that contain
the commands to build a .o from a .c, an app from objects and libraries,
and a library from objects, respectively.
Change-Id: Ie2eaa13156b8bf3db7a4ffa66161382d829aef07
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
nvme_ctrlr_process_io_completions() now takes a second parameter,
max_completions, to let the user limit the number of I/Os completed on
each poll.
If there are many I/Os waiting to be completed, the
nvme_ctrlr_process_io_completions() function could run for a long time
before returning control to the user, so the max_completions parameter
lets the user have more control of latency.
Change-Id: I3173059d94ec1cc5dbb636fc0ffd3dc09f3bfe4b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
After converting is_resetting to bool, it is smaller and can be packed
more efficiently with is_failed and reordered after the larger fields
used in the I/O path.
Change-Id: Ifa2301eb61ce8d38eb5412cca61d2a91b1474101
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
It was previously uint32_t because it was accessed with special
uint32_t-only atomic read/write helper functions, but that was replaced
with normal variable accesses protected by a mutex.
Change-Id: I304a7ef8c723cb33fd08110b697f848823a163e7
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Set SPDK_ROOT_DIR explicitly in each Makefile so that make from a
subdirectory will work (assuming all dependencies from the upper
directory have already been built). This allows partial rebuilds of the
source tree, as well as building the unit tests without requiring DPDK.
Change-Id: I3f65b805d490b40ff5ec53cceb61df542ce814f1
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This helps weed out functions that should be static, functions that are
not declared in public header files, and .c files that don't include
their .h interface headers.
Change-Id: Ie39f83ad4b320847e4a938bd1d4d0b4fa21c2ffa
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Fix all of the uses of __thread so they are at the beginning (similar to
e.g. static).
Don't actually enable -Wold-style-declaration, since clang doesn't
understand that.
Change-Id: I0dcbb758143eab90fc978334c8f256c6602cc4cd
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
The ioctl() calls in dev_get_blocklen() were checking for != 0 instead
of == 0, so the default path (512) was always being taken.
Change-Id: Ib0b016b1d453fb94d408063417b7485ff24ed220
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Rename the nvme_free_request macro to nvme_dealloc_request to match
nvme_alloc_request and add a wrapper function to nvme.c so that the
macro contents are only expanded once.
The DPDK nvme_impl.h uses rte_mempool_put(), which generates a large
amount of code inline. Moving this macro expansion to a wrapper
function avoids inlining it in the multiple places nvme_free_request()
gets called, most of which are error handling cases that are not in the
hot I/O path.
Change-Id: I64ea9c39ba47e26672eee8d5058f1489e07eee5b
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Like sprintf() with automatic buffer allocation.
This should help to avoid fixed-size buffers in
non-performance-sensitive code that formats strings.
Change-Id: I35209ae84014ed5daf41baa5b03af8a5f6b02b8e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This is the only place nvme_request_add_child() is used, so move it
nearby and make it static to allow the compiler to inline it.
Change-Id: If4a7e17fde0b0272e1d4432c1dcedbec27c25371
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Move dependency includes into a new spdk.deps.mk file,
then include it at the end of Makefiles that build
source files.
Also add a test to autobuild.sh to confirm that
binaries are regenerated if we make after touching a
header file.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: If6a1905706a840f92cbdf3ace7fbdb27fe2de213
Pull the almost-identical request splitting code for driver-assisted
striping and maximum I/O size into its own function,
_nvme_ns_cmd_split_request().
Change-Id: I3c15ac2073f8f5aec721c427199c8fb1a5d6a1fc
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This helps enable FreeBSD, where pciaccess pci_device_has_kernel_driver()
is not functional. The function will return 0 if there is no driver
attached, or the Linux uio or FreeBSD nic_uio driver is attached. It will
return 1 otherwise.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I0921e61c9040b1e0411b5dc40b36fc7f2721c8c5
The changes are minor:
- remove unneeded error.h
- replace PATH_MAX with a suitable local #define
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I5aecf8b53e0ac7582f394c71b4668888a6c6292f
The Linux pagemap-based implementation obviously does not
work on FreeBSD. DPDK has data structures describing the huge
pages it has allocated, so use that instead when we need to
populate new 2MB mappings in our tables.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I924e104f42891aaa2f931159aabba2779f239e91
GCC generates a series of 64-bit MOV instructions for the memcpy() into
the submission queue. We can do better with 128-bit SSE2 instructions.
DPDK already has a memcpy implementation that is optimized for small
inline copies, so use it instead of memcpy.
Change-Id: I5f09259b4d5cb089ace4a8ea6d2078c03fee84f3
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
No change in behavior, just a simplification.
We already have a check for retry, so pull the cb_fn check out and put
it under the !retry branch.
This makes it clearer that requests that are going to be retried will
not get their callbacks called.
Change-Id: I70c7067e550c7fca78b0441b5474833f73863315
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Since it is empty function. It can be deleted.
Add nvme_assert to fix issue reported by scan-build.
Change-Id: Ia0e8f656e1dac0da7ec72f8404469ea1b0dcb40e
Signed-off-by: Liang Yan <liangx.yan@intel.com>
This is the only place that was using printf directly in the NVMe
library. Replace it with the official nvme_printf logging mechanism.
Change-Id: I689a7c0854b5e47eb357150f814e347cd44be79c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
In nvme_qpair_complete_tracker, make sure we got a valid request in the
tracker that is being completed.
This should never occur in practice, but safeguard against it in case of
programmer error. Fixes a scan-build warning about potential NULL
dereference.
Change-Id: Id82af604d2a5ed5de0aeccf3affa1900f6712ebe
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Document the assumptions made by qpair_construct using asserts.
These values can't actually be 0 in practice due to the way they are
derived, but scan-build can't see that. It is also useful to have these
asserts in case of future modifications.
Change-Id: I546c057f5cbe7ccc62acd90b595e423cd450d86a
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
nvme_dump_command is totally unused aside from the unit test.
nvme_dump_completion was used in qpair, but it can be replaced with the
equivalent nvme_qpair_print_completion.
Also added the missing nvme_completion fields to nvme_qpair_print_completion
that had been printed by nvme_dump_command.
Change-Id: Ia5ee66f3553df06febe8f465d42e49a84c555dd2
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This is calculated elsewhere now, so remove the comments around
nvme_qpair_construct calls.
Change-Id: I2dc4956a9e250b88e62038bc55cdd315940ad391
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
rc is reinitialized before it is ever read.
Change-Id: I9abbc256fb06022f3024b0aa3827be02a273f20a
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>