We were passing a map structure to unmap ioctl,
where we should obviously pass an unmap structure
instead.
Due to how map/unmap structures are, we were
unmapping much more memory than necessary on pci
hotremove.
Change-Id: Iabbea4b277228dc3de300849f751202f42bded2b
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/421693
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This brings DPDK 18.05 support and introduces
dynamic hugepage memory allocation.
The following is now possible:
./spdk_tgt -s 32
rpc.py construct_malloc_bdev 128 512
or even:
./spdk_tgt -s 0
Note that if no -s param is given, DPDK will still
allocate all available hugepage memory.
This has been tested with DPDK 18.05-rc6.
Fixes#281
Change-Id: Ic9521484c2871eb5b2a56445f1177f305b147707
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/410540
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
We've had cases (especially with vhost) in the past where we have
a valid vaddr but the backing page was not assigned yet. DPDK used
to return 0 as the phys addr in these cases but now it returns
RTE_BAD_IOVA. Unfortunately we don't have any tests currently
in the test pool that hit this condition, but at least one user
has an environment which hits it and this patch fixes their
problem.
Make sure we still work with older versions of DPDK as well.
Fixes issue #260.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ie3c0ef54a3e34153bd0850ecfb2be4fcb92455b1
Reviewed-on: https://review.gerrithub.io/410071
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
DPDK commit 9df1ae8a888d ("eal: rename and move PCI resource structure")
introduced the new struct rte_mem_resource name; before that point, the
equivalent type was struct rte_pci_resource.
Since this is an easy fix, we can go ahead and patch around it; however,
users are highly recommended to use newer DPDK releases.
Change-Id: I27637136fa932f10032f5f76248da07120fa02a9
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-on: https://review.gerrithub.io/408743
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Replace the use of the private rte_pci_bus list with our own internal
list of PCI devices inside SPDK. This fixes linking against the shared
library version of DPDK.
Change-Id: Ia69555e4e7caa1a40974b7969d48773e36ae0fd7
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-on: https://review.gerrithub.io/405937
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
DPDK 17.11 added a public API, rte_vfio_is_enabled(), that we can use
instead of declaring and using pci_vfio_is_enabled(). This removes one
of the remaining non-public DPDK symbols we are currently using, getting
us closer to building against the shared library version of DPDK.
Change-Id: Idf4ee66d4868cf542521fa2896ed8c609d42ee29
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-on: https://review.gerrithub.io/405921
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
If those are encountered, SPDK will get physical
addreses by itself (either IOVAs from vfio or
real physical addreses from /proc/self/pagemap).
Change-Id: I321892f7dfb26054087a86cd24502efff05883ea
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/404138
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Add a new virtual to physical (vtophys) method for
spdk_vtophys_notify() that works for PCI memory (namely NVMe CMBs and
PMRs). This new method searches all the BARs on all the detected PCI
devices to see if the vaddr resides inside any of them.
Change-Id: I68afbeffd958cf40c1e8652e13da5531811b522b
Signed-off-by: Stephen Bates <sbates@raithlin.com>
Reviewed-on: https://review.gerrithub.io/398872
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
At least on some ARM platforms, virtual address map
uses lower 48 bits of address space, compared to just
47 on x86-64.
This does double the size of the top level map from
1MB to 2MB.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I18f95a10b4913dca0256cbbd8d05f3baa5ab69fa
Reviewed-on: https://review.gerrithub.io/396318
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Modifies spdk_env_init() and spdk_mem_map_init() such that
they return on failure instead of terminating with exit()
or abort().
Change-Id: I054c1d9b2e46516ff53d845328ab9547f54bdbc4
Signed-off-by: Lance Hartmann <lance.hartmann@oracle.com>
Reviewed-on: https://review.gerrithub.io/393987
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
VFIO requires at least one IOMMU group to be added to the
VFIO container to be able to perform any IOMMU operations
on that container. [1] Without any groups added, VFIO_IOMMU_MAP_DMA
would always respond with errno 22 (Invalid argument).
Also, if the last IOMMU group is removed from the container
(device hotremove), all the IOMMU mappings are lost.
In both cases we need to remap vfio memory as soon as the
first IOMMU group is attached. The attach is done inside
DPDK during device attach and we can't hook into it directly.
Instead, this patch hooks into our PCI init/fini callbacks.
There's now a PCI device ref counter in our vfio manager and
a history of all registered memory pages. When the refcount
is increased from 0 to 1, the vtophys will remap all vfio
dma memory.
[1] https://www.kernel.org/doc/Documentation/vfio.txt
"On its own, the container provides little functionality,
with all but a couple version and extension query interfaces
locked away. The user needs to add a group into the container
for the next level of functionality. [...] With a group
(or groups) attached to a container, the remaining ioctls
become available, enabling access to the VFIO IOMMU
interfaces."
Change-Id: I744e07043dbe7ffd433fc95d604dad39647675f4
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/390655
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Require braces around all conditional statements, e.g.:
if (cond)
statement();
becomes:
if (cond) {
statement();
}
This is the style used through most of the SPDK code, but several
exceptions crept in over time. Add the astyle option to make sure we
are consistent.
Change-Id: I5a71980147fe8dfb471ff42e8bc06db2124a1a7f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-on: https://review.gerrithub.io/390914
Reviewed-by: <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
If the IOMMU is enabled, automatically register memory
added by the user through spdk_mem_register().
Change-Id: Ie02c7bf445314da23e2efee9de9c187ed0773a9f
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/375249
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This makes the separation between the vtophys map
and the generic memory map code clearer.
Change-Id: I3e8686e432a4594339008698de156d3978e9768a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/375640
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Previously, if the user registered memory twice the
mem maps would get notified both times. Instead,
reference count the registrations.
Change-Id: I3f5cbf9e9320c2f4491ec8a5892c1f83cce41706
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/375639
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Change-Id: Iffce947b4ccd13fd0747cdb9372fdb7587b1f5a2
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/375828
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
This is in preparation for needing to know if the
address is in the DPDK memsegs on unregister.
Change-Id: Ie2febecae789808ae8ae63ce6889b9bbf3a72aab
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/375248
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Both the register and unregister paths did the same splitting
on 2MB boundaries, so do that first and then switch on
the action.
Change-Id: I532c42a698c2d423d3ecb48bc0d964e766cf742b
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/375247
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
This code can't use our standard logging mechanism, but the
if DEBUG statements were making it hard to read. Add a
basic macro to clean it up.
Change-Id: I1d5c87df60d212ffe2b2455cc2169036dcb9e807
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/375246
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
spdk_vtophys_register_one and spdk_vtophys_unregister_one
were both only called from one spot and are tiny functions.
Remove the extra layer of indirection for code clarity.
Change-Id: I4cf2698d6c7df7e09bfe05d786e39e1fb063a973
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/375245
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This function was only called from one place, so
inline it there. It's easier to follow the code
without so much jumping around.
Change-Id: I3bb11dda321af5f266d23aa32f1a79c8b361595b
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/375224
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
When running IOMMU enabled the user is actually
interacting with IO addresses - not physical
addresses. The value in the DPDK memseg array is
the correct one in this case, and should be used
at a higher priority than attempting to look up
the address in /proc/self/pagemap (i.e. what
rte_mem_virt2phy() does).
Further, rte_mem_virt2phy() will always fail when
running as an unprivileged user, but scanning
the memory segments will work.
Change-Id: I576e685111b7f9f848337134b7b89a3cf7c85402
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/375208
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
There was already handling for a NULL map pointer pointer, but the inner
pointer wasn't checked. Treat pointer to NULL as a no-op as well to
simplify calling code.
Change-Id: I0a213233c021957ab2f4b6c1dd304034ca98b868
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-on: https://review.gerrithub.io/367433
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
SPDK_COUNTOF works like sizeof, except it returns the number of elements
in an array instead of the number of bytes.
Change-Id: I38ff4dd3485ed9b630cc5660ff84851d0031911f
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This will allow returning a different default value per mem map.
Change-Id: I94d3de197acfb2e6ad40092ab0588ba4e951af80
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Add a top-level structure that can be reused for other kinds of memory
address translations.
Change-Id: I046f98b76b4e98087d90095d6e9dea5cd6ab7898
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Make sure the compiler arranges the fast path as the fallthrough case by
annotating the checks in spdk_vtophys().
Change-Id: If0fc3149297131894b5c7a94bff31bf8ee40326e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Now that all DPDK memory is registered at startup, spdk_vtophys() never
needs to add new translations to the vtophys map. This means that any
lookup that fails to find an allocated map_1gb will always return
SPDK_VTOPHYS_ERROR rather than trying to allocate it and then failing
the lookup anyway.
Change-Id: I7e6f7af183199651f5808a17810a17970b0e3331
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
vtophys_get_paddr() and vtophys_get_dpdk_paddr() are doing similar
things; combine them into one function that works for all DPDK
memory addresses.
Part of the vtophys test is temporarily disabled until the next commit,
which will register all DPDK memory at startup and stop lookiing up
addresses at runtime.
Change-Id: I91312837aa1e6170bacaf3b0d2adbdc4391d3afa
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
This just moves the lookup of the physical address up one level - now
_spdk_vtophys_register_one() is only responsible for filling out the
mapping table, not looking up the translation.
Change-Id: I9fd5b85da623e403fda0563b6bdebd4aaaf42864
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Rather than storing the page frame number, just store the full physical
address of each 2 MB page. This simplifies the lookup code and makes
the map generic (values are inserted and retrieved without any
modification) for future uses.
Change-Id: Ib1081513a0682f6b8b908f3401c00d87b00f484c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Split the ref_count field of the bottom level of the vtophys map tree to
a separate array so that the pfn_2mb field can be expanded to a full 64
bits again. This doesn't change behavior for the current use as a page
frame number; it is setup work for storing an arbitrary 64-bit pointer
value in the bottom level.
Change-Id: I0bc44df3edc9df4a479229d69c2f3884d43a340d
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
These APIs can be used to register/unregister regions
of pinned, huge page memory that are separate from
huge page memory allocated by the default DPDK
allocations. These APIs will be used by an upcoming
SPDK vhost-scsi target to enable SPDK to target
NVMe DMA operations directly to VM memory that has
been allocated by QEMU using pinned huge pages.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I649a4adeeb758b29bd29cd42c8872eed3d5d6ce9