This should have always been the case with spdk_mem_map_translate. For
some memory maps (like RDMA) this doesn't matter, but for others like
our virtual to physical map, this is critical for retrieving valid
translations.
This behavior change will only affect maps that have a registered
contiguous memory callback.
Change-Id: I67517667f01d974702d7daa7c81238281aae0cf6
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/436562 (master)
Reviewed-on: https://review.gerrithub.io/437202
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
This keeps us from having to deal with ALLOC and FREE events
for mismatching regions - which necessitated splitting new
regions into individual pages. This caused all kinds of
problems with NVMe-oF - for example, buffers that spanned
memory regions, or bumping up against MR limits on RDMA
NICs.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I18dcdae148436b55d4481bb9fb8799f4832c7de1
Reviewed-on: https://review.gerrithub.io/434895 (master)
Reviewed-on: https://review.gerrithub.io/435692
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
We assumed spdk_mem_map_translate() translates only 2MB-aligned
addresses, but that's not true. Both vtophys and NVMf can use it
with any user-provided address and that breaks our contiguous memory
length calculations. Right now each buffer appeared to have the
first n * 2MB of memory always contiguous.
This is a bugfix for NVMf which does check the mapping length
internally. It will also become handy when adding the similar
functionality to spdk_vtophys().
Change-Id: I3bc8e0b2b8d203cb90320a79264effb7ea7037a7
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/433076 (master)
Reviewed-on: https://review.gerrithub.io/435691
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Minor cleanup.
Change-Id: I9554163ae22836b50b954ec27ed27bcb848cb193
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/428883
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
The problem with registering the entire hotplugged memory
region is that it won't necessarily be unregistered in one
go. Registering each hugepage separately solves that
problem.
This puts a limitation on the number of pages that can
be allocated when using RDMA. We'll hopefully lift this
limitation sometime in future - probably levereging
ibv_rereg_mr, but for now we'll have to resort to either:
a) using 1GB hugepages
b) preallocating memory (with [-s|--mem-size <size>] app
param) as it will be registered as just one region no
matter what size it is. This memory won't be returned
to the system until the SPDK app exits.
Change-Id: I6de997fb4901b772730ba6fe995dcc0640b85749
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/428716
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
All mem_maps will still receive separate unregister
notification for each registered region, but the
public memory unregister API is more flexible now.
This follows the VFIO_TYPE1v2_IOMMU interface, which
allows the same.
Change-Id: Ifc008afdc6bff39d9b3b4c892c379ade10c3098e
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/428715
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Added sanity checks to prevent unregistering a memory
range that wasn't registered as a one, complete region.
Change-Id: I819b57560b2e48b0802113ffff9f72949d7a148a
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/425556
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Removed the reference count from the registrations map.
Although technically supported, registering a single memory
region more than once had a lot of unhandled cases and could
easily lead to a segfault.
RDMA maps require all memory to be unregistered in the same
chunks the memory was registered, which is often impossible
to achieve if a region was registered more than once:
1. register region 0x0 - 0x3 -> it gets mapped to
a single ibv_mr
2. register region 0x1 - 0x2 -> nothing happens, this region
is already registered
3. unregister region 0x0 - 0x3 -> 0x0-0x1 gets unregistered as
one region. 0x2-0x3 gets
unregistered as another
(leading to segfault in the
the current RDMA implementation)
The problem is that the last two regions share the same ibv_mr,
which SPDK tries to free twice. The second free causes a segfault.
vtophys map handles this case by registering each 2MB chunk
separately, but this solution cannot be applied for RDMA, as
NICs put a limitation (~2048) on the number of regions registered.
Another option is to keep a refcount of each ibv_mr allocated,
and free it only when the entire region was unregistered from the
SPDK mem map. This is however very tricky and RDMAmojo mentions
that freeing a memory buffer before unregistering its ibv_mr
may lead to a segfault.
Change-Id: I545c56e24ffa55bda211dea22aeb8a55d9631fe5
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/426085
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Prevents us from unregistering two or more separately
registered regions with a single notification.
This fixes an ibv_mr leak in RDMA. When multiple registrations
were unregistered with a single notification, only the first
ibv_mr one would be freed and the remaining memory would
possibly still remain DMA-able.
As of now, unregistering multiple complete regions with
a single unregister call is not possible. It will be
implemented later, after the rest of the code is cleaned up.
Change-Id: I7d61867fa61fd7a4a8a644ff45cab17125d63e1b
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/425555
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Newly registered mem maps weren't notified at vaddr 0
as we used value "0" to denote an unitialized, unset
variable. Since we *do* register vaddr 0 in our memory
unit tests let's switch unitialized vaddr value to
something different - like UINT64_MAX.
Change-Id: I9c902165e76155e068642abb9a656f3ae8ca1105
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/428713
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
In spdk_mem_map_alloc() we only do the memory walk when
notify_cb is provided, but spdk_mem_map_free() does the
memory walk undonditionally. Not anymore.
Change-Id: Ic8dfdc5cb2c99dc58e62ab0523cf5a18ba8691cc
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/428722
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
If during mem_map creation we managed to register only a part
of the total registered memory and then failed in the middle,
we will now undo our registrations and gracefully return
an error code leading to a spdk_mem_map_alloc() failure().
Change-Id: Id549f35109bd381318adc676392d9ee26d035359
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/423538
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This function will now check for whether or not a memory region is
contiguous accross 2MB map entries and return the total length of that
contiguous buffer up to the size specified by the user.
Also includes unittests
This series of changes is aimed at enabling spdk_mem_map_translate to
report back to the user the length of the valid mem_map up to the
function that requested the translation.
This will be useful when retrieving memory regions associated with I/O
buffers in NVMe-oF. For large I/O it will be possible that the buffer is
split over multiple MRs and the I/O will have to be split into multiple
SGLs.
Change-Id: I2ce582427d451be5a317808d0825c770e12e9a69
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/425329
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This series of changes is aimed at enabling spdk_mem_map_translate to
report back to the user the length of the valid mem_map up to the
function that requested the translation.
This will be useful when retrieving memory regions associated with I/O
buffers in NVMe-oF. For large I/O it will be possible that the buffer is
split over multiple MRs and the I/O will have to be split into multiple
SGLs.
Change-Id: I90da6d4d31c669a3bf046f7721923dd743c5ef21
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/425328
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
We recently increased the top level map from 128TB
(47 bits) to 256TB (48 bits) but missed fixing one
comment that still referred to the old number of bits.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ic993bd136cd09789ea2a65b480151805d231c3d5
Reviewed-on: https://review.gerrithub.io/426002
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Dariusz Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This struct will hold the unique operations for the mem_map.
This series of changes is aimed at enabling spdk_mem_map_translate to
report back to the user the length of the valid mem_map up to the
function that requested the translation.
This will be useful when retrieving memory regions associated with I/O
buffers in NVMe-oF. For large I/O it will be possible that the buffer is
split over multiple MRs and the I/O will have to be split into multiple
SGLs.
Change-Id: Ifdd82497f238d99345033f2615c718802a591438
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/425327
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
The function now takes a pointer as it's last argument, and copies the
size of the memory region for which the translation is validinto that
pointer.
For now, that will always be 2MB. However that behavior can change in
the future.
This series of changes is aimed at enabling spdk_mem_map_translate to
report back to the user the length of the valid mem_map up to the
function that requested the translation.
This will be useful when retrieving memory regions associated with I/O
buffers in NVMe-oF. For large I/O it will be possible that the buffer is
split over multiple MRs and the I/O will have to be split into multiple
SGLs.
Change-Id: I8686c166ec956507f5ae55cf602341281482cb89
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/424888
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Add a check to prevent spdk_mem_map_set_translation() or
spdk_mem_map_clear_translation() calls that start within the valid
address range but specify a size that would access parts of the mem map
outside of the valid region.
spdk_mem_map_translate() is safe without any extra checks since it only
accesses the first entry regardless of size, and the MASK_256TB check
catches out-of-range accesses to that entry.
Change-Id: Ie1437e57b5158363bb98a6b42a26fb41a089bbad
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-on: https://review.gerrithub.io/418106
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
The arrays for both the map_256tb and map_1gb structures were twice
as large as necessary; fix the sizes and add unit tests for the boundary
conditions to verify that the fix works.
Change-Id: I66bce463f234f54e69cf2a697db9f806d398ca1e
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-on: https://review.gerrithub.io/418105
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ia413a913857dad00ae091b8cea02a647b8996a5c
Reviewed-on: https://review.gerrithub.io/418108
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This brings DPDK 18.05 support and introduces
dynamic hugepage memory allocation.
The following is now possible:
./spdk_tgt -s 32
rpc.py construct_malloc_bdev 128 512
or even:
./spdk_tgt -s 0
Note that if no -s param is given, DPDK will still
allocate all available hugepage memory.
This has been tested with DPDK 18.05-rc6.
Fixes#281
Change-Id: Ic9521484c2871eb5b2a56445f1177f305b147707
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/410540
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
At least on some ARM platforms, virtual address map
uses lower 48 bits of address space, compared to just
47 on x86-64.
This does double the size of the top level map from
1MB to 2MB.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I18f95a10b4913dca0256cbbd8d05f3baa5ab69fa
Reviewed-on: https://review.gerrithub.io/396318
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Modifies spdk_env_init() and spdk_mem_map_init() such that
they return on failure instead of terminating with exit()
or abort().
Change-Id: I054c1d9b2e46516ff53d845328ab9547f54bdbc4
Signed-off-by: Lance Hartmann <lance.hartmann@oracle.com>
Reviewed-on: https://review.gerrithub.io/393987
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Memory is now reference counted at a higher level.
Change-Id: I61b24db7b92a129686775eddbff3a48814c842fe
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/375644
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
This makes the separation between the vtophys map
and the generic memory map code clearer.
Change-Id: I3e8686e432a4594339008698de156d3978e9768a
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/375640
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>