When bs_create_blob() is creating the internal xattr for the esnap ID,
it errors out if the ID is too long. This error path neglected to set
the return value. It now returns -EINVAL in this case.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I6d756da47f41fb554cd6782add63378e81735118
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/17292
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
From the issue report in #2507
that comparing blob maybe be NULL.
So add assert, that in CI may catch this issue.
And other funtions add this also.
Signed-off-by: yidong0635 <dongx.yi@intel.com>
Change-Id: I98179ec76f2b6785b6921c37373204021c0669b6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12737
Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
As an lvstore is opening it calls spdk_bs_load(), which briefly opens
each blob and has no use for external snapshots. Since there is no point
in opening them at this time, don't open them. Once the blobstore has
been loaded, update lvs->load_esnaps so that external snapshots are
opened as the lvols open their blobs.
Change-Id: Ib16c8474300ff4b106aad0baa5b8b38332c23b01
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/16424
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The blob's parent_id and allocate_all examined and/or modified in a
two places bs_inflate_blob_open_cpl(). This transforms the two if
statements scattered around the function into a switch statement to make
it easier to understand how these two values are related.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I2cff2d07a0089b52678035b2ece60db6a5f67a8e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/17178
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
When an esnap clone blob's external snapshot arrives after the blob is
opened, it can now be hot-added to the blob. Presumably the new device
replaces a place-holder device that did not really atteempt IO.
Change-Id: I622feb84efa66628debf44f7e7cb88b6a012db6d
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/16232
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This adds the ability to abort IOs as esnap bs_dev channels are being
destroyed.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: Ia63d4cbef5cd4c84dc8d5e2e9e407bacd961385f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/16423
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
As the blobstore is being unlaoded, async esnap channel destructions may
be in flight. In such a case, spdk_bs_unload() needs to defer the unload
of the blobstore until channel destructions are complete.
The following commands lead to the illustrated states.
bdev_malloc_create -b malloc0
bdev_lvol_clone_bdev lvs1 malloc0 eclone
.---------. .--------.
| malloc0 |<--| eclone |
`---------' `--------'
bdev_lvol_snapshot lvs1/eclone snap
.---------. .------. .--------.
| malloc0 |<--| snap |<--| eclone |
`---------' `------' `--------'
bdev_lvol_clone lvs1/snap eclone
.--------.
,-| eclone |
.---------. .------.<-' `--------'
| malloc0 |<--| snap |
`---------' `------'<-. .-------.
`-| clone |
`-------'
As the blobstore is preparing to be unloaded spdk_blob_unload(snap) is
called once for eclone, once for clone, and once for snap. The last of
these calls happens just before spdk_bs_unload() is called.
spdk_blob_unload() needs to destroy channels on each thread. During this
thread iteration, spdk_bs_unload() starts. The work performed in the
iteration maintains a reference to the blob, and as such it
spdk_bs_unload() cannot do its work until the iteration is complete.
Change-Id: Id9b92ad73341fb3437441146110055c84ee6dc52
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14975
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This adds support for inflate and decouple for esnap clones. Since there
are no immediate consumers that will provide back_bs_dev->is_zeroes()
that can return true, a shortcut is taken in that inflate and decouple
of esnap clones are the same.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I4d2e6565126991acd650f073ce876466334e986d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11574
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
An esnap clone needs special handling as snapshots are created and
removed. In particular: the following must exist on the blob that
directly references the external snapshot and must be removed from
others:
- Ensure SPDK_BLOB_EXTERNAL_SNAPSHOT invalid flag exists only on the
esnap clone.
- Ensure BLOB_EXTERNAL_SNAPSHOT_ID internal xattr exists only on the
esnap clone.
- Clean up any esnap IO channels on a blob that is no longer an esnap
clone due to snapshot creation or removal.
See the diagrams and description in blob_esnap_clone_snapshot() in
blob_ut.c for details.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: Ie4125d64d5bac9cfa7d6c7cc9a543d72a169f6ee
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11573
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
The channel passed to blob IO operations is useful for tracking
operations within the blobstore and the bs_dev that the blobstore
resides on. Esnap clone blobs perform reads from other bs_devs and
require per-thread, per-bs_dev channels.
This commit augments struct spdk_bs_channel with a tree containing
channels for the external snapshot bs_devs. The tree is indexed by blob
ID. These "esnap channels" are lazily created on the first read from an
external snapshot via each bs_channel. They are removed as bs_channels
are destroyed and blobs are closed.
Change-Id: I97aebe5a2f3584bfbf3a10ede8f3128448d30d6e
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14974
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Add an API to easily determine if a blob is an esnap clone, similar to
what already exists for snapshot, clone, and thin_provisioned.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: Ie07cd09b30513893e82f1c85e94a24a93c79d71e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/16862
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Mellanox Build Bot
When a sequence is used to perform IO on an esnap clone, differenent
channels will be needed for the blobstore device and the esnap device.
No special esnap handling is required when a sequence is used to perform
IO directly on the blobstore device.
This commit splits bs_sequence_start() into bs_sequence_start_bs() and
bs_sequence_start_blob() to handle these two scenarios. A later commit
introduces special handling of ensap clone blobs.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I3a6f46640cdb7fdc380bf557736638f1b39f05e3
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/17172
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Mellanox Build Bot
External snapshots have a slightly more complicated cleanup of
back_bs_dev. This moves all calls to back_bs_dev->destroy() into a
function so that this more complicated cleanup can have a single
implementation.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I78460aa3877481788118e2b0b76931dcf5c56338
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14972
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
When consumers open a blob with spdk_bs_open_blob_ext(), they can set
esnap_ctx in struct spdk_blob_open_opts to have that context passed
to bs->external_bs_dev_create().
Change-Id: I0c1a9cec0e5aed5ef2a7143103e822cbe400aabb
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14971
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
During resize, we correctly determine if we have enough
md_pages for new extent pages, before proceeding with
actually allocating clusters and associated extent
pages.
But during actual allocation, we were incrementing
the lfmd output parameter, which was incorrect.
Technically we should increment it any time
bs_allocate_cluster() allocated an md_page. But
it's also fine to just not increment it at the
call site at all - worst case, we just check that
bit index again which isn't going to cause a
performance problem.
Also add a unit test that demonstrated the original
problem, and works fine with this patch.
Fixes issue #2932.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iba177a66e880fb99363944ee44d3d060a44a03a4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/17150
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: 阿克曼 <lilei.777@bytedance.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Community-CI: Mellanox Build Bot
If blob_freeze_io() is called twice in a row,
and the second time occurs before the for_each_channel
for the first completes, the second caller will
receive its callback too soon.
Instead just simplify the whole process, always do
the for_each_channel and don't try to optimize it
at all. These are infrequent operations - correctness
and simplicity are in order.
A few additional changes:
1) Make same changes for unfreeze path.
2) Add blob_verify_md_op() calls, just to be sure
these are only called from md_thread. This was
already checked in calling functions, but as these
functions get called from new code paths (i.e.
esnap clones) it can't hurt to add additional
checks.
3) Add unit test that failed with original code, but
passes with this patch.
Fixes issue #2935.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ibefba554547ddf3e26aaabfa4288c8073d3c04ff
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/17148
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Mike Gerdts <mgerdts@nvidia.com>
Community-CI: Mellanox Build Bot
When a blobstore consumer creates or loads a blobstore, it should be
able to set a per-blobstore context pointer that will be passed back to
the consumer via bs->esnap_bs_dev_create().
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I59c0ebe21eaf65c3d79a4ac3469715283f56313a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14970
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This is the beginning of support for external snapshots. An external
snapshot is a read-only blobstore device (struct spdk_bs_dev) that can
be used as a blob's back device. Normally a blob will have no back
device (a normal blob), a zeroes back device (a thin provisioned blob),
or a blob back device (a clone blob). When a blob has an external
snapshot ("esnap") as its back device, it is called an esnap clone.
With this patch, esnap clones can be created but they are not yet
useful. Subsequent patches in the series will plumb the IO path, enable
various features, and allow lvol bdevs to be esnap clones.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I29206b628a2b03b6386a88532565e228df988e0e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14969
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
When blobstore is shutdown unexpectedly the super block should
already be marked as dirty. Only when proper blobstore unload
happens, the super block is marked as clean.
Super block is not marked as dirty on blobstore load,
but on first action that starts to modify the metadata.
At this time it only happens through blob persist,
which is fine for creation/deletion of blobs or
their modification (resize/xattr).
It works for cluster allocation with extent_table disabled,
and when extent page needs to be allocated.
Yet it fails for cases when no new extent page is required.
It will result in not marking blobstore as dirty and
then fail when loading a particular blob due to mismatch
between used_clusters and contents of extent page.
To fix that, the blobstore is now marked dirty on a very first
extent page update since blobstore load.
Fixes#2830
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Ied37ecf90d46e1bc51b22c323dce278a0fa88f72
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/16179
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This patch makes marking blobstore as dirty, separate from
persist process. Adding a context strucutre and new single
callback once completed.
It is being as part of refactor to allow marking blobstore
dirty when writing out extent page - see #2830.
Change-Id: Ie2e9cc32860697e0e747939842ab04f48fbff49b
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/16328
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Patch https://review.spdk.io/gerrit/c/spdk/spdk/+/16328,
introduced a refactor for persist path. No functional change
should occur with it, but the code layout after compilation
(path length) might have changed.
It resulted in unrelated scan-build failure:
https://ci.spdk.io/results/autotest-per-patch/builds/95746/archive/scanbuild-vg-autotest/scan-build/report-d08e76.html#EndPath
Tried to replicate the issue without the above patch,
by increasing maxloop or -analyze-headers in scan-build.
Didn't result in any new failures in blobstore.
This seemed like a false positive, so it was verified:
https://review.spdk.io/gerrit/c/spdk/spdk/+/16333/
With no other options, an assert is added only to the
function where the false positive occured.
scan-build log for posterity:
blobstore.c:1062:58: warning: Division by zero [core.DivideZero]
desc_extent_rle->extents[extent_idx].cluster_idx = lba /
lba_per_cluster;
~~~~^~~~~~~~~~~~~~~~~
blobstore.c:1079:58: warning: Division by zero [core.DivideZero]
desc_extent_rle->extents[extent_idx].cluster_idx = lba /
lba_per_cluster;
~~~~^~~~~~~~~~~~~~~~~
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I9dd729fa13ce1c9bbcb91e4326658e2b4e326e6d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/16335
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The FIELD_OK macro in blob_open_opts_copy() should consider offsets in
struct spdk_blob_open_opts, not struct spdk_blob_opts.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I62e22acbe7dfb994453a379c92f78b7e9bc7fc13
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14962
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Blob IDs are sequentially assigned starting at 0x100000000. When
debugging with a small number of blob IDs, it is much more intuitive to
see blob ID 0x100000000 rather than blob ID 4294967296.
In commit 76a577b082 a similar change was
made to blobcli.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: Ic5321a83b57cf8c9f8df48cd424a926b6fec4ba8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14963
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Many parts of the blobstore.c seem to have gone with the assumption that
blob creation, deletion, etc. all happen on the md thread. This
assumption would allow modification of the bs->used_md_pages and
bs->used_clusters bit arrays without holding a lock. Placing
"assert(spdk_get_thread() == bs->md_thread)" in bs_claim_md_page() and
bs_claim_cluster() show that each of these functions are called on other
threads due writes to thin provisioned volumes.
This problem was first seen in the wild with this failed assertion:
bs_claim_md_page: Assertion
`spdk_bit_array_get(bs->used_md_pages, page) == false' failed.
This commit adds "assert(spdk_spin_held(&bs->used_lock))" in those
places where bs->used_md_pages and bs->used_lock are modified, then
holds bs->used_lock in the places needed to satisfy these assertions.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I0523dd343ec490d994352932b2a73379a80e36f4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15953
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Community-CI: Mellanox Build Bot
A future commit will add to the complexity when returning with a
non-zero value. Rather than further complicating the several error
return locations, all affected error returns are handled after the error
label.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I56e8e338b0560f849399c085d0bb07efb7df26fb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15983
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Community-CI: Mellanox Build Bot
A future commit may need to release a lock before returning. This
refactors blob_resize() to always return at end of the function using an
out label and goto.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I671fbdbe0e3b766c264c45589dad3a864ba1f192
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15982
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Community-CI: Mellanox Build Bot
Convert bs->used_lock to a spinlock. This is being done to help with
the debugging and fixing of a race that has led to a failed assertion in
bs_claim_md_page.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I11b80096de022f79a217c65d787ee57ca54240f9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15952
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Community-CI: Mellanox Build Bot
The bs->used_clusters_mutex protects used_md_pages, used_clusters, and
num_free_clusters. A more generic name is appropraite. The next patch in
this series will convert it from a mutex to a spinlock and having
"mutex" or "spin" in the name is of little help to maintainers, so a
more generic name is used.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I5ce7b85b84fdec2a0c5d2ac959e0109e1d80c7f5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15981
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Community-CI: Mellanox Build Bot
Found with misspell-fixer.
Signed-off-by: Michal Berger <michal.berger@intel.com>
Change-Id: If062df0189d92e4fb2da3f055fb981909780dc04
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15207
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Copy-on-write happens when cluster is written for the first time for
thin provisioned volume. Currently it is implemented as two separate
requests to underlying bdev: read of the whole cluster to bounce
buffer and then write of this buffer to the new location on the same
underlying bdev.
This patch improves copy-on-write flow by utilizing copy command of
underlying bdev if it is supported. In this case we have just one
request to bdev and don't need the bounce buffer.
Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com>
Change-Id: I92552e0f18f7a41820d589e7bb1e86160c69183f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14351
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
per Intel policy to include file commit date using git cmd
below. The policy does not apply to non-Intel (C) notices.
git log --follow -C90% --format=%ad --date default <file> | tail -1
and then pull just the 4 digit year from the result.
Intel copyrights were not added to files where Intel either had
no contribution ot the contribution lacked substance (ie license
header updates, formatting changes, etc). Contribution date used
"--follow -C95%" to get the most accurate date.
Note that several files in this patch didn't end the license/(c)
block with a blank comment line so these were added as the vast
majority of files do have this last blank line. Simply there for
consistency.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: Id5b7ce4f658fe87132f14139ead58d6e285c04d4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15192
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Mellanox Build Bot
These functions start from a given offset and seek for first
io_unit belonging to an allocated cluster or first io_unit
belonging to an unallocated cluster
Signed-off-by: Damiano Cipriani <damiano.cipriani@suse.com>
Change-Id: I0c632e2b3dfd2e96aa22e21796e25a36f2f55f9f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14360
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Mellanox Build Bot
Writing to unallocated cluster triggers copy-on-write sequence. If
this cluster is backed by zeroes device we can skip the copy part. For
a simple thin provisioned volume copy this shortcut is already
implemented because `blob->parent_id == SPDK_BLOBID_INVALID`. But this
will not work for thin provisioned volumes created from snapshot. In
this case we need to traverse the whole stack of underlying
`spdk_bs_dev` devices for specific cluster to check if it is zeroes
backed.
This patch adds `is_zeroes` operation to `spdk_bs_dev`. For zeroes
device it always returns 'true', for real bdev (`blob_bs_dev`) always
returns false, for another layer of `blob_bs_dev` does lba conversion
and forwards to backing device.
In blobstore's cluster copy flow we check if cluster is backed by
zeroes device and skip copy part if it is.
Signed-off-by: Evgeniy Kochetov <evgeniik@nvidia.com>
Change-Id: I640773ac78f8f466b96e96a34c3a6c3c91f87dab
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/13446
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
When decoupling a snapshot from its parent, we need to clear its parent.
So we should remove the xattr BLOB_SNAPSHOT. Modifying the xattrs of a blob
only works if its metadata are not in read-only mode.
By default, a snapshot is in read-only mode so this operation fails. When we
later want to delete the snapshot, we will see that it has a parent, so we will
try to remove the snapshot from its parent's clones list. This will cause a
crash.
The fix is to remove the BLOB_SNAPSHOT xattr only after setting the snapshot's
metadata in rw mode.
Signed-off-by: Alex Michon <amichon@kalrayinc.com>
Change-Id: I80efa6dd3dcb38b4c738ce2e97aa2ffc281cefa5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/13723
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
The bdev_lvol_grow_lvstore will grow the lvstore size if the undering
bdev size is increased. It invokes spdk_bs_grow internally. The
spdk_bs_grow will extend the used_clusters bitmap. If there is no
enough space resereved for the used_clusters bitmap, the api will
fail. The reserved space was calculated according to the num_md_pages
at blobstore creating time.
Signed-off-by: Peng Yu <yupeng0921@gmail.com>
Change-Id: If6e8c0794dbe4eaa7042acf5031de58138ce7bca
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9730
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reserve space for used_cluster bitmap. The reserved space is calculated
according to the num_md_pages. The reserved space would be used when
the blobstore is extended in the future.
Add the num_md_pages_per_cluster_ratio parameter to the
bdev_lvol_create_lvstore API. Then calculate the num_md_pages
according to the num_md_pages_per_cluster_ratio and bdev total size, then
pass the num_md_pages to the blobstore.
Signed-off-by: Peng Yu <yupeng0921@gmail.com>
Change-Id: I61a28a3c931227e0fd3e1ef6b145fc18a3657751
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/9517
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
In SPDK, declarations have the return type on the same line. Definitions
have the return type on a separate line. Astyle has an option for
enforcing this. Unfortunately, it seems to have two bugs:
1) It doesn't work correctly at all on C++ files.
2) It often fails on functions that return enums, or long type names
Deal with 1) by adjusting the check_format.sh script to only tell astyle
to fix return type line breaks for C files and not C++. Deal with 2) by
adding a few typedefs to work around the problem.
Change-Id: Idf28281466cab8411ce252d5f02ab384166790c6
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/13437
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Many open source projects have moved to using SPDX identifiers
to specify license information, reducing the amount of
boilerplate code in every source file. This patch replaces
the bulk of SPDK .c, .cpp and Makefiles with the BSD-3-Clause
identifier.
Almost all of these files share the exact same license text,
and this patch only modifies the files that contain the
most common license text. There can be slight variations
because the third clause contains company names - most say
"Intel Corporation", but there are instances for Nvidia,
Samsung, Eideticom and even "the copyright holder".
Used a bash script to automate replacement of the license text
with SPDX identifier which is checked into scripts/spdx.sh.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iaa88ab5e92ea471691dc298cfe41ebfb5d169780
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12904
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: <qun.wan@intel.com>
When a new cluster is added to a thin provisioned blob,
md_page is allocated to update extents in base dev
This memory allocation reduces perfromance, it can
take 250usec - 1 msec on ARM platform.
Since we may have only 1 outstainding cluster
allocation per io_channel, we can preallcoate md_page
on each channel and remove dynamic memory allocation.
With this change blob_write_extent_page() expects
that md_page is given by the caller. Sicne this function
is also used during snapshot deletion, this patch also
updates this process. Now we allocate a single page
and reuse it for each extent in the snapshot.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I815a4c8c69bd38d8eff4f45c088e5d05215b9e57
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12129
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
These function accept optional spdk_blob_ext_io_opts
structure. If this structure is provided by the user
then readv/writev_ext ops of base dev will be used
in data path
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I370dd43f8c56f5752f7a52d0780bcfe3e3ae2d9e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11371
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
When snapshot is created, the new blob is loaded and
examined for BLOB_SNAPSHOT xattr in blob_load_backing_dev
function. At this step there is no such xattr, so zeroes
back_bs_dev is created. Later snapshot inherits back_bs_dev
from original blob, so previously created back_bs_dev can
be lost.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I90cc9b02f56598d8c5c7fe00409f571fba0aa91a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11384
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Prior to this patch bs_user_op_abort() always
returned EIO back to the bdev layer.
This is not sufficient for ENOMEM cases where
the I/O should be resubmitted by the bdev layer.
ENOMEM for bs_sequence_start() in bs_allocate_and_copy_cluster()
specifically addresses issue #2306.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Icfb0ce9ca20e1c4dd1668ba77d121f7091acb044
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11764
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This confirms that the error path can return more efficient
without memcpy such as xattr->name.
Signed-off-by: Dong Yi <dongx.yi@intel.com>
Change-Id: Ic2ed28121ed76eda9d7b24ed6c4c95b0588817de
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11654
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
When printing metadata pages, blobcli could print the start LBA to aid
someone that needs to debug with dd and od.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I380bd923dfcd1149e3f705dd0ec0ab46b1000019
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11260
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
When blobcli is printing blob metadata, extent tables are now printed.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: Ie748a2f2b3fbc3e6e5ee06a0f2eb9bd491bfed46
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11259
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
When printing blob metadata via blobcli, descriptor types that do not
have full dump support should not be silently ignored. This prints a
message that indicates an unsupported descriptor type was encountered
so that the person debugging with blobcli knows that there is more
metadata present.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: Id30b671fd9dee1ec12e10625eb2af4c1e43eda27
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11258
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
When blobcli prints blob metadata, it will now Print invalid_flags,
data_ro_flags, and md_ro_flags when printing blob metadata. The
complete mask is printed as well as the meaning of each bit or set of
bits. If unknown bits are set, that will be indicated in the output
as well.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I743a843a5d23b0e81c04482304515ab3c3b4c7bc
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11257
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
In some scenarios, a split IO can immediately complete. For
example, a very large unmap operation to a newly thin-provisioned
blob has no operations to perform, so the batch for its operation
immediately completes.
But if it immediately completes, we can't recursively submit
the next split IO. So use variables in the context structure
to detect when an operation immediately completes, to allow
it to unwind and submit the next operation without recursing.
Fixes issue #2347.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I8e4c121190c7d08152aa8de20cf6abc55b5edc46
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11388
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
No functional change here, this only prepares this function for
some functional changes in the next patch. By adding the
do/while loop here we reduce the amount of whitespace changes
in the next patch.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: I09d64fd1fb69ee232af1d298619c762e562fdc79
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11387
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>