As the blobstore is being unlaoded, async esnap channel destructions may
be in flight. In such a case, spdk_bs_unload() needs to defer the unload
of the blobstore until channel destructions are complete.
The following commands lead to the illustrated states.
bdev_malloc_create -b malloc0
bdev_lvol_clone_bdev lvs1 malloc0 eclone
.---------. .--------.
| malloc0 |<--| eclone |
`---------' `--------'
bdev_lvol_snapshot lvs1/eclone snap
.---------. .------. .--------.
| malloc0 |<--| snap |<--| eclone |
`---------' `------' `--------'
bdev_lvol_clone lvs1/snap eclone
.--------.
,-| eclone |
.---------. .------.<-' `--------'
| malloc0 |<--| snap |
`---------' `------'<-. .-------.
`-| clone |
`-------'
As the blobstore is preparing to be unloaded spdk_blob_unload(snap) is
called once for eclone, once for clone, and once for snap. The last of
these calls happens just before spdk_bs_unload() is called.
spdk_blob_unload() needs to destroy channels on each thread. During this
thread iteration, spdk_bs_unload() starts. The work performed in the
iteration maintains a reference to the blob, and as such it
spdk_bs_unload() cannot do its work until the iteration is complete.
Change-Id: Id9b92ad73341fb3437441146110055c84ee6dc52
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14975
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
The channel passed to blob IO operations is useful for tracking
operations within the blobstore and the bs_dev that the blobstore
resides on. Esnap clone blobs perform reads from other bs_devs and
require per-thread, per-bs_dev channels.
This commit augments struct spdk_bs_channel with a tree containing
channels for the external snapshot bs_devs. The tree is indexed by blob
ID. These "esnap channels" are lazily created on the first read from an
external snapshot via each bs_channel. They are removed as bs_channels
are destroyed and blobs are closed.
Change-Id: I97aebe5a2f3584bfbf3a10ede8f3128448d30d6e
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14974
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
When a blobstore consumer creates or loads a blobstore, it should be
able to set a per-blobstore context pointer that will be passed back to
the consumer via bs->esnap_bs_dev_create().
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I59c0ebe21eaf65c3d79a4ac3469715283f56313a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14970
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This is the beginning of support for external snapshots. An external
snapshot is a read-only blobstore device (struct spdk_bs_dev) that can
be used as a blob's back device. Normally a blob will have no back
device (a normal blob), a zeroes back device (a thin provisioned blob),
or a blob back device (a clone blob). When a blob has an external
snapshot ("esnap") as its back device, it is called an esnap clone.
With this patch, esnap clones can be created but they are not yet
useful. Subsequent patches in the series will plumb the IO path, enable
various features, and allow lvol bdevs to be esnap clones.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I29206b628a2b03b6386a88532565e228df988e0e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14969
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Convert bs->used_lock to a spinlock. This is being done to help with
the debugging and fixing of a race that has led to a failed assertion in
bs_claim_md_page.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I11b80096de022f79a217c65d787ee57ca54240f9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15952
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Community-CI: Mellanox Build Bot
The bs->used_clusters_mutex protects used_md_pages, used_clusters, and
num_free_clusters. A more generic name is appropraite. The next patch in
this series will convert it from a mutex to a spinlock and having
"mutex" or "spin" in the name is of little help to maintainers, so a
more generic name is used.
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Change-Id: I5ce7b85b84fdec2a0c5d2ac959e0109e1d80c7f5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15981
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Community-CI: Mellanox Build Bot
per Intel policy to include file commit date using git cmd
below. The policy does not apply to non-Intel (C) notices.
git log --follow -C90% --format=%ad --date default <file> | tail -1
and then pull just the 4 digit year from the result.
Intel copyrights were not added to files where Intel either had
no contribution ot the contribution lacked substance (ie license
header updates, formatting changes, etc). Contribution date used
"--follow -C95%" to get the most accurate date.
Note that several files in this patch didn't end the license/(c)
block with a blank comment line so these were added as the vast
majority of files do have this last blank line. Simply there for
consistency.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: Id5b7ce4f658fe87132f14139ead58d6e285c04d4
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/15192
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Mellanox Build Bot
This function returns the number of io_units per cluster
Signed-off-by: Damiano Cipriani <damiano.cipriani@suse.com>
Change-Id: I8f33d24a63876a0a918830b9eeaa69a91ff21193
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/14431
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Community-CI: Mellanox Build Bot
Many open source projects have moved to using SPDX identifiers
to specify license information, reducing the amount of
boilerplate code in every source file. This patch replaces
the bulk of SPDK .c, .cpp and Makefiles with the BSD-3-Clause
identifier.
Almost all of these files share the exact same license text,
and this patch only modifies the files that contain the
most common license text. There can be slight variations
because the third clause contains company names - most say
"Intel Corporation", but there are instances for Nvidia,
Samsung, Eideticom and even "the copyright holder".
Used a bash script to automate replacement of the license text
with SPDX identifier which is checked into scripts/spdx.sh.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Iaa88ab5e92ea471691dc298cfe41ebfb5d169780
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12904
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: <qun.wan@intel.com>
When a new cluster is added to a thin provisioned blob,
md_page is allocated to update extents in base dev
This memory allocation reduces perfromance, it can
take 250usec - 1 msec on ARM platform.
Since we may have only 1 outstainding cluster
allocation per io_channel, we can preallcoate md_page
on each channel and remove dynamic memory allocation.
With this change blob_write_extent_page() expects
that md_page is given by the caller. Sicne this function
is also used during snapshot deletion, this patch also
updates this process. Now we allocate a single page
and reuse it for each extent in the snapshot.
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Change-Id: I815a4c8c69bd38d8eff4f45c088e5d05215b9e57
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12129
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
bs_back_dev_lba_to_io_unit() and bs_num_pages_to_cluster_boundary() are
unused inline functions. The last consumer (by the earlier _spdk_* name)
was removed in commit 6609b776.
Change-Id: Ib1babfed8002fb44451b337aa0db66c15a6805d2
Signed-off-by: Mike Gerdts <mgerdts@nvidia.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11561
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
If blobs held in a blobstore are opened a lot, lookup
by RB_TREE will be much more efficient.
Change-Id: I7075b95c597a958e7bb10890f803191309532021
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10917
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
We find a few files to get the size of a member of a struct. How to
do it is a little complex. So add a macro to do it will be helpful
to read the current code and develop new features.
lib/dif had used member_size() internally but Linux use sizeof_member()
as the macro. Besides, SPDK have used upper case letters for similar
macros, SPDK_CONTAINEROF() and SPDK_COUNTOF(). Hence spdk_member_size()
may be good but propose SPDK_SIZEOF_MEMBER() as the macro.
Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I2179c845a3b75fb71aa039075cc4dfd30617b898
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/8738
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
When blob persist starts, there can already be multiple
of such requests pending. It is possible to complete
a set of persists at once, if blob state after their
execution would be the same. This is the case when
persists are already pending when a particular persist
request is started.
This patch implements such mechanism by introducing
persists_to_complete queue, containing entries that
were previously queued up before starting the current
persist request. If there are any entries in this queue,
further requests are put into pending_persists.
When first request from persists_to_complete is persisted,
completions are issued for all requests on that queue at once.
If at that point there are any new entries on pending_persists,
all of them are put into persists_to_complete. Persist process is started
again with the first request from that queue.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I10063e55d6f821b1863de016d3148da6a719a422
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/7643
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
We still need to be able to explicitly set specific
bits in the cluster array during initialization and
loading (especially recovery), so we use a bit_array
during load, and then convert it to a bit_pool just
before calling the user's cmopletion callback.
This gives a roughly 300% improvement over baseline
on a benchmark which does continuous resize operations.
The benefit is primarily from saving the lowest free
bit rather than having to always start at bit 0. We
may be able to further improve this by saving extents
in the bit pool as well, although after this patch,
the benchmark shows other hot spots different from the
bit search.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Idb1d75d8348bc50560b1f42d49dbe4d79d024619
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3975
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This can speed up the check for whether a blob is already open
significantly.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: If32b0b1f168fcdb58e61df6281d7b7520725a195
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2781
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Operation of locating right lba from cluster map
is done on I/O path. Instead of division and multiplication,
perform bit shift operation.
Bit shift is only used when pages per cluster is power of 2.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Ic3ed7ec0a82867a8a4bc6391785b9d40c800aacb
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1724
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
We have an unofficial naming convention that the
spdk_ namespace is reserved for public API functions only.
This patch is attempting to bring the blob library into compliance
with that naming convention.
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: Ie298e41d1b741dae01744826c208378ee60f9d0a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1700
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Community-CI: Broadcom CI
It is possible for multiple blob persists to affect one another.
Either by blob->state changes or blob mutable data.
Safe way to prevent that is to queue up the persists.
Next persist will be executed only after previous one completes.
Fixes#1170Fixes#960
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Iaf95d9238510100b629050bc0d5c2c96c982a60c
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/776
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
With recent changes to extent on-disk metadata format,
new format (Extent Pages) is not backwards compatible.
Meanwhile old format (Extent RLE) is backwards
compatible with older SPDK applications.
Summing up:
Blobstore created pre SPDK 20.01 can only use Extent RLE.
Blobstore created starting with SPDK 20.01 can use both,
Extent Pages and Extent RLE specified by use_extent_table opts.
When use_extent_table is set to true, invalid flag for it is set.
SPDK application pre 20.01, will not load such blob.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: If14ebd03f19eb581d71dcb46191e099336655189
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/483220
Community-CI: SPDK CI Jenkins <sys_sgci@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Size of a blob (thus size of clusters array in mutable data)
is known from extent table descriptor.
Extent pages were read sequentially in order they were
placed in extent table. This meant that cluster
array could have been filled up from beginning to end.
Yet reading extent pages in any other order,
would result in incorrect placement of clusters.
This patch adds first cluster index that is contained within
each extent page. This will allow to read/write
multiple extent pages in parallel, since
we will know where in clusters array to put the cluster idxs.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Ib6b9332111cd93f990d057dc60624152907dd87f
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/482701
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This is more adequate name, since this value if first read from
Extent Table descriptor. Then decreased when iterating over entries in
extent table and extent pages are read.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Ib188c524b8488b38d4de063a9970dcfdf49c9acd
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/482600
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Right now output from _spdk_bs_cluster_to_extent_page()
is used to determine whether the exten_table is used at all.
If NULL pointer was returned this meant that extent table
was not allocated, even if the code might suggest just
checking if we overran the array.
To make it more obvious, the _spdk_bs_cluster_to_extent_page()
now only asserts the extent_table_id.
blob->use_extent_table is now always used to determine the
serialization path.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I9d2630645213539bae5cd1d72e5f9b878f53c2bc
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/482599
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Force number of Extents to fit into Extent Page to
be power of 2, in order to simplify calculations
on cluster allocations.
At this time SPDK_BS_PAGE_SIZE is 4k, which would
results in SPDK_EXTENTS_PER_EP to be 512.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I7e09d92b00dfe5c12d7dd10ac0fc5a9a10d526ac
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/472041
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Similar to EXTENT_RLE, this descriptor holds LBA of clusters.
Difference is that EXTENT is kept in separate md pages,
and only single EXTENT will be updated on cluster allocation.
This patch adds the EXTENT processing, which is not used
until following patch.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Ifbac23db7ca3e7c8c91cee01018f20071f0d5160
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470014
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I3e49c398d9bdf9f4eacba65061cc7fe4b300fb56
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/479963
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
With this patch extent pages array will change it size accordingly
to size of the blob. Similar to clusters, only resizing up
is done on blob resize. Shrinking is done on persisting the blob.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Id7f7c81efbd96af414fce9fc4045cbb476cc93a6
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/479962
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Added new descriptor SPDK_MD_DESCRIPTOR_TYPE_EXTENT_TABLE.
Extent Table will hold md page offsets for new Extent Page descriptor.
Entries in Extent Table are run-length encoded 0's as unallocated
Extent Page descriptors.
Additionally total number of clusters is persisted in each Extent
Table descriptor. This is because there is no guarantee that
last Extent Page of a blob will be allocated.
Even if number of Extents per Extent Page is always the same,
Extent Page can hold less Extents than that.
This patch does not add more metadata on disk right now.
Only added descriptor parsing/serialization and applicable fields
to store it in run time.
Following patches are going to implement TODO's added in this patch.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Iac5d8f00ddfc655c507bc26d69d7adf8495074e9
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466920
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Since further patches will be adding new descriptors
that are related to cluster layout throughout the blobstore,
add description for existing descriptor too.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I722eb633445685789d5185ed59dfc910f76b109f
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/481724
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This is an additional option that can be passed when creating
a blob.
When opts->enable_extent_pages is set to false (current default),
only EXTENT_RLE should be persisted on sync.
During blob load, when EXTENT_RLE is present in md,
blob->extent_rle_found is set to true.
When opts->enable_extent_pages is set to true,
only EXTENT_TABLE and EXTENT_PAGES should be persisted on sync.
During blob load, when EXTENT_TABLE is present in md,
blob->extent_table_found is set to true.
It is possible to find neither EXTENT_* descriptor when loading a blob.
This means that blob length is 0 and EXTENT_RLE was supposed to be used.
Yet none were persisted due to lack of clusters.
In such case blob->use_extent_table is set to true after finishing
blob load.
When parsing metadata ends, if extent_table_found is set - then
support for extent_table is enabled. All other cases disable it.
At this time path for Extent Pages is not implemented, so it should
not be used.
Later in the series, it will become the default path for serialization.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I2146da6130a0645e686ab02a3b5d2d86a7d35a1f
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/479853
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Accept a clear method option on blob create by adding clear_method
to the opts structure passed in to _spdk_bs_create_blob(). Store
these 2 bits in md_ro_flags so that earlier versions without an
understanding of these bits can not alter metadata.
The new metadata values will be used later in the series.
Signed-off-by: paul luse <paul.e.luse@intel.com>
Change-Id: I5440645ca20b426778d13b2e544b65dc2b3b83c7
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/472204
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The _spdk_bs_page_to_lba() [without 'md'] is only for translating the
pages on the blobstore to lba they are at. Those pages start at
the begining of the device and cover all of it. Thus simple
math is enough to translate those.
It is used to calculate lba_count for set of pages as well.
Meanwhile there are 'md_pages' which are the same pages as for
the above, but their count start at bs->md_start.
Which is right after super_block and couple pages for bit masks.
This patch creates new _spdk_bs_md_page_to_lba() that is more
explicit in what page number is passed. Hopefully avoiding
confusion when reading which page number refers to which
'type' of page.
Exception to the that is _spdk_bs_dump_read_md_page(), where
blobstore is not actually loaded (md_start from super block
is not copied to bs structure).
Additionaly providing assert to catch errors on debug builds.
Making the check in _spdk_blob_load_cpl() for max_md_lba obsolete.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I66bbca55b5ca3d6794c462d50177e6037ddbefa6
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/479017
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
In future patches new type of extents will be added,
for compatibility the current extent type will be still
handled in the code.
To signify the difference between those two types,
current type is renamed to SPDK_MD_DESCRIPTOR_TYPE_EXTENT_RLE.
Along with any variables throughout the code,
to make it clear which ones are used.
There are no functional changes in this patch.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I7186ccc452d200036188abf1dcea9660dcedee72
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/468230
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Length of xattr descriptor is equal to length of xattr struct,
xattr name and the len of stored value.
There is no limit to how much can be stored in memory for xattr.
On disk xattr size is limited to single page and within that to
max descriptors that can fit in it.
This size is known at compile time.
Before this patch it was possible to add xattr exceeding
what was possible to be written to disk. This caused issues
when serializing the metadata during spdk_blob_sync_md()
or spdk_blob_close(). Making those fail without specific info
to the user and not actually writting such descriptor.
Since maximum length of xattr descriptor is known at compile time,
this patch compares against this value when setting the xattr.
It will immediately report back to user with error, and will
not store xattr in memory (thus not serialize it).
This patch should not affect any backward compatibility for blobs.
Too large xattrs weren't written to disk before,
API for blobstore stays the same - only reporting ENOMEM
when it should.
Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I6f4af4d079e47f084e20d7a4969d9a78ec1f8610
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/460450
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Maciej Szwed <maciej.szwed@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
In some cases user may want to flag blob for removal
then do some operations (before removing it) and while
it happens there might be power failure. In such cases
we should remove this blob on next blobstore load.
Example of such usage is delete snapshot functionality
that will be introduced in upcoming patch.
Signed-off-by: Maciej Szwed <maciej.szwed@intel.com>
Change-Id: I85f396b73762d2665ba8aec62528bb224acace74
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/453835
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
This is a part of future changes to block blob operations
that may cause race conditions between each other.
Signed-off-by: Maciej Szwed <maciej.szwed@intel.com>
Change-Id: Ia728d1fc207375ddcb3b70b5081ddcffa9f99027
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/449789
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Some users require to do write zeroes operation when
erasing data on lvol. Currently the default method is
unmap. This patch adds flag to spdk_rpc_construct_lvol_bdev
call that changes default erase method. This is also a base
implementation for possible future function for erasing
data on lvol bdev.
Signed-off-by: Maciej Szwed <maciej.szwed@intel.com>
Change-Id: I8964f170b13c2268fe3c18104f7956c32be96040
Reviewed-on: https://review.gerrithub.io/c/441527
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Signed-off-by: Piotr Pelplinski <piotr.pelplinski@intel.com>
Change-Id: I994b5d46faffd34430cb39e66225929c4cba90ba
Reviewed-on: https://review.gerrithub.io/414935
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
It will save the space of spdk_xattr when put uint16_t after
uint32_t
Change-Id: Ie0712d8c3b16d90fc354847509fd87e1ffd93916
Signed-off-by: Ziye Yang <optimistyzy@gmail.com>
Reviewed-on: https://review.gerrithub.io/419453
Chandler-Test-Pool: SPDK Automated Test System <sys_sgsw@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Save the original size of the disk to metadata when it is first created.
On load verify that the disk did not change size.
Signed-off-by: Piotr Pelplinski <piotr.pelplinski@intel.com>
Change-Id: I535940ee188425ee3b394effd99653cc073d541e
Reviewed-on: https://review.gerrithub.io/410896
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
4KiB page size * UINT32_MAX = 16TiB - so we must use
a uint64_t for any blobstores on backing devices of
16TiB or greater.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ief13cf06d413477dc8ab4f9fe0ff4c0631566c00
Reviewed-on: https://review.gerrithub.io/416448
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Make sure we don't truncate the LBA when using it to serialize the
cluster array into an extent list.
We also need to add an explicit cast in _spdk_bs_cluster_to_lba
to ensure the conversion doesn't get truncated. While here, do
the same cast for _spdk_bs_cluster_to_page.
Change-Id: If4e65ed86550e39dfa39826930dfafac158d519c
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Reviewed-on: https://review.gerrithub.io/416231
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Signed-off-by: Piotr Pelplinski <piotr.pelplinski@intel.com>
Change-Id: I23c34d4dcb542aa9ab3fa8cb734cf9cc0e0fc5da
Reviewed-on: https://review.gerrithub.io/409144
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Signed-off-by: Piotr Pelplinski <piotr.pelplinski@intel.com>
Change-Id: I6182eb3a77d23db7088703492d71349e3a4b6460
Reviewed-on: https://review.gerrithub.io/399366
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The patch disables writing dirty bit during blobstore loading.
Instead, dirty bit is written prior to the first metadata update.
Signed-off-by: Piotr Pelplinski <piotr.pelplinski@intel.com>
Change-Id: I7be81009a99f09048bf23749c8f6ef5e9f7b3751
Reviewed-on: https://review.gerrithub.io/410884
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Maciej Szwed <maciej.szwed@intel.com>