doc: Update FTL documentation
Update FTL documentation according to recent changes in API. Change-Id: I86de8c115f2dedaff5f281d17a35bce34c35cef0 Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/481807 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This commit is contained in:
parent
49d0baee65
commit
aa44b69a52
135
doc/ftl.md
135
doc/ftl.md
@ -1,8 +1,9 @@
|
||||
# Flash Translation Layer {#ftl}
|
||||
|
||||
The Flash Translation Layer library provides block device access on top of non-block SSDs
|
||||
implementing Open Channel interface. It handles the logical to physical address mapping, responds to
|
||||
the asynchronous media management events, and manages the defragmentation process.
|
||||
The Flash Translation Layer library provides block device access on top of devices
|
||||
implementing bdev_zone interface.
|
||||
It handles the logical to physical address mapping, responds to the asynchronous
|
||||
media management events, and manages the defragmentation process.
|
||||
|
||||
# Terminology {#ftl_terminology}
|
||||
|
||||
@ -10,32 +11,32 @@ the asynchronous media management events, and manages the defragmentation proces
|
||||
|
||||
* Shorthand: L2P
|
||||
|
||||
Contains the mapping of the logical addresses (LBA) to their on-disk physical location (PPA). The
|
||||
LBAs are contiguous and in range from 0 to the number of surfaced blocks (the number of spare blocks
|
||||
Contains the mapping of the logical addresses (LBA) to their on-disk physical location. The LBAs
|
||||
are contiguous and in range from 0 to the number of surfaced blocks (the number of spare blocks
|
||||
are calculated during device formation and are subtracted from the available address space). The
|
||||
spare blocks account for chunks going offline throughout the lifespan of the device as well as
|
||||
spare blocks account for zones going offline throughout the lifespan of the device as well as
|
||||
provide necessary buffer for data [defragmentation](#ftl_reloc).
|
||||
|
||||
## Band {#ftl_band}
|
||||
|
||||
Band describes a collection of chunks, each belonging to a different parallel unit. All writes to
|
||||
the band follow the same pattern - a batch of logical blocks is written to one chunk, another batch
|
||||
A band describes a collection of zones, each belonging to a different parallel unit. All writes to
|
||||
a band follow the same pattern - a batch of logical blocks is written to one zone, another batch
|
||||
to the next one and so on. This ensures the parallelism of the write operations, as they can be
|
||||
executed independently on a different chunks. Each band keeps track of the LBAs it consists of, as
|
||||
executed independently on different zones. Each band keeps track of the LBAs it consists of, as
|
||||
well as their validity, as some of the data will be invalidated by subsequent writes to the same
|
||||
logical address. The L2P mapping can be restored from the SSD by reading this information in order
|
||||
from the oldest band to the youngest.
|
||||
|
||||
+--------------+ +--------------+ +--------------+
|
||||
band 1 | chunk 1 +--------+ chk 1 +---- --- --- --- --- ---+ chk 1 |
|
||||
band 1 | zone 1 +--------+ zone 1 +---- --- --- --- --- ---+ zone 1 |
|
||||
+--------------+ +--------------+ +--------------+
|
||||
band 2 | chunk 2 +--------+ chk 2 +---- --- --- --- --- ---+ chk 2 |
|
||||
band 2 | zone 2 +--------+ zone 2 +---- --- --- --- --- ---+ zone 2 |
|
||||
+--------------+ +--------------+ +--------------+
|
||||
band 3 | chunk 3 +--------+ chk 3 +---- --- --- --- --- ---+ chk 3 |
|
||||
band 3 | zone 3 +--------+ zone 3 +---- --- --- --- --- ---+ zone 3 |
|
||||
+--------------+ +--------------+ +--------------+
|
||||
| ... | | ... | | ... |
|
||||
+--------------+ +--------------+ +--------------+
|
||||
band m | chunk m +--------+ chk m +---- --- --- --- --- ---+ chk m |
|
||||
band m | zone m +--------+ zone m +---- --- --- --- --- ---+ zone m |
|
||||
+--------------+ +--------------+ +--------------+
|
||||
| ... | | ... | | ... |
|
||||
+--------------+ +--------------+ +--------------+
|
||||
@ -51,15 +52,15 @@ metadata is split in two parts:
|
||||
|
||||
|
||||
head metadata band's data tail metadata
|
||||
+-------------------+-------------------------------+----------------------+
|
||||
|chk 1|...|chk n|...|...|chk 1|...| | ... |chk m-1 |chk m|
|
||||
|lbk 1| |lbk 1| | |lbk x| | | |lblk y |lblk y|
|
||||
+-------------------+-------------+-----------------+----------------------+
|
||||
+-------------------+-------------------------------+------------------------+
|
||||
|zone 1 |...|zone n |...|...|zone 1 |...| | ... |zone m-1 |zone m|
|
||||
|block 1| |block 1| | |block x| | | |block y |block y|
|
||||
+-------------------+-------------+-----------------+------------------------+
|
||||
|
||||
|
||||
Bands are being written sequentially (in a way that was described earlier). Before a band can be
|
||||
written to, all of its chunks need to be erased. During that time, the band is considered to be in a
|
||||
`PREP` state. After that is done, the band transitions to the `OPENING` state, in which head metadata
|
||||
Bands are written sequentially (in a way that was described earlier). Before a band can be written
|
||||
to, all of its zones need to be erased. During that time, the band is considered to be in a `PREP`
|
||||
state. After that is done, the band transitions to the `OPENING` state, in which head metadata
|
||||
is being written. Then the band moves to the `OPEN` state and actual user data can be written to the
|
||||
band. Once the whole available space is filled, tail metadata is written and the band transitions to
|
||||
`CLOSING` state. When that finishes the band becomes `CLOSED`.
|
||||
@ -103,7 +104,7 @@ servicing read requests from the buffer.
|
||||
|
||||
Since a write to the same LBA invalidates its previous physical location, some of the blocks on a
|
||||
band might contain old data that basically wastes space. As there is no way to overwrite an already
|
||||
written block, this data will stay there until the whole chunk is reset. This might create a
|
||||
written block, this data will stay there until the whole zone is reset. This might create a
|
||||
situation in which all of the bands contain some valid data and no band can be erased, so no writes
|
||||
can be executed anymore. Therefore a mechanism is needed to move valid data and invalidate whole
|
||||
bands, so that they can be reused.
|
||||
@ -123,13 +124,13 @@ long time ago) or due to read disturb (media characteristic, that causes corrupt
|
||||
blocks during a read operation).
|
||||
|
||||
Module responsible for data relocation is called `reloc`. When a band is chosen for defragmentation
|
||||
or an ANM (asynchronous NAND management) event is received, the appropriate blocks are marked as
|
||||
or a media management event is received, the appropriate blocks are marked as
|
||||
required to be moved. The `reloc` module takes a band that has some of such blocks marked, checks
|
||||
their validity and, if they're still valid, copies them.
|
||||
|
||||
Choosing a band for defragmentation depends on several factors: its valid ratio (1) (proportion of
|
||||
valid blocks to all user blocks), its age (2) (when was it written) and its write count / wear level
|
||||
index of its chunks (3) (how many times the band was written to). The lower the ratio (1), the
|
||||
index of its zones (3) (how many times the band was written to). The lower the ratio (1), the
|
||||
higher its age (2) and the lower its write count (3), the higher the chance the band will be chosen
|
||||
for defrag.
|
||||
|
||||
@ -137,9 +138,24 @@ for defrag.
|
||||
|
||||
## Prerequisites {#ftl_prereq}
|
||||
|
||||
In order to use the FTL module, an Open Channel SSD is required. The easiest way to obtain one is to
|
||||
emulate it using QEMU. The QEMU with the patches providing Open Channel support can be found on the
|
||||
SPDK's QEMU fork on [spdk-3.0.0](https://github.com/spdk/qemu/tree/spdk-3.0.0) branch.
|
||||
In order to use the FTL module, a device capable of zoned interface is required e.g. `zone_block`
|
||||
bdev or OCSSD `nvme` bdev.
|
||||
|
||||
## FTL bdev creation {#ftl_create}
|
||||
|
||||
Similar to other bdevs, the FTL bdevs can be created either based on JSON config files or via RPC.
|
||||
Both interfaces require the same arguments which are described by the `--help` option of the
|
||||
`bdev_ftl_create` RPC call, which are:
|
||||
- bdev's name
|
||||
- base bdev's name (base bdev must implement bdev_zone API)
|
||||
- UUID of the FTL device (if the FTL is to be restored from the SSD)
|
||||
|
||||
## FTL usage with OCSSD nvme bdev {#ftl_ocssd}
|
||||
|
||||
This option requires an Open Channel SSD, which can be emulated using QEMU.
|
||||
|
||||
The QEMU with the patches providing Open Channel support can be found on the SPDK's QEMU fork
|
||||
on [spdk-3.0.0](https://github.com/spdk/qemu/tree/spdk-3.0.0) branch.
|
||||
|
||||
## Configuring QEMU {#ftl_qemu_config}
|
||||
|
||||
@ -223,39 +239,48 @@ Logical blks per chunk: 24576
|
||||
|
||||
```
|
||||
|
||||
Similarly to other bdevs, the FTL bdevs can be created either based on config files or via RPC. Both
|
||||
interfaces require the same arguments which are described by the `--help` option of the
|
||||
`bdev_ftl_create` RPC call, which are:
|
||||
- bdev's name
|
||||
- transport type of the device (e.g. PCIe)
|
||||
- transport address of the device (e.g. `00:0a.0`)
|
||||
- parallel unit range
|
||||
- UUID of the FTL device (if the FTL is to be restored from the SSD)
|
||||
|
||||
Example config:
|
||||
In order to create FTL on top Open Channel SSD, the following steps are required:
|
||||
1) Attach OCSSD NVMe controller
|
||||
2) Create OCSSD bdev on the controller attached in step 1 (user could specify parallel unit range
|
||||
and create multiple OCSSD bdevs on single OCSSD NVMe controller)
|
||||
3) Create FTL bdev on top of bdev created in step 2
|
||||
|
||||
Example:
|
||||
```
|
||||
[Ftl]
|
||||
TransportID "trtype:PCIe traddr:00:0a.0" nvme0 "0-3" 00000000-0000-0000-0000-000000000000
|
||||
TransportID "trtype:PCIe traddr:00:0a.0" nvme1 "4-5" e9825835-b03c-49d7-bc3e-5827cbde8a88
|
||||
```
|
||||
$ scripts/rpc.py bdev_nvme_attach_controller -b nvme0 -a 00:0a.0 -t pcie
|
||||
|
||||
The above will result in creation of two devices:
|
||||
- `nvme0` on `00:0a.0` using parallel units 0-3, created from scratch
|
||||
- `nvme1` on the same device using parallel units 4-5, restored from the SSD using the UUID
|
||||
provided
|
||||
$ scripts/rpc.py bdev_ocssd_create -c nvme0 -b nvme0n1
|
||||
nvme0n1
|
||||
|
||||
The same can be achieved with the following two RPC calls:
|
||||
|
||||
```
|
||||
$ scripts/rpc.py bdev_ftl_create -b nvme0 -l 0-3 -a 00:0a.0
|
||||
$ scripts/rpc.py bdev_ftl_create -b ftl0 -d nvme0n1
|
||||
{
|
||||
"name": "nvme0",
|
||||
"uuid": "b4624a89-3174-476a-b9e5-5fd27d73e870"
|
||||
}
|
||||
$ scripts/rpc.py bdev_ftl_create -b nvme1 -l 0-3 -a 00:0a.0 -u e9825835-b03c-49d7-bc3e-5827cbde8a88
|
||||
{
|
||||
"name": "nvme1",
|
||||
"uuid": "e9825835-b03c-49d7-bc3e-5827cbde8a88"
|
||||
"name": "ftl0",
|
||||
"uuid": "3b469565-1fa5-4bfb-8341-747ec9fca9b9"
|
||||
}
|
||||
```
|
||||
|
||||
## FTL usage with zone block bdev {#ftl_zone_block}
|
||||
|
||||
Zone block bdev is a bdev adapter between regular `bdev` and `bdev_zone`. It emulates a zoned
|
||||
interface on top of a regular block device.
|
||||
|
||||
In order to create FTL on to of a regular bdev:
|
||||
1) Create regular bdev e.g. `bdev_nvme`, `bdev_null`, `bdev_malloc`
|
||||
2) Create zone block bdev on top of a regular bdev created in step 1 (user could specify zone capacity
|
||||
and optimal number of open zones)
|
||||
3) Create FTL bdev on top of bdev created in step 2
|
||||
|
||||
Example:
|
||||
```
|
||||
$ scripts/rpc.py bdev_nvme_attach_controller -b nvme0 -a 00:05.0 -t pcie
|
||||
nvme0n1
|
||||
|
||||
$ scripts/rpc.py bdev_zone_block_create -b zone1 -n nvme0n1 -z 4096 -o 32
|
||||
zone1
|
||||
|
||||
$ scripts/rpc.py bdev_ftl_create -b ftl0 -d zone1
|
||||
{
|
||||
"name": "ftl0",
|
||||
"uuid": "3b469565-1fa5-4bfb-8341-747ec9f3a9b9"
|
||||
}
|
||||
```
|
||||
|
Loading…
Reference in New Issue
Block a user