doc: Update FTL documentation
Update FTL documentation according to recent changes in API. Change-Id: I86de8c115f2dedaff5f281d17a35bce34c35cef0 Signed-off-by: Wojciech Malikowski <wojciech.malikowski@intel.com> Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/481807 Tested-by: SPDK CI Jenkins <sys_sgci@intel.com> Community-CI: SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by: Jim Harris <james.r.harris@intel.com> Reviewed-by: Ben Walker <benjamin.walker@intel.com>
This commit is contained in:
parent
49d0baee65
commit
aa44b69a52
135
doc/ftl.md
135
doc/ftl.md
@ -1,8 +1,9 @@
|
|||||||
# Flash Translation Layer {#ftl}
|
# Flash Translation Layer {#ftl}
|
||||||
|
|
||||||
The Flash Translation Layer library provides block device access on top of non-block SSDs
|
The Flash Translation Layer library provides block device access on top of devices
|
||||||
implementing Open Channel interface. It handles the logical to physical address mapping, responds to
|
implementing bdev_zone interface.
|
||||||
the asynchronous media management events, and manages the defragmentation process.
|
It handles the logical to physical address mapping, responds to the asynchronous
|
||||||
|
media management events, and manages the defragmentation process.
|
||||||
|
|
||||||
# Terminology {#ftl_terminology}
|
# Terminology {#ftl_terminology}
|
||||||
|
|
||||||
@ -10,32 +11,32 @@ the asynchronous media management events, and manages the defragmentation proces
|
|||||||
|
|
||||||
* Shorthand: L2P
|
* Shorthand: L2P
|
||||||
|
|
||||||
Contains the mapping of the logical addresses (LBA) to their on-disk physical location (PPA). The
|
Contains the mapping of the logical addresses (LBA) to their on-disk physical location. The LBAs
|
||||||
LBAs are contiguous and in range from 0 to the number of surfaced blocks (the number of spare blocks
|
are contiguous and in range from 0 to the number of surfaced blocks (the number of spare blocks
|
||||||
are calculated during device formation and are subtracted from the available address space). The
|
are calculated during device formation and are subtracted from the available address space). The
|
||||||
spare blocks account for chunks going offline throughout the lifespan of the device as well as
|
spare blocks account for zones going offline throughout the lifespan of the device as well as
|
||||||
provide necessary buffer for data [defragmentation](#ftl_reloc).
|
provide necessary buffer for data [defragmentation](#ftl_reloc).
|
||||||
|
|
||||||
## Band {#ftl_band}
|
## Band {#ftl_band}
|
||||||
|
|
||||||
Band describes a collection of chunks, each belonging to a different parallel unit. All writes to
|
A band describes a collection of zones, each belonging to a different parallel unit. All writes to
|
||||||
the band follow the same pattern - a batch of logical blocks is written to one chunk, another batch
|
a band follow the same pattern - a batch of logical blocks is written to one zone, another batch
|
||||||
to the next one and so on. This ensures the parallelism of the write operations, as they can be
|
to the next one and so on. This ensures the parallelism of the write operations, as they can be
|
||||||
executed independently on a different chunks. Each band keeps track of the LBAs it consists of, as
|
executed independently on different zones. Each band keeps track of the LBAs it consists of, as
|
||||||
well as their validity, as some of the data will be invalidated by subsequent writes to the same
|
well as their validity, as some of the data will be invalidated by subsequent writes to the same
|
||||||
logical address. The L2P mapping can be restored from the SSD by reading this information in order
|
logical address. The L2P mapping can be restored from the SSD by reading this information in order
|
||||||
from the oldest band to the youngest.
|
from the oldest band to the youngest.
|
||||||
|
|
||||||
+--------------+ +--------------+ +--------------+
|
+--------------+ +--------------+ +--------------+
|
||||||
band 1 | chunk 1 +--------+ chk 1 +---- --- --- --- --- ---+ chk 1 |
|
band 1 | zone 1 +--------+ zone 1 +---- --- --- --- --- ---+ zone 1 |
|
||||||
+--------------+ +--------------+ +--------------+
|
+--------------+ +--------------+ +--------------+
|
||||||
band 2 | chunk 2 +--------+ chk 2 +---- --- --- --- --- ---+ chk 2 |
|
band 2 | zone 2 +--------+ zone 2 +---- --- --- --- --- ---+ zone 2 |
|
||||||
+--------------+ +--------------+ +--------------+
|
+--------------+ +--------------+ +--------------+
|
||||||
band 3 | chunk 3 +--------+ chk 3 +---- --- --- --- --- ---+ chk 3 |
|
band 3 | zone 3 +--------+ zone 3 +---- --- --- --- --- ---+ zone 3 |
|
||||||
+--------------+ +--------------+ +--------------+
|
+--------------+ +--------------+ +--------------+
|
||||||
| ... | | ... | | ... |
|
| ... | | ... | | ... |
|
||||||
+--------------+ +--------------+ +--------------+
|
+--------------+ +--------------+ +--------------+
|
||||||
band m | chunk m +--------+ chk m +---- --- --- --- --- ---+ chk m |
|
band m | zone m +--------+ zone m +---- --- --- --- --- ---+ zone m |
|
||||||
+--------------+ +--------------+ +--------------+
|
+--------------+ +--------------+ +--------------+
|
||||||
| ... | | ... | | ... |
|
| ... | | ... | | ... |
|
||||||
+--------------+ +--------------+ +--------------+
|
+--------------+ +--------------+ +--------------+
|
||||||
@ -51,15 +52,15 @@ metadata is split in two parts:
|
|||||||
|
|
||||||
|
|
||||||
head metadata band's data tail metadata
|
head metadata band's data tail metadata
|
||||||
+-------------------+-------------------------------+----------------------+
|
+-------------------+-------------------------------+------------------------+
|
||||||
|chk 1|...|chk n|...|...|chk 1|...| | ... |chk m-1 |chk m|
|
|zone 1 |...|zone n |...|...|zone 1 |...| | ... |zone m-1 |zone m|
|
||||||
|lbk 1| |lbk 1| | |lbk x| | | |lblk y |lblk y|
|
|block 1| |block 1| | |block x| | | |block y |block y|
|
||||||
+-------------------+-------------+-----------------+----------------------+
|
+-------------------+-------------+-----------------+------------------------+
|
||||||
|
|
||||||
|
|
||||||
Bands are being written sequentially (in a way that was described earlier). Before a band can be
|
Bands are written sequentially (in a way that was described earlier). Before a band can be written
|
||||||
written to, all of its chunks need to be erased. During that time, the band is considered to be in a
|
to, all of its zones need to be erased. During that time, the band is considered to be in a `PREP`
|
||||||
`PREP` state. After that is done, the band transitions to the `OPENING` state, in which head metadata
|
state. After that is done, the band transitions to the `OPENING` state, in which head metadata
|
||||||
is being written. Then the band moves to the `OPEN` state and actual user data can be written to the
|
is being written. Then the band moves to the `OPEN` state and actual user data can be written to the
|
||||||
band. Once the whole available space is filled, tail metadata is written and the band transitions to
|
band. Once the whole available space is filled, tail metadata is written and the band transitions to
|
||||||
`CLOSING` state. When that finishes the band becomes `CLOSED`.
|
`CLOSING` state. When that finishes the band becomes `CLOSED`.
|
||||||
@ -103,7 +104,7 @@ servicing read requests from the buffer.
|
|||||||
|
|
||||||
Since a write to the same LBA invalidates its previous physical location, some of the blocks on a
|
Since a write to the same LBA invalidates its previous physical location, some of the blocks on a
|
||||||
band might contain old data that basically wastes space. As there is no way to overwrite an already
|
band might contain old data that basically wastes space. As there is no way to overwrite an already
|
||||||
written block, this data will stay there until the whole chunk is reset. This might create a
|
written block, this data will stay there until the whole zone is reset. This might create a
|
||||||
situation in which all of the bands contain some valid data and no band can be erased, so no writes
|
situation in which all of the bands contain some valid data and no band can be erased, so no writes
|
||||||
can be executed anymore. Therefore a mechanism is needed to move valid data and invalidate whole
|
can be executed anymore. Therefore a mechanism is needed to move valid data and invalidate whole
|
||||||
bands, so that they can be reused.
|
bands, so that they can be reused.
|
||||||
@ -123,13 +124,13 @@ long time ago) or due to read disturb (media characteristic, that causes corrupt
|
|||||||
blocks during a read operation).
|
blocks during a read operation).
|
||||||
|
|
||||||
Module responsible for data relocation is called `reloc`. When a band is chosen for defragmentation
|
Module responsible for data relocation is called `reloc`. When a band is chosen for defragmentation
|
||||||
or an ANM (asynchronous NAND management) event is received, the appropriate blocks are marked as
|
or a media management event is received, the appropriate blocks are marked as
|
||||||
required to be moved. The `reloc` module takes a band that has some of such blocks marked, checks
|
required to be moved. The `reloc` module takes a band that has some of such blocks marked, checks
|
||||||
their validity and, if they're still valid, copies them.
|
their validity and, if they're still valid, copies them.
|
||||||
|
|
||||||
Choosing a band for defragmentation depends on several factors: its valid ratio (1) (proportion of
|
Choosing a band for defragmentation depends on several factors: its valid ratio (1) (proportion of
|
||||||
valid blocks to all user blocks), its age (2) (when was it written) and its write count / wear level
|
valid blocks to all user blocks), its age (2) (when was it written) and its write count / wear level
|
||||||
index of its chunks (3) (how many times the band was written to). The lower the ratio (1), the
|
index of its zones (3) (how many times the band was written to). The lower the ratio (1), the
|
||||||
higher its age (2) and the lower its write count (3), the higher the chance the band will be chosen
|
higher its age (2) and the lower its write count (3), the higher the chance the band will be chosen
|
||||||
for defrag.
|
for defrag.
|
||||||
|
|
||||||
@ -137,9 +138,24 @@ for defrag.
|
|||||||
|
|
||||||
## Prerequisites {#ftl_prereq}
|
## Prerequisites {#ftl_prereq}
|
||||||
|
|
||||||
In order to use the FTL module, an Open Channel SSD is required. The easiest way to obtain one is to
|
In order to use the FTL module, a device capable of zoned interface is required e.g. `zone_block`
|
||||||
emulate it using QEMU. The QEMU with the patches providing Open Channel support can be found on the
|
bdev or OCSSD `nvme` bdev.
|
||||||
SPDK's QEMU fork on [spdk-3.0.0](https://github.com/spdk/qemu/tree/spdk-3.0.0) branch.
|
|
||||||
|
## FTL bdev creation {#ftl_create}
|
||||||
|
|
||||||
|
Similar to other bdevs, the FTL bdevs can be created either based on JSON config files or via RPC.
|
||||||
|
Both interfaces require the same arguments which are described by the `--help` option of the
|
||||||
|
`bdev_ftl_create` RPC call, which are:
|
||||||
|
- bdev's name
|
||||||
|
- base bdev's name (base bdev must implement bdev_zone API)
|
||||||
|
- UUID of the FTL device (if the FTL is to be restored from the SSD)
|
||||||
|
|
||||||
|
## FTL usage with OCSSD nvme bdev {#ftl_ocssd}
|
||||||
|
|
||||||
|
This option requires an Open Channel SSD, which can be emulated using QEMU.
|
||||||
|
|
||||||
|
The QEMU with the patches providing Open Channel support can be found on the SPDK's QEMU fork
|
||||||
|
on [spdk-3.0.0](https://github.com/spdk/qemu/tree/spdk-3.0.0) branch.
|
||||||
|
|
||||||
## Configuring QEMU {#ftl_qemu_config}
|
## Configuring QEMU {#ftl_qemu_config}
|
||||||
|
|
||||||
@ -223,39 +239,48 @@ Logical blks per chunk: 24576
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Similarly to other bdevs, the FTL bdevs can be created either based on config files or via RPC. Both
|
In order to create FTL on top Open Channel SSD, the following steps are required:
|
||||||
interfaces require the same arguments which are described by the `--help` option of the
|
1) Attach OCSSD NVMe controller
|
||||||
`bdev_ftl_create` RPC call, which are:
|
2) Create OCSSD bdev on the controller attached in step 1 (user could specify parallel unit range
|
||||||
- bdev's name
|
and create multiple OCSSD bdevs on single OCSSD NVMe controller)
|
||||||
- transport type of the device (e.g. PCIe)
|
3) Create FTL bdev on top of bdev created in step 2
|
||||||
- transport address of the device (e.g. `00:0a.0`)
|
|
||||||
- parallel unit range
|
|
||||||
- UUID of the FTL device (if the FTL is to be restored from the SSD)
|
|
||||||
|
|
||||||
Example config:
|
|
||||||
|
|
||||||
|
Example:
|
||||||
```
|
```
|
||||||
[Ftl]
|
$ scripts/rpc.py bdev_nvme_attach_controller -b nvme0 -a 00:0a.0 -t pcie
|
||||||
TransportID "trtype:PCIe traddr:00:0a.0" nvme0 "0-3" 00000000-0000-0000-0000-000000000000
|
|
||||||
TransportID "trtype:PCIe traddr:00:0a.0" nvme1 "4-5" e9825835-b03c-49d7-bc3e-5827cbde8a88
|
|
||||||
```
|
|
||||||
|
|
||||||
The above will result in creation of two devices:
|
$ scripts/rpc.py bdev_ocssd_create -c nvme0 -b nvme0n1
|
||||||
- `nvme0` on `00:0a.0` using parallel units 0-3, created from scratch
|
nvme0n1
|
||||||
- `nvme1` on the same device using parallel units 4-5, restored from the SSD using the UUID
|
|
||||||
provided
|
|
||||||
|
|
||||||
The same can be achieved with the following two RPC calls:
|
$ scripts/rpc.py bdev_ftl_create -b ftl0 -d nvme0n1
|
||||||
|
|
||||||
```
|
|
||||||
$ scripts/rpc.py bdev_ftl_create -b nvme0 -l 0-3 -a 00:0a.0
|
|
||||||
{
|
{
|
||||||
"name": "nvme0",
|
"name": "ftl0",
|
||||||
"uuid": "b4624a89-3174-476a-b9e5-5fd27d73e870"
|
"uuid": "3b469565-1fa5-4bfb-8341-747ec9fca9b9"
|
||||||
}
|
}
|
||||||
$ scripts/rpc.py bdev_ftl_create -b nvme1 -l 0-3 -a 00:0a.0 -u e9825835-b03c-49d7-bc3e-5827cbde8a88
|
```
|
||||||
{
|
|
||||||
"name": "nvme1",
|
## FTL usage with zone block bdev {#ftl_zone_block}
|
||||||
"uuid": "e9825835-b03c-49d7-bc3e-5827cbde8a88"
|
|
||||||
|
Zone block bdev is a bdev adapter between regular `bdev` and `bdev_zone`. It emulates a zoned
|
||||||
|
interface on top of a regular block device.
|
||||||
|
|
||||||
|
In order to create FTL on to of a regular bdev:
|
||||||
|
1) Create regular bdev e.g. `bdev_nvme`, `bdev_null`, `bdev_malloc`
|
||||||
|
2) Create zone block bdev on top of a regular bdev created in step 1 (user could specify zone capacity
|
||||||
|
and optimal number of open zones)
|
||||||
|
3) Create FTL bdev on top of bdev created in step 2
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```
|
||||||
|
$ scripts/rpc.py bdev_nvme_attach_controller -b nvme0 -a 00:05.0 -t pcie
|
||||||
|
nvme0n1
|
||||||
|
|
||||||
|
$ scripts/rpc.py bdev_zone_block_create -b zone1 -n nvme0n1 -z 4096 -o 32
|
||||||
|
zone1
|
||||||
|
|
||||||
|
$ scripts/rpc.py bdev_ftl_create -b ftl0 -d zone1
|
||||||
|
{
|
||||||
|
"name": "ftl0",
|
||||||
|
"uuid": "3b469565-1fa5-4bfb-8341-747ec9f3a9b9"
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
Loading…
Reference in New Issue
Block a user