Compare commits

..

13 Commits

Author SHA1 Message Date
Daniel Verkamp
87036cef6f vhost: fix build with DPDK 17.02 and older
__rte_always_inline was added in DPDK 17.05; replace the single use with
a regular 'inline' to restore compatibility with older DPDK versions.

Change-Id: Ifa92ab781e9b597fca0c9a92f562027fec6b5337
Fixes: 867063e463 ("rte_vhost: introduce safe API for GPA translation")
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-on: https://review.gerrithub.io/409076
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2018-04-25 22:32:06 +00:00
Dariusz Stojaczyk
ea938b9a88 vhost_scsi: support initiators without eventq/controlq
Backported from 18.04. (1d74fea)

Change-Id: Iaf10a3a8a6e728540ebd8f6a1cf473901f98625f
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/409073
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2018-04-25 21:34:25 +00:00
Daniel Verkamp
18f65e5a8a version: v18.01.2-pre
Change-Id: I03f31d9f520031d6380cdd7b1550f211d56db9bb
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-on: https://review.gerrithub.io/409075
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2018-04-25 21:34:14 +00:00
Jim Harris
2a280f2fdc CHANGELOG.md: document v18.01.1 release
While here, bump version.h for the v18.01.1 release

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ie09fcf5b86a816564ac8728b8b216cf2c3cce539
Reviewed-on: https://review.gerrithub.io/408727
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
2018-04-23 17:58:50 +00:00
Dariusz Stojaczyk
466eb99e9f vhost: switch to the new rte_vhost API for GPA translation
DPDK will deprecate the old API soon.

Change-Id: I0522d47d9cc0b80fb0e2ceb9cc47c45ff51a5077
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/408719
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2018-04-23 17:52:27 +00:00
Dariusz Stojaczyk
8760280eb3 rte_vhost: ensure all range is mapped when translating QVAs
This patch ensures that all the address range is mapped
when translating addresses from master's addresses
(e.g. QEMU host addressess) to process VAs.

Change-Id: If141670951064a8d2b4b7343bf4cc9ca93fe2e6d
Reported-by: Yongji Xie <xieyongji@baidu.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/408718
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2018-04-23 17:52:27 +00:00
Dariusz Stojaczyk
867063e463 rte_vhost: introduce safe API for GPA translation
This new rte_vhost_va_from_guest_pa API takes an extra len parameter,
used to specify the size of the range to be mapped.
Effective mapped range is returned via len parameter.

Change-Id: Ib3830e1da9e0cb477d99860a03684c665bb3f6ec
Reported-by: Yongji Xie <xieyongji@baidu.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Reviewed-on: https://review.gerrithub.io/408717
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
2018-04-23 17:52:27 +00:00
Jonathan Richardson
e5cfae172d nvme: Remove calls to getpid() when submitting nvme requests
As of glibc version 2.3.4 onwards getpid() is no longer cached. SPDK
makes calls to it in nvme_allocate_request() which is called for each
nvme request received. This results in a system calls up to millions of
times per second which slows down nvme submissions. Since the pid never
changes, it only needs to be called once in initialization per process.
This improves the performance of nvme_allocate_request() signficantly.

Backported from master commit ce70f29662

Change-Id: I81b3d8d7f298db25c3f6c3e237e5f9d290c1f126
Signed-off-by: Jonathan Richardson <jonathan.richardson@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-on: https://review.gerrithub.io/407599
Reviewed-by: Scott Branden <sbranden@gmail.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-on: https://review.gerrithub.io/408406
2018-04-20 13:28:47 -04:00
Daniel Verkamp
1050ace333 test/bdev: disable blockdev tests when ASAN is on
This test doesn't pass on the current automated test pool now that ASAN
is enabled on some test machines.  The fixes needed to make this work
are too large to backport to a maintenance branch, so for now, just
disable this test.

Users are encouraged to move to master or the upcoming v18.04 release in
order to get all of the blockdev unregister bug fixes.

Change-Id: Iabfc7d59dba12654f7411b99b90cb78cdbb62fcd
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-on: https://review.gerrithub.io/408507
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2018-04-20 13:09:38 -04:00
Seth Howell
34ccb2d7d9 test/blockdev.sh: filter out partitions in nbd test
Change-Id: Id32350e6a4bfaa31b785fe10efea170b82f20497
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/400192
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Tested-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-on: https://review.gerrithub.io/408415
2018-04-20 12:00:59 -04:00
Seth Howell
5ac01ab53e test: set all coremasks to use 8 or fewer cores.
This will allow us to reduce core assignments for build pool vms to 8.

Change-Id: Iba5f6beb387742df2c30b48e22be1961e82af0cf
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/403559
Reviewed-by: John Kariuki <John.K.Kariuki@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-on: https://review.gerrithub.io/408413
Tested-by: Daniel Verkamp <daniel.verkamp@intel.com>
2018-04-20 12:00:59 -04:00
Seth Howell
3bc1b71100 examples/kmod: add cache.mk to git ignore
Change-Id: I4fbe941be2791eb6e4927ecba0fe060eaefedeab
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/404048
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-on: https://review.gerrithub.io/408411
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: Daniel Verkamp <daniel.verkamp@intel.com>
2018-04-20 12:00:59 -04:00
Daniel Verkamp
73fee9c732 version: v18.01.1-pre
Change-Id: Ia79a90aa00921f50d276280a8900a032f67976b9
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-on: https://review.gerrithub.io/397645
Tested-by: SPDK Automated Test System <sys_sgsw@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
2018-02-01 11:04:44 -05:00
2241 changed files with 95938 additions and 502536 deletions

View File

@ -1,31 +0,0 @@
#!/bin/sh
# SPDX-License-Identifier: BSD-3-Clause
# All rights reserved.
#
# Verify what is about to be committed.
# Called by "git commit" with no arguments. The hook should
# exit with non-zero status after issuing an appropriate message if
# it wants to stop the commit.
rc=0
# Redirect output to stderr.
exec 1>&2
# If there are formatting errors, print the offending file names and fail.
if [ -x "./scripts/check_format.sh" ]; then
echo "Running check_format.sh ..."
"./scripts/check_format.sh" > check_format.log 2>&1
rc=$?
if [ $rc -ne 0 ]; then
cat check_format.log
echo ""
echo "ERROR check_format.sh returned errors!"
echo "ERROR Fix the problem and use 'git add' to update your changes."
echo "ERROR See `pwd`/check_format.log for more information."
echo ""
fi
fi
exit $rc

View File

@ -1,84 +0,0 @@
#!/bin/sh
# SPDX-License-Identifier: BSD-3-Clause
# All rights reserved.
# Verify what is about to be pushed. Called by "git
# push" after it has checked the remote status, but before anything has been
# pushed. If this script exits with a non-zero status nothing will be pushed.
#
# This hook is called with the following parameters:
#
# $1 -- Name of the remote to which the push is being done
# $2 -- URL to which the push is being done
#
# If pushing without using a named remote those arguments will be equal.
# <local ref> <local sha1> <remote ref> <remote sha1>
#
rc=0
SYSTEM=`uname -s`
# Redirect output to stderr.
exec 1>&2
if [ "$SYSTEM" = "FreeBSD" ]; then
MAKE="gmake MAKE=gmake -j $(sysctl -a | grep -E -i 'hw.ncpu' | awk '{print $2}')"
COMP="clang"
else
MAKE="make -j $(nproc)"
COMP="gcc"
fi
echo "Running make with $COMP ..."
echo "${MAKE} clean " > make.log
$MAKE clean >> make.log 2>&1
echo "${MAKE} CONFIG_DEBUG=n CONFIG_WERROR=y " >> make.log
$MAKE CONFIG_DEBUG=n CONFIG_WERROR=y >> make.log 2>&1
rc=$?
if [ $rc -ne 0 ]; then
tail -20 make.log
echo ""
echo "ERROR make returned errors!"
echo "ERROR Fix the problem and use 'git commit' to update your changes."
echo "ERROR See `pwd`/make.log for more information."
echo ""
exit $rc
fi
echo "${MAKE} SKIP_DPDK_BUILD=1 clean " >> make.log
$MAKE clean SKIP_DPDK_BUILD=1 >> make.log 2>&1
echo "${MAKE} CONFIG_DEBUG=y CONFIG_WERROR=y SKIP_DPDK_BUILD=1 " >> make.log
$MAKE CONFIG_DEBUG=y CONFIG_WERROR=y SKIP_DPDK_BUILD=1 >> make.log 2>&1
rc=$?
if [ $rc -ne 0 ]; then
tail -20 make.log
echo ""
echo "ERROR make returned errors!"
echo "ERROR Fix the problem and use 'git commit' to update your changes."
echo "ERROR See `pwd`/make.log for more information."
echo ""
exit $rc
fi
echo "Running unittest.sh ..."
echo "./test/unit/unittest.sh" >> make.log
"./test/unit/unittest.sh" >> make.log 2>&1
rc=$?
if [ $rc -ne 0 ]; then
tail -20 make.log
echo ""
echo "ERROR unittest returned errors!"
echo "ERROR Fix the problem and use 'git commit' to update your changes."
echo "ERROR See `pwd`/make.log for more information."
echo ""
exit $rc
fi
echo "$MAKE clean " >> make.log
$MAKE clean >> make.log 2>&1
echo "Pushing to $1 $2"
exit $rc

View File

@ -1,37 +0,0 @@
---
name: Sighting report
about: Create a report to help us improve. Please use the issue tracker only for reporting suspected issues.
title: ''
labels: 'Sighting'
assignees: ''
---
# Sighting report
<!--- Provide a general summary of the issue in the Title above -->
## Expected Behavior
<!--- Tell us what should happen -->
## Current Behavior
<!--- Tell us what happens instead of the expected behavior -->
## Possible Solution
<!--- Not obligatory, but suggest a fix/reason for the potential issue, -->
## Steps to Reproduce
<!--- Provide a link to a live example, or an unambiguous set of steps to -->
<!--- reproduce this sighting. Include code to reproduce, if relevant -->
1.
2.
3.
4.
## Context (Environment including OS version, SPDK version, etc.)
<!--- Providing context helps us come up with a solution that is most useful in the real world -->

View File

@ -1,8 +0,0 @@
blank_issues_enabled: false
contact_links:
- name: SPDK Community
url: https://spdk.io/community/
about: Please ask and answer questions here.
- name: SPDK Common Vulnerabilities and Exposures (CVE) Process
url: https://spdk.io/cve_threat/
about: Please follow CVE process to responsibly disclose security vulnerabilities.

View File

@ -1,25 +0,0 @@
---
name: CI Intermittent Failure
about: Create a report with CI failure unrelated to the patch tested.
title: '[test_name] Failure description'
labels: 'Intermittent Failure'
assignees: ''
---
# CI Intermittent Failure
<!--- Provide a [test_name] where the issue occurred and brief description in the Title above. -->
<!--- Name of the test can be found by last occurrence of: -->
<!--- ************************************ -->
<!--- START TEST [test_name] -->
<!--- ************************************ -->
## Link to the failed CI build
<!--- Please provide a link to the failed CI build -->
## Execution failed at
<!--- Please provide the first failure in the test. Pointed to by the first occurrence of: -->
<!--- ========== Backtrace start: ========== -->

View File

@ -1,11 +0,0 @@
# To get started with Dependabot version updates, you'll need to specify which
# package ecosystems to update and where the package manifests are located.
# Please see the documentation for all configuration options:
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
version: 2
updates:
- package-ecosystem: "" # See documentation for possible values
directory: "/" # Location of package manifests
schedule:
interval: "weekly"

View File

@ -1,10 +0,0 @@
filters:
- true
commentBody: |
Thanks for your contribution! Unfortunately, we don't use GitHub pull
requests to manage code contributions to this repository. Instead, please
see https://spdk.io/development which provides instructions on how to
submit patches to the SPDK Gerrit instance.
addLabel: false

17
.gitignore vendored
View File

@ -2,41 +2,28 @@
*.a
*.cmd
*.d
*.dll
*.exe
*.gcda
*.gcno
*.kdev4
*.ko
*.lib
*.log
*.map
*.o
*.obj
*.pdb
*.pyc
*.so
*.so.*
*.swp
*.DS_Store
build/
ut_coverage/
tags
cscope.out
dpdk-*
CUnit-Memory-Dump.xml
include/spdk/config.h
config.h
CONFIG.local
*VC.db
.vscode
.project
.cproject
.settings
.gitreview
mk/cc.mk
mk/config.mk
mk/cc.flags.mk
PYTHON_COMMAND
test_completions.txt
timing.txt
test/common/build_config.sh
.coredump_path

18
.gitmodules vendored
View File

@ -1,21 +1,3 @@
[submodule "dpdk"]
path = dpdk
url = https://github.com/spdk/dpdk.git
[submodule "intel-ipsec-mb"]
path = intel-ipsec-mb
url = https://github.com/spdk/intel-ipsec-mb.git
[submodule "isa-l"]
path = isa-l
url = https://github.com/spdk/isa-l.git
[submodule "ocf"]
path = ocf
url = https://github.com/Open-CAS/ocf.git
[submodule "libvfio-user"]
path = libvfio-user
url = https://github.com/nutanix/libvfio-user.git
[submodule "xnvme"]
path = xnvme
url = https://github.com/OpenMPDK/xNVMe.git
[submodule "isa-l-crypto"]
path = isa-l-crypto
url = https://github.com/intel/isa-l_crypto

35
.travis.yml Normal file
View File

@ -0,0 +1,35 @@
language: c
compiler:
- gcc
- clang
dist: trusty
sudo: false
addons:
apt:
packages:
- libcunit1-dev
- libaio-dev
- libssl-dev
- uuid-dev
- libnuma-dev
before_script:
- git submodule update --init
- export MAKEFLAGS="-j$(nproc)"
- if [ "$CC" = gcc ]; then
wget https://downloads.sourceforge.net/project/astyle/astyle/astyle%203.0/astyle_3.0_linux.tar.gz;
tar xf astyle_3.0_linux.tar.gz;
pushd astyle/build/gcc;
make;
export PATH=$PWD/bin:$PATH;
popd;
fi
script:
- ./scripts/check_format.sh
- ./configure --enable-werror
- make
- ./unittest.sh

File diff suppressed because it is too large Load Diff

View File

@ -1,130 +0,0 @@
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, caste, color, religion, or sexual
identity and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the overall
community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or advances of
any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email address,
without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
SPDK core [maintainers](https://spdk.io/development/) are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
SPDK core maintainers have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported privately to any of the SPDK core maintainers. All complaints will be
reviewed and investigated promptly and fairly.
All SPDK core maintainers are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
SPDK core maintainers will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from SPDK core maintainers, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series of
actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or permanent
ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within the
community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.1, available at
[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
Community Impact Guidelines were inspired by
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
For answers to common questions about this code of conduct, see the FAQ at
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
[https://www.contributor-covenant.org/translations][translations].
[homepage]: https://www.contributor-covenant.org
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq

235
CONFIG
View File

@ -1,222 +1,91 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation.
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
# Copyright (c) 2021, 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2022 Dell Inc, or its subsidiaries.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
# configure options: __CONFIGURE_OPTIONS__
# Installation prefix
CONFIG_PREFIX="/usr/local"
# Target architecture
CONFIG_ARCH=native
# Destination directory for the libraries
CONFIG_LIBDIR=
# Prefix for cross compilation
CONFIG_CROSS_PREFIX=
CONFIG_PREFIX?=/usr/local
# Build with debug logging. Turn off for performance testing and normal usage
CONFIG_DEBUG=n
CONFIG_DEBUG?=n
# Treat warnings as errors (fail the build on any warning).
CONFIG_WERROR=n
CONFIG_WERROR?=n
# Build with link-time optimization.
CONFIG_LTO=n
# Generate profile guided optimization data.
CONFIG_PGO_CAPTURE=n
# Use profile guided optimization data.
CONFIG_PGO_USE=n
CONFIG_LTO?=n
# Build with code coverage instrumentation.
CONFIG_COVERAGE=n
CONFIG_COVERAGE?=n
# Build with Address Sanitizer enabled
CONFIG_ASAN=n
CONFIG_ASAN?=n
# Build with Undefined Behavior Sanitizer enabled
CONFIG_UBSAN=n
# Build with LLVM fuzzing enabled
CONFIG_FUZZER=n
CONFIG_FUZZER_LIB=
CONFIG_UBSAN?=n
# Build with Thread Sanitizer enabled
CONFIG_TSAN=n
# Build functional tests
CONFIG_TESTS=y
# Build unit tests
CONFIG_UNIT_TESTS=y
# Build examples
CONFIG_EXAMPLES=y
# Build apps
CONFIG_APPS=y
# Build with Control-flow Enforcement Technology (CET)
CONFIG_CET=n
CONFIG_TSAN?=n
# Directory that contains the desired SPDK environment library.
# By default, this is implemented using DPDK.
CONFIG_ENV=
CONFIG_ENV?=$(SPDK_ROOT_DIR)/lib/env_dpdk
# This directory should contain 'include' and 'lib' directories for your DPDK
# installation.
CONFIG_DPDK_DIR=
# Automatically set via pkg-config when bare --with-dpdk is set
CONFIG_DPDK_LIB_DIR=
CONFIG_DPDK_INC_DIR=
CONFIG_DPDK_PKG_CONFIG=n
# installation. Alternatively you can specify this on the command line
# with 'make DPDK_DIR=/path/to/dpdk'. This is only a valid entry
# when using the default SPDK environment library.
CONFIG_DPDK_DIR?=$(SPDK_ROOT_DIR)/dpdk/build
# This directory should contain 'include' and 'lib' directories for WPDK.
CONFIG_WPDK_DIR=
# Build SPDK FIO plugin. Requires CONFIG_FIO_SOURCE_DIR set to a valid
# Build SPDK FIO plugin. Requires FIO_SOURCE_DIR set to a valid
# fio source code directory.
CONFIG_FIO_PLUGIN=n
CONFIG_FIO_PLUGIN?=n
# This directory should contain the source code directory for fio
# which is required for building the SPDK FIO plugin.
CONFIG_FIO_SOURCE_DIR=/usr/src/fio
FIO_SOURCE_DIR?=/usr/src/fio
# Enable RDMA support for the NVMf target.
# Requires ibverbs development libraries.
CONFIG_RDMA=n
CONFIG_RDMA_SEND_WITH_INVAL=n
CONFIG_RDMA_SET_ACK_TIMEOUT=n
CONFIG_RDMA_SET_TOS=n
CONFIG_RDMA_PROV=verbs
# Enable NVMe Character Devices.
CONFIG_NVME_CUSE=n
# Enable FC support for the NVMf target.
# Requires FC low level driver (from FC vendor)
CONFIG_FC=n
CONFIG_FC_PATH=
CONFIG_RDMA?=n
# Build Ceph RBD support in bdev modules
# Requires librbd development libraries
CONFIG_RBD=n
# Build DAOS support in bdev modules
# Requires daos development libraries
CONFIG_DAOS=n
CONFIG_DAOS_DIR=
# Build UBLK support
CONFIG_UBLK=n
CONFIG_RBD?=n
# Build vhost library.
CONFIG_VHOST=y
CONFIG_VHOST?=y
# Build vhost initiator (Virtio) driver.
CONFIG_VIRTIO=y
CONFIG_VIRTIO?=y
# Build custom vfio-user transport for NVMf target and NVMe initiator.
CONFIG_VFIO_USER=n
CONFIG_VFIO_USER_DIR=
# Build with xNVMe
CONFIG_XNVME=n
# Enable the dependencies for building the DPDK accel compress module
CONFIG_DPDK_COMPRESSDEV=n
# Enable the dependencies for building the compress vbdev, includes the reduce library
CONFIG_VBDEV_COMPRESS=n
# Enable mlx5_pci dpdk compress PMD, enabled automatically if CONFIG_VBDEV_COMPRESS=y and libmlx5 exists
CONFIG_VBDEV_COMPRESS_MLX5=n
# Enable mlx5_pci dpdk crypto PMD, enabled automatically if CONFIG_CRYPTO=y and libmlx5 exists
CONFIG_CRYPTO_MLX5=n
# Requires libiscsi development libraries.
CONFIG_ISCSI_INITIATOR=n
# Enable the dependencies for building the crypto vbdev
CONFIG_CRYPTO=n
# Build spdk shared libraries in addition to the static ones.
CONFIG_SHARED=n
# Build with VTune support.
CONFIG_VTUNE=n
CONFIG_VTUNE_DIR=
# Build Intel IPSEC_MB library
CONFIG_IPSEC_MB=n
# Enable OCF module
CONFIG_OCF=n
CONFIG_OCF_PATH=
CONFIG_CUSTOMOCF=n
# Build ISA-L library
CONFIG_ISAL=y
# Build ISA-L-crypto library
CONFIG_ISAL_CRYPTO=y
# Build with IO_URING support
CONFIG_URING=n
# Build IO_URING bdev with ZNS support
CONFIG_URING_ZNS=n
# Path to custom built IO_URING library
CONFIG_URING_PATH=
# Path to custom built OPENSSL library
CONFIG_OPENSSL_PATH=
# Build with FUSE support
CONFIG_FUSE=n
# Build with RAID5f support
CONFIG_RAID5F=n
# Build with IDXD support
# In this mode, SPDK fully controls the DSA device.
CONFIG_IDXD=n
# Build with USDT support
CONFIG_USDT=n
# Build with IDXD kernel support.
# In this mode, SPDK shares the DSA device with the kernel.
CONFIG_IDXD_KERNEL=n
# arc4random is available in stdlib.h
CONFIG_HAVE_ARC4RANDOM=n
# uuid_generate_sha1 is available in uuid/uuid.h
CONFIG_HAVE_UUID_GENERATE_SHA1=n
# Is DPDK using libbsd?
CONFIG_HAVE_LIBBSD=n
# Is DPDK using libarchive?
CONFIG_HAVE_LIBARCHIVE=n
# Path to IPSEC_MB used by DPDK
CONFIG_IPSEC_MB_DIR=
# Generate Storage Management Agent's protobuf interface
CONFIG_SMA=n
# Build with Avahi support
CONFIG_AVAHI=n
# Setup DPDK's RTE_MAX_LCORES
CONFIG_MAX_LCORES=
# Build with NVML backends
CONFIG_NVML?=n

52
LICENSE
View File

@ -1,30 +1,30 @@
The SPDK repo contains multiple git submodules each with its own
license info.
BSD LICENSE
Submodule license info:
dpdk: see dpdk/license
intel-ipsec-mb: see intel-ipsec-mb/LICENSE
isa-l: see isa-l/LICENSE
libvfio-user: see libvfio-user/LICENSE
ocf: see ocf/LICENSE
Copyright (c) Intel Corporation.
All rights reserved.
The rest of the SPDK repository uses the Open Source BSD-3-Clause
license. SPDK also uses SPDX Unique License Identifiers to eliminate
the need to copy the license text into each individual file.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
Any new file contributions to SPDK shall adhere to the BSD-3-Clause
license and use SPDX identifiers. Exceptions are subject to usual
review and must be listed in this file.
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
Exceptions:
* include/linux/* header files are BSD-3-Clause but do not use SPDX
identifier to keep them identical to the same header files in the
Linux kernel source tree.
* include/spdk/tree.h and include/spdk/queue_extras are BSD-2-Clause,
since there were primarily imported from FreeBSD. tree.h uses an SPDX
identifier but also the license text to reduce differences from the
FreeBSD source tree.
* lib/util/base64_neon.c is BSD-2-Clause.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

164
Makefile
View File

@ -1,147 +1,77 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation.
# Copyright (c) 2020, Mellanox Corporation.
# Copyright (c) 2022 NVIDIA CORPORATION & AFFILIATES
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
S :=
SPDK_ROOT_DIR := $(CURDIR)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
DIRS-y += lib
DIRS-y += module
DIRS-$(CONFIG_SHARED) += shared_lib
DIRS-y += include
DIRS-$(CONFIG_EXAMPLES) += examples
DIRS-$(CONFIG_APPS) += app
DIRS-y += test
DIRS-$(CONFIG_IPSEC_MB) += ipsecbuild
DIRS-$(CONFIG_ISAL) += isalbuild
DIRS-$(CONFIG_ISAL_CRYPTO) += isalcryptobuild
DIRS-$(CONFIG_VFIO_USER) += vfiouserbuild
DIRS-$(CONFIG_SMA) += proto
DIRS-$(CONFIG_XNVME) += xnvmebuild
DIRS-y += lib test examples app include
.PHONY: all clean $(DIRS-y) include/spdk/config.h mk/config.mk \
cc_version cxx_version .libs_only_other .ldflags ldflags install \
uninstall
.PHONY: all clean $(DIRS-y) config.h CONFIG.local mk/cc.mk
# Workaround for ninja. See dpdkbuild/Makefile
export MAKE_PID := $(shell echo $$PPID)
ifeq ($(SPDK_ROOT_DIR)/lib/env_dpdk,$(CONFIG_ENV))
ifeq ($(CURDIR)/dpdk/build,$(CONFIG_DPDK_DIR))
ifneq ($(SKIP_DPDK_BUILD),1)
ifneq ($(CONFIG_DPDK_PKG_CONFIG),y)
DPDKBUILD = dpdkbuild
DIRS-y += dpdkbuild
endif
endif
endif
endif
ifeq ($(OS),Windows)
ifeq ($(CURDIR)/wpdk/build,$(CONFIG_WPDK_DIR))
WPDK = wpdk
DIRS-y += wpdk
endif
endif
ifeq ($(CONFIG_SHARED),y)
LIB = shared_lib
else
LIB = module
endif
ifeq ($(CONFIG_IPSEC_MB),y)
LIB += ipsecbuild
DPDK_DEPS += ipsecbuild
endif
ifeq ($(CONFIG_ISAL),y)
ISALBUILD = isalbuild
LIB += isalbuild
DPDK_DEPS += isalbuild
ifeq ($(CONFIG_ISAL_CRYPTO),y)
ISALCRYPTOBUILD = isalcryptobuild
LIB += isalcryptobuild
endif
endif
ifeq ($(CONFIG_VFIO_USER),y)
VFIOUSERBUILD = vfiouserbuild
LIB += vfiouserbuild
endif
ifeq ($(CONFIG_XNVME),y)
XNVMEBUILD = xnvmebuild
LIB += xnvmebuild
endif
all: mk/cc.mk $(DIRS-y)
all: $(DIRS-y)
clean: $(DIRS-y)
$(Q)rm -f include/spdk/config.h
$(Q)rm -rf build
$(Q)rm -f mk/cc.mk
$(Q)rm -f config.h
install: all
$(Q)echo "Installed to $(DESTDIR)$(CONFIG_PREFIX)"
uninstall: $(DIRS-y)
$(Q)echo "Uninstalled spdk"
ifneq ($(SKIP_DPDK_BUILD),1)
dpdkdeps $(DPDK_DEPS): $(WPDK)
dpdkbuild: $(WPDK) $(DPDK_DEPS)
endif
lib: $(WPDK) $(DPDKBUILD) $(VFIOUSERBUILD) $(XNVMEBUILD) $(ISALBUILD) $(ISALCRYPTOBUILD)
module: lib
shared_lib: module
app: $(LIB)
test: $(LIB)
examples: $(LIB)
lib: $(DPDKBUILD)
app: lib
test: lib
examples: lib
pkgdep:
sh ./scripts/pkgdep.sh
$(DIRS-y): mk/cc.mk build_dir include/spdk/config.h
$(DIRS-y): mk/cc.mk config.h
mk/cc.mk:
$(Q)echo "Please run configure prior to make"
false
build_dir: mk/cc.mk
$(Q)mkdir -p build/lib/pkgconfig/tmp
$(Q)mkdir -p build/bin
$(Q)mkdir -p build/fio
$(Q)mkdir -p build/examples
$(Q)mkdir -p build/include/spdk
include/spdk/config.h: mk/config.mk scripts/genconfig.py
$(Q)echo "#ifndef SPDK_CONFIG_H" > $@.tmp; \
echo "#define SPDK_CONFIG_H" >> $@.tmp; \
scripts/genconfig.py $(MAKEFLAGS) >> $@.tmp; \
echo "#endif /* SPDK_CONFIG_H */" >> $@.tmp; \
$(Q)scripts/detect_cc.sh --cc=$(CC) --cxx=$(CXX) --lto=$(CONFIG_LTO) > $@.tmp; \
cmp -s $@.tmp $@ || mv $@.tmp $@ ; \
rm -f $@.tmp
cc_version: mk/cc.mk
$(Q)echo "SPDK using CC=$(CC)"; $(CC) -v
cxx_version: mk/cc.mk
$(Q)echo "SPDK using CXX=$(CXX)"; $(CXX) -v
.libs_only_other:
$(Q)echo -n '$(SYS_LIBS) '
$(Q)if [ "$(CONFIG_SHARED)" = "y" ]; then \
echo -n '-lspdk '; \
fi
.ldflags:
$(Q)echo -n '$(LDFLAGS) '
ldflags: .ldflags .libs_only_other
$(Q)echo ''
config.h: CONFIG CONFIG.local scripts/genconfig.py
$(Q)PYCMD=$$(cat PYTHON_COMMAND 2>/dev/null) ; \
test -z "$$PYCMD" && PYCMD=python ; \
$$PYCMD scripts/genconfig.py $(MAKEFLAGS) > $@.tmp; \
cmp -s $@.tmp $@ || mv $@.tmp $@ ; \
rm -f $@.tmp
include $(SPDK_ROOT_DIR)/mk/spdk.subdirs.mk

101
README.md
View File

@ -2,11 +2,6 @@
[![Build Status](https://travis-ci.org/spdk/spdk.svg?branch=master)](https://travis-ci.org/spdk/spdk)
NOTE: The SPDK mailing list has moved to a new location. Please visit
[this URL](https://lists.linuxfoundation.org/mailman/listinfo/spdk) to subscribe
at the new location. Subscribers from the old location will not be automatically
migrated to the new location.
The Storage Performance Development Kit ([SPDK](http://www.spdk.io)) provides a set of tools
and libraries for writing high performance, scalable, user-mode storage
applications. It achieves high performance by moving all of the necessary
@ -15,7 +10,6 @@ interrupts, which avoids kernel context switches and eliminates interrupt
handling overhead.
The development kit currently includes:
* [NVMe driver](http://www.spdk.io/doc/nvme.html)
* [I/OAT (DMA engine) driver](http://www.spdk.io/doc/ioat.html)
* [NVMe over Fabrics target](http://www.spdk.io/doc/nvmf.html)
@ -23,7 +17,7 @@ The development kit currently includes:
* [vhost target](http://www.spdk.io/doc/vhost.html)
* [Virtio-SCSI driver](http://www.spdk.io/doc/virtio.html)
## In this readme
# In this readme:
* [Documentation](#documentation)
* [Prerequisites](#prerequisites)
@ -31,9 +25,7 @@ The development kit currently includes:
* [Build](#libraries)
* [Unit Tests](#tests)
* [Vagrant](#vagrant)
* [AWS](#aws)
* [Advanced Build Options](#advanced)
* [Shared libraries](#shared)
* [Hugepages and Device Binding](#huge)
* [Example Code](#examples)
* [Contributing](#contributing)
@ -58,9 +50,6 @@ git submodule update --init
## Prerequisites
The dependencies can be installed automatically by `scripts/pkgdep.sh`.
The `scripts/pkgdep.sh` script will automatically install the bare minimum
dependencies required to build SPDK.
Use `--help` to see information on installing dependencies for optional components
~~~{.sh}
./scripts/pkgdep.sh
@ -90,7 +79,7 @@ gmake
## Unit Tests
~~~{.sh}
./test/unit/unittest.sh
./unittest.sh
~~~
You will see several error messages when running the unit tests, but they are
@ -102,43 +91,32 @@ success or failure.
A [Vagrant](https://www.vagrantup.com/downloads.html) setup is also provided
to create a Linux VM with a virtual NVMe controller to get up and running
quickly. Currently this has been tested on MacOS, Ubuntu 16.04.2 LTS and
Ubuntu 18.04.3 LTS with the VirtualBox and Libvirt provider.
The [VirtualBox Extension Pack](https://www.virtualbox.org/wiki/Downloads)
or [Vagrant Libvirt] (https://github.com/vagrant-libvirt/vagrant-libvirt) must
quickly. Currently this has only been tested on MacOS and Ubuntu 16.04.2 LTS
with the [VirtualBox](https://www.virtualbox.org/wiki/Downloads) provider. The
[VirtualBox Extension Pack](https://www.virtualbox.org/wiki/Downloads) must
also be installed in order to get the required NVMe support.
Details on the Vagrant setup can be found in the
[SPDK Vagrant documentation](http://spdk.io/doc/vagrant.html).
<a id="aws"></a>
## AWS
The following setup is known to work on AWS:
Image: Ubuntu 18.04
Before running `setup.sh`, run `modprobe vfio-pci`
then: `DRIVER_OVERRIDE=vfio-pci ./setup.sh`
<a id="advanced"></a>
## Advanced Build Options
Optional components and other build-time configuration are controlled by
settings in the Makefile configuration file in the root of the repository. `CONFIG`
contains the base settings for the `configure` script. This script generates a new
file, `mk/config.mk`, that contains final build settings. For advanced configuration,
there are a number of additional options to `configure` that may be used, or
`mk/config.mk` can simply be created and edited by hand. A description of all
possible options is located in `CONFIG`.
settings in two Makefile fragments in the root of the repository. `CONFIG`
contains the base settings. Running the `configure` script generates a new
file, `CONFIG.local`, that contains overrides to the base `CONFIG` file. For
advanced configuration, there are a number of additional options to `configure`
that may be used, or `CONFIG.local` can simply be created and edited by hand. A
description of all possible options is located in `CONFIG`.
Boolean (on/off) options are configured with a 'y' (yes) or 'n' (no). For
example, this line of `CONFIG` controls whether the optional RDMA (libibverbs)
support is enabled:
~~~{.sh}
CONFIG_RDMA?=n
~~~
To enable RDMA, this line may be added to `mk/config.mk` with a 'y' instead of
To enable RDMA, this line may be added to `CONFIG.local` with a 'y' instead of
'n'. For the majority of options this can be done using the `configure` script.
For example:
@ -146,7 +124,7 @@ For example:
./configure --with-rdma
~~~
Additionally, `CONFIG` options may also be overridden on the `make` command
Additionally, `CONFIG` options may also be overrriden on the `make` command
line:
~~~{.sh}
@ -154,10 +132,8 @@ make CONFIG_RDMA=y
~~~
Users may wish to use a version of DPDK different from the submodule included
in the SPDK repository. Note, this includes the ability to build not only
from DPDK sources, but also just with the includes and libraries
installed via the dpdk and dpdk-devel packages. To specify an alternate DPDK
installation, run configure with the --with-dpdk option. For example:
in the SPDK repository. To specify an alternate DPDK installation, run
configure with the --with-dpdk option. For example:
Linux:
@ -174,40 +150,10 @@ gmake
~~~
The options specified on the `make` command line take precedence over the
values in `mk/config.mk`. This can be useful if you, for example, generate
a `mk/config.mk` using the `configure` script and then have one or two
options (i.e. debug builds) that you wish to turn on and off frequently.
<a id="shared"></a>
## Shared libraries
By default, the build of the SPDK yields static libraries against which
the SPDK applications and examples are linked.
Configure option `--with-shared` provides the ability to produce SPDK shared
libraries, in addition to the default static ones. Use of this flag also
results in the SPDK executables linked to the shared versions of libraries.
SPDK shared libraries by default, are located in `./build/lib`. This includes
the single SPDK shared lib encompassing all of the SPDK static libs
(`libspdk.so`) as well as individual SPDK shared libs corresponding to each
of the SPDK static ones.
In order to start a SPDK app linked with SPDK shared libraries, make sure
to do the following steps:
- run ldconfig specifying the directory containing SPDK shared libraries
- provide proper `LD_LIBRARY_PATH`
If DPDK shared libraries are used, you may also need to add DPDK shared
libraries to `LD_LIBRARY_PATH`
Linux:
~~~{.sh}
./configure --with-shared
make
ldconfig -v -n ./build/lib
LD_LIBRARY_PATH=./build/lib/:./dpdk/build/lib/ ./build/bin/spdk_tgt
~~~
default values in `CONFIG` and `CONFIG.local`. This can be useful if you, for
example, generate a `CONFIG.local` using the `configure` script and then have
one or two options (i.e. debug builds) that you wish to turn on and off
frequently.
<a id="huge"></a>
## Hugepages and Device Binding
@ -228,13 +174,6 @@ configuring 8192MB memory.
sudo HUGEMEM=8192 scripts/setup.sh
~~~
There are a lot of other environment variables that can be set to configure
setup.sh for advanced users. To see the full list, run:
~~~{.sh}
scripts/setup.sh --help
~~~
<a id="examples"></a>
## Example Code
@ -249,5 +188,5 @@ vfio.
## Contributing
For additional details on how to get more involved in the community, including
[contributing code](http://www.spdk.io/development) and participating in discussions and other activities, please
[contributing code](http://www.spdk.io/development) and participating in discussions and other activiites, please
refer to [spdk.io](http://www.spdk.io/community)

View File

@ -1,4 +0,0 @@
# Security Policy
The SPDK community has a documented CVE process [here](https://spdk.io/cve_threat/) that describes
both how to report a potential security issue as well as who to contact for more information.

View File

@ -1,24 +1,45 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation.
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
DIRS-y += trace
DIRS-y += trace_record
DIRS-y += nvmf_tgt
DIRS-y += iscsi_top
DIRS-y += iscsi_tgt
DIRS-y += spdk_tgt
DIRS-y += spdk_lspci
ifneq ($(OS),Windows)
# TODO - currently disabled on Windows due to lack of support for curses
DIRS-y += spdk_top
endif
ifeq ($(OS),Linux)
DIRS-$(CONFIG_VHOST) += vhost
DIRS-y += spdk_dd
endif
.PHONY: all clean $(DIRS-y)

View File

@ -1,10 +1,39 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2016 Intel Corporation.
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk
APP = iscsi_tgt
@ -15,20 +44,25 @@ CFLAGS += -I$(SPDK_ROOT_DIR)/lib
C_SRCS := iscsi_tgt.c
SPDK_LIB_LIST = $(ALL_MODULES_LIST) event event_iscsi
ifeq ($(SPDK_ROOT_DIR)/lib/env_dpdk,$(CONFIG_ENV))
SPDK_LIB_LIST += env_dpdk_rpc
endif
SPDK_LIB_LIST = event_bdev event_copy event_iscsi event_net event_scsi
SPDK_LIB_LIST += jsonrpc json rpc bdev_rpc bdev iscsi scsi net copy trace conf
SPDK_LIB_LIST += util log log_rpc event app_rpc
ifeq ($(OS),Linux)
SPDK_LIB_LIST += event_nbd
SPDK_LIB_LIST += event_nbd nbd
endif
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
LIBS += $(BLOCKDEV_MODULES_LINKER_ARGS) \
$(COPY_MODULES_LINKER_ARGS)
LIBS += $(SPDK_LIB_LINKER_ARGS) -lcrypto
LIBS += $(ENV_LINKER_ARGS)
install: $(APP)
$(INSTALL_APP)
all : $(APP)
uninstall:
$(UNINSTALL_APP)
$(APP) : $(OBJS) $(SPDK_LIB_FILES) $(ENV_LIBS) $(BLOCKDEV_MODULES_FILES) $(COPY_MODULES_FILES)
$(LINK_C)
clean :
$(CLEAN_C) $(APP)
include $(SPDK_ROOT_DIR)/mk/spdk.deps.mk

View File

@ -1,6 +1,34 @@
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright (C) 2016 Intel Corporation.
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
@ -9,9 +37,25 @@
#include "spdk/event.h"
#include "iscsi/iscsi.h"
#include "spdk/log.h"
#include "spdk/net.h"
static int g_daemon_mode = 0;
static void
spdk_sigusr1(int signo __attribute__((__unused__)))
{
char *config_str = NULL;
if (spdk_app_get_running_config(&config_str, "iscsi.conf") < 0) {
fprintf(stderr, "Error getting config\n");
} else {
fprintf(stdout, "============================\n");
fprintf(stdout, " iSCSI target running config\n");
fprintf(stdout, "=============================\n");
fprintf(stdout, "%s", config_str);
}
free(config_str);
}
static void
iscsi_usage(void)
{
@ -19,7 +63,7 @@ iscsi_usage(void)
}
static void
spdk_startup(void *arg1)
spdk_startup(void *arg1, void *arg2)
{
if (getenv("MEMZONE_DUMP") != NULL) {
spdk_memzone_dump(stdout);
@ -27,7 +71,7 @@ spdk_startup(void *arg1)
}
}
static int
static void
iscsi_parse_arg(int ch, char *arg)
{
switch (ch) {
@ -35,9 +79,9 @@ iscsi_parse_arg(int ch, char *arg)
g_daemon_mode = 1;
break;
default:
return -EINVAL;
assert(false);
break;
}
return 0;
}
int
@ -46,28 +90,24 @@ main(int argc, char **argv)
int rc;
struct spdk_app_opts opts = {};
spdk_app_opts_init(&opts, sizeof(opts));
spdk_app_opts_init(&opts);
opts.config_file = SPDK_ISCSI_DEFAULT_CONFIG;
opts.name = "iscsi";
if ((rc = spdk_app_parse_args(argc, argv, &opts, "b", NULL,
iscsi_parse_arg, iscsi_usage)) !=
SPDK_APP_PARSE_ARGS_SUCCESS) {
exit(rc);
}
spdk_app_parse_args(argc, argv, &opts, "b", iscsi_parse_arg, iscsi_usage);
if (g_daemon_mode) {
if (daemon(1, 0) < 0) {
SPDK_ERRLOG("Start iscsi target daemon failed.\n");
SPDK_ERRLOG("Start iscsi target daemon faild.\n");
exit(EXIT_FAILURE);
}
}
opts.shutdown_cb = NULL;
opts.usr1_handler = spdk_sigusr1;
printf("Using net framework %s\n", spdk_net_framework_get_name());
/* Blocks until the application is exiting */
rc = spdk_app_start(&opts, spdk_startup, NULL);
if (rc) {
SPDK_ERRLOG("Start iscsi target daemon: spdk_app_start() retn non-zero\n");
}
rc = spdk_app_start(&opts, spdk_startup, NULL, NULL);
spdk_app_fini();

1
app/iscsi_top/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
iscsi_top

52
app/iscsi_top/Makefile Normal file
View File

@ -0,0 +1,52 @@
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
CXXFLAGS += $(ENV_CXXFLAGS)
CXXFLAGS += -I$(SPDK_ROOT_DIR)/lib
CXX_SRCS = iscsi_top.cpp
APP = iscsi_top
all: $(APP)
$(APP) : $(OBJS)
$(LINK_CXX)
clean:
$(CLEAN_C) $(APP)
include $(SPDK_ROOT_DIR)/mk/spdk.deps.mk

251
app/iscsi_top/iscsi_top.cpp Normal file
View File

@ -0,0 +1,251 @@
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
#include <algorithm>
#include <map>
#include <vector>
extern "C" {
#include "spdk/trace.h"
#include "iscsi/conn.h"
}
static char *exe_name;
static int g_shm_id = 0;
static void usage(void)
{
fprintf(stderr, "usage:\n");
fprintf(stderr, " %s <option>\n", exe_name);
fprintf(stderr, " option = '-i' to specify the shared memory ID,"
" (required)\n");
}
static bool
conns_compare(struct spdk_iscsi_conn *first, struct spdk_iscsi_conn *second)
{
if (first->lcore < second->lcore) {
return true;
}
if (first->lcore > second->lcore) {
return false;
}
if (first->id < second->id) {
return true;
}
return false;
}
static void
print_connections(void)
{
std::vector<struct spdk_iscsi_conn *> v;
std::vector<struct spdk_iscsi_conn *>::iterator iter;
size_t conns_size;
struct spdk_iscsi_conn *conns, *conn;
void *conns_ptr;
int fd, i;
char shm_name[64];
snprintf(shm_name, sizeof(shm_name), "/spdk_iscsi_conns.%d", g_shm_id);
fd = shm_open(shm_name, O_RDONLY, 0600);
if (fd < 0) {
fprintf(stderr, "Cannot open shared memory: %s\n", shm_name);
usage();
exit(1);
}
conns_size = sizeof(*conns) * MAX_ISCSI_CONNECTIONS;
conns_ptr = mmap(NULL, conns_size, PROT_READ, MAP_SHARED, fd, 0);
if (conns_ptr == NULL) {
fprintf(stderr, "Cannot mmap shared memory\n");
exit(1);
}
conns = (struct spdk_iscsi_conn *)conns_ptr;
for (i = 0; i < MAX_ISCSI_CONNECTIONS; i++) {
if (!conns[i].is_valid) {
continue;
}
v.push_back(&conns[i]);
}
stable_sort(v.begin(), v.end(), conns_compare);
for (iter = v.begin(); iter != v.end(); iter++) {
conn = *iter;
printf("lcore %2d conn %3d T:%-8s I:%s (%s)\n",
conn->lcore, conn->id,
conn->target_short_name, conn->initiator_name,
conn->initiator_addr);
}
printf("\n");
munmap(conns, conns_size);
close(fd);
}
int main(int argc, char **argv)
{
void *history_ptr;
struct spdk_trace_histories *histories;
struct spdk_trace_history *history;
uint64_t tasks_done, last_tasks_done[SPDK_TRACE_MAX_LCORE];
int delay, old_delay, history_fd, i, quit, rc;
int tasks_done_delta, tasks_done_per_sec;
int total_tasks_done_per_sec;
struct timeval timeout;
fd_set fds;
char ch;
struct termios oldt, newt;
char spdk_trace_shm_name[64];
int op;
exe_name = argv[0];
while ((op = getopt(argc, argv, "i:")) != -1) {
switch (op) {
case 'i':
g_shm_id = atoi(optarg);
break;
default:
usage();
exit(1);
}
}
snprintf(spdk_trace_shm_name, sizeof(spdk_trace_shm_name), "/iscsi_trace.%d", g_shm_id);
history_fd = shm_open(spdk_trace_shm_name, O_RDONLY, 0600);
if (history_fd < 0) {
fprintf(stderr, "Unable to open history shm %s\n", spdk_trace_shm_name);
usage();
exit(1);
}
history_ptr = mmap(NULL, sizeof(*histories), PROT_READ, MAP_SHARED, history_fd, 0);
if (history_ptr == MAP_FAILED) {
fprintf(stderr, "Unable to mmap history shm\n");
exit(1);
}
histories = (struct spdk_trace_histories *)history_ptr;
memset(last_tasks_done, 0, sizeof(last_tasks_done));
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
history = &histories->per_lcore_history[i];
last_tasks_done[i] = history->tpoint_count[TRACE_ISCSI_TASK_DONE];
}
delay = 1;
quit = 0;
tcgetattr(0, &oldt);
newt = oldt;
newt.c_lflag &= ~(ICANON);
tcsetattr(0, TCSANOW, &newt);
while (1) {
FD_ZERO(&fds);
FD_SET(0, &fds);
timeout.tv_sec = delay;
timeout.tv_usec = 0;
rc = select(2, &fds, NULL, NULL, &timeout);
if (rc > 0) {
if (read(0, &ch, 1) != 1) {
fprintf(stderr, "Read error on stdin\n");
goto cleanup;
}
printf("\b");
switch (ch) {
case 'd':
printf("Enter num seconds to delay (1-10): ");
old_delay = delay;
rc = scanf("%d", &delay);
if (rc != 1) {
fprintf(stderr, "Illegal delay value\n");
delay = old_delay;
} else if (delay < 1 || delay > 10) {
delay = 1;
}
break;
case 'q':
quit = 1;
break;
default:
fprintf(stderr, "'%c' not recognized\n", ch);
break;
}
if (quit == 1) {
break;
}
}
printf("\e[1;1H\e[2J");
print_connections();
printf("lcore tasks\n");
printf("=============\n");
total_tasks_done_per_sec = 0;
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
history = &histories->per_lcore_history[i];
tasks_done = history->tpoint_count[TRACE_ISCSI_TASK_DONE];
tasks_done_delta = tasks_done - last_tasks_done[i];
if (tasks_done_delta == 0) {
continue;
}
last_tasks_done[i] = tasks_done;
tasks_done_per_sec = tasks_done_delta / delay;
printf("%5d %7d\n", history->lcore, tasks_done_per_sec);
total_tasks_done_per_sec += tasks_done_per_sec;
}
printf("Total %7d\n", total_tasks_done_per_sec);
}
cleanup:
tcsetattr(0, TCSANOW, &oldt);
munmap(history_ptr, sizeof(*histories));
close(history_fd);
return (0);
}

View File

@ -1,30 +1,63 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2016 Intel Corporation.
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk
APP = nvmf_tgt
C_SRCS := nvmf_main.c
C_SRCS := conf.c nvmf_main.c nvmf_tgt.c nvmf_rpc.c
SPDK_LIB_LIST = $(ALL_MODULES_LIST) event event_nvmf
ifeq ($(SPDK_ROOT_DIR)/lib/env_dpdk,$(CONFIG_ENV))
SPDK_LIB_LIST += env_dpdk_rpc
endif
SPDK_LIB_LIST = event_bdev event_copy
SPDK_LIB_LIST += nvmf event log trace conf util bdev copy rpc jsonrpc json
SPDK_LIB_LIST += app_rpc log_rpc bdev_rpc
ifeq ($(OS),Linux)
SPDK_LIB_LIST += event_nbd
SPDK_LIB_LIST += event_nbd nbd
endif
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
LIBS += $(BLOCKDEV_MODULES_LINKER_ARGS) \
$(COPY_MODULES_LINKER_ARGS) \
$(SPDK_LIB_LINKER_ARGS) $(ENV_LINKER_ARGS)
install: $(APP)
$(INSTALL_APP)
all : $(APP)
uninstall:
$(UNINSTALL_APP)
$(APP) : $(OBJS) $(SPDK_LIB_FILES) $(SPDK_WHOLE_LIBS) $(BLOCKDEV_MODULES_FILES) $(LINKER_MODULES) $(ENV_LIBS)
$(LINK_C)
clean :
$(CLEAN_C) $(APP)
include $(SPDK_ROOT_DIR)/mk/spdk.deps.mk

424
app/nvmf_tgt/conf.c Normal file
View File

@ -0,0 +1,424 @@
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
#include "nvmf_tgt.h"
#include "spdk/conf.h"
#include "spdk/log.h"
#include "spdk/bdev.h"
#include "spdk/nvme.h"
#include "spdk/nvmf.h"
#include "spdk/string.h"
#include "spdk/util.h"
#define MAX_LISTEN_ADDRESSES 255
#define MAX_HOSTS 255
#define MAX_NAMESPACES 255
#define PORTNUMSTRLEN 32
#define SPDK_NVMF_DEFAULT_SIN_PORT ((uint16_t)4420)
#define ACCEPT_TIMEOUT_US 10000 /* 10ms */
struct spdk_nvmf_probe_ctx {
struct spdk_nvmf_subsystem *subsystem;
bool any;
bool found;
struct spdk_nvme_transport_id trid;
};
#define MAX_STRING_LEN 255
struct spdk_nvmf_tgt_conf g_spdk_nvmf_tgt_conf;
static int
spdk_add_nvmf_discovery_subsystem(void)
{
struct spdk_nvmf_subsystem *subsystem;
subsystem = nvmf_tgt_create_subsystem(SPDK_NVMF_DISCOVERY_NQN, SPDK_NVMF_SUBTYPE_DISCOVERY, 0);
if (subsystem == NULL) {
SPDK_ERRLOG("Failed creating discovery nvmf library subsystem\n");
return -1;
}
spdk_nvmf_subsystem_set_allow_any_host(subsystem, true);
return 0;
}
static void
spdk_nvmf_read_config_file_params(struct spdk_conf_section *sp,
struct spdk_nvmf_tgt_opts *opts)
{
int max_queue_depth;
int max_queues_per_sess;
int in_capsule_data_size;
int max_io_size;
int acceptor_poll_rate;
max_queue_depth = spdk_conf_section_get_intval(sp, "MaxQueueDepth");
if (max_queue_depth >= 0) {
opts->max_queue_depth = max_queue_depth;
}
max_queues_per_sess = spdk_conf_section_get_intval(sp, "MaxQueuesPerSession");
if (max_queues_per_sess >= 0) {
opts->max_qpairs_per_ctrlr = max_queues_per_sess;
}
in_capsule_data_size = spdk_conf_section_get_intval(sp, "InCapsuleDataSize");
if (in_capsule_data_size >= 0) {
opts->in_capsule_data_size = in_capsule_data_size;
}
max_io_size = spdk_conf_section_get_intval(sp, "MaxIOSize");
if (max_io_size >= 0) {
opts->max_io_size = max_io_size;
}
acceptor_poll_rate = spdk_conf_section_get_intval(sp, "AcceptorPollRate");
if (acceptor_poll_rate >= 0) {
g_spdk_nvmf_tgt_conf.acceptor_poll_rate = acceptor_poll_rate;
}
}
static int
spdk_nvmf_parse_nvmf_tgt(void)
{
struct spdk_conf_section *sp;
struct spdk_nvmf_tgt_opts opts;
int rc;
spdk_nvmf_tgt_opts_init(&opts);
g_spdk_nvmf_tgt_conf.acceptor_poll_rate = ACCEPT_TIMEOUT_US;
sp = spdk_conf_find_section(NULL, "Nvmf");
if (sp != NULL) {
spdk_nvmf_read_config_file_params(sp, &opts);
}
g_tgt.tgt = spdk_nvmf_tgt_create(&opts);
if (!g_tgt.tgt) {
SPDK_ERRLOG("spdk_nvmf_tgt_create() failed\n");
return -1;
}
rc = spdk_add_nvmf_discovery_subsystem();
if (rc != 0) {
SPDK_ERRLOG("spdk_add_nvmf_discovery_subsystem failed\n");
return rc;
}
return 0;
}
static int
spdk_nvmf_parse_subsystem(struct spdk_conf_section *sp)
{
const char *nqn, *mode;
size_t i;
int ret;
int lcore;
int num_listen_addrs;
struct rpc_listen_address listen_addrs[MAX_LISTEN_ADDRESSES] = {};
char *listen_addrs_str[MAX_LISTEN_ADDRESSES] = {};
int num_hosts;
char *hosts[MAX_HOSTS];
bool allow_any_host;
const char *sn;
size_t num_ns;
struct spdk_nvmf_ns_params ns_list[MAX_NAMESPACES] = {};
struct spdk_nvmf_subsystem *subsystem;
nqn = spdk_conf_section_get_val(sp, "NQN");
mode = spdk_conf_section_get_val(sp, "Mode");
lcore = spdk_conf_section_get_intval(sp, "Core");
/* Mode is no longer a valid parameter, but print out a nice
* message if it exists to inform users.
*/
if (mode) {
SPDK_NOTICELOG("Mode present in the [Subsystem] section of the config file.\n"
"Mode was removed as a valid parameter.\n");
if (strcasecmp(mode, "Virtual") == 0) {
SPDK_NOTICELOG("Your mode value is 'Virtual' which is now the only possible mode.\n"
"Your configuration file will work as expected.\n");
} else {
SPDK_NOTICELOG("Please remove Mode from your configuration file.\n");
return -1;
}
}
/* Core is no longer a valid parameter, but print out a nice
* message if it exists to inform users.
*/
if (lcore >= 0) {
SPDK_NOTICELOG("Core present in the [Subsystem] section of the config file.\n"
"Core was removed as an option. Subsystems can now run on all available cores.\n");
SPDK_NOTICELOG("Please remove Core from your configuration file. Ignoring it and continuing.\n");
}
/* Parse Listen sections */
num_listen_addrs = 0;
for (i = 0; i < MAX_LISTEN_ADDRESSES; i++) {
listen_addrs[num_listen_addrs].transport =
spdk_conf_section_get_nmval(sp, "Listen", i, 0);
if (!listen_addrs[num_listen_addrs].transport) {
break;
}
listen_addrs_str[i] = spdk_conf_section_get_nmval(sp, "Listen", i, 1);
if (!listen_addrs_str[i]) {
break;
}
listen_addrs_str[i] = strdup(listen_addrs_str[i]);
ret = spdk_parse_ip_addr(listen_addrs_str[i], &listen_addrs[num_listen_addrs].traddr,
&listen_addrs[num_listen_addrs].trsvcid);
if (ret < 0) {
SPDK_ERRLOG("Unable to parse listen address '%s'\n", listen_addrs_str[i]);
free(listen_addrs_str[i]);
listen_addrs_str[i] = NULL;
continue;
}
if (strchr(listen_addrs[num_listen_addrs].traddr, ':')) {
listen_addrs[num_listen_addrs].adrfam = "IPv6";
} else {
listen_addrs[num_listen_addrs].adrfam = "IPv4";
}
num_listen_addrs++;
}
/* Parse Host sections */
for (i = 0; i < MAX_HOSTS; i++) {
hosts[i] = spdk_conf_section_get_nval(sp, "Host", i);
if (!hosts[i]) {
break;
}
}
num_hosts = i;
allow_any_host = spdk_conf_section_get_boolval(sp, "AllowAnyHost", false);
sn = spdk_conf_section_get_val(sp, "SN");
num_ns = 0;
for (i = 0; i < SPDK_COUNTOF(ns_list); i++) {
char *nsid_str;
ns_list[i].bdev_name = spdk_conf_section_get_nmval(sp, "Namespace", i, 0);
if (!ns_list[i].bdev_name) {
break;
}
nsid_str = spdk_conf_section_get_nmval(sp, "Namespace", i, 1);
if (nsid_str) {
char *end;
unsigned long nsid_ul = strtoul(nsid_str, &end, 0);
if (*end != '\0' || nsid_ul == 0 || nsid_ul >= UINT32_MAX) {
SPDK_ERRLOG("Invalid NSID %s\n", nsid_str);
return -1;
}
ns_list[i].nsid = (uint32_t)nsid_ul;
} else {
/* Automatically assign the next available NSID. */
ns_list[i].nsid = 0;
}
num_ns++;
}
subsystem = spdk_nvmf_construct_subsystem(nqn,
num_listen_addrs, listen_addrs,
num_hosts, hosts, allow_any_host,
sn,
num_ns, ns_list);
for (i = 0; i < MAX_LISTEN_ADDRESSES; i++) {
free(listen_addrs_str[i]);
}
return (subsystem != NULL);
}
static int
spdk_nvmf_parse_subsystems(void)
{
int rc = 0;
struct spdk_conf_section *sp;
sp = spdk_conf_first_section(NULL);
while (sp != NULL) {
if (spdk_conf_section_match_prefix(sp, "Subsystem")) {
rc = spdk_nvmf_parse_subsystem(sp);
if (rc < 0) {
return -1;
}
}
sp = spdk_conf_next_section(sp);
}
return 0;
}
int
spdk_nvmf_parse_conf(void)
{
int rc;
/* NVMf section */
rc = spdk_nvmf_parse_nvmf_tgt();
if (rc < 0) {
return rc;
}
/* Subsystem sections */
rc = spdk_nvmf_parse_subsystems();
if (rc < 0) {
return rc;
}
return 0;
}
struct spdk_nvmf_subsystem *
spdk_nvmf_construct_subsystem(const char *name,
int num_listen_addresses, struct rpc_listen_address *addresses,
int num_hosts, char *hosts[], bool allow_any_host,
const char *sn, size_t num_ns, struct spdk_nvmf_ns_params *ns_list)
{
struct spdk_nvmf_subsystem *subsystem;
int i, rc;
size_t j;
struct spdk_bdev *bdev;
if (name == NULL) {
SPDK_ERRLOG("No NQN specified for subsystem\n");
return NULL;
}
if (num_listen_addresses > MAX_LISTEN_ADDRESSES) {
SPDK_ERRLOG("invalid listen adresses number\n");
return NULL;
}
if (num_hosts > MAX_HOSTS) {
SPDK_ERRLOG("invalid hosts number\n");
return NULL;
}
subsystem = nvmf_tgt_create_subsystem(name, SPDK_NVMF_SUBTYPE_NVME, num_ns);
if (subsystem == NULL) {
SPDK_ERRLOG("Subsystem creation failed\n");
return NULL;
}
/* Parse Listen sections */
for (i = 0; i < num_listen_addresses; i++) {
struct spdk_nvme_transport_id trid = {};
if (spdk_nvme_transport_id_parse_trtype(&trid.trtype, addresses[i].transport)) {
SPDK_ERRLOG("Missing listen address transport type\n");
goto error;
}
if (spdk_nvme_transport_id_parse_adrfam(&trid.adrfam, addresses[i].adrfam)) {
trid.adrfam = SPDK_NVMF_ADRFAM_IPV4;
}
snprintf(trid.traddr, sizeof(trid.traddr), "%s", addresses[i].traddr);
snprintf(trid.trsvcid, sizeof(trid.trsvcid), "%s", addresses[i].trsvcid);
rc = spdk_nvmf_tgt_listen(g_tgt.tgt, &trid);
if (rc) {
SPDK_ERRLOG("Failed to listen on transport %s, adrfam %s, traddr %s, trsvcid %s\n",
addresses[i].transport,
addresses[i].adrfam,
addresses[i].traddr,
addresses[i].trsvcid);
goto error;
}
spdk_nvmf_subsystem_add_listener(subsystem, &trid);
}
/* Parse Host sections */
for (i = 0; i < num_hosts; i++) {
spdk_nvmf_subsystem_add_host(subsystem, hosts[i]);
}
spdk_nvmf_subsystem_set_allow_any_host(subsystem, allow_any_host);
if (sn == NULL) {
SPDK_ERRLOG("Subsystem %s: missing serial number\n", name);
goto error;
}
if (spdk_nvmf_subsystem_set_sn(subsystem, sn)) {
SPDK_ERRLOG("Subsystem %s: invalid serial number '%s'\n", name, sn);
goto error;
}
for (j = 0; j < num_ns; j++) {
struct spdk_nvmf_ns_params *ns_params = &ns_list[j];
if (!ns_params->bdev_name) {
SPDK_ERRLOG("Namespace missing bdev name\n");
goto error;
}
bdev = spdk_bdev_get_by_name(ns_params->bdev_name);
if (bdev == NULL) {
SPDK_ERRLOG("Could not find namespace bdev '%s'\n", ns_params->bdev_name);
goto error;
}
if (spdk_nvmf_subsystem_add_ns(subsystem, bdev, ns_params->nsid) == 0) {
goto error;
}
SPDK_NOTICELOG("Attaching block device %s to subsystem %s\n",
spdk_bdev_get_name(bdev), spdk_nvmf_subsystem_get_nqn(subsystem));
}
return subsystem;
error:
spdk_nvmf_subsystem_destroy(subsystem);
return NULL;
}

View File

@ -1,31 +1,54 @@
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright (C) 2017 Intel Corporation.
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
#include "spdk/env.h"
#include "nvmf_tgt.h"
#include "spdk/event.h"
#include "spdk/log.h"
#define SPDK_NVMF_BUILD_ETC "/usr/local/etc/nvmf"
#define SPDK_NVMF_DEFAULT_CONFIG SPDK_NVMF_BUILD_ETC "/nvmf.conf"
static void
nvmf_usage(void)
{
}
static int
static void
nvmf_parse_arg(int ch, char *arg)
{
return 0;
}
static void
nvmf_tgt_started(void *arg1)
{
if (getenv("MEMZONE_DUMP") != NULL) {
spdk_memzone_dump(stdout);
fflush(stdout);
}
}
int
@ -35,16 +58,13 @@ main(int argc, char **argv)
struct spdk_app_opts opts = {};
/* default value in opts */
spdk_app_opts_init(&opts, sizeof(opts));
spdk_app_opts_init(&opts);
opts.name = "nvmf";
if ((rc = spdk_app_parse_args(argc, argv, &opts, "", NULL,
nvmf_parse_arg, nvmf_usage)) !=
SPDK_APP_PARSE_ARGS_SUCCESS) {
exit(rc);
}
opts.config_file = SPDK_NVMF_DEFAULT_CONFIG;
opts.max_delay_us = 1000; /* 1 ms */
spdk_app_parse_args(argc, argv, &opts, "", nvmf_parse_arg, nvmf_usage);
rc = spdk_nvmf_tgt_start(&opts);
/* Blocks until the application is exiting */
rc = spdk_app_start(&opts, nvmf_tgt_started, NULL);
spdk_app_fini();
return rc;
}

495
app/nvmf_tgt/nvmf_rpc.c Normal file
View File

@ -0,0 +1,495 @@
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
#include "spdk/bdev.h"
#include "spdk/log.h"
#include "spdk/rpc.h"
#include "spdk/env.h"
#include "spdk/nvme.h"
#include "spdk/nvmf.h"
#include "spdk/util.h"
#include "nvmf_tgt.h"
static void
dump_nvmf_subsystem(struct spdk_json_write_ctx *w, struct spdk_nvmf_subsystem *subsystem)
{
struct spdk_nvmf_host *host;
struct spdk_nvmf_listener *listener;
spdk_json_write_object_begin(w);
spdk_json_write_name(w, "nqn");
spdk_json_write_string(w, spdk_nvmf_subsystem_get_nqn(subsystem));
spdk_json_write_name(w, "subtype");
if (spdk_nvmf_subsystem_get_type(subsystem) == SPDK_NVMF_SUBTYPE_NVME) {
spdk_json_write_string(w, "NVMe");
} else {
spdk_json_write_string(w, "Discovery");
}
spdk_json_write_name(w, "listen_addresses");
spdk_json_write_array_begin(w);
for (listener = spdk_nvmf_subsystem_get_first_listener(subsystem); listener != NULL;
listener = spdk_nvmf_subsystem_get_next_listener(subsystem, listener)) {
const struct spdk_nvme_transport_id *trid;
const char *trtype;
const char *adrfam;
trid = spdk_nvmf_listener_get_trid(listener);
spdk_json_write_object_begin(w);
trtype = spdk_nvme_transport_id_trtype_str(trid->trtype);
if (trtype == NULL) {
trtype = "unknown";
}
adrfam = spdk_nvme_transport_id_adrfam_str(trid->adrfam);
if (adrfam == NULL) {
adrfam = "unknown";
}
/* NOTE: "transport" is kept for compatibility; new code should use "trtype" */
spdk_json_write_name(w, "transport");
spdk_json_write_string(w, trtype);
spdk_json_write_name(w, "trtype");
spdk_json_write_string(w, trtype);
spdk_json_write_name(w, "adrfam");
spdk_json_write_string(w, adrfam);
spdk_json_write_name(w, "traddr");
spdk_json_write_string(w, trid->traddr);
spdk_json_write_name(w, "trsvcid");
spdk_json_write_string(w, trid->trsvcid);
spdk_json_write_object_end(w);
}
spdk_json_write_array_end(w);
spdk_json_write_name(w, "allow_any_host");
spdk_json_write_bool(w, spdk_nvmf_subsystem_get_allow_any_host(subsystem));
spdk_json_write_name(w, "hosts");
spdk_json_write_array_begin(w);
for (host = spdk_nvmf_subsystem_get_first_host(subsystem); host != NULL;
host = spdk_nvmf_subsystem_get_next_host(subsystem, host)) {
spdk_json_write_object_begin(w);
spdk_json_write_name(w, "nqn");
spdk_json_write_string(w, spdk_nvmf_host_get_nqn(host));
spdk_json_write_object_end(w);
}
spdk_json_write_array_end(w);
if (spdk_nvmf_subsystem_get_type(subsystem) == SPDK_NVMF_SUBTYPE_NVME) {
struct spdk_nvmf_ns *ns;
spdk_json_write_name(w, "serial_number");
spdk_json_write_string(w, spdk_nvmf_subsystem_get_sn(subsystem));
spdk_json_write_name(w, "namespaces");
spdk_json_write_array_begin(w);
for (ns = spdk_nvmf_subsystem_get_first_ns(subsystem); ns != NULL;
ns = spdk_nvmf_subsystem_get_next_ns(subsystem, ns)) {
spdk_json_write_object_begin(w);
spdk_json_write_name(w, "nsid");
spdk_json_write_int32(w, spdk_nvmf_ns_get_id(ns));
spdk_json_write_name(w, "bdev_name");
spdk_json_write_string(w, spdk_bdev_get_name(spdk_nvmf_ns_get_bdev(ns)));
/* NOTE: "name" is kept for compatibility only - new code should use bdev_name. */
spdk_json_write_name(w, "name");
spdk_json_write_string(w, spdk_bdev_get_name(spdk_nvmf_ns_get_bdev(ns)));
spdk_json_write_object_end(w);
}
spdk_json_write_array_end(w);
}
spdk_json_write_object_end(w);
}
static void
spdk_rpc_get_nvmf_subsystems(struct spdk_jsonrpc_request *request,
const struct spdk_json_val *params)
{
struct spdk_json_write_ctx *w;
struct spdk_nvmf_subsystem *subsystem;
if (params != NULL) {
spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INVALID_PARAMS,
"get_nvmf_subsystems requires no parameters");
return;
}
w = spdk_jsonrpc_begin_result(request);
if (w == NULL) {
return;
}
spdk_json_write_array_begin(w);
subsystem = spdk_nvmf_subsystem_get_first(g_tgt.tgt);
while (subsystem) {
dump_nvmf_subsystem(w, subsystem);
subsystem = spdk_nvmf_subsystem_get_next(subsystem);
}
spdk_json_write_array_end(w);
spdk_jsonrpc_end_result(request, w);
}
SPDK_RPC_REGISTER("get_nvmf_subsystems", spdk_rpc_get_nvmf_subsystems)
#define RPC_MAX_LISTEN_ADDRESSES 255
#define RPC_MAX_HOSTS 255
#define RPC_MAX_NAMESPACES 255
struct rpc_listen_addresses {
size_t num_listen_address;
struct rpc_listen_address addresses[RPC_MAX_LISTEN_ADDRESSES];
};
static const struct spdk_json_object_decoder rpc_listen_address_decoders[] = {
/* NOTE: "transport" is kept for compatibility; new code should use "trtype" */
{"transport", offsetof(struct rpc_listen_address, transport), spdk_json_decode_string, true},
{"trtype", offsetof(struct rpc_listen_address, transport), spdk_json_decode_string, true},
{"adrfam", offsetof(struct rpc_listen_address, adrfam), spdk_json_decode_string, true},
{"traddr", offsetof(struct rpc_listen_address, traddr), spdk_json_decode_string},
{"trsvcid", offsetof(struct rpc_listen_address, trsvcid), spdk_json_decode_string},
};
static int
decode_rpc_listen_address(const struct spdk_json_val *val, void *out)
{
struct rpc_listen_address *req = (struct rpc_listen_address *)out;
if (spdk_json_decode_object(val, rpc_listen_address_decoders,
SPDK_COUNTOF(rpc_listen_address_decoders),
req)) {
SPDK_ERRLOG("spdk_json_decode_object failed\n");
return -1;
}
return 0;
}
static int
decode_rpc_listen_addresses(const struct spdk_json_val *val, void *out)
{
struct rpc_listen_addresses *listen_addresses = out;
return spdk_json_decode_array(val, decode_rpc_listen_address, &listen_addresses->addresses,
RPC_MAX_LISTEN_ADDRESSES,
&listen_addresses->num_listen_address, sizeof(struct rpc_listen_address));
}
struct rpc_hosts {
size_t num_hosts;
char *hosts[RPC_MAX_HOSTS];
};
static int
decode_rpc_hosts(const struct spdk_json_val *val, void *out)
{
struct rpc_hosts *rpc_hosts = out;
return spdk_json_decode_array(val, spdk_json_decode_string, rpc_hosts->hosts, RPC_MAX_HOSTS,
&rpc_hosts->num_hosts, sizeof(char *));
}
struct rpc_namespaces {
size_t num_ns;
struct spdk_nvmf_ns_params ns_params[RPC_MAX_NAMESPACES];
};
static const struct spdk_json_object_decoder rpc_ns_params_decoders[] = {
{"nsid", offsetof(struct spdk_nvmf_ns_params, nsid), spdk_json_decode_uint32, true},
{"bdev_name", offsetof(struct spdk_nvmf_ns_params, bdev_name), spdk_json_decode_string},
};
static void
free_rpc_namespaces(struct rpc_namespaces *r)
{
size_t i;
for (i = 0; i < r->num_ns; i++) {
free(r->ns_params[i].bdev_name);
}
}
static int
decode_rpc_ns_params(const struct spdk_json_val *val, void *out)
{
struct spdk_nvmf_ns_params *ns_params = out;
return spdk_json_decode_object(val, rpc_ns_params_decoders,
SPDK_COUNTOF(rpc_ns_params_decoders),
ns_params);
}
static int
decode_rpc_namespaces(const struct spdk_json_val *val, void *out)
{
struct rpc_namespaces *namespaces = out;
char *names[RPC_MAX_NAMESPACES]; /* old format - array of strings (bdev names) */
size_t i;
int rc;
/* First try to decode namespaces as an array of objects (new format) */
if (spdk_json_decode_array(val, decode_rpc_ns_params, namespaces->ns_params,
SPDK_COUNTOF(namespaces->ns_params),
&namespaces->num_ns, sizeof(*namespaces->ns_params)) == 0) {
return 0;
}
/* If that fails, try to decode namespaces as an array of strings (old format) */
free_rpc_namespaces(namespaces);
memset(namespaces, 0, sizeof(*namespaces));
rc = spdk_json_decode_array(val, spdk_json_decode_string, names,
SPDK_COUNTOF(names),
&namespaces->num_ns, sizeof(char *));
if (rc == 0) {
/* Decoded old format - copy to ns_params (new format) */
for (i = 0; i < namespaces->num_ns; i++) {
namespaces->ns_params[i].bdev_name = names[i];
}
return 0;
}
/* Failed to decode - don't leave dangling string pointers around */
for (i = 0; i < namespaces->num_ns; i++) {
free(names[i]);
}
return rc;
}
static void
free_rpc_listen_addresses(struct rpc_listen_addresses *r)
{
size_t i;
for (i = 0; i < r->num_listen_address; i++) {
free(r->addresses[i].transport);
free(r->addresses[i].adrfam);
free(r->addresses[i].traddr);
free(r->addresses[i].trsvcid);
}
}
static void
free_rpc_hosts(struct rpc_hosts *r)
{
size_t i;
for (i = 0; i < r->num_hosts; i++) {
free(r->hosts[i]);
}
}
struct rpc_subsystem {
int32_t core;
char *mode;
char *nqn;
struct rpc_listen_addresses listen_addresses;
struct rpc_hosts hosts;
bool allow_any_host;
char *pci_address;
char *serial_number;
struct rpc_namespaces namespaces;
};
static void
free_rpc_subsystem(struct rpc_subsystem *req)
{
free(req->mode);
free(req->nqn);
free(req->serial_number);
free_rpc_namespaces(&req->namespaces);
free_rpc_listen_addresses(&req->listen_addresses);
free_rpc_hosts(&req->hosts);
}
static void
spdk_rpc_nvmf_subsystem_started(struct spdk_nvmf_subsystem *subsystem,
void *cb_arg, int status)
{
struct spdk_jsonrpc_request *request = cb_arg;
struct spdk_json_write_ctx *w;
w = spdk_jsonrpc_begin_result(request);
if (w == NULL) {
return;
}
spdk_json_write_bool(w, true);
spdk_jsonrpc_end_result(request, w);
}
static const struct spdk_json_object_decoder rpc_subsystem_decoders[] = {
{"core", offsetof(struct rpc_subsystem, core), spdk_json_decode_int32, true},
{"mode", offsetof(struct rpc_subsystem, mode), spdk_json_decode_string, true},
{"nqn", offsetof(struct rpc_subsystem, nqn), spdk_json_decode_string},
{"listen_addresses", offsetof(struct rpc_subsystem, listen_addresses), decode_rpc_listen_addresses},
{"hosts", offsetof(struct rpc_subsystem, hosts), decode_rpc_hosts, true},
{"allow_any_host", offsetof(struct rpc_subsystem, allow_any_host), spdk_json_decode_bool, true},
{"serial_number", offsetof(struct rpc_subsystem, serial_number), spdk_json_decode_string, true},
{"namespaces", offsetof(struct rpc_subsystem, namespaces), decode_rpc_namespaces, true},
};
static void
spdk_rpc_construct_nvmf_subsystem(struct spdk_jsonrpc_request *request,
const struct spdk_json_val *params)
{
struct rpc_subsystem req = {};
struct spdk_nvmf_subsystem *subsystem;
req.core = -1; /* Explicitly set the core as the uninitialized value */
if (spdk_json_decode_object(params, rpc_subsystem_decoders,
SPDK_COUNTOF(rpc_subsystem_decoders),
&req)) {
SPDK_ERRLOG("spdk_json_decode_object failed\n");
goto invalid;
}
/* Mode is no longer a valid parameter, but print out a nice
* message if it exists to inform users.
*/
if (req.mode) {
SPDK_NOTICELOG("Mode present in the construct NVMe-oF subsystem RPC.\n"
"Mode was removed as a valid parameter.\n");
if (strcasecmp(req.mode, "Virtual") == 0) {
SPDK_NOTICELOG("Your mode value is 'Virtual' which is now the only possible mode.\n"
"Your RPC will work as expected.\n");
} else {
SPDK_NOTICELOG("Please remove 'mode' from the RPC.\n");
goto invalid;
}
}
/* Core is no longer a valid parameter, but print out a nice
* message if it exists to inform users.
*/
if (req.core != -1) {
SPDK_NOTICELOG("Core present in the construct NVMe-oF subsystem RPC.\n"
"Core was removed as an option. Subsystems can now run on all available cores.\n");
SPDK_NOTICELOG("Ignoring it and continuing.\n");
}
subsystem = spdk_nvmf_construct_subsystem(req.nqn,
req.listen_addresses.num_listen_address,
req.listen_addresses.addresses,
req.hosts.num_hosts, req.hosts.hosts, req.allow_any_host,
req.serial_number,
req.namespaces.num_ns, req.namespaces.ns_params);
if (!subsystem) {
goto invalid;
}
free_rpc_subsystem(&req);
spdk_nvmf_subsystem_start(subsystem,
spdk_rpc_nvmf_subsystem_started,
request);
return;
invalid:
spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INVALID_PARAMS, "Invalid parameters");
free_rpc_subsystem(&req);
}
SPDK_RPC_REGISTER("construct_nvmf_subsystem", spdk_rpc_construct_nvmf_subsystem)
struct rpc_delete_subsystem {
char *nqn;
};
static void
free_rpc_delete_subsystem(struct rpc_delete_subsystem *r)
{
free(r->nqn);
}
static void
spdk_rpc_nvmf_subsystem_stopped(struct spdk_nvmf_subsystem *subsystem,
void *cb_arg, int status)
{
struct spdk_jsonrpc_request *request = cb_arg;
struct spdk_json_write_ctx *w;
spdk_nvmf_subsystem_destroy(subsystem);
w = spdk_jsonrpc_begin_result(request);
if (w == NULL) {
return;
}
spdk_json_write_bool(w, true);
spdk_jsonrpc_end_result(request, w);
}
static const struct spdk_json_object_decoder rpc_delete_subsystem_decoders[] = {
{"nqn", offsetof(struct rpc_delete_subsystem, nqn), spdk_json_decode_string},
};
static void
spdk_rpc_delete_nvmf_subsystem(struct spdk_jsonrpc_request *request,
const struct spdk_json_val *params)
{
struct rpc_delete_subsystem req = {};
struct spdk_nvmf_subsystem *subsystem;
if (spdk_json_decode_object(params, rpc_delete_subsystem_decoders,
SPDK_COUNTOF(rpc_delete_subsystem_decoders),
&req)) {
SPDK_ERRLOG("spdk_json_decode_object failed\n");
goto invalid;
}
if (req.nqn == NULL) {
SPDK_ERRLOG("missing name param\n");
goto invalid;
}
subsystem = spdk_nvmf_tgt_find_subsystem(g_tgt.tgt, req.nqn);
if (!subsystem) {
goto invalid;
}
free_rpc_delete_subsystem(&req);
spdk_nvmf_subsystem_stop(subsystem,
spdk_rpc_nvmf_subsystem_stopped,
request);
return;
invalid:
spdk_jsonrpc_send_error_response(request, SPDK_JSONRPC_ERROR_INVALID_PARAMS, "Invalid parameters");
free_rpc_delete_subsystem(&req);
}
SPDK_RPC_REGISTER("delete_nvmf_subsystem", spdk_rpc_delete_nvmf_subsystem)

361
app/nvmf_tgt/nvmf_tgt.c Normal file
View File

@ -0,0 +1,361 @@
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
#include "nvmf_tgt.h"
#include "spdk/bdev.h"
#include "spdk/event.h"
#include "spdk/io_channel.h"
#include "spdk/log.h"
#include "spdk/nvme.h"
#include "spdk/util.h"
struct nvmf_tgt_poll_group {
struct spdk_nvmf_poll_group *group;
};
struct nvmf_tgt g_tgt = {};
static struct nvmf_tgt_poll_group *g_poll_groups = NULL;
static size_t g_num_poll_groups = 0;
static size_t g_active_poll_groups = 0;
static struct spdk_poller *g_acceptor_poller = NULL;
static void nvmf_tgt_advance_state(void *arg1, void *arg2);
static void
_spdk_nvmf_shutdown_cb(void *arg1, void *arg2)
{
/* Still in initialization state, defer shutdown operation */
if (g_tgt.state < NVMF_TGT_RUNNING) {
spdk_event_call(spdk_event_allocate(spdk_env_get_current_core(),
_spdk_nvmf_shutdown_cb, NULL, NULL));
return;
} else if (g_tgt.state > NVMF_TGT_RUNNING) {
/* Already in Shutdown status, ignore the signal */
return;
}
g_tgt.state = NVMF_TGT_FINI_STOP_ACCEPTOR;
nvmf_tgt_advance_state(NULL, NULL);
}
static void
spdk_nvmf_shutdown_cb(void)
{
printf("\n=========================\n");
printf(" NVMF shutdown signal\n");
printf("=========================\n");
/* Always let the first core to handle the case */
if (spdk_env_get_current_core() != spdk_env_get_first_core()) {
spdk_event_call(spdk_event_allocate(spdk_env_get_first_core(),
_spdk_nvmf_shutdown_cb, NULL, NULL));
} else {
_spdk_nvmf_shutdown_cb(NULL, NULL);
}
}
struct spdk_nvmf_subsystem *
nvmf_tgt_create_subsystem(const char *name, enum spdk_nvmf_subtype subtype, uint32_t num_ns)
{
struct spdk_nvmf_subsystem *subsystem;
if (spdk_nvmf_tgt_find_subsystem(g_tgt.tgt, name)) {
SPDK_ERRLOG("Subsystem already exist\n");
return NULL;
}
subsystem = spdk_nvmf_subsystem_create(g_tgt.tgt, name, subtype, num_ns);
if (subsystem == NULL) {
SPDK_ERRLOG("Subsystem creation failed\n");
return NULL;
}
SPDK_NOTICELOG("allocated subsystem %s\n", name);
return subsystem;
}
int
nvmf_tgt_shutdown_subsystem_by_nqn(const char *nqn)
{
struct spdk_nvmf_subsystem *subsystem;
subsystem = spdk_nvmf_tgt_find_subsystem(g_tgt.tgt, nqn);
if (!subsystem) {
return -EINVAL;
}
spdk_nvmf_subsystem_destroy(subsystem);
return 0;
}
static void
nvmf_tgt_poll_group_add(void *arg1, void *arg2)
{
struct spdk_nvmf_qpair *qpair = arg1;
struct nvmf_tgt_poll_group *pg = arg2;
spdk_nvmf_poll_group_add(pg->group, qpair);
}
static void
new_qpair(struct spdk_nvmf_qpair *qpair)
{
struct spdk_event *event;
struct nvmf_tgt_poll_group *pg;
uint32_t core;
core = g_tgt.core;
g_tgt.core = spdk_env_get_next_core(core);
if (g_tgt.core == UINT32_MAX) {
g_tgt.core = spdk_env_get_first_core();
}
pg = &g_poll_groups[core];
assert(pg != NULL);
event = spdk_event_allocate(core, nvmf_tgt_poll_group_add, qpair, pg);
spdk_event_call(event);
}
static void
acceptor_poll(void *arg)
{
struct spdk_nvmf_tgt *tgt = arg;
spdk_nvmf_tgt_accept(tgt, new_qpair);
}
static void
nvmf_tgt_destroy_poll_group_done(void *ctx)
{
g_tgt.state = NVMF_TGT_FINI_FREE_RESOURCES;
nvmf_tgt_advance_state(NULL, NULL);
}
static void
nvmf_tgt_destroy_poll_group(void *ctx)
{
struct nvmf_tgt_poll_group *pg;
pg = &g_poll_groups[spdk_env_get_current_core()];
assert(pg != NULL);
spdk_nvmf_poll_group_destroy(pg->group);
pg->group = NULL;
assert(g_active_poll_groups > 0);
g_active_poll_groups--;
}
static void
nvmf_tgt_create_poll_group_done(void *ctx)
{
g_tgt.state = NVMF_TGT_INIT_START_SUBSYSTEMS;
nvmf_tgt_advance_state(NULL, NULL);
}
static void
nvmf_tgt_create_poll_group(void *ctx)
{
struct nvmf_tgt_poll_group *pg;
pg = &g_poll_groups[spdk_env_get_current_core()];
assert(pg != NULL);
pg->group = spdk_nvmf_poll_group_create(g_tgt.tgt);
if (pg->group == NULL) {
SPDK_ERRLOG("Failed to create poll group for core %u\n", spdk_env_get_current_core());
}
g_active_poll_groups++;
}
static void
nvmf_tgt_subsystem_started(struct spdk_nvmf_subsystem *subsystem,
void *cb_arg, int status)
{
subsystem = spdk_nvmf_subsystem_get_next(subsystem);
if (subsystem) {
spdk_nvmf_subsystem_start(subsystem, nvmf_tgt_subsystem_started, NULL);
return;
}
g_tgt.state = NVMF_TGT_INIT_START_ACCEPTOR;
nvmf_tgt_advance_state(NULL, NULL);
}
static void
nvmf_tgt_subsystem_stopped(struct spdk_nvmf_subsystem *subsystem,
void *cb_arg, int status)
{
subsystem = spdk_nvmf_subsystem_get_next(subsystem);
if (subsystem) {
spdk_nvmf_subsystem_stop(subsystem, nvmf_tgt_subsystem_stopped, NULL);
return;
}
g_tgt.state = NVMF_TGT_FINI_DESTROY_POLL_GROUPS;
nvmf_tgt_advance_state(NULL, NULL);
}
static void
nvmf_tgt_advance_state(void *arg1, void *arg2)
{
enum nvmf_tgt_state prev_state;
int rc = -1;
do {
prev_state = g_tgt.state;
switch (g_tgt.state) {
case NVMF_TGT_INIT_NONE: {
uint32_t core;
g_tgt.state = NVMF_TGT_INIT_PARSE_CONFIG;
/* Find the maximum core number */
SPDK_ENV_FOREACH_CORE(core) {
g_num_poll_groups = spdk_max(g_num_poll_groups, core + 1);
}
assert(g_num_poll_groups > 0);
g_poll_groups = calloc(g_num_poll_groups, sizeof(*g_poll_groups));
if (g_poll_groups == NULL) {
g_tgt.state = NVMF_TGT_ERROR;
rc = -ENOMEM;
break;
}
g_tgt.core = spdk_env_get_first_core();
break;
}
case NVMF_TGT_INIT_PARSE_CONFIG:
rc = spdk_nvmf_parse_conf();
if (rc < 0) {
SPDK_ERRLOG("spdk_nvmf_parse_conf() failed\n");
g_tgt.state = NVMF_TGT_ERROR;
rc = -EINVAL;
break;
}
g_tgt.state = NVMF_TGT_INIT_CREATE_POLL_GROUPS;
break;
case NVMF_TGT_INIT_CREATE_POLL_GROUPS:
/* Send a message to each thread and create a poll group */
spdk_for_each_thread(nvmf_tgt_create_poll_group,
NULL,
nvmf_tgt_create_poll_group_done);
break;
case NVMF_TGT_INIT_START_SUBSYSTEMS: {
struct spdk_nvmf_subsystem *subsystem;
subsystem = spdk_nvmf_subsystem_get_first(g_tgt.tgt);
if (subsystem) {
spdk_nvmf_subsystem_start(subsystem, nvmf_tgt_subsystem_started, NULL);
} else {
g_tgt.state = NVMF_TGT_INIT_START_ACCEPTOR;
}
break;
}
case NVMF_TGT_INIT_START_ACCEPTOR:
g_acceptor_poller = spdk_poller_register(acceptor_poll, g_tgt.tgt,
g_spdk_nvmf_tgt_conf.acceptor_poll_rate);
SPDK_NOTICELOG("Acceptor running\n");
g_tgt.state = NVMF_TGT_RUNNING;
break;
case NVMF_TGT_RUNNING:
if (getenv("MEMZONE_DUMP") != NULL) {
spdk_memzone_dump(stdout);
fflush(stdout);
}
break;
case NVMF_TGT_FINI_STOP_ACCEPTOR:
spdk_poller_unregister(&g_acceptor_poller);
g_tgt.state = NVMF_TGT_FINI_STOP_SUBSYSTEMS;
break;
case NVMF_TGT_FINI_STOP_SUBSYSTEMS: {
struct spdk_nvmf_subsystem *subsystem;
subsystem = spdk_nvmf_subsystem_get_first(g_tgt.tgt);
if (subsystem) {
spdk_nvmf_subsystem_stop(subsystem, nvmf_tgt_subsystem_stopped, NULL);
} else {
g_tgt.state = NVMF_TGT_FINI_DESTROY_POLL_GROUPS;
}
break;
}
case NVMF_TGT_FINI_DESTROY_POLL_GROUPS:
/* Send a message to each thread and destroy the poll group */
spdk_for_each_thread(nvmf_tgt_destroy_poll_group,
NULL,
nvmf_tgt_destroy_poll_group_done);
break;
case NVMF_TGT_FINI_FREE_RESOURCES:
spdk_nvmf_tgt_destroy(g_tgt.tgt);
g_tgt.state = NVMF_TGT_STOPPED;
break;
case NVMF_TGT_STOPPED:
spdk_app_stop(0);
return;
case NVMF_TGT_ERROR:
spdk_app_stop(rc);
return;
}
} while (g_tgt.state != prev_state);
}
int
spdk_nvmf_tgt_start(struct spdk_app_opts *opts)
{
int rc;
opts->shutdown_cb = spdk_nvmf_shutdown_cb;
/* Blocks until the application is exiting */
rc = spdk_app_start(opts, nvmf_tgt_advance_state, NULL, NULL);
spdk_app_fini();
return rc;
}

101
app/nvmf_tgt/nvmf_tgt.h Normal file
View File

@ -0,0 +1,101 @@
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef NVMF_TGT_H
#define NVMF_TGT_H
#include "spdk/stdinc.h"
#include "spdk/nvmf.h"
#include "spdk/queue.h"
#include "spdk/event.h"
struct rpc_listen_address {
char *transport;
char *adrfam;
char *traddr;
char *trsvcid;
};
struct spdk_nvmf_tgt_conf {
uint32_t acceptor_poll_rate;
};
enum nvmf_tgt_state {
NVMF_TGT_INIT_NONE = 0,
NVMF_TGT_INIT_PARSE_CONFIG,
NVMF_TGT_INIT_CREATE_POLL_GROUPS,
NVMF_TGT_INIT_START_SUBSYSTEMS,
NVMF_TGT_INIT_START_ACCEPTOR,
NVMF_TGT_RUNNING,
NVMF_TGT_FINI_STOP_ACCEPTOR,
NVMF_TGT_FINI_DESTROY_POLL_GROUPS,
NVMF_TGT_FINI_STOP_SUBSYSTEMS,
NVMF_TGT_FINI_FREE_RESOURCES,
NVMF_TGT_STOPPED,
NVMF_TGT_ERROR,
};
struct nvmf_tgt {
enum nvmf_tgt_state state;
struct spdk_nvmf_tgt *tgt;
uint32_t core; /* Round-robin tracking of cores for qpair assignment */
};
extern struct spdk_nvmf_tgt_conf g_spdk_nvmf_tgt_conf;
extern struct nvmf_tgt g_tgt;
int spdk_nvmf_parse_conf(void);
struct spdk_nvmf_subsystem *nvmf_tgt_create_subsystem(const char *name,
enum spdk_nvmf_subtype subtype, uint32_t num_ns);
struct spdk_nvmf_ns_params {
char *bdev_name;
uint32_t nsid;
};
struct spdk_nvmf_subsystem *spdk_nvmf_construct_subsystem(const char *name,
int num_listen_addresses, struct rpc_listen_address *addresses,
int num_hosts, char *hosts[], bool allow_any_host,
const char *sn, size_t num_ns, struct spdk_nvmf_ns_params *ns_list);
int
nvmf_tgt_shutdown_subsystem_by_nqn(const char *nqn);
int spdk_nvmf_tgt_start(struct spdk_app_opts *opts);
#endif

View File

@ -1 +0,0 @@
spdk_dd

View File

@ -1,22 +0,0 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2017 Intel Corporation.
# All rights reserved.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk
APP = spdk_dd
C_SRCS := spdk_dd.c
SPDK_LIB_LIST = $(ALL_MODULES_LIST) event event_bdev
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
install: $(APP)
$(INSTALL_APP)
uninstall:
$(UNINSTALL_APP)

File diff suppressed because it is too large Load Diff

View File

@ -1 +0,0 @@
spdk_lspci

View File

@ -1,22 +0,0 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation.
# All rights reserved.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk
APP = spdk_lspci
C_SRCS := spdk_lspci.c
SPDK_LIB_LIST = $(SOCK_MODULES_LIST) nvme vmd
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
install: $(APP)
$(INSTALL_APP)
uninstall:
$(UNINSTALL_APP)

View File

@ -1,89 +0,0 @@
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright (C) 2019 Intel Corporation.
* All rights reserved.
*/
#include "spdk/stdinc.h"
#include "spdk/env.h"
#include "spdk/vmd.h"
static void
usage(void)
{
printf("Usage: spdk_lspci\n");
printf("Print available SPDK PCI devices supported by NVMe driver.\n");
}
static int
pci_enum_cb(void *ctx, struct spdk_pci_device *dev)
{
return 0;
}
static void
print_pci_dev(void *ctx, struct spdk_pci_device *dev)
{
struct spdk_pci_addr pci_addr = spdk_pci_device_get_addr(dev);
char addr[32] = { 0 };
spdk_pci_addr_fmt(addr, sizeof(addr), &pci_addr);
printf("%s (%x %x)", addr,
spdk_pci_device_get_vendor_id(dev),
spdk_pci_device_get_device_id(dev));
if (strcmp(spdk_pci_device_get_type(dev), "vmd") == 0) {
printf(" (NVMe disk behind VMD) ");
}
if (dev->internal.driver == spdk_pci_vmd_get_driver()) {
printf(" (VMD) ");
}
printf("\n");
}
int
main(int argc, char **argv)
{
int op, rc = 0;
struct spdk_env_opts opts;
while ((op = getopt(argc, argv, "h")) != -1) {
switch (op) {
case 'h':
usage();
return 0;
default:
usage();
return 1;
}
}
spdk_env_opts_init(&opts);
opts.name = "spdk_lspci";
if (spdk_env_init(&opts) < 0) {
printf("Unable to initialize SPDK env\n");
return 1;
}
if (spdk_vmd_init()) {
printf("Failed to initialize VMD. Some NVMe devices can be unavailable.\n");
}
if (spdk_pci_enumerate(spdk_pci_nvme_get_driver(), pci_enum_cb, NULL)) {
printf("Unable to enumerate PCI nvme driver\n");
rc = 1;
goto exit;
}
printf("\nList of available PCI devices:\n");
spdk_pci_for_each_device(NULL, print_pci_dev);
exit:
spdk_vmd_fini();
spdk_env_fini();
return rc;
}

View File

@ -1 +0,0 @@
spdk_tgt

View File

@ -1,41 +0,0 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2018 Intel Corporation.
# All rights reserved.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk
APP = spdk_tgt
C_SRCS := spdk_tgt.c
SPDK_LIB_LIST = $(ALL_MODULES_LIST)
SPDK_LIB_LIST += event event_iscsi event_nvmf
ifeq ($(SPDK_ROOT_DIR)/lib/env_dpdk,$(CONFIG_ENV))
SPDK_LIB_LIST += env_dpdk_rpc
endif
ifeq ($(OS),Linux)
SPDK_LIB_LIST += event_nbd
ifeq ($(CONFIG_UBLK),y)
SPDK_LIB_LIST += event_ublk
endif
ifeq ($(CONFIG_VHOST),y)
SPDK_LIB_LIST += event_vhost_blk event_vhost_scsi
endif
ifeq ($(CONFIG_VFIO_USER),y)
SPDK_LIB_LIST += event_vfu_tgt
endif
endif
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
install: $(APP)
$(INSTALL_APP)
uninstall:
$(UNINSTALL_APP)

View File

@ -1,96 +0,0 @@
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright (C) 2018 Intel Corporation.
* All rights reserved.
*/
#include "spdk/stdinc.h"
#include "spdk/config.h"
#include "spdk/env.h"
#include "spdk/event.h"
#include "spdk/vhost.h"
#ifdef SPDK_CONFIG_VHOST
#define SPDK_VHOST_OPTS "S:"
#else
#define SPDK_VHOST_OPTS
#endif
static const char *g_pid_path = NULL;
static const char g_spdk_tgt_get_opts_string[] = "f:" SPDK_VHOST_OPTS;
static void
spdk_tgt_usage(void)
{
printf(" -f <file> pidfile save pid to file under given path\n");
#ifdef SPDK_CONFIG_VHOST
printf(" -S <path> directory where to create vhost sockets (default: pwd)\n");
#endif
}
static void
spdk_tgt_save_pid(const char *pid_path)
{
FILE *pid_file;
pid_file = fopen(pid_path, "w");
if (pid_file == NULL) {
fprintf(stderr, "Couldn't create pid file '%s': %s\n", pid_path, strerror(errno));
exit(EXIT_FAILURE);
}
fprintf(pid_file, "%d\n", getpid());
fclose(pid_file);
}
static int
spdk_tgt_parse_arg(int ch, char *arg)
{
switch (ch) {
case 'f':
g_pid_path = arg;
break;
#ifdef SPDK_CONFIG_VHOST
case 'S':
spdk_vhost_set_socket_path(arg);
break;
#endif
default:
return -EINVAL;
}
return 0;
}
static void
spdk_tgt_started(void *arg1)
{
if (g_pid_path) {
spdk_tgt_save_pid(g_pid_path);
}
if (getenv("MEMZONE_DUMP") != NULL) {
spdk_memzone_dump(stdout);
fflush(stdout);
}
}
int
main(int argc, char **argv)
{
struct spdk_app_opts opts = {};
int rc;
spdk_app_opts_init(&opts, sizeof(opts));
opts.name = "spdk_tgt";
if ((rc = spdk_app_parse_args(argc, argv, &opts, g_spdk_tgt_get_opts_string,
NULL, spdk_tgt_parse_arg, spdk_tgt_usage)) !=
SPDK_APP_PARSE_ARGS_SUCCESS) {
return rc;
}
rc = spdk_app_start(&opts, spdk_tgt_started, NULL);
spdk_app_fini();
return rc;
}

View File

@ -1 +0,0 @@
spdk_top

View File

@ -1,22 +0,0 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation.
# All rights reserved.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
APP = spdk_top
C_SRCS := spdk_top.c
SPDK_LIB_LIST = rpc
LIBS=-lpanel -lmenu -lncurses
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
install: $(APP)
$(INSTALL_APP)
uninstall:
$(UNINSTALL_APP)

View File

@ -1,74 +0,0 @@
Contents
========
- Overview
- Installation
- Usage
Overview
========
This application provides SPDK live statistics regarding usage of cores,
threads, pollers, execution times, and relations between those. All data
is being gathered from SPDK by calling appropriate RPC calls. Application
consists of three selectable tabs providing statistics related to three
main topics:
- Threads
- Pollers
- Cores
Installation
============
spdk_top requires Ncurses library (can by installed by running
spdk/scripts/pkgdep.sh) and is compiled by default when SPDK compiles.
Usage
=====
To run spdk_top:
sudo spdk_top [options]
options:
-r <path> RPC listen address (optional, default: /var/tmp/spdk.sock)
-h show help message
Application consists of:
- Tabs list (on top)
- Statistics window (main windows in the middle)
- Options window (below statistics window)
- Page indicator / Error status
Tabs list shows available tabs and highlights currently selected tab.
Statistics window displays current statistics. Available statistics
depend on which tab is currently selected. All time and run counter
related statistics are relative - show elapsed time / number of runs
since previous data refresh. Options windows provide hotkeys list
to change application settings. Available options are:
- [q] Quit - quit the application
- [1-3] TAB selection - select tab to be displayed
- [PgUp] Previous page - go to previous page
- [PgDown] Next page - go to next page
- [c] Columns - select which columns should be visible / hidden:
Use arrow up / down and space / enter keys to select which columns
should be visible. Select 'CLOSE' to confirm changes and close
the window.
- [s] Sorting - change data sorting:
Use arrow up / down to select based on which column data should be
sorted. Use enter key to confirm or esc key to exit without
changing current sorting scheme.
- [r] Refresh rate - change data refresh rate:
Enter new data refresh rate value. Refresh rate accepts value
between 0 and 255 seconds. Use enter key to apply or escape key
to cancel.
Page indicator show current data page. Error status can be displayed
on bottom right side of the screen when the application encountered
an error.

File diff suppressed because it is too large Load Diff

View File

@ -1,23 +1,50 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation.
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk
APP = spdk_trace
SPDK_NO_LINK_ENV = 1
SPDK_LIB_LIST += json trace_parser
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
CXX_SRCS := trace.cpp
include $(SPDK_ROOT_DIR)/mk/spdk.app_cxx.mk
APP = spdk_trace
install: $(APP)
$(INSTALL_APP)
all: $(APP)
uninstall:
$(UNINSTALL_APP)
$(APP): $(OBJS) $(SPDK_LIBS)
$(LINK_CXX)
clean:
$(CLEAN_C) $(APP)
include $(SPDK_ROOT_DIR)/mk/spdk.deps.mk

View File

@ -1,56 +1,91 @@
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright (C) 2016 Intel Corporation.
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
#include "spdk/env.h"
#include "spdk/json.h"
#include "spdk/likely.h"
#include "spdk/string.h"
#include "spdk/util.h"
#include <map>
extern "C" {
#include "spdk/trace_parser.h"
#include "spdk/util.h"
#include "spdk/trace.h"
}
static struct spdk_trace_parser *g_parser;
static const struct spdk_trace_flags *g_flags;
static struct spdk_json_write_ctx *g_json;
static bool g_print_tsc = false;
/* This is a bit ugly, but we don't want to include env_dpdk in the app, while spdk_util, which we
* do need, uses some of the functions implemented there. We're not actually using the functions
* that depend on those, so just define them as no-ops to allow the app to link.
*/
extern "C" {
void *
spdk_realloc(void *buf, size_t size, size_t align)
{
assert(false);
return NULL;
}
void
spdk_free(void *buf)
{
assert(false);
}
uint64_t
spdk_get_ticks(void)
{
return 0;
}
} /* extern "C" */
static struct spdk_trace_histories *g_histories;
static void usage(void);
static char *g_exe_name;
struct entry_key {
entry_key(uint16_t _lcore, uint64_t _tsc) : lcore(_lcore), tsc(_tsc) {}
uint16_t lcore;
uint64_t tsc;
};
class compare_entry_key
{
public:
bool operator()(const entry_key &first, const entry_key &second) const
{
if (first.tsc == second.tsc) {
return first.lcore < second.lcore;
} else {
return first.tsc < second.tsc;
}
}
};
typedef std::map<entry_key, spdk_trace_entry *, compare_entry_key> entry_map;
entry_map g_entry_map;
struct object_stats {
std::map<uint64_t, uint64_t> start;
std::map<uint64_t, uint64_t> index;
std::map<uint64_t, uint64_t> size;
std::map<uint64_t, uint64_t> tpoint_id;
uint64_t counter;
object_stats() : start(), index(), size(), tpoint_id(), counter(0) {}
};
struct object_stats g_stats[SPDK_TRACE_MAX_OBJECT];
static char *exe_name;
static int verbose = 1;
static int g_fudge_factor = 20;
static uint64_t tsc_rate;
static uint64_t first_tsc = 0x0;
static uint64_t last_tsc = -1ULL;
static float
get_us_from_tsc(uint64_t tsc, uint64_t tsc_rate)
@ -58,19 +93,10 @@ get_us_from_tsc(uint64_t tsc, uint64_t tsc_rate)
return ((float)tsc) * 1000 * 1000 / tsc_rate;
}
static const char *
format_argname(const char *name)
{
static char namebuf[16];
snprintf(namebuf, sizeof(namebuf), "%s: ", name);
return namebuf;
}
static void
print_ptr(const char *arg_string, uint64_t arg)
{
printf("%-7.7s0x%-14jx ", format_argname(arg_string), arg);
printf("%-7.7s0x%-14jx ", arg_string, arg);
}
static void
@ -81,13 +107,7 @@ print_uint64(const char *arg_string, uint64_t arg)
* for FLUSH WRITEBUF when writev() returns -1 due to full
* socket buffer.
*/
printf("%-7.7s%-16jd ", format_argname(arg_string), arg);
}
static void
print_string(const char *arg_string, const char *arg)
{
printf("%-7.7s%-16.16s ", format_argname(arg_string), arg);
printf("%-7.7s%-16jd ", arg_string, arg);
}
static void
@ -101,46 +121,60 @@ print_size(uint32_t size)
}
static void
print_object_id(const struct spdk_trace_tpoint *d, struct spdk_trace_parser_entry *entry)
print_object_id(uint8_t type, uint64_t id)
{
/* Set size to 128 and 256 bytes to make sure we can fit all the characters we need */
char related_id[128] = {'\0'};
char ids[256] = {'\0'};
if (entry->related_type != OBJECT_NONE) {
snprintf(related_id, sizeof(related_id), " (%c%jd)",
g_flags->object[entry->related_type].id_prefix,
entry->related_index);
}
snprintf(ids, sizeof(ids), "%c%jd%s", g_flags->object[d->object_type].id_prefix,
entry->object_index, related_id);
printf("id: %-17s", ids);
printf("id: %c%-15jd ", g_histories->flags.object[type].id_prefix, id);
}
static void
print_float(const char *arg_string, float arg)
{
printf("%-7s%-16.3f ", format_argname(arg_string), arg);
printf("%-7s%-16.3f ", arg_string, arg);
}
static void
print_event(struct spdk_trace_parser_entry *entry, uint64_t tsc_rate, uint64_t tsc_offset)
print_arg(bool arg_is_ptr, const char *arg_string, uint64_t arg)
{
struct spdk_trace_entry *e = entry->entry;
const struct spdk_trace_tpoint *d;
float us;
size_t i;
if (arg_string[0] == 0) {
return;
}
if (arg_is_ptr) {
print_ptr(arg_string, arg);
} else {
print_uint64(arg_string, arg);
}
}
static void
print_event(struct spdk_trace_entry *e, uint64_t tsc_rate,
uint64_t tsc_offset, uint16_t lcore)
{
struct spdk_trace_tpoint *d;
struct object_stats *stats;
float us;
d = &g_histories->flags.tpoint[e->tpoint_id];
stats = &g_stats[d->object_type];
if (d->new_object) {
stats->index[e->object_id] = stats->counter++;
stats->tpoint_id[e->object_id] = e->tpoint_id;
stats->start[e->object_id] = e->tsc;
stats->size[e->object_id] = e->size;
}
if (d->arg1_is_alias) {
stats->index[e->arg1] = stats->index[e->object_id];
stats->start[e->arg1] = stats->start[e->object_id];
stats->size[e->arg1] = stats->size[e->object_id];
}
d = &g_flags->tpoint[e->tpoint_id];
us = get_us_from_tsc(e->tsc - tsc_offset, tsc_rate);
printf("%2d: %10.3f ", entry->lcore, us);
if (g_print_tsc) {
printf("(%9ju) ", e->tsc - tsc_offset);
}
if (g_flags->owner[d->owner_type].id_prefix) {
printf("%c%02d ", g_flags->owner[d->owner_type].id_prefix, e->poller_id);
printf("%2d: %10.3f (%9ju) ", lcore, us, e->tsc - tsc_offset);
if (g_histories->flags.owner[d->owner_type].id_prefix) {
printf("%c%02d ", g_histories->flags.owner[d->owner_type].id_prefix, e->poller_id);
} else {
printf("%4s", " ");
}
@ -149,209 +183,140 @@ print_event(struct spdk_trace_parser_entry *entry, uint64_t tsc_rate, uint64_t t
print_size(e->size);
if (d->new_object) {
print_object_id(d, entry);
print_arg(d->arg1_is_ptr, d->arg1_name, e->arg1);
print_object_id(d->object_type, stats->index[e->object_id]);
} else if (d->object_type != OBJECT_NONE) {
if (entry->object_index != UINT64_MAX) {
us = get_us_from_tsc(e->tsc - entry->object_start, tsc_rate);
print_object_id(d, entry);
print_float("time", us);
if (stats->start.find(e->object_id) != stats->start.end()) {
struct spdk_trace_tpoint *start_description;
us = get_us_from_tsc(e->tsc - stats->start[e->object_id],
tsc_rate);
print_object_id(d->object_type, stats->index[e->object_id]);
print_float("time:", us);
start_description = &g_histories->flags.tpoint[stats->tpoint_id[e->object_id]];
if (start_description->short_name[0] != 0) {
printf(" (%.4s)", start_description->short_name);
}
} else {
printf("id: N/A");
}
} else if (e->object_id != 0) {
print_ptr("object", e->object_id);
}
for (i = 0; i < d->num_args; ++i) {
switch (d->args[i].type) {
case SPDK_TRACE_ARG_TYPE_PTR:
print_ptr(d->args[i].name, (uint64_t)entry->args[i].pointer);
break;
case SPDK_TRACE_ARG_TYPE_INT:
print_uint64(d->args[i].name, entry->args[i].integer);
break;
case SPDK_TRACE_ARG_TYPE_STR:
print_string(d->args[i].name, entry->args[i].string);
break;
}
} else {
print_arg(d->arg1_is_ptr, d->arg1_name, e->arg1);
}
printf("\n");
}
static void
print_event_json(struct spdk_trace_parser_entry *entry, uint64_t tsc_rate, uint64_t tsc_offset)
process_event(struct spdk_trace_entry *e, uint64_t tsc_rate,
uint64_t tsc_offset, uint16_t lcore)
{
struct spdk_trace_entry *e = entry->entry;
const struct spdk_trace_tpoint *d;
size_t i;
d = &g_flags->tpoint[e->tpoint_id];
spdk_json_write_object_begin(g_json);
spdk_json_write_named_uint64(g_json, "lcore", entry->lcore);
spdk_json_write_named_uint64(g_json, "tpoint", e->tpoint_id);
spdk_json_write_named_uint64(g_json, "tsc", e->tsc);
if (g_flags->owner[d->owner_type].id_prefix) {
spdk_json_write_named_string_fmt(g_json, "poller", "%c%02d",
g_flags->owner[d->owner_type].id_prefix,
e->poller_id);
if (verbose) {
print_event(e, tsc_rate, tsc_offset, lcore);
}
if (e->size != 0) {
spdk_json_write_named_uint32(g_json, "size", e->size);
}
if (d->new_object || d->object_type != OBJECT_NONE || e->object_id != 0) {
char object_type;
spdk_json_write_named_object_begin(g_json, "object");
if (d->new_object) {
object_type = g_flags->object[d->object_type].id_prefix;
spdk_json_write_named_string_fmt(g_json, "id", "%c%" PRIu64, object_type,
entry->object_index);
} else if (d->object_type != OBJECT_NONE) {
object_type = g_flags->object[d->object_type].id_prefix;
if (entry->object_index != UINT64_MAX) {
spdk_json_write_named_string_fmt(g_json, "id", "%c%" PRIu64,
object_type,
entry->object_index);
spdk_json_write_named_uint64(g_json, "time",
e->tsc - entry->object_start);
}
}
spdk_json_write_named_uint64(g_json, "value", e->object_id);
spdk_json_write_object_end(g_json);
}
/* Print related objects array */
if (entry->related_index != UINT64_MAX) {
spdk_json_write_named_string_fmt(g_json, "related", "%c%" PRIu64,
g_flags->object[entry->related_type].id_prefix,
entry->related_index);
}
if (d->num_args > 0) {
spdk_json_write_named_array_begin(g_json, "args");
for (i = 0; i < d->num_args; ++i) {
switch (d->args[i].type) {
case SPDK_TRACE_ARG_TYPE_PTR:
spdk_json_write_uint64(g_json, (uint64_t)entry->args[i].pointer);
break;
case SPDK_TRACE_ARG_TYPE_INT:
spdk_json_write_uint64(g_json, entry->args[i].integer);
break;
case SPDK_TRACE_ARG_TYPE_STR:
spdk_json_write_string(g_json, entry->args[i].string);
break;
}
}
spdk_json_write_array_end(g_json);
}
spdk_json_write_object_end(g_json);
}
static void
process_event(struct spdk_trace_parser_entry *e, uint64_t tsc_rate, uint64_t tsc_offset)
{
if (g_json == NULL) {
print_event(e, tsc_rate, tsc_offset);
} else {
print_event_json(e, tsc_rate, tsc_offset);
}
}
static void
print_tpoint_definitions(void)
{
const struct spdk_trace_tpoint *tpoint;
size_t i, j;
/* We only care about these when printing JSON */
if (!g_json) {
return;
}
spdk_json_write_named_uint64(g_json, "tsc_rate", g_flags->tsc_rate);
spdk_json_write_named_array_begin(g_json, "tpoints");
for (i = 0; i < SPDK_COUNTOF(g_flags->tpoint); ++i) {
tpoint = &g_flags->tpoint[i];
if (tpoint->tpoint_id == 0) {
continue;
}
spdk_json_write_object_begin(g_json);
spdk_json_write_named_string(g_json, "name", tpoint->name);
spdk_json_write_named_uint32(g_json, "id", tpoint->tpoint_id);
spdk_json_write_named_bool(g_json, "new_object", tpoint->new_object);
spdk_json_write_named_array_begin(g_json, "args");
for (j = 0; j < tpoint->num_args; ++j) {
spdk_json_write_object_begin(g_json);
spdk_json_write_named_string(g_json, "name", tpoint->args[j].name);
spdk_json_write_named_uint32(g_json, "type", tpoint->args[j].type);
spdk_json_write_named_uint32(g_json, "size", tpoint->args[j].size);
spdk_json_write_object_end(g_json);
}
spdk_json_write_array_end(g_json);
spdk_json_write_object_end(g_json);
}
spdk_json_write_array_end(g_json);
}
static int
print_json(void *cb_ctx, const void *data, size_t size)
populate_events(struct spdk_trace_history *history)
{
ssize_t rc;
int i, entry_size, history_size, num_entries, num_entries_filled;
struct spdk_trace_entry *e;
int first, last, lcore;
while (size > 0) {
rc = write(STDOUT_FILENO, data, size);
if (rc < 0) {
fprintf(stderr, "%s: %s\n", g_exe_name, spdk_strerror(errno));
abort();
lcore = history->lcore;
entry_size = sizeof(history->entries[0]);
history_size = sizeof(history->entries);
num_entries = history_size / entry_size;
e = history->entries;
num_entries_filled = num_entries;
while (e[num_entries_filled - 1].tsc == 0) {
num_entries_filled--;
}
size -= rc;
if (num_entries == num_entries_filled) {
first = last = 0;
for (i = 1; i < num_entries; i++) {
if (e[i].tsc < e[first].tsc) {
first = i;
}
if (e[i].tsc > e[last].tsc) {
last = i;
}
}
return 0;
first += g_fudge_factor;
if (first >= num_entries) {
first -= num_entries;
}
static void
usage(void)
last -= g_fudge_factor;
if (last < 0) {
last += num_entries;
}
} else {
first = 0;
last = num_entries_filled - 1;
}
/*
* We keep track of the highest first TSC out of all reactors and
* the lowest last TSC out of all reactors. We will ignore any
* events outside the range of these two TSC values. This will
* ensure we only print data for the subset of time where we have
* data across all reactors.
*/
if (e[first].tsc > first_tsc) {
first_tsc = e[first].tsc;
}
if (e[last].tsc < last_tsc) {
last_tsc = e[last].tsc;
}
i = first;
while (1) {
g_entry_map[entry_key(lcore, e[i].tsc)] = &e[i];
if (i == last) {
break;
}
i++;
if (i == num_entries_filled) {
i = 0;
}
}
return (0);
}
static void usage(void)
{
fprintf(stderr, "usage:\n");
fprintf(stderr, " %s <option> <lcore#>\n", g_exe_name);
fprintf(stderr, " %s <option> <lcore#>\n", exe_name);
fprintf(stderr, " option = '-q' to disable verbose mode\n");
fprintf(stderr, " '-s' to specify spdk_trace shm name\n");
fprintf(stderr, " '-c' to display single lcore history\n");
fprintf(stderr, " '-t' to display TSC offset for each event\n");
fprintf(stderr, " '-s' to specify spdk_trace shm name for a\n");
fprintf(stderr, " currently running process\n");
fprintf(stderr, " '-f' to specify number of events to ignore at\n");
fprintf(stderr, " beginning and end of trace (default: 20)\n");
fprintf(stderr, " '-i' to specify the shared memory ID\n");
fprintf(stderr, " '-p' to specify the trace PID\n");
fprintf(stderr, " (If -s is specified, then one of\n");
fprintf(stderr, " -i or -p must be specified)\n");
fprintf(stderr, " '-f' to specify a tracepoint file name\n");
fprintf(stderr, " (-s and -f are mutually exclusive)\n");
fprintf(stderr, " '-j' to use JSON to format the output\n");
fprintf(stderr, " (One of -i or -p must be specified)\n");
}
int
main(int argc, char **argv)
int main(int argc, char **argv)
{
struct spdk_trace_parser_opts opts;
struct spdk_trace_parser_entry entry;
void *history_ptr;
struct spdk_trace_history *history_entries, *history;
int fd, i;
int lcore = SPDK_TRACE_MAX_LCORE;
uint64_t tsc_offset, entry_count;
const char *app_name = NULL;
const char *file_name = NULL;
int op, i;
uint64_t tsc_offset;
const char *app_name = "ids";
int op;
char shm_name[64];
int shm_id = -1, shm_pid = -1;
bool json = false;
g_exe_name = argv[0];
while ((op = getopt(argc, argv, "c:f:i:jp:s:t")) != -1) {
exe_name = argv[0];
while ((op = getopt(argc, argv, "c:f:i:p:qs:")) != -1) {
switch (op) {
case 'c':
lcore = atoi(optarg);
@ -362,101 +327,95 @@ main(int argc, char **argv)
exit(1);
}
break;
case 'f':
g_fudge_factor = atoi(optarg);
break;
case 'i':
shm_id = atoi(optarg);
break;
case 'p':
shm_pid = atoi(optarg);
break;
case 'q':
verbose = 0;
break;
case 's':
app_name = optarg;
break;
case 'f':
file_name = optarg;
break;
case 't':
g_print_tsc = true;
break;
case 'j':
json = true;
break;
default:
usage();
exit(1);
}
}
if (file_name != NULL && app_name != NULL) {
fprintf(stderr, "-f and -s are mutually exclusive\n");
usage();
exit(1);
}
if (file_name == NULL && app_name == NULL) {
fprintf(stderr, "One of -f and -s must be specified\n");
usage();
exit(1);
}
if (json) {
g_json = spdk_json_write_begin(print_json, NULL, 0);
if (g_json == NULL) {
fprintf(stderr, "Failed to allocate JSON write context\n");
exit(1);
}
}
if (!file_name) {
if (shm_id >= 0) {
snprintf(shm_name, sizeof(shm_name), "/%s_trace.%d", app_name, shm_id);
} else {
snprintf(shm_name, sizeof(shm_name), "/%s_trace.pid%d", app_name, shm_pid);
}
file_name = shm_name;
fd = shm_open(shm_name, O_RDONLY, 0600);
if (fd < 0) {
fprintf(stderr, "Could not open shm %s.\n", shm_name);
usage();
exit(-1);
}
opts.filename = file_name;
opts.lcore = lcore;
opts.mode = app_name == NULL ? SPDK_TRACE_PARSER_MODE_FILE : SPDK_TRACE_PARSER_MODE_SHM;
g_parser = spdk_trace_parser_init(&opts);
if (g_parser == NULL) {
fprintf(stderr, "Failed to initialize trace parser\n");
exit(1);
history_ptr = mmap(NULL, sizeof(*g_histories), PROT_READ, MAP_SHARED, fd, 0);
if (history_ptr == MAP_FAILED) {
fprintf(stderr, "Could not mmap shm %s.\n", shm_name);
usage();
exit(-1);
}
g_flags = spdk_trace_parser_get_flags(g_parser);
if (!g_json) {
printf("TSC Rate: %ju\n", g_flags->tsc_rate);
} else {
spdk_json_write_object_begin(g_json);
print_tpoint_definitions();
spdk_json_write_named_array_begin(g_json, "entries");
g_histories = (struct spdk_trace_histories *)history_ptr;
tsc_rate = g_histories->flags.tsc_rate;
if (tsc_rate == 0) {
fprintf(stderr, "Invalid tsc_rate %ju\n", tsc_rate);
usage();
exit(-1);
}
for (i = 0; i < SPDK_TRACE_MAX_LCORE; ++i) {
if (lcore == SPDK_TRACE_MAX_LCORE || i == lcore) {
entry_count = spdk_trace_parser_get_entry_count(g_parser, i);
if (entry_count > 0) {
printf("Trace Size of lcore (%d): %ju\n", i, entry_count);
}
}
if (verbose) {
printf("TSC Rate: %ju\n", tsc_rate);
}
tsc_offset = spdk_trace_parser_get_tsc_offset(g_parser);
while (spdk_trace_parser_next_entry(g_parser, &entry)) {
if (entry.entry->tsc < tsc_offset) {
history_entries = (struct spdk_trace_history *)malloc(sizeof(g_histories->per_lcore_history));
if (history_entries == NULL) {
goto cleanup;
}
memcpy(history_entries, g_histories->per_lcore_history,
sizeof(g_histories->per_lcore_history));
if (lcore == SPDK_TRACE_MAX_LCORE) {
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
history = &history_entries[i];
if (history->entries[0].tsc == 0) {
continue;
}
process_event(&entry, g_flags->tsc_rate, tsc_offset);
populate_events(history);
}
} else {
history = &history_entries[lcore];
if (history->entries[0].tsc != 0) {
populate_events(history);
}
}
if (g_json != NULL) {
spdk_json_write_array_end(g_json);
spdk_json_write_object_end(g_json);
spdk_json_write_end(g_json);
tsc_offset = first_tsc;
for (entry_map::iterator it = g_entry_map.begin(); it != g_entry_map.end(); it++) {
if (it->first.tsc < first_tsc || it->first.tsc > last_tsc) {
continue;
}
process_event(it->second, tsc_rate, tsc_offset, it->first.lcore);
}
spdk_trace_parser_cleanup(g_parser);
free(history_entries);
cleanup:
munmap(history_ptr, sizeof(*g_histories));
close(fd);
return (0);
}

View File

@ -1 +0,0 @@
spdk_trace_record

View File

@ -1,21 +0,0 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation.
# All rights reserved.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
SPDK_LIB_LIST = util log
APP = spdk_trace_record
C_SRCS := trace_record.c
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
install: $(APP)
$(INSTALL_APP)
uninstall:
$(UNINSTALL_APP)

View File

@ -1,706 +0,0 @@
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright (C) 2018 Intel Corporation.
* All rights reserved.
*/
#include "spdk/stdinc.h"
#include "spdk/env.h"
#include "spdk/string.h"
#include "spdk/trace.h"
#include "spdk/util.h"
#include "spdk/barrier.h"
#define TRACE_FILE_COPY_SIZE (32 * 1024)
#define TRACE_PATH_MAX 2048
static char *g_exe_name;
static int g_verbose = 1;
static uint64_t g_tsc_rate;
static uint64_t g_utsc_rate;
static bool g_shutdown = false;
static uint64_t g_histories_size;
struct lcore_trace_record_ctx {
char lcore_file[TRACE_PATH_MAX];
int fd;
bool valid;
struct spdk_trace_history *in_history;
struct spdk_trace_history *out_history;
/* Recorded next entry index in record */
uint64_t rec_next_entry;
/* Record tsc for report */
uint64_t first_entry_tsc;
uint64_t last_entry_tsc;
/* Total number of entries in lcore trace file */
uint64_t num_entries;
};
struct aggr_trace_record_ctx {
const char *out_file;
int out_fd;
int shm_fd;
struct lcore_trace_record_ctx lcore_ports[SPDK_TRACE_MAX_LCORE];
struct spdk_trace_histories *trace_histories;
};
static int
input_trace_file_mmap(struct aggr_trace_record_ctx *ctx, const char *shm_name)
{
void *history_ptr;
int i;
ctx->shm_fd = shm_open(shm_name, O_RDONLY, 0);
if (ctx->shm_fd < 0) {
fprintf(stderr, "Could not open %s.\n", shm_name);
return -1;
}
/* Map the header of trace file */
history_ptr = mmap(NULL, sizeof(struct spdk_trace_histories), PROT_READ, MAP_SHARED, ctx->shm_fd,
0);
if (history_ptr == MAP_FAILED) {
fprintf(stderr, "Could not mmap shm %s.\n", shm_name);
close(ctx->shm_fd);
return -1;
}
ctx->trace_histories = (struct spdk_trace_histories *)history_ptr;
g_tsc_rate = ctx->trace_histories->flags.tsc_rate;
g_utsc_rate = g_tsc_rate / 1000;
if (g_tsc_rate == 0) {
fprintf(stderr, "Invalid tsc_rate %ju\n", g_tsc_rate);
munmap(history_ptr, sizeof(struct spdk_trace_histories));
close(ctx->shm_fd);
return -1;
}
if (g_verbose) {
printf("TSC Rate: %ju\n", g_tsc_rate);
}
/* Remap the entire trace file */
g_histories_size = spdk_get_trace_histories_size(ctx->trace_histories);
munmap(history_ptr, sizeof(struct spdk_trace_histories));
history_ptr = mmap(NULL, g_histories_size, PROT_READ, MAP_SHARED, ctx->shm_fd, 0);
if (history_ptr == MAP_FAILED) {
fprintf(stderr, "Could not remmap shm %s.\n", shm_name);
close(ctx->shm_fd);
return -1;
}
ctx->trace_histories = (struct spdk_trace_histories *)history_ptr;
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
struct spdk_trace_history *history;
history = spdk_get_per_lcore_history(ctx->trace_histories, i);
ctx->lcore_ports[i].in_history = history;
ctx->lcore_ports[i].valid = (history != NULL);
if (g_verbose && history) {
printf("Number of trace entries for lcore (%d): %ju\n", i,
history->num_entries);
}
}
return 0;
}
static int
output_trace_files_prepare(struct aggr_trace_record_ctx *ctx, const char *aggr_path)
{
int flags = O_CREAT | O_EXCL | O_RDWR;
struct lcore_trace_record_ctx *port_ctx;
int name_len;
int i, rc;
/* Assign file names for related trace files */
ctx->out_file = aggr_path;
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
port_ctx = &ctx->lcore_ports[i];
/* Get the length of trace file name for each lcore with format "%s-%d" */
name_len = snprintf(port_ctx->lcore_file, TRACE_PATH_MAX, "%s-%d", ctx->out_file, i);
if (name_len >= TRACE_PATH_MAX) {
fprintf(stderr, "Length of file path (%s) exceeds limitation for lcore file.\n",
aggr_path);
goto err;
}
}
/* If output trace file already exists, try to unlink it together with its temporary files */
if (access(ctx->out_file, F_OK) == 0) {
rc = unlink(ctx->out_file);
if (rc) {
goto err;
}
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
port_ctx = &ctx->lcore_ports[i];
if (access(port_ctx->lcore_file, F_OK) == 0) {
rc = unlink(port_ctx->lcore_file);
if (rc) {
goto err;
}
}
}
}
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
port_ctx = &ctx->lcore_ports[i];
if (!port_ctx->valid) {
continue;
}
port_ctx->fd = open(port_ctx->lcore_file, flags, 0600);
if (port_ctx->fd < 0) {
fprintf(stderr, "Could not open lcore file %s.\n", port_ctx->lcore_file);
goto err;
}
if (g_verbose) {
printf("Create tmp lcore trace file %s for lcore %d\n", port_ctx->lcore_file, i);
}
port_ctx->out_history = calloc(1, sizeof(struct spdk_trace_history));
if (port_ctx->out_history == NULL) {
fprintf(stderr, "Failed to allocate memory for out_history.\n");
goto err;
}
}
return 0;
err:
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
port_ctx = &ctx->lcore_ports[i];
free(port_ctx->out_history);
if (port_ctx->fd > 0) {
close(port_ctx->fd);
}
}
return -1;
}
static void
output_trace_files_finish(struct aggr_trace_record_ctx *ctx)
{
struct lcore_trace_record_ctx *port_ctx;
int i;
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
port_ctx = &ctx->lcore_ports[i];
free(port_ctx->out_history);
close(port_ctx->fd);
unlink(port_ctx->lcore_file);
if (g_verbose) {
printf("Remove tmp lcore trace file %s for lcore %d\n", port_ctx->lcore_file, i);
}
}
}
static int
cont_write(int fildes, const void *buf, size_t nbyte)
{
int rc;
int _nbyte = nbyte;
while (_nbyte) {
rc = write(fildes, buf, _nbyte);
if (rc < 0) {
if (errno != EINTR) {
return -1;
}
continue;
}
_nbyte -= rc;
}
return nbyte;
}
static int
cont_read(int fildes, void *buf, size_t nbyte)
{
int rc;
int _nbyte = nbyte;
while (_nbyte) {
rc = read(fildes, buf, _nbyte);
if (rc == 0) {
return nbyte - _nbyte;
} else if (rc < 0) {
if (errno != EINTR) {
return -1;
}
continue;
}
_nbyte -= rc;
}
return nbyte;
}
static int
lcore_trace_last_entry_idx(struct spdk_trace_history *in_history, int cir_next_idx)
{
int last_idx;
if (cir_next_idx == 0) {
last_idx = in_history->num_entries - 1;
} else {
last_idx = cir_next_idx - 1;
}
return last_idx;
}
static int
circular_buffer_padding_backward(int fd, struct spdk_trace_history *in_history,
int cir_start, int cir_end)
{
int rc;
if (cir_end <= cir_start) {
fprintf(stderr, "Wrong using of circular_buffer_padding_back\n");
return -1;
}
rc = cont_write(fd, &in_history->entries[cir_start],
sizeof(struct spdk_trace_entry) * (cir_end - cir_start));
if (rc < 0) {
fprintf(stderr, "Failed to append entries into lcore file\n");
return rc;
}
return 0;
}
static int
circular_buffer_padding_across(int fd, struct spdk_trace_history *in_history,
int cir_start, int cir_end)
{
int rc;
int num_entries = in_history->num_entries;
if (cir_end > cir_start) {
fprintf(stderr, "Wrong using of circular_buffer_padding_across\n");
return -1;
}
rc = cont_write(fd, &in_history->entries[cir_start],
sizeof(struct spdk_trace_entry) * (num_entries - cir_start));
if (rc < 0) {
fprintf(stderr, "Failed to append entries into lcore file backward\n");
return rc;
}
if (cir_end == 0) {
return 0;
}
rc = cont_write(fd, &in_history->entries[0], sizeof(struct spdk_trace_entry) * cir_end);
if (rc < 0) {
fprintf(stderr, "Failed to append entries into lcore file forward\n");
return rc;
}
return 0;
}
static int
circular_buffer_padding_all(int fd, struct spdk_trace_history *in_history,
int cir_end)
{
return circular_buffer_padding_across(fd, in_history, cir_end, cir_end);
}
static int
lcore_trace_record(struct lcore_trace_record_ctx *lcore_port)
{
struct spdk_trace_history *in_history = lcore_port->in_history;
uint64_t rec_next_entry = lcore_port->rec_next_entry;
uint64_t rec_num_entries = lcore_port->num_entries;
int fd = lcore_port->fd;
uint64_t shm_next_entry;
uint64_t num_cir_entries;
uint64_t shm_cir_next;
uint64_t rec_cir_next;
int rc;
int last_idx;
shm_next_entry = in_history->next_entry;
/* Ensure all entries of spdk_trace_history are latest to next_entry */
spdk_smp_rmb();
if (shm_next_entry == rec_next_entry) {
/* There is no update */
return 0;
} else if (shm_next_entry < rec_next_entry) {
/* Error branch */
fprintf(stderr, "Trace porting error in lcore %d, trace rollback occurs.\n", in_history->lcore);
fprintf(stderr, "shm_next_entry is %ju, record_next_entry is %ju.\n", shm_next_entry,
rec_next_entry);
return -1;
}
num_cir_entries = in_history->num_entries;
shm_cir_next = shm_next_entry & (num_cir_entries - 1);
/* Record first entry's tsc and corresponding entries when recording first time. */
if (lcore_port->first_entry_tsc == 0) {
if (shm_next_entry < num_cir_entries) {
/* Updates haven't been across circular buffer yet.
* The first entry in shared memory is the eldest one.
*/
lcore_port->first_entry_tsc = in_history->entries[0].tsc;
lcore_port->num_entries += shm_cir_next;
rc = circular_buffer_padding_backward(fd, in_history, 0, shm_cir_next);
} else {
/* Updates have already been across circular buffer.
* The eldest entry in shared memory is pointed by shm_cir_next.
*/
lcore_port->first_entry_tsc = in_history->entries[shm_cir_next].tsc;
lcore_port->num_entries += num_cir_entries;
rc = circular_buffer_padding_all(fd, in_history, shm_cir_next);
}
goto out;
}
if (shm_next_entry - rec_next_entry > num_cir_entries) {
/* There must be missed updates */
fprintf(stderr, "Trace-record missed %ju trace entries\n",
shm_next_entry - rec_next_entry - num_cir_entries);
lcore_port->num_entries += num_cir_entries;
rc = circular_buffer_padding_all(fd, in_history, shm_cir_next);
} else if (shm_next_entry - rec_next_entry == num_cir_entries) {
/* All circular buffer is updated */
lcore_port->num_entries += num_cir_entries;
rc = circular_buffer_padding_all(fd, in_history, shm_cir_next);
} else {
/* Part of circular buffer is updated */
rec_cir_next = rec_next_entry & (num_cir_entries - 1);
if (shm_cir_next > rec_cir_next) {
/* Updates are not across circular buffer */
lcore_port->num_entries += shm_cir_next - rec_cir_next;
rc = circular_buffer_padding_backward(fd, in_history, rec_cir_next, shm_cir_next);
} else {
/* Updates are across circular buffer */
lcore_port->num_entries += num_cir_entries - rec_cir_next + shm_cir_next;
rc = circular_buffer_padding_across(fd, in_history, rec_cir_next, shm_cir_next);
}
}
out:
if (rc) {
return rc;
}
if (g_verbose) {
printf("Append %ju trace_entry for lcore %d\n", lcore_port->num_entries - rec_num_entries,
in_history->lcore);
}
/* Update tpoint_count info */
memcpy(lcore_port->out_history, lcore_port->in_history, sizeof(struct spdk_trace_history));
/* Update last_entry_tsc to align with appended entries */
last_idx = lcore_trace_last_entry_idx(in_history, shm_cir_next);
lcore_port->last_entry_tsc = in_history->entries[last_idx].tsc;
lcore_port->rec_next_entry = shm_next_entry;
return rc;
}
static int
trace_files_aggregate(struct aggr_trace_record_ctx *ctx)
{
int flags = O_CREAT | O_EXCL | O_RDWR;
struct lcore_trace_record_ctx *lcore_port;
char copy_buff[TRACE_FILE_COPY_SIZE];
uint64_t lcore_offsets[SPDK_TRACE_MAX_LCORE + 1];
int rc, i;
ssize_t len = 0;
uint64_t current_offset;
uint64_t len_sum;
ctx->out_fd = open(ctx->out_file, flags, 0600);
if (ctx->out_fd < 0) {
fprintf(stderr, "Could not open aggregation file %s.\n", ctx->out_file);
return -1;
}
if (g_verbose) {
printf("Create trace file %s for output\n", ctx->out_file);
}
/* Write flags of histories into head of converged trace file, except num_entriess */
rc = cont_write(ctx->out_fd, ctx->trace_histories,
sizeof(struct spdk_trace_histories) - sizeof(lcore_offsets));
if (rc < 0) {
fprintf(stderr, "Failed to write trace header into trace file\n");
goto out;
}
/* Update and append lcore offsets converged trace file */
current_offset = sizeof(struct spdk_trace_flags);
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
lcore_port = &ctx->lcore_ports[i];
if (lcore_port->valid) {
lcore_offsets[i] = current_offset;
current_offset += spdk_get_trace_history_size(lcore_port->num_entries);
} else {
lcore_offsets[i] = 0;
}
}
lcore_offsets[SPDK_TRACE_MAX_LCORE] = current_offset;
rc = cont_write(ctx->out_fd, lcore_offsets, sizeof(lcore_offsets));
if (rc < 0) {
fprintf(stderr, "Failed to write lcore offsets into trace file\n");
goto out;
}
/* Append each lcore trace file into converged trace file */
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
lcore_port = &ctx->lcore_ports[i];
if (!lcore_port->valid) {
continue;
}
lcore_port->out_history->num_entries = lcore_port->num_entries;
rc = cont_write(ctx->out_fd, lcore_port->out_history, sizeof(struct spdk_trace_history));
if (rc < 0) {
fprintf(stderr, "Failed to write lcore trace header into trace file\n");
goto out;
}
/* Move file offset to the start of trace_entries */
rc = lseek(lcore_port->fd, 0, SEEK_SET);
if (rc != 0) {
fprintf(stderr, "Failed to lseek lcore trace file\n");
goto out;
}
len_sum = 0;
while ((len = cont_read(lcore_port->fd, copy_buff, TRACE_FILE_COPY_SIZE)) > 0) {
len_sum += len;
rc = cont_write(ctx->out_fd, copy_buff, len);
if (rc != len) {
fprintf(stderr, "Failed to write lcore trace entries into trace file\n");
goto out;
}
}
/* Clear rc so that the last cont_write() doesn't get interpreted as a failure. */
rc = 0;
if (len_sum != lcore_port->num_entries * sizeof(struct spdk_trace_entry)) {
fprintf(stderr, "Len of lcore trace file doesn't match number of entries for lcore\n");
}
}
printf("All lcores trace entries are aggregated into trace file %s\n", ctx->out_file);
out:
close(ctx->out_fd);
return rc;
}
static void
__shutdown_signal(int signo)
{
g_shutdown = true;
}
static int
setup_exit_signal_handler(void)
{
struct sigaction sigact;
int rc;
memset(&sigact, 0, sizeof(sigact));
sigemptyset(&sigact.sa_mask);
/* Install the same handler for SIGINT and SIGTERM */
sigact.sa_handler = __shutdown_signal;
rc = sigaction(SIGINT, &sigact, NULL);
if (rc < 0) {
fprintf(stderr, "sigaction(SIGINT) failed\n");
return rc;
}
rc = sigaction(SIGTERM, &sigact, NULL);
if (rc < 0) {
fprintf(stderr, "sigaction(SIGTERM) failed\n");
}
return rc;
}
static void
usage(void)
{
printf("\n%s is used to record all SPDK generated trace entries\n", g_exe_name);
printf("from SPDK trace shared-memory to specified file.\n\n");
printf("usage:\n");
printf(" %s <option>\n", g_exe_name);
printf(" option = '-q' to disable verbose mode\n");
printf(" '-s' to specify spdk_trace shm name for a\n");
printf(" currently running process\n");
printf(" '-i' to specify the shared memory ID\n");
printf(" '-p' to specify the trace PID\n");
printf(" (one of -i or -p must be specified)\n");
printf(" '-f' to specify output trace file name\n");
printf(" '-h' to print usage information\n");
}
int
main(int argc, char **argv)
{
const char *app_name = NULL;
const char *file_name = NULL;
int op;
char shm_name[64];
int shm_id = -1, shm_pid = -1;
int rc = 0;
int i;
struct aggr_trace_record_ctx ctx = {};
struct lcore_trace_record_ctx *lcore_port;
g_exe_name = argv[0];
while ((op = getopt(argc, argv, "f:i:p:qs:h")) != -1) {
switch (op) {
case 'i':
shm_id = spdk_strtol(optarg, 10);
break;
case 'p':
shm_pid = spdk_strtol(optarg, 10);
break;
case 'q':
g_verbose = 0;
break;
case 's':
app_name = optarg;
break;
case 'f':
file_name = optarg;
break;
case 'h':
usage();
exit(EXIT_SUCCESS);
default:
usage();
exit(1);
}
}
if (file_name == NULL) {
fprintf(stderr, "-f must be specified\n");
usage();
exit(1);
}
if (app_name == NULL) {
fprintf(stderr, "-s must be specified\n");
usage();
exit(1);
}
if (shm_id == -1 && shm_pid == -1) {
fprintf(stderr, "-i or -p must be specified\n");
usage();
exit(1);
}
if (shm_id >= 0) {
snprintf(shm_name, sizeof(shm_name), "/%s_trace.%d", app_name, shm_id);
} else {
snprintf(shm_name, sizeof(shm_name), "/%s_trace.pid%d", app_name, shm_pid);
}
rc = setup_exit_signal_handler();
if (rc) {
exit(1);
}
rc = input_trace_file_mmap(&ctx, shm_name);
if (rc) {
exit(1);
}
rc = output_trace_files_prepare(&ctx, file_name);
if (rc) {
exit(1);
}
printf("Start to poll trace shm file %s\n", shm_name);
while (!g_shutdown && rc == 0) {
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
lcore_port = &ctx.lcore_ports[i];
if (!lcore_port->valid) {
continue;
}
rc = lcore_trace_record(lcore_port);
if (rc) {
break;
}
}
}
if (rc) {
exit(1);
}
printf("Start to aggregate lcore trace files\n");
rc = trace_files_aggregate(&ctx);
if (rc) {
exit(1);
}
/* Summary report */
printf("TSC Rate: %ju\n", g_tsc_rate);
for (i = 0; i < SPDK_TRACE_MAX_LCORE; i++) {
lcore_port = &ctx.lcore_ports[i];
if (lcore_port->num_entries == 0) {
continue;
}
printf("Port %ju trace entries for lcore (%d) in %ju usec\n",
lcore_port->num_entries, i,
(lcore_port->last_entry_tsc - lcore_port->first_entry_tsc) / g_utsc_rate);
}
munmap(ctx.trace_histories, g_histories_size);
close(ctx.shm_fd);
output_trace_files_finish(&ctx);
return 0;
}

View File

@ -1,26 +1,61 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2017 Intel Corporation.
#
# BSD LICENSE
#
# Copyright (c) Intel Corporation.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in
# the documentation and/or other materials provided with the
# distribution.
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/../..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
include $(SPDK_ROOT_DIR)/mk/spdk.modules.mk
APP = vhost
C_SRCS := vhost.c
SPDK_LIB_LIST = $(ALL_MODULES_LIST) event event_vhost_blk event_vhost_scsi event_nbd
SPDK_LIB_LIST = event_bdev event_copy event_net event_scsi event_vhost
SPDK_LIB_LIST += jsonrpc json rpc bdev_rpc bdev scsi net copy trace conf
SPDK_LIB_LIST += util log log_rpc event app_rpc
SPDK_LIB_LIST += vhost rte_vhost event_nbd nbd
ifeq ($(SPDK_ROOT_DIR)/lib/env_dpdk,$(CONFIG_ENV))
SPDK_LIB_LIST += env_dpdk_rpc
endif
LIBS += $(BLOCKDEV_MODULES_LINKER_ARGS) \
$(COPY_MODULES_LINKER_ARGS)
LIBS += $(SPDK_LIB_LINKER_ARGS)
LIBS += $(ENV_LINKER_ARGS)
include $(SPDK_ROOT_DIR)/mk/spdk.app.mk
all : $(APP)
install: $(APP)
$(INSTALL_APP)
$(APP) : $(OBJS) $(SPDK_LIB_FILES) $(ENV_LIBS) $(BLOCKDEV_MODULES_FILES) $(COPY_MODULES_FILES)
$(LINK_C)
uninstall:
$(UNINSTALL_APP)
clean :
$(CLEAN_C) $(APP)
include $(SPDK_ROOT_DIR)/mk/spdk.deps.mk

View File

@ -1,21 +1,66 @@
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright (C) 2017 Intel Corporation.
/*-
* BSD LICENSE
*
* Copyright (c) Intel Corporation.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
* * Neither the name of Intel Corporation nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "spdk/stdinc.h"
#include "spdk/log.h"
#include "spdk/conf.h"
#include "spdk/event.h"
#include "spdk/vhost.h"
#define SPDK_VHOST_DEFAULT_CONFIG "/usr/local/etc/spdk/vhost.conf"
#define SPDK_VHOST_DEFAULT_ENABLE_COREDUMP true
#define SPDK_VHOST_DEFAULT_MEM_SIZE 1024
static const char *g_socket_path = NULL;
static const char *g_pid_path = NULL;
static void
vhost_app_opts_init(struct spdk_app_opts *opts)
{
spdk_app_opts_init(opts);
opts->name = "vhost";
opts->config_file = SPDK_VHOST_DEFAULT_CONFIG;
opts->mem_size = SPDK_VHOST_DEFAULT_MEM_SIZE;
}
static void
vhost_usage(void)
{
printf(" -f <path> save pid to file under given path\n");
printf(" -S <path> directory where to create vhost sockets (default: pwd)\n");
printf(" -f pidfile save pid to file under given path\n");
printf(" -S dir directory where to create vhost sockets (default: pwd)\n");
}
static void
@ -33,7 +78,7 @@ save_pid(const char *pid_path)
fclose(pid_file);
}
static int
static void
vhost_parse_arg(int ch, char *arg)
{
switch (ch) {
@ -41,17 +86,9 @@ vhost_parse_arg(int ch, char *arg)
g_pid_path = arg;
break;
case 'S':
spdk_vhost_set_socket_path(arg);
g_socket_path = arg;
break;
default:
return -EINVAL;
}
return 0;
}
static void
vhost_started(void *arg1)
{
}
int
@ -60,21 +97,18 @@ main(int argc, char *argv[])
struct spdk_app_opts opts = {};
int rc;
spdk_app_opts_init(&opts, sizeof(opts));
opts.name = "vhost";
vhost_app_opts_init(&opts);
if ((rc = spdk_app_parse_args(argc, argv, &opts, "f:S:", NULL,
vhost_parse_arg, vhost_usage)) !=
SPDK_APP_PARSE_ARGS_SUCCESS) {
exit(rc);
}
spdk_app_parse_args(argc, argv, &opts, "f:S:", vhost_parse_arg, vhost_usage);
if (g_pid_path) {
save_pid(g_pid_path);
}
opts.shutdown_cb = spdk_vhost_shutdown_cb;
/* Blocks until the application is exiting */
rc = spdk_app_start(&opts, vhost_started, NULL);
rc = spdk_app_start(&opts, spdk_vhost_startup, (void *)g_socket_path, NULL);
spdk_app_fini();

View File

@ -1,70 +1,100 @@
#!/usr/bin/env bash
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation
# All rights reserved.
#
set -e
rootdir=$(readlink -f $(dirname $0))
source "$rootdir/scripts/autotest_common.sh"
source "$rootdir/test/common/autobuild_common.sh"
out=$PWD
SPDK_TEST_AUTOBUILD=${SPDK_TEST_AUTOBUILD:-}
umask 022
cd $rootdir
# Print some test system info out for the log
date -u
git describe --tags
if [ $SPDK_RUN_ASAN -eq 1 ]; then
run_test "asan" echo "using asan"
fi
timing_enter autobuild
if [ $SPDK_RUN_UBSAN -eq 1 ]; then
run_test "ubsan" echo "using ubsan"
fi
./configure $config_params
if [ -n "$SPDK_TEST_NATIVE_DPDK" ]; then
build_native_dpdk
timing_enter check_format
if [ $SPDK_RUN_CHECK_FORMAT -eq 1 ]; then
./scripts/check_format.sh
fi
timing_exit check_format
case "$SPDK_TEST_AUTOBUILD" in
full)
$rootdir/configure $config_params
echo "** START ** Info for Hostname: $HOSTNAME"
uname -a
$MAKE cc_version
$MAKE cxx_version
echo "** END ** Info for Hostname: $HOSTNAME"
;;
ext | tiny | "") ;;
*)
echo "ERROR: supported values for SPDK_TEST_AUTOBUILD are 'full', 'tiny' and 'ext'"
timing_enter build_kmod
if [ $SPDK_BUILD_IOAT_KMOD -eq 1 ]; then
./scripts/build_kmod.sh build
fi
timing_exit build_kmod
scanbuild=''
if [ $SPDK_RUN_SCANBUILD -eq 1 ] && hash scan-build; then
scanbuild="scan-build -o $out/scan-build-tmp --status-bugs"
fi
echo $scanbuild
$MAKE $MAKEFLAGS clean
timing_enter scanbuild_make
fail=0
time $scanbuild $MAKE $MAKEFLAGS || fail=1
if [ $fail -eq 1 ]; then
if [ -d $out/scan-build-tmp ]; then
scanoutput=$(ls -1 $out/scan-build-tmp/)
mv $out/scan-build-tmp/$scanoutput $out/scan-build
rm -rf $out/scan-build-tmp
chmod -R a+rX $out/scan-build
fi
exit 1
;;
esac
if [[ $SPDK_TEST_OCF -eq 1 ]]; then
ocf_precompile
fi
if [[ $SPDK_TEST_FUZZER -eq 1 ]]; then
llvm_precompile
fi
if [[ -n $SPDK_TEST_AUTOBUILD ]]; then
autobuild_test_suite
elif [[ $SPDK_TEST_UNITTEST -eq 1 ]]; then
unittest_build
elif [[ $SPDK_TEST_SCANBUILD -eq 1 ]]; then
scanbuild_make
else
if [[ $SPDK_TEST_FUZZER -eq 1 ]]; then
# if we are testing nvmf fuzz with llvm lib, --with-shared will cause lib link fail
$rootdir/configure $config_params
else
# if we aren't testing the unittests, build with shared objects.
$rootdir/configure $config_params --with-shared
rm -rf $out/scan-build-tmp
fi
run_test "make" $MAKE $MAKEFLAGS
timing_exit scanbuild_make
# Check for generated files that are not listed in .gitignore
if [ `git status --porcelain | wc -l` -ne 0 ]; then
echo "Generated files missing from .gitignore:"
git status --porcelain
exit 1
fi
# Check that header file dependencies are working correctly by
# capturing a binary's stat data before and after touching a
# header file and re-making.
STAT1=`stat examples/nvme/identify/identify`
sleep 1
touch lib/nvme/nvme_internal.h
$MAKE $MAKEFLAGS
STAT2=`stat examples/nvme/identify/identify`
if [ "$STAT1" == "$STAT2" ]; then
echo "Header dependency check failed"
exit 1
fi
# Test 'make install'
rm -rf /tmp/spdk
mkdir /tmp/spdk
$MAKE $MAKEFLAGS install DESTDIR=/tmp/spdk prefix=/usr
ls -lR /tmp/spdk
rm -rf /tmp/spdk
timing_enter doxygen
if [ $SPDK_BUILD_DOC -eq 1 ] && hash doxygen; then
(cd "$rootdir"/doc; $MAKE $MAKEFLAGS) &> "$out"/doxygen.log
if hash pdflatex; then
(cd "$rootdir"/doc/output/latex && $MAKE $MAKEFLAGS) &>> "$out"/doxygen.log
fi
mkdir -p "$out"/doc
mv "$rootdir"/doc/output/html "$out"/doc
if [ -f "$rootdir"/doc/output/latex/refman.pdf ]; then
mv "$rootdir"/doc/output/latex/refman.pdf "$out"/doc/spdk.pdf
fi
(cd "$rootdir"/doc; $MAKE $MAKEFLAGS clean) &>> "$out"/doxygen.log
rm -rf "$rootdir"/doc/output
fi
timing_exit doxygen
timing_exit autobuild

View File

@ -1,54 +1,54 @@
#!/usr/bin/env bash
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation
# All rights reserved.
#
set -xe
rootdir=$(readlink -f $(dirname $0))
source "$rootdir/test/common/autobuild_common.sh"
source "$rootdir/scripts/autotest_common.sh"
out=$PWD
MAKEFLAGS=${MAKEFLAGS:--j16}
cd $rootdir
timing_enter porcelain_check
if [[ -e $rootdir/mk/config.mk ]]; then
$MAKE clean
fi
timing_enter autopackage
if [ $(git status --porcelain --ignore-submodules | wc -l) -ne 0 ]; then
$MAKE clean
if [ `git status --porcelain | wc -l` -ne 0 ]; then
echo make clean left the following files:
git status --porcelain --ignore-submodules
git status --porcelain
exit 1
fi
timing_exit porcelain_check
if [[ $SPDK_TEST_RELEASE_BUILD -eq 1 ]]; then
build_packaging
$MAKE clean
spdk_pv=spdk-$(date +%Y_%m_%d)
spdk_tarball=${spdk_pv}.tar
dpdk_pv=dpdk-$(date +%Y_%m_%d)
dpdk_tarball=${dpdk_pv}.tar
find . -iname "spdk-*.tar* dpdk-*.tar*" -delete
git archive HEAD^{tree} --prefix=${spdk_pv}/ -o ${spdk_tarball}
# Build from packaged source
tmpdir=$(mktemp -d)
echo "tmpdir=$tmpdir"
tar -C "$tmpdir" -xf $spdk_tarball
if [ -z "$WITH_DPDK_DIR" ]; then
cd dpdk
git archive HEAD^{tree} --prefix=dpdk/ -o ../${dpdk_tarball}
cd ..
tar -C "$tmpdir/${spdk_pv}" -xf $dpdk_tarball
fi
if [[ $RUN_NIGHTLY -eq 0 || $SPDK_TEST_UNITTEST -eq 0 ]]; then
timing_finish
exit 0
fi
(
cd "$tmpdir"/spdk-*
# use $config_params to get the right dependency options, but disable coverage and ubsan
# explicitly since they are not needed for this build
./configure $config_params --disable-debug --enable-werror --disable-coverage --disable-ubsan
time $MAKE ${MAKEFLAGS}
)
rm -rf "$tmpdir"
timing_enter build_release
config_params="$(get_config_params | sed 's/--enable-debug//g')"
if [ $(uname -s) = Linux ]; then
# LTO needs a special compiler to work under clang. See detect_cc.sh for details.
if [[ $CC == *clang* ]]; then
LD=$(type -P ld.gold)
export LD
fi
$rootdir/configure $config_params --enable-lto
else
# LTO needs a special compiler to work on BSD.
$rootdir/configure $config_params
fi
$MAKE ${MAKEFLAGS}
$MAKE ${MAKEFLAGS} clean
timing_exit build_release
timing_exit autopackage
timing_finish

View File

@ -1,32 +1,12 @@
#!/usr/bin/env bash
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2016 Intel Corporation
# All rights reserved.
#
set -e
rootdir=$(readlink -f $(dirname $0))
default_conf=~/autorun-spdk.conf
conf=${1:-${default_conf}}
# If the configuration of tests is not provided, no tests will be carried out.
if [[ ! -f $conf ]]; then
echo "ERROR: $conf doesn't exist"
exit 1
fi
source "$conf"
echo "Test configuration:"
cat "$conf"
conf=~/autorun-spdk.conf
# Runs agent scripts
$rootdir/autobuild.sh "$conf"
if ((SPDK_TEST_UNITTEST == 1 || SPDK_RUN_FUNCTIONAL_TEST == 1)); then
sudo -E $rootdir/autotest.sh "$conf"
fi
if [[ $SPDK_TEST_AUTOBUILD != 'tiny' ]]; then
sudo $rootdir/autotest.sh "$conf"
$rootdir/autopackage.sh "$conf"
fi

View File

@ -1,214 +1,70 @@
#! /usr/bin/python3
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2017 Intel Corporation.
# All rights reserved.
import shutil
import subprocess
import argparse
import itertools
import os
import sys
import glob
import re
import pandas as pd
def generateTestCompletionTableByTest(output_dir, data_table):
columns_to_group = ['Domain', 'Test', 'Agent']
total_tests_number = len(data_table.groupby('Test'))
has_agent = data_table['Agent'] != 'None'
data_table_with_agent = data_table[has_agent]
executed_tests = len(data_table_with_agent.groupby('Test'))
tests_executions = len(data_table_with_agent.groupby(columns_to_group))
pivot_by_test = pd.pivot_table(data_table, index=columns_to_group)
output_file = os.path.join(output_dir, 'post_process', 'completions_table_by_test.html')
with open(output_file, 'w') as f:
table_row = '<tr><td>{}</td><td>{}</td>\n'
f.write('<table>\n')
f.write(table_row.format('Total number of tests', total_tests_number))
f.write(table_row.format('Tests executed', executed_tests))
f.write(table_row.format('Number of test executions', tests_executions))
f.write('</table>\n')
f.write(pivot_by_test.to_html(None))
def generateTestCompletionTables(output_dir, completion_table):
data_table = pd.DataFrame(completion_table, columns=["Agent", "Domain", "Test", "With Asan", "With UBsan"])
data_table.to_html(os.path.join(output_dir, 'completions_table.html'))
os.makedirs(os.path.join(output_dir, "post_process"), exist_ok=True)
pivot_by_agent = pd.pivot_table(data_table, index=["Agent", "Domain", "Test"])
pivot_by_agent.to_html(os.path.join(output_dir, "post_process", 'completions_table_by_agent.html'))
generateTestCompletionTableByTest(output_dir, data_table)
pivot_by_asan = pd.pivot_table(data_table, index=["Domain", "Test"], values=["With Asan"], aggfunc=any)
pivot_by_asan.to_html(os.path.join(output_dir, "post_process", 'completions_table_by_asan.html'))
pivot_by_ubsan = pd.pivot_table(data_table, index=["Domain", "Test"], values=["With UBsan"], aggfunc=any)
pivot_by_ubsan.to_html(os.path.join(output_dir, "post_process", 'completions_table_by_ubsan.html'))
def generateCoverageReport(output_dir, repo_dir):
with open(os.path.join(output_dir, 'coverage.log'), 'w+') as log_file:
coveragePath = os.path.join(output_dir, '**', 'cov_total.info')
covfiles = [os.path.abspath(p) for p in glob.glob(coveragePath, recursive=True)]
covfiles = glob.glob(coveragePath, recursive=True)
for f in covfiles:
print(f)
print(f, file=log_file)
if len(covfiles) == 0:
return
lcov_opts = [
'--rc', 'lcov_branch_coverage=1',
'--rc', 'lcov_function_coverage=1',
'--rc', 'genhtml_branch_coverage=1',
'--rc', 'genhtml_function_coverage=1',
'--rc', 'genhtml_legend=1',
'--rc', 'geninfo_all_blocks=1',
'--rc lcov_branch_coverage=1',
'--rc lcov_function_coverage=1',
'--rc genhtml_branch_coverage=1',
'--rc genhtml_function_coverage=1',
'--rc genhtml_legend=1',
'--rc geninfo_all_blocks=1',
]
# HACK: This is a workaround for some odd CI assumptions
details = '--show-details'
cov_total = os.path.abspath(os.path.join(output_dir, 'cov_total.info'))
cov_total = os.path.join(output_dir, 'cov_total.info')
coverage = os.path.join(output_dir, 'coverage')
lcov = ['lcov', *lcov_opts, '-q', *itertools.chain(*[('-a', f) for f in covfiles]), '-o', cov_total]
genhtml = ['genhtml', *lcov_opts, '-q', cov_total, '--legend', '-t', 'Combined', *details.split(), '-o', coverage]
lcov = 'lcov' + ' ' + ' '.join(lcov_opts) + ' -q -a ' + ' -a '.join(covfiles) + ' -o ' + cov_total
genhtml = 'genhtml' + ' ' + ' '.join(lcov_opts) + ' -q ' + cov_total + ' --legend' + ' -t "Combined" --show-details -o ' + coverage
try:
subprocess.check_call(lcov)
subprocess.check_call([lcov], shell=True, stdout=log_file, stderr=log_file)
except subprocess.CalledProcessError as e:
print("lcov failed")
print(e)
print("lcov failed", file=log_file)
print(e, file=log_file)
return
with open(cov_total, 'r') as cov_total_file:
file_contents = cov_total_file.readlines()
cov_total_file = open(cov_total, 'r')
replacement = "SF:" + repo_dir
file_contents = cov_total_file.readlines()
cov_total_file.close()
os.remove(cov_total)
with open(cov_total, 'w+') as file:
for Line in file_contents:
Line = re.sub("^SF:.*/repo", replacement, Line)
file.write(Line + '\n')
try:
subprocess.check_call(genhtml)
subprocess.check_call([genhtml], shell=True, stdout=log_file, stderr=log_file)
except subprocess.CalledProcessError as e:
print("genhtml failed")
print(e)
for f in covfiles:
os.remove(f)
print("genhtml failed", file=log_file)
print(e, file=log_file)
def collectOne(output_dir, dir_name):
dirs = glob.glob(os.path.join(output_dir, '*', dir_name))
dirs.sort()
if len(dirs) == 0:
def prepDocumentation(output_dir, repo_dir):
# Find one instance of 'doc' output directory and move it to the top level
docDirs = glob.glob(os.path.join(output_dir, '*', 'doc'))
docDirs.sort()
if len(docDirs) == 0:
return
# Collect first instance of dir_name and move it to the top level
collect_dir = dirs.pop(0)
shutil.move(collect_dir, os.path.join(output_dir, dir_name))
# Delete all other instances
for d in dirs:
shutil.rmtree(d)
print("docDirs: ", docDirs)
docDir = docDirs[0]
print("docDir: ", docDir)
shutil.move(docDir, os.path.join(output_dir, 'doc'))
def getCompletions(completionFile, test_list, test_completion_table):
agent_name = os.path.basename(os.path.dirname(completionFile))
with open(completionFile, 'r') as completionList:
completions = completionList.read()
asan_enabled = "asan" in completions
ubsan_enabled = "ubsan" in completions
for line in completions.splitlines():
try:
domain, test_name = line.strip().split()
test_list[test_name] = (True, asan_enabled | test_list[test_name][1], ubsan_enabled | test_list[test_name][2])
test_completion_table.append([agent_name, domain, test_name, asan_enabled, ubsan_enabled])
try:
test_completion_table.remove(["None", "None", test_name, False, False])
except ValueError:
continue
except KeyError:
continue
def printList(header, test_list, index, condition):
print("\n\n-----%s------" % header)
executed_tests = [x for x in sorted(test_list) if test_list[x][index] is condition]
print(*executed_tests, sep="\n")
def printListInformation(table_type, test_list):
printList("%s Executed in Build" % table_type, test_list, 0, True)
printList("%s Missing From Build" % table_type, test_list, 0, False)
printList("%s Missing ASAN" % table_type, test_list, 1, False)
printList("%s Missing UBSAN" % table_type, test_list, 2, False)
def getSkippedTests(repo_dir):
skipped_test_file = os.path.join(repo_dir, "test", "common", "skipped_tests.txt")
if not os.path.exists(skipped_test_file):
return []
with open(skipped_test_file, "r") as skipped_test_data:
return [x.strip() for x in skipped_test_data.readlines() if "#" not in x and x.strip() != '']
def confirmPerPatchTests(test_list, skiplist):
missing_tests = [x for x in sorted(test_list) if test_list[x][0] is False
and x not in skiplist]
if len(missing_tests) > 0:
print("Not all tests were run. Failing the build.")
print(missing_tests)
sys.exit(1)
def aggregateCompletedTests(output_dir, repo_dir, skip_confirm=False):
test_list = {}
test_completion_table = []
testFiles = glob.glob(os.path.join(output_dir, '**', 'all_tests.txt'), recursive=True)
completionFiles = glob.glob(os.path.join(output_dir, '**', 'test_completions.txt'), recursive=True)
if len(testFiles) == 0:
print("Unable to perform test completion aggregator. No input files.")
return 0
with open(testFiles[0], 'r') as raw_test_list:
for line in raw_test_list:
try:
test_name = line.strip()
except Exception:
print("Failed to parse a test type.")
return 1
test_list[test_name] = (False, False, False)
test_completion_table.append(["None", "None", test_name, False, False])
for completionFile in completionFiles:
getCompletions(completionFile, test_list, test_completion_table)
printListInformation("Tests", test_list)
generateTestCompletionTables(output_dir, test_completion_table)
skipped_tests = getSkippedTests(repo_dir)
if not skip_confirm:
confirmPerPatchTests(test_list, skipped_tests)
return 0
def main(output_dir, repo_dir, skip_confirm=False):
print("-----Begin Post Process Script------")
def main(output_dir, repo_dir):
generateCoverageReport(output_dir, repo_dir)
collectOne(output_dir, 'doc')
collectOne(output_dir, 'ut_coverage')
aggregateCompletedTests(output_dir, repo_dir, skip_confirm)
prepDocumentation(output_dir, repo_dir)
if __name__ == "__main__":
@ -217,7 +73,5 @@ if __name__ == "__main__":
help="The location of your build's output directory")
parser.add_argument("-r", "--repo_directory", type=str, required=True,
help="The location of your spdk repository")
parser.add_argument("-s", "--skip_confirm", required=False, action="store_true",
help="Do not check if all autotest.sh tests were executed.")
args = parser.parse_args()
main(args.directory_location, args.repo_directory, args.skip_confirm)
main(args.directory_location, args.repo_directory)

View File

@ -1,72 +1,34 @@
#!/usr/bin/env bash
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation
# All rights reserved.
#
rootdir=$(readlink -f $(dirname $0))
# In autotest_common.sh all tests are disabled by default.
# If the configuration of tests is not provided, no tests will be carried out.
if [[ ! -f $1 ]]; then
echo "ERROR: SPDK test configuration not specified"
exit 1
fi
# Autotest.sh, as part of autorun.sh, runs in a different
# shell process than autobuild.sh. Use helper file to pass
# over env variable containing libraries paths.
if [[ -e /tmp/spdk-ld-path ]]; then
source /tmp/spdk-ld-path
fi
source "$1"
source "$rootdir/test/common/autotest_common.sh"
source "$rootdir/scripts/autotest_common.sh"
source "$rootdir/test/nvmf/common.sh"
set -xe
if [ $EUID -ne 0 ]; then
echo "$0 must be run as root"
exit 1
fi
if [ $(uname -s) = Linux ]; then
old_core_pattern=$(< /proc/sys/kernel/core_pattern)
mkdir -p "$output_dir/coredumps"
# Set core_pattern to a known value to avoid ABRT, systemd-coredump, etc.
# Dump the $output_dir path to a file so collector can pick it up while executing.
# We don't set in in the core_pattern command line because of the string length limitation
# of 128 bytes. See 'man core 5' for details.
echo "|$rootdir/scripts/core-collector.sh %P %s %t" > /proc/sys/kernel/core_pattern
echo "$output_dir/coredumps" > "$rootdir/.coredump_path"
# make sure nbd (network block device) driver is loaded if it is available
# this ensures that when tests need to use nbd, it will be fully initialized
modprobe nbd || true
if udevadm=$(type -P udevadm); then
"$udevadm" monitor --property &> "$output_dir/udev.log" &
udevadm_pid=$!
fi
# set core_pattern to a known value to avoid ABRT, systemd-coredump, etc.
echo "core" > /proc/sys/kernel/core_pattern
fi
trap "autotest_cleanup || :; exit 1" SIGINT SIGTERM EXIT
trap "process_core; $rootdir/scripts/setup.sh reset; exit 1" SIGINT SIGTERM EXIT
timing_enter autotest
create_test_list
src=$(readlink -f $(dirname $0))
out=$output_dir
out=$PWD
cd $src
freebsd_update_contigmem_mod
freebsd_set_maxsock_buf
./scripts/setup.sh status
# lcov takes considerable time to process clang coverage.
# Disabling lcov allow us to do this.
# More information: https://github.com/spdk/spdk/issues/1693
CC_TYPE=$(grep CC_TYPE mk/cc.mk)
if hash lcov && ! [[ "$CC_TYPE" == *"clang"* ]]; then
if hash lcov; then
# setup output dir for unittest.sh
export UT_COVERAGE=$out/ut_coverage
export LCOV_OPTS="
--rc lcov_branch_coverage=1
--rc lcov_function_coverage=1
@ -76,310 +38,188 @@ if hash lcov && ! [[ "$CC_TYPE" == *"clang"* ]]; then
--rc geninfo_all_blocks=1
"
export LCOV="lcov $LCOV_OPTS --no-external"
# Print lcov version to log
$LCOV -v
# zero out coverage data
$LCOV -q -c -i -t "Baseline" -d $src -o $out/cov_base.info
$LCOV -q -c -i -t "Baseline" -d $src -o cov_base.info
fi
# Make sure the disks are clean (no leftover partition tables)
timing_enter pre_cleanup
timing_enter cleanup
# Remove old domain socket pathname just in case
rm -f /var/tmp/spdk*.sock
if [ $(uname -s) = Linux ]; then
# Load the kernel driver
$rootdir/scripts/setup.sh reset
./scripts/setup.sh reset
get_zoned_devs
# Let the kernel discover any filesystems or partitions
sleep 10
if ((${#zoned_devs[@]} > 0)); then
# FIXME: For now make sure zoned devices are tested on-demand by
# a designated tests instead of falling into any other. The main
# concern here are fio workloads where specific configuration
# must be in place for it to work with the zoned device.
export PCI_BLOCKED="${zoned_devs[*]}"
export PCI_ZONED="${zoned_devs[*]}"
fi
# Delete all leftover lvols and gpt partitions
# Matches both /dev/nvmeXnY on Linux and /dev/nvmeXnsY on BSD
# Filter out nvme with partitions - the "p*" suffix
for dev in $(ls /dev/nvme*n* | grep -v p || true); do
# Skip zoned devices as non-sequential IO will always fail
[[ -z ${zoned_devs["${dev##*/}"]} ]] || continue
if ! block_in_use "$dev"; then
dd if=/dev/zero of="$dev" bs=1M count=1
fi
# Delete all partitions on NVMe devices
devs=`lsblk -l -o NAME | grep nvme | grep -v p` || true
for dev in $devs; do
parted -s /dev/$dev mklabel msdos
done
sync
if ! xtrace_disable_per_cmd reap_spdk_processes; then
echo "WARNING: Lingering SPDK processes were detected. Testing environment may be unstable" >&2
# Load RAM disk driver if available
modprobe brd || true
fi
if [ $(uname -s) = Linux ]; then
run_test "setup.sh" "$rootdir/test/setup/test-setup.sh"
fi
$rootdir/scripts/setup.sh status
if [[ $(uname -s) == Linux ]]; then
# Revert NVMe namespaces to default state
nvme_namespace_revert
fi
timing_exit pre_cleanup
timing_exit cleanup
# set up huge pages
timing_enter afterboot
$rootdir/scripts/setup.sh
./scripts/setup.sh
timing_exit afterboot
# Revert existing OPAL to factory settings that may have been left from earlier failed tests.
# This ensures we won't hit any unexpected failures due to NVMe SSDs being locked.
opal_revert_cleanup
timing_enter nvmf_setup
rdma_device_init
timing_exit nvmf_setup
if [ $SPDK_TEST_RBD -eq 1 ]; then
timing_enter rbd_setup
rbd_setup
timing_exit rbd_setup
fi
#####################
# Unit Tests
#####################
if [ $SPDK_TEST_UNITTEST -eq 1 ]; then
run_test "unittest" $rootdir/test/unit/unittest.sh
timing_enter unittest
run_test ./unittest.sh
timing_exit unittest
fi
if [ $SPDK_RUN_FUNCTIONAL_TEST -eq 1 ]; then
if [[ $SPDK_TEST_CRYPTO -eq 1 || $SPDK_TEST_VBDEV_COMPRESS -eq 1 ]]; then
if [[ $SPDK_TEST_USE_IGB_UIO -eq 1 ]]; then
$rootdir/scripts/qat_setup.sh igb_uio
else
$rootdir/scripts/qat_setup.sh
fi
fi
timing_enter lib
run_test "env" $rootdir/test/env/env.sh
run_test "rpc" $rootdir/test/rpc/rpc.sh
run_test "rpc_client" $rootdir/test/rpc_client/rpc_client.sh
run_test "json_config" $rootdir/test/json_config/json_config.sh
run_test "json_config_extra_key" $rootdir/test/json_config/json_config_extra_key.sh
run_test "alias_rpc" $rootdir/test/json_config/alias_rpc/alias_rpc.sh
run_test "spdkcli_tcp" $rootdir/test/spdkcli/tcp.sh
run_test "dpdk_mem_utility" $rootdir/test/dpdk_memory_utility/test_dpdk_mem_info.sh
run_test "event" $rootdir/test/event/event.sh
run_test "thread" $rootdir/test/thread/thread.sh
run_test "accel" $rootdir/test/accel/accel.sh
run_test "app_cmdline" $rootdir/test/app/cmdline.sh
if [ $SPDK_TEST_BLOCKDEV -eq 1 ]; then
run_test "blockdev_general" $rootdir/test/bdev/blockdev.sh
run_test "bdev_raid" $rootdir/test/bdev/bdev_raid.sh
run_test "bdevperf_config" $rootdir/test/bdev/bdevperf/test_config.sh
if [[ $(uname -s) == Linux ]]; then
run_test "reactor_set_interrupt" $rootdir/test/interrupt/reactor_set_interrupt.sh
run_test "reap_unregistered_poller" $rootdir/test/interrupt/reap_unregistered_poller.sh
# NOTE: disabled on SPDK v18.01.x branch when ASAN is enabled
if [ $SPDK_RUN_ASAN -eq 0 ]; then
run_test test/lib/bdev/blockdev.sh
fi
fi
if [[ $(uname -s) == Linux ]]; then
if [[ $SPDK_TEST_BLOCKDEV -eq 1 || $SPDK_TEST_URING -eq 1 ]]; then
# The crypto job also includes the SPDK_TEST_BLOCKDEV in its configuration hence the
# dd tests are executed there as well. However, these tests can take a significant
# amount of time to complete (up to 4min) on a physical system leading to a potential
# job timeout. Avoid that by skipping these tests - this should not affect the coverage
# since dd tests are still run as part of the vg jobs.
if [[ $SPDK_TEST_CRYPTO -eq 0 ]]; then
run_test "spdk_dd" $rootdir/test/dd/dd.sh
fi
fi
if [ $SPDK_TEST_EVENT -eq 1 ]; then
run_test test/lib/event/event.sh
fi
if [ $SPDK_TEST_NVME -eq 1 ]; then
run_test "blockdev_nvme" $rootdir/test/bdev/blockdev.sh "nvme"
if [[ $(uname -s) == Linux ]]; then
run_test "blockdev_nvme_gpt" $rootdir/test/bdev/blockdev.sh "gpt"
fi
run_test "nvme" $rootdir/test/nvme/nvme.sh
if [[ $SPDK_TEST_NVME_PMR -eq 1 ]]; then
run_test "nvme_pmr" $rootdir/test/nvme/nvme_pmr.sh
fi
if [[ $SPDK_TEST_NVME_SCC -eq 1 ]]; then
run_test "nvme_scc" $rootdir/test/nvme/nvme_scc.sh
fi
if [[ $SPDK_TEST_NVME_BP -eq 1 ]]; then
run_test "nvme_bp" $rootdir/test/nvme/nvme_bp.sh
fi
if [[ $SPDK_TEST_NVME_CUSE -eq 1 ]]; then
run_test "nvme_cuse" $rootdir/test/nvme/cuse/nvme_cuse.sh
fi
if [[ $SPDK_TEST_NVME_CMB -eq 1 ]]; then
run_test "nvme_cmb" $rootdir/test/nvme/cmb/cmb.sh
fi
if [[ $SPDK_TEST_NVME_FDP -eq 1 ]]; then
run_test "nvme_fdp" test/nvme/nvme_fdp.sh
fi
if [[ $SPDK_TEST_NVME_ZNS -eq 1 ]]; then
run_test "nvme_zns" $rootdir/test/nvme/zns/zns.sh
fi
run_test "nvme_rpc" $rootdir/test/nvme/nvme_rpc.sh
run_test "nvme_rpc_timeouts" $rootdir/test/nvme/nvme_rpc_timeouts.sh
run_test test/lib/nvme/nvme.sh
# Only test hotplug without ASAN enabled. Since if it is
# enabled, it catches SEGV earlier than our handler which
# breaks the hotplug logic.
if [ $SPDK_RUN_ASAN -eq 0 ] && [ $(uname -s) = Linux ]; then
run_test "sw_hotplug" $rootdir/test/nvme/sw_hotplug.sh
# breaks the hotplug logic
if [ $SPDK_RUN_ASAN -eq 0 ]; then
run_test test/lib/nvme/hotplug.sh intel
fi
fi
if [[ $SPDK_TEST_XNVME -eq 1 ]]; then
run_test "nvme_xnvme" $rootdir/test/nvme/xnvme/xnvme.sh
run_test "blockdev_xnvme" $rootdir/test/bdev/blockdev.sh "xnvme"
# Run ublk with xnvme since they have similar kernel dependencies
run_test "ublk" $rootdir/test/ublk/ublk.sh
fi
fi
run_test test/lib/env/env.sh
if [ $SPDK_TEST_IOAT -eq 1 ]; then
run_test "ioat" $rootdir/test/ioat/ioat.sh
run_test test/lib/ioat/ioat.sh
fi
timing_exit lib
if [ $SPDK_TEST_ISCSI -eq 1 ]; then
run_test "iscsi_tgt" $rootdir/test/iscsi_tgt/iscsi_tgt.sh
run_test "spdkcli_iscsi" $rootdir/test/spdkcli/iscsi.sh
# Run raid spdkcli test under iSCSI since blockdev tests run on systems that can't run spdkcli yet
run_test "spdkcli_raid" $rootdir/test/spdkcli/raid.sh
run_test ./test/iscsi_tgt/iscsi_tgt.sh
fi
if [ $SPDK_TEST_BLOBFS -eq 1 ]; then
run_test "rocksdb" $rootdir/test/blobfs/rocksdb/rocksdb.sh
run_test "blobstore" $rootdir/test/blobstore/blobstore.sh
run_test "blobstore_grow" $rootdir/test/blobstore/blobstore_grow/blobstore_grow.sh
run_test "blobfs" $rootdir/test/blobfs/blobfs.sh
run_test "hello_blob" $SPDK_EXAMPLE_DIR/hello_blob \
examples/blob/hello_world/hello_blob.json
run_test ./test/blobfs/rocksdb/rocksdb.sh
run_test ./test/blobstore/blobstore.sh
fi
if [ $SPDK_TEST_NVMF -eq 1 ]; then
export NET_TYPE
# The NVMe-oF run test cases are split out like this so that the parser that compiles the
# list of all tests can properly differentiate them. Please do not merge them into one line.
if [ "$SPDK_TEST_NVMF_TRANSPORT" = "rdma" ]; then
run_test "nvmf_rdma" $rootdir/test/nvmf/nvmf.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
run_test "spdkcli_nvmf_rdma" $rootdir/test/spdkcli/nvmf.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
elif [ "$SPDK_TEST_NVMF_TRANSPORT" = "tcp" ]; then
run_test "nvmf_tcp" $rootdir/test/nvmf/nvmf.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
if [[ $SPDK_TEST_URING -eq 0 ]]; then
run_test "spdkcli_nvmf_tcp" $rootdir/test/spdkcli/nvmf.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
run_test "nvmf_identify_passthru" $rootdir/test/nvmf/target/identify_passthru.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
fi
run_test "nvmf_dif" $rootdir/test/nvmf/target/dif.sh
run_test "nvmf_abort_qd_sizes" $rootdir/test/nvmf/target/abort_qd_sizes.sh
elif [ "$SPDK_TEST_NVMF_TRANSPORT" = "fc" ]; then
run_test "nvmf_fc" $rootdir/test/nvmf/nvmf.sh --transport=$SPDK_TEST_NVMF_TRANSPORT
run_test "spdkcli_nvmf_fc" $rootdir/test/spdkcli/nvmf.sh
else
echo "unknown NVMe transport, please specify rdma, tcp, or fc."
exit 1
fi
run_test ./test/nvmf/nvmf.sh
fi
if [ $SPDK_TEST_VHOST -eq 1 ]; then
run_test "vhost" $rootdir/test/vhost/vhost.sh
timing_enter vhost
timing_enter negative
run_test ./test/vhost/spdk_vhost.sh --negative
timing_exit negative
if [ $RUN_NIGHTLY -eq 1 ]; then
timing_enter integrity_blk
run_test ./test/vhost/spdk_vhost.sh --integrity-blk
timing_exit integrity_blk
timing_enter integrity
run_test ./test/vhost/spdk_vhost.sh --integrity
timing_exit integrity
timing_enter readonly
run_test ./test/vhost/spdk_vhost.sh --readonly
timing_exit readonly
timing_enter fs_integrity_scsi
run_test ./test/vhost/spdk_vhost.sh --fs-integrity-scsi
timing_exit fs_integrity_scsi
timing_enter fs_integrity_blk
run_test ./test/vhost/spdk_vhost.sh --fs-integrity-blk
timing_exit fs_integrity_blk
timing_enter integrity_lvol_scsi_nightly
run_test ./test/vhost/spdk_vhost.sh --integrity-lvol-scsi-nightly
timing_exit integrity_lvol_scsi_nightly
timing_enter integrity_lvol_blk_nightly
run_test ./test/vhost/spdk_vhost.sh --integrity-lvol-blk-nightly
timing_exit integrity_lvol_blk_nightly
fi
if [ $SPDK_TEST_VFIOUSER_QEMU -eq 1 ]; then
run_test "vfio_user_qemu" $rootdir/test/vfio_user/vfio_user.sh
timing_enter integrity_lvol_scsi
run_test ./test/vhost/spdk_vhost.sh --integrity-lvol-scsi
timing_exit integrity_lvol_scsi
timing_enter integrity_lvol_blk
run_test ./test/vhost/spdk_vhost.sh --integrity-lvol-blk
timing_exit integrity_lvol_blk
timing_exit vhost
fi
if [ $SPDK_TEST_LVOL -eq 1 ]; then
run_test "lvol" $rootdir/test/lvol/lvol.sh
run_test "blob_io_wait" $rootdir/test/blobstore/blob_io_wait/blob_io_wait.sh
timing_enter lvol
test_cases="1,50,51,52,53,100,101,102,250,251,252,253,255,"
test_cases+="300,301,450,451,452,550,600,601,650,651,700"
run_test ./test/lvol/lvol.sh --test-cases=$test_cases
timing_exit lvol
fi
if [ $SPDK_TEST_VHOST_INIT -eq 1 ]; then
timing_enter vhost_initiator
run_test "vhost_blockdev" $rootdir/test/vhost/initiator/blockdev.sh
run_test "spdkcli_virtio" $rootdir/test/spdkcli/virtio.sh
run_test "vhost_shared" $rootdir/test/vhost/shared/shared.sh
run_test "vhost_fuzz" $rootdir/test/vhost/fuzz/fuzz.sh
timing_exit vhost_initiator
run_test ./test/vhost/initiator/blockdev.sh
fi
if [ $SPDK_TEST_NVML -eq 1 ]; then
run_test ./test/pmem/pmem.sh
fi
timing_enter cleanup
if [ $SPDK_TEST_RBD -eq 1 ]; then
run_test "blockdev_rbd" $rootdir/test/bdev/blockdev.sh "rbd"
run_test "spdkcli_rbd" $rootdir/test/spdkcli/rbd.sh
rbd_cleanup
fi
if [ $SPDK_TEST_OCF -eq 1 ]; then
run_test "ocf" $rootdir/test/ocf/ocf.sh
./scripts/setup.sh reset
if [ $SPDK_BUILD_IOAT_KMOD -eq 1 ]; then
./scripts/build_kmod.sh clean
fi
if [ $SPDK_TEST_FTL -eq 1 ]; then
run_test "ftl" $rootdir/test/ftl/ftl.sh
fi
if [ $SPDK_TEST_VMD -eq 1 ]; then
run_test "vmd" $rootdir/test/vmd/vmd.sh
fi
if [ $SPDK_TEST_VBDEV_COMPRESS -eq 1 ]; then
run_test "compress_compdev" $rootdir/test/compress/compress.sh "compdev"
run_test "compress_isal" $rootdir/test/compress/compress.sh "isal"
fi
if [ $SPDK_TEST_OPAL -eq 1 ]; then
run_test "nvme_opal" $rootdir/test/nvme/nvme_opal.sh
fi
if [ $SPDK_TEST_CRYPTO -eq 1 ]; then
run_test "blockdev_crypto_aesni" $rootdir/test/bdev/blockdev.sh "crypto_aesni"
run_test "blockdev_crypto_sw" $rootdir/test/bdev/blockdev.sh "crypto_sw"
run_test "blockdev_crypto_qat" $rootdir/test/bdev/blockdev.sh "crypto_qat"
run_test "chaining" $rootdir/test/bdev/chaining.sh
fi
if [[ $SPDK_TEST_SCHEDULER -eq 1 ]]; then
run_test "scheduler" $rootdir/test/scheduler/scheduler.sh
fi
if [[ $SPDK_TEST_SMA -eq 1 ]]; then
run_test "sma" $rootdir/test/sma/sma.sh
fi
if [[ $SPDK_TEST_FUZZER -eq 1 ]]; then
run_test "llvm_fuzz" $rootdir/test/fuzz/llvm.sh
fi
if [[ $SPDK_TEST_RAID5 -eq 1 ]]; then
run_test "blockdev_raid5f" $rootdir/test/bdev/blockdev.sh "raid5f"
fi
fi
trap - SIGINT SIGTERM EXIT
timing_enter post_cleanup
autotest_cleanup
timing_exit post_cleanup
timing_exit cleanup
timing_exit autotest
chmod a+r $output_dir/timing.txt
[[ -f "$output_dir/udev.log" ]] && rm -f "$output_dir/udev.log"
trap - SIGINT SIGTERM EXIT
if hash lcov && ! [[ "$CC_TYPE" == *"clang"* ]]; then
# catch any stray core files
process_core
if hash lcov; then
# generate coverage data and combine with baseline
$LCOV -q -c -d $src -t "$(hostname)" -o $out/cov_test.info
$LCOV -q -a $out/cov_base.info -a $out/cov_test.info -o $out/cov_total.info
$LCOV -q -c -d $src -t "$(hostname)" -o cov_test.info
$LCOV -q -a cov_base.info -a cov_test.info -o $out/cov_total.info
$LCOV -q -r $out/cov_total.info '*/dpdk/*' -o $out/cov_total.info
$LCOV -q -r $out/cov_total.info '/usr/*' -o $out/cov_total.info
$LCOV -q -r $out/cov_total.info '*/examples/vmd/*' -o $out/cov_total.info
$LCOV -q -r $out/cov_total.info '*/app/spdk_lspci/*' -o $out/cov_total.info
$LCOV -q -r $out/cov_total.info '*/app/spdk_top/*' -o $out/cov_total.info
owner=$(stat -c "%U" .)
sudo -u $owner git clean -f "*.gcda"
git clean -f "*.gcda"
rm -f cov_base.info cov_test.info OLD_STDOUT OLD_STDERR
fi

1
build/lib/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
# Placeholder

1337
configure vendored

File diff suppressed because it is too large Load Diff

View File

@ -1,62 +0,0 @@
# Deprecation
## ABI and API Deprecation
This document details the policy for maintaining stability of SPDK ABI and API.
Major ABI version can change at most once for each quarterly SPDK release.
ABI versions are managed separately for each library and follow [Semantic Versioning](https://semver.org/).
API and ABI deprecation notices shall be posted in the next section.
Each entry must describe what will be removed and can suggest the future use or alternative.
Specific future SPDK release for the removal must be provided.
ABI cannot be removed without providing deprecation notice for at least single SPDK release.
Deprecated code paths must be registered with `SPDK_DEPRECATION_REGISTER()` and logged with
`SPDK_LOG_DEPRECATED()`. The tag used with these macros will appear in the SPDK
log at the warn level when `SPDK_LOG_DEPRECATED()` is called, subject to rate limits.
The tags can be matched with the level 4 headers below.
## Deprecation Notices
### PMDK
PMDK is no longer supported and integrations with it in SPDK are now deprecated, and will be removed in SPDK 23.05.
Please see: [UPDATE ON PMDK AND OUR LONG TERM SUPPORT STRATEGY](https://pmem.io/blog/2022/11/update-on-pmdk-and-our-long-term-support-strategy/).
### VTune
#### `vtune_support`
VTune integration is in now deprecated and will be removed in SPDK 23.05.
### nvmf
#### `spdk_nvmf_qpair_disconnect`
Parameters `cb_fn` and `ctx` of `spdk_nvmf_qpair_disconnect` API are deprecated. These parameters
will be removed in 23.09 release.
### gpt
#### `old_gpt_guid`
Deprecated the SPDK partition type GUID `7c5222bd-8f5d-4087-9c00-bf9843c7b58c`. Partitions of this
type have bdevs created that are one block less than the actual size of the partition. Existing
partitions using the deprecated GUID can continue to use that GUID; support for the deprecated GUID
will remain in SPDK indefinitely, and will continue to exhibit the off-by-one bug so that on-disk
metadata layouts based on the incorrect size are not affected.
See GitHub issue [2801](https://github.com/spdk/spdk/issues/2801) for additional details on the bug.
New SPDK partition types should use GUID `6527994e-2c5a-4eec-9613-8f5944074e8b` which will create
a bdev of the correct size.
### lvol
#### `vbdev_lvol_rpc_req_size`
Param `size` in rpc commands `rpc_bdev_lvol_create` and `rpc_bdev_lvol_resize` is deprecated and
replace by `size_in_mib`.
See GitHub issue [2346](https://github.com/spdk/spdk/issues/2346) for additional details.

4
doc/.gitignore vendored
View File

@ -1,4 +1,2 @@
# changelog.md and deprecation.md is generated by Makefile
# changelog.md is generated by Makefile
changelog.md
deprecation.md
output/

View File

@ -234,7 +234,7 @@ ALIASES =
# A mapping has the form "name=value". For example adding "class=itcl::class"
# will allow you to use the command class in the itcl::class meaning.
# TCL_SUBST =
TCL_SUBST =
# Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C sources
# only. Doxygen will then generate output that is more tailored for C. For
@ -730,7 +730,7 @@ WARNINGS = YES
# will automatically be disabled.
# The default value is: YES.
WARN_IF_UNDOCUMENTED = NO
WARN_IF_UNDOCUMENTED = YES
# If the WARN_IF_DOC_ERROR tag is set to YES, doxygen will generate warnings for
# potential errors in the documentation, such as not documenting some parameters
@ -746,7 +746,7 @@ WARN_IF_DOC_ERROR = YES
# parameter documentation, but not about the absence of documentation.
# The default value is: NO.
WARN_NO_PARAMDOC = YES
WARN_NO_PARAMDOC = NO
# If the WARN_AS_ERROR tag is set to YES then doxygen will immediately stop when
# a warning is encountered.
@ -782,79 +782,29 @@ WARN_LOGFILE =
INPUT = ../include/spdk \
index.md \
# This list contains the top level pages listed in index.md. This list should
# remain in the same order as the contents of index.md. The order here also
# determines the order of these sections in the left-side navigation bar.
INPUT += \
intro.md \
concepts.md \
user_guides.md \
prog_guides.md \
general.md \
misc.md \
driver_modules.md \
tools.md \
ci_tools.md \
performance_reports.md \
# All remaining pages are listed here in alphabetical order by filename.
INPUT += \
about.md \
accel_fw.md \
applications.md \
bdev.md \
bdevperf.md \
bdev_module.md \
bdev_pg.md \
changelog.md \
concurrency.md \
directory_structure.md \
getting_started.md \
memory.md \
porting.md \
blob.md \
blobfs.md \
changelog.md \
compression.md \
concurrency.md \
containers.md \
deprecation.md \
distributions.md \
bdev.md \
event.md \
ftl.md \
gdb_macros.md \
getting_started.md \
idxd.md \
ioat.md \
iscsi.md \
jsonrpc.md \
jsonrpc_proxy.md \
libraries.md \
lvol.md \
memory.md \
notify.md \
nvme.md \
nvme_multipath.md \
nvme_spec.md \
nvme-cli.md \
nvmf.md \
nvmf_tgt_pg.md \
nvmf_tracing.md \
nvmf_multipath_howto.md \
overview.md \
peer_2_peer.md \
pkgconfig.md \
porting.md \
rpm.md \
scheduler.md \
shfmt.md \
sma.md \
spdkcli.md \
spdk_top.md \
ssd_internals.md \
system_configuration.md \
ublk.md \
usdt.md \
userspace.md \
vagrant.md \
vhost.md \
vhost_processing.md \
virtio.md \
vmd.md
virtio.md
# This tag can be used to specify the character encoding of the source files
# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses
@ -951,7 +901,7 @@ EXAMPLE_RECURSIVE = NO
# that contain images that are to be included in the documentation (see the
# \image command).
IMAGE_PATH = img
IMAGE_PATH =
# The INPUT_FILTER tag can be used to specify a program that doxygen should
# invoke to filter for each input file. Doxygen will invoke the filter program
@ -1111,7 +1061,7 @@ ALPHABETICAL_INDEX = YES
# Minimum value: 1, maximum value: 20, default value: 5.
# This tag requires that the tag ALPHABETICAL_INDEX is set to YES.
# COLS_IN_ALPHA_INDEX = 5
COLS_IN_ALPHA_INDEX = 5
# In case all classes in a project start with a common prefix, all classes will
# be put under the same header in the alphabetical index. The IGNORE_PREFIX tag
@ -1208,7 +1158,7 @@ HTML_EXTRA_STYLESHEET = stylesheet.css
# files will be copied as-is; there are no commands or markers available.
# This tag requires that the tag GENERATE_HTML is set to YES.
HTML_EXTRA_FILES = two.min.js
HTML_EXTRA_FILES =
# The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. Doxygen
# will adjust the colors in the style sheet and background images according to
@ -1247,7 +1197,7 @@ HTML_COLORSTYLE_GAMMA = 80
# The default value is: NO.
# This tag requires that the tag GENERATE_HTML is set to YES.
HTML_TIMESTAMP = NO
HTML_TIMESTAMP = YES
# If the HTML_DYNAMIC_SECTIONS tag is set to YES then the generated HTML
# documentation will contain sections that can be hidden and shown after the
@ -1519,6 +1469,17 @@ EXT_LINKS_IN_WINDOW = NO
FORMULA_FONTSIZE = 10
# Use the FORMULA_TRANPARENT tag to determine whether or not the images
# generated for formulas are transparent PNGs. Transparent PNGs are not
# supported properly for IE 6.0, but are supported on all modern browsers.
#
# Note that when changing this option you need to delete any form_*.png files in
# the HTML output directory before the changes have effect.
# The default value is: YES.
# This tag requires that the tag GENERATE_HTML is set to YES.
FORMULA_TRANSPARENT = YES
# Enable the USE_MATHJAX option to render LaTeX formulas using MathJax (see
# http://www.mathjax.org) which uses client side Javascript for the rendering
# instead of using pre-rendered bitmaps. Use this if you do not have LaTeX
@ -1587,7 +1548,7 @@ MATHJAX_CODEFILE =
# The default value is: YES.
# This tag requires that the tag GENERATE_HTML is set to YES.
SEARCHENGINE = YES
SEARCHENGINE = NO
# When the SERVER_BASED_SEARCH tag is enabled the search engine will be
# implemented using a web server instead of a web client using Javascript. There
@ -1661,7 +1622,7 @@ EXTRA_SEARCH_MAPPINGS =
# If the GENERATE_LATEX tag is set to YES, doxygen will generate LaTeX output.
# The default value is: YES.
GENERATE_LATEX = NO
GENERATE_LATEX = YES
# The LATEX_OUTPUT tag is used to specify where the LaTeX docs will be put. If a
# relative path is entered the value of OUTPUT_DIRECTORY will be put in front of
@ -1797,6 +1758,16 @@ LATEX_BATCHMODE = YES
LATEX_HIDE_INDICES = NO
# If the LATEX_SOURCE_CODE tag is set to YES then doxygen will include source
# code with syntax highlighting in the LaTeX output.
#
# Note that which sources are shown also depends on other settings such as
# SOURCE_BROWSER.
# The default value is: NO.
# This tag requires that the tag GENERATE_LATEX is set to YES.
LATEX_SOURCE_CODE = NO
# The LATEX_BIB_STYLE tag can be used to specify the style to use for the
# bibliography, e.g. plainnat, or ieeetr. See
# http://en.wikipedia.org/wiki/BibTeX and \cite for more info.
@ -1869,6 +1840,16 @@ RTF_STYLESHEET_FILE =
RTF_EXTENSIONS_FILE =
# If the RTF_SOURCE_CODE tag is set to YES then doxygen will include source code
# with syntax highlighting in the RTF output.
#
# Note that which sources are shown also depends on other settings such as
# SOURCE_BROWSER.
# The default value is: NO.
# This tag requires that the tag GENERATE_RTF is set to YES.
RTF_SOURCE_CODE = NO
#---------------------------------------------------------------------------
# Configuration options related to the man page output
#---------------------------------------------------------------------------
@ -1958,6 +1939,15 @@ GENERATE_DOCBOOK = NO
DOCBOOK_OUTPUT = docbook
# If the DOCBOOK_PROGRAMLISTING tag is set to YES, doxygen will include the
# program listings (including syntax highlighting and cross-referencing
# information) to the DOCBOOK output. Note that enabling this will significantly
# increase the size of the DOCBOOK output.
# The default value is: NO.
# This tag requires that the tag GENERATE_DOCBOOK is set to YES.
DOCBOOK_PROGRAMLISTING = NO
#---------------------------------------------------------------------------
# Configuration options for the AutoGen Definitions output
#---------------------------------------------------------------------------
@ -2136,12 +2126,21 @@ EXTERNAL_PAGES = YES
# interpreter (i.e. the result of 'which perl').
# The default file (with absolute path) is: /usr/bin/perl.
# PERL_PATH = /usr/bin/perl
PERL_PATH = /usr/bin/perl
#---------------------------------------------------------------------------
# Configuration options related to the dot tool
#---------------------------------------------------------------------------
# If the CLASS_DIAGRAMS tag is set to YES, doxygen will generate a class diagram
# (in HTML and LaTeX) for classes with base or super classes. Setting the tag to
# NO turns the diagrams off. Note that this option also works with HAVE_DOT
# disabled, but it is recommended to install and use dot, since it yields more
# powerful graphs.
# The default value is: YES.
CLASS_DIAGRAMS = YES
# You can define message sequence charts within doxygen comments using the \msc
# command. Doxygen will then run the mscgen tool (see:
# http://www.mcternan.me.uk/mscgen/)) to produce the chart and insert it in the
@ -2149,7 +2148,7 @@ EXTERNAL_PAGES = YES
# the mscgen tool resides. If left empty the tool is assumed to be found in the
# default search path.
# MSCGEN_PATH =
MSCGEN_PATH =
# You can include diagrams made with dia in doxygen documentation. Doxygen will
# then run dia to produce the diagram and insert it in the documentation. The
@ -2183,6 +2182,23 @@ HAVE_DOT = YES
DOT_NUM_THREADS = 0
# When you want a differently looking font in the dot files that doxygen
# generates you can specify the font name using DOT_FONTNAME. You need to make
# sure dot is able to find the font, which can be done by putting it in a
# standard location or by setting the DOTFONTPATH environment variable or by
# setting DOT_FONTPATH to the directory containing the font.
# The default value is: Helvetica.
# This tag requires that the tag HAVE_DOT is set to YES.
DOT_FONTNAME = Helvetica
# The DOT_FONTSIZE tag can be used to set the size (in points) of the font of
# dot graphs.
# Minimum value: 4, maximum value: 24, default value: 10.
# This tag requires that the tag HAVE_DOT is set to YES.
DOT_FONTSIZE = 10
# By default doxygen will tell dot to use the default font as specified with
# DOT_FONTNAME. If you specify a different font using DOT_FONTNAME you can set
# the path where dot can find it using this tag.
@ -2395,6 +2411,18 @@ DOT_GRAPH_MAX_NODES = 50
MAX_DOT_GRAPH_DEPTH = 2
# Set the DOT_TRANSPARENT tag to YES to generate images with a transparent
# background. This is disabled by default, because dot on Windows does not seem
# to support this out of the box.
#
# Warning: Depending on the platform used, enabling this option may lead to
# badly anti-aliased labels on the edges of a graph (i.e. they become hard to
# read).
# The default value is: NO.
# This tag requires that the tag HAVE_DOT is set to YES.
DOT_TRANSPARENT = NO
# Set the DOT_MULTI_TARGETS tag to YES to allow dot to generate multiple output
# files in one run (i.e. multiple -o and -T options on the command line). This
# makes dot run faster, but since only newer versions of dot (>1.8.10) support

View File

@ -1,32 +1,19 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (C) 2015 Intel Corporation
# All rights reserved.
#
SPDK_ROOT_DIR := $(abspath $(CURDIR)/..)
include $(SPDK_ROOT_DIR)/mk/spdk.common.mk
all: doc
@:
.PHONY: all doc clean
doc: output
deprecation.md: ../deprecation.md
$(Q)sed -e 's/^# Deprecation/# Deprecation {#deprecation}/' \
< $< > $@
changelog.md: ../CHANGELOG.md
$(Q)sed -e 's/^# Changelog/# Changelog {#changelog}/' \
sed -e 's/^# Changelog/# Changelog {#changelog}/' \
-e 's/^##/#/' \
-e 's/^# \(\(v..\...\):.*\)/# \1 {#changelog-\2}/' \
-e '/# v..\...:/s/\./-/2' \
< $< > $@
output: Doxyfile changelog.md deprecation.md $(wildcard *.md) $(wildcard ../include/spdk/*.h)
$(Q)rm -rf $@
$(Q)doxygen Doxyfile
output: Doxyfile changelog.md
rm -rf $@
doxygen Doxyfile
clean:
$(Q)rm -rf output changelog.md deprecation.md
rm -rf output changelog.md

View File

@ -1,12 +1,11 @@
# SPDK Documentation
SPDK Documentation
==================
The current version of the SPDK documentation can be found online at
http://www.spdk.io/doc/
## Building the Documentation
Building the Documentation
==========================
To convert the documentation into HTML run `make` in the `doc`
directory. The output will be located in `doc/output/html`. Before
running `make` ensure all pre-requisites are installed. See
[Installing Prerequisites](http://www.spdk.io/doc/getting_started.html)
for more details.
To convert the documentation into HTML, install Doxygen and mscgen and run `make` in the `doc`
directory. The output will be located in `doc/output/html`.

View File

@ -1,4 +1,4 @@
# What is SPDK {#about}
# What is SPDK? {#about}
The Storage Performance Development Kit (SPDK) provides a set of tools and
libraries for writing high performance, scalable, user-mode storage

View File

@ -1,190 +0,0 @@
# Acceleration Framework {#accel_fw}
SPDK provides a framework for abstracting general acceleration capabilities
that can be implemented through plug-in modules and low-level libraries. These
plug-in modules include support for hardware acceleration engines such as
the Intel(R) I/O Acceleration Technology (IOAT) engine and the Intel(R) Data
Streaming Accelerator (DSA) engine. Additionally, a software plug-in module
exists to enable use of the framework in environments without hardware
acceleration capabilities. ISA/L is used for optimized CRC32C calculation within
the software module.
## Acceleration Framework Functions {#accel_functions}
Functions implemented via the framework can be found in the DoxyGen documentation of the
framework public header file here [accel.h](https://spdk.io/doc/accel_8h.html)
## Acceleration Framework Design Considerations {#accel_dc}
The general interface is defined by `/include/spdk/accel.h` and implemented
in `/lib/accel`. These functions may be called by an SPDK application and in
most cases, except where otherwise documented, are asynchronous and follow the
standard SPDK model for callbacks with a callback argument.
If the acceleration framework is started without initializing a hardware module,
optimized software implementations of the operations will back the public API. All
operations supported by the framework have a backing software implementation in
the event that no hardware accelerators have been enabled for that operation.
When multiple hardware modules are enabled the framework will assign each operation to
a module based on the order in which it was initialized. So, for example if two modules are
enabled, IOAT and software, the software module will be used for every operation except those
supported by IOAT.
## Acceleration Low Level Libraries {#accel_libs}
Low level libraries provide only the most basic functions that are specific to
the hardware. Low level libraries are located in the '/lib' directory with the
exception of the software implementation which is implemented as part of the
framework itself. The software low level library does not expose a public API.
Applications may choose to interact directly with a low level library if there are
specific needs/considerations not met via accessing the library through the
framework/module. Note that when using the low level libraries directly, the
framework abstracted interface is bypassed as the application will call the public
functions exposed by the individual low level libraries. Thus, code written this
way needs to be certain that the underlying hardware exists everywhere that it runs.
The low level library for IOAT is located in `/lib/ioat`. The low level library
for DSA and IAA is in `/lib/idxd` (IDXD stands for Intel(R) Data Acceleration Driver and
supports both DSA and IAA hardware accelerators). In `/lib/idxd` folder, SPDK supports the ability
to use either user space and kernel space drivers. The following describes each usage scenario:
Leveraging user space idxd driver: The DSA devices are managed by the SPDK user space
driver in a dedicated SPDK process, then the device cannot be shared by another
process. The benefit of this usage is no kernel dependency.
Leveraging kernel space driver: The DSA devices are managed by kernel
space drivers. And the Work queues inside the DSA device can be shared among
different processes. Naturally, it can be used in cloud native scenario. The drawback of
this usage is the kernel dependency, i.e., idxd kernel driver must be supported and loaded
in the kernel.
## Acceleration Plug-In Modules {#accel_modules}
Plug-in modules depend on low level libraries to interact with the hardware and
add additional functionality such as queueing during busy conditions or flow
control in some cases. The framework in turn depends on the modules to provide
the complete implementation of the acceleration component. A module must be
selected via startup RPC when the application is started. Otherwise, if no startup
RPC is provided, the framework is available and will use the software plug-in module.
### IOAT Module {#accel_ioat}
To use the IOAT module, use the RPC [`ioat_scan_accel_module`](https://spdk.io/doc/jsonrpc.html) before starting the application.
### DSA Module {#accel_dsa}
The DSA module supports the DSA hardware and relies on the low level IDXD library.
To use the DSA module, use the RPC
[`dsa_scan_accel_module`](https://spdk.io/doc/jsonrpc.html). By default, this
will attempt to load the SPDK user-space idxd driver. To use the built-in
kernel driver on Linux, add the `-k` parameter. See the next section for
details on using the kernel driver.
The DSA hardware supports a limited queue depth and channels. This means that
only a limited number of `spdk_thread`s will be able to acquire a channel.
Design software to deal with the inability to get a channel.
#### How to use kernel idxd driver {#accel_idxd_kernel}
There are several dependencies to leverage the Linux idxd driver for driving DSA devices.
1 Linux kernel support: You need to have a Linux kernel with the `idxd` driver
loaded. Further, add the following command line options to the kernel boot
commands:
```bash
intel_iommu=on,sm_on
```
2 User library dependency: Users need to install the developer version of the
`accel-config` library. This is often packaged, but the source is available on
[GitHub](https://github.com/intel/idxd-config). After the library is installed,
users can use the `accel-config` command to configure the work queues(WQs) of
the idxd devices managed by the kernel with the following steps:
Note: this library must be installed before you run `configure`
```bash
accel-config disable-wq dsa0/wq0.1
accel-config disable-device dsa0
accel-config config-wq --group-id=0 --mode=dedicated --wq-size=128 --type=user --name="MyApp1"
--priority=10 --block-on-fault=1 dsa0/wq0.1
accel-config config-engine dsa0/engine0.0 --group-id=0
accel-config config-engine dsa0/engine0.1 --group-id=0
accel-config config-engine dsa0/engine0.2 --group-id=0
accel-config config-engine dsa0/engine0.3 --group-id=0
accel-config enable-device dsa0
accel-config enable-wq dsa0/wq0.1
```
DSA can be configured in many ways, but the above configuration is needed for use with SPDK.
Before you can run using the kernel driver you need to make sure that the hardware is bound
to the kernel driver and not VFIO. By default when you run `setup.sh` DSA devices will be
bound to VFIO. To exclude DSA devices, pass a whitespace separated list of DSA devices BDF
using the PCI_BLOCKED parameter as shown below.
```bash
sudo PCI_BLOCKED="0000:04:00.0 0000:05:00.0" ./setup.sh
```
Note: you might need to run `sudo ./setup.sh reset` to unbind all drivers before performing
the step above.
### Software Module {#accel_sw}
The software module is enabled by default. If no hardware module is explicitly
enabled via startup RPC as discussed earlier, the software module will use ISA-L
if available for functions such as CRC32C. Otherwise, standard glibc calls are
used to back the framework API.
### dpdk_cryptodev {#accel_dpdk_cryptodev}
The dpdk_cryptodev module uses DPDK CryptoDev API to implement crypto operations.
The following ciphers and PMDs are supported:
- AESN-NI Multi Buffer Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC
- Intel(R) QuickAssist (QAT) Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC,
RTE_CRYPTO_CIPHER_AES128_XTS
(Note: QAT is functional however is marked as experimental until the hardware has
been fully integrated with the SPDK CI system.)
- MLX5 Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES256_XTS, RTE_CRYPTO_CIPHER_AES512_XTS
To enable this module, use [`dpdk_cryptodev_scan_accel_module`](https://spdk.io/doc/jsonrpc.html),
this RPC is available in STARTUP state and the SPDK application needs to be run with `--wait-for-rpc`
CLI parameter. To select a specific PMD, use [`dpdk_cryptodev_set_driver`](https://spdk.io/doc/jsonrpc.html)
### Module to Operation Code Assignment {#accel_assignments}
When multiple modules are initialized, the accel framework will assign op codes to
modules by first assigning all op codes to the Software Module and then overriding
op code assignments to Hardware Modules in the order in which they were initialized.
The RPC `accel_get_opc_assignments` can be used at any time to see the current
assignment map including the names of valid operations. The RPC `accel_assign_opc`
can be used after initializing the desired Hardware Modules but before starting the
framework in the event that a specific override is desired. Note that to start an
application and send startup RPC's use the `--wait-for-rpc` parameter and then use the
`framework_start_init` RPC to continue. For example, assume the DSA Module is initialized
but for some reason the desire is to have the Software Module handle copies instead.
The following RPCs would accomplish the copy override:
```bash
./scripts/rpc.py dsa_scan_accel_module
./scripts/rpc.py accel_assign_opc -o copy -m software
./scripts/rpc.py framework_start_init
./scripts/rpc.py accel_get_opc_assignments
{
"copy": "software",
"fill": "dsa",
"dualcast": "dsa",
"compare": "dsa",
"crc32c": "dsa",
"copy_crc32c": "dsa",
"compress": "software",
"decompress": "software"
}
```
To determine the name of available modules and their supported operations use the
RPC `accel_get_module_info`.

View File

@ -1,160 +0,0 @@
# An Overview of SPDK Applications {#app_overview}
SPDK is primarily a development kit that delivers libraries and header files for
use in other applications. However, SPDK also contains a number of applications.
These applications are primarily used to test the libraries, but many are full
featured and high quality. The major applications in SPDK are:
- @ref iscsi
- @ref nvmf
- @ref vhost
- SPDK Target (a unified application combining the above three)
There are also a number of tools and examples in the `examples` directory.
The SPDK targets are all based on a common framework so they have much in
common. The framework defines a concept called a `subsystem` and all
functionality is implemented in various subsystems. Subsystems have a unified
initialization and teardown path.
# Configuring SPDK Applications {#app_config}
## Command Line Parameters {#app_cmd_line_args}
The SPDK application framework defines a set of base command line flags for all
applications that use it. Specific applications may implement additional flags.
Param | Long Param | Type | Default | Description
-------- | ---------------------- | -------- | ---------------------- | -----------
-c | --config | string | | @ref cmd_arg_config_file
-d | --limit-coredump | flag | false | @ref cmd_arg_limit_coredump
-e | --tpoint-group | integer | | @ref cmd_arg_limit_tpoint_group_mask
-g | --single-file-segments | flag | | @ref cmd_arg_single_file_segments
-h | --help | flag | | show all available parameters and exit
-i | --shm-id | integer | | @ref cmd_arg_multi_process
-m | --cpumask | CPU mask | 0x1 | application @ref cpu_mask
-n | --mem-channels | integer | all channels | number of memory channels used for DPDK
-p | --main-core | integer | first core in CPU mask | main (primary) core for DPDK
-r | --rpc-socket | string | /var/tmp/spdk.sock | RPC listen address
-s | --mem-size | integer | all hugepage memory | @ref cmd_arg_memory_size
| | --silence-noticelog | flag | | disable notice level logging to `stderr`
-u | --no-pci | flag | | @ref cmd_arg_disable_pci_access.
| | --wait-for-rpc | flag | | @ref cmd_arg_deferred_initialization
-B | --pci-blocked | B:D:F | | @ref cmd_arg_pci_blocked_allowed.
-A | --pci-allowed | B:D:F | | @ref cmd_arg_pci_blocked_allowed.
-R | --huge-unlink | flag | | @ref cmd_arg_huge_unlink
| | --huge-dir | string | the first discovered | allocate hugepages from a specific mount
-L | --logflag | string | | @ref cmd_arg_log_flags
### Configuration file {#cmd_arg_config_file}
SPDK applications are configured using a JSON RPC configuration file.
See @ref jsonrpc for details.
### Limit coredump {#cmd_arg_limit_coredump}
By default, an SPDK application will set resource limits for core file sizes
to RLIM_INFINITY. Specifying `--limit-coredump` will not set the resource limits.
### Tracepoint group mask {#cmd_arg_limit_tpoint_group_mask}
SPDK has an experimental low overhead tracing framework. Tracepoints in this
framework are organized into tracepoint groups. By default, all tracepoint
groups are disabled. `--tpoint-group` can be used to enable a specific
subset of tracepoint groups in the application.
Note: Additional documentation on the tracepoint framework is in progress.
### Deferred initialization {#cmd_arg_deferred_initialization}
SPDK applications progress through a set of states beginning with `STARTUP` and
ending with `RUNTIME`.
If the `--wait-for-rpc` parameter is provided SPDK will pause just before starting
framework initialization. This state is called `STARTUP`. The JSON RPC server is
ready but only a small subset of commands are available to set up initialization
parameters. Those parameters can't be changed after the SPDK application enters
`RUNTIME` state. When the client finishes configuring the SPDK subsystems it
needs to issue the @ref rpc_framework_start_init RPC command to begin the
initialization process. After `rpc_framework_start_init` returns `true` SPDK
will enter the `RUNTIME` state and the list of available commands becomes much
larger.
To see which RPC methods are available in the current state, issue the
`rpc_get_methods` with the parameter `current` set to `true`.
For more details see @ref jsonrpc documentation.
### Create just one hugetlbfs file {#cmd_arg_single_file_segments}
Instead of creating one hugetlbfs file per page, this option makes SPDK create
one file per hugepages per socket. This is needed for @ref virtio to be used
with more than 8 hugepages. See @ref virtio_2mb.
### Multi process mode {#cmd_arg_multi_process}
When `--shm-id` is specified, the application is started in multi-process mode.
Applications using the same shm-id share their memory and
[NVMe devices](@ref nvme_multi_process). The first app to start with a given id
becomes a primary process, with the rest, called secondary processes, only
attaching to it. When the primary process exits, the secondary ones continue to
operate, but no new processes can be attached at this point. All processes within
the same shm-id group must use the same
[--single-file-segments setting](@ref cmd_arg_single_file_segments).
### Memory size {#cmd_arg_memory_size}
Total size of the hugepage memory to reserve. If DPDK env layer is used, it will
reserve memory from all available hugetlbfs mounts, starting with the one with
the highest page size. This option accepts a number of bytes with a possible
binary prefix, e.g. 1024, 1024M, 1G. The default unit is megabyte.
Starting with DPDK 18.05.1, it's possible to reserve hugepages at runtime, meaning
that SPDK application can be started with 0 pre-reserved memory. Unlike hugepages
pre-reserved at the application startup, the hugepages reserved at runtime will be
released to the system as soon as they're no longer used.
### Disable PCI access {#cmd_arg_disable_pci_access}
If SPDK is run with PCI access disabled it won't detect any PCI devices. This
includes primarily NVMe and IOAT devices. Also, the VFIO and UIO kernel modules
are not required in this mode.
### PCI address blocked and allowed lists {#cmd_arg_pci_blocked_allowed}
If blocked list is used, then all devices with the provided PCI address will be
ignored. If an allowed list is used, only allowed devices will be probed.
`-B` or `-A` can be used more than once, but cannot be mixed together. That is,
`-B` and `-A` cannot be used at the same time.
### Unlink hugepage files after initialization {#cmd_arg_huge_unlink}
By default, each DPDK-based application tries to remove any orphaned hugetlbfs
files during its initialization. This option removes hugetlbfs files of the current
process as soon as they're created, but is not compatible with `--shm-id`.
### Log flag {#cmd_arg_log_flags}
Enable a specific log type. This option can be used more than once. A list of
all available types is provided in the `--help` output, with `--logflag all`
enabling all of them. Additionally enables debug print level in debug builds of SPDK.
## CPU mask {#cpu_mask}
Whenever the `CPU mask` is mentioned it is a string in one of the following formats:
- Case insensitive hexadecimal string with or without "0x" prefix.
- Comma separated list of CPUs or list of CPU ranges. Use '-' to define range.
### Example
The following CPU masks are equal and correspond to CPUs 0, 1, 2, 8, 9, 10, 11 and 12:
~~~bash
0x1f07
0x1F07
1f07
[0,1,2,8-12]
[0, 1, 2, 8, 9, 10, 11, 12]
~~~

View File

@ -1,11 +1,6 @@
# Block Device User Guide {#bdev}
# Block Device Layer {#bdev}
## Target Audience {#bdev_ug_targetaudience}
This user guide is intended for software developers who have knowledge of block storage, storage drivers, issuing JSON-RPC
commands and storage services such as RAID, compression, crypto, and others.
## Introduction {#bdev_ug_introduction}
# Introduction {#bdev_getting_started}
The SPDK block device layer, often simply called *bdev*, is a C library
intended to be equivalent to the operating system block storage layer that
@ -14,295 +9,203 @@ storage stack. Specifically, this library provides the following
functionality:
* A pluggable module API for implementing block devices that interface with different types of block storage devices.
* Driver modules for NVMe, malloc (ramdisk), Linux AIO, virtio-scsi, Ceph RBD, Pmem and Vhost-SCSI Initiator and more.
* Driver modules for NVMe, malloc (ramdisk), Linux AIO, virtio-scsi, Ceph RBD, and more.
* An application API for enumerating and claiming SPDK block devices and then performing operations (read, write, unmap, etc.) on those devices.
* Facilities to stack block devices to create complex I/O pipelines, including logical volume management (lvol) and partition support (GPT).
* Configuration of block devices via JSON-RPC.
* Configuration of block devices via JSON-RPC and a configuration file.
* Request queueing, timeout, and reset handling.
* Multiple, lockless queues for sending I/O to block devices.
Bdev module creates abstraction layer that provides common API for all devices.
User can use available bdev modules or create own module with any type of
device underneath (please refer to @ref bdev_module for details). SPDK
provides also vbdev modules which creates block devices on existing bdev. For
example @ref bdev_ug_logical_volumes or @ref bdev_ug_gpt
# Configuring block devices {#bdev_config}
## Prerequisites {#bdev_ug_prerequisites}
The block device layer is a C library with a single public header file named
bdev.h. Upon initialization, the library will read in a configuration file that
defines the block devices it will expose. The configuration file is a text
format containing sections denominated by square brackets followed by keys with
optional values. It is often passed as a command line argument to the
application. Refer to the help facility of your application for more details.
This guide assumes that you can already build the standard SPDK distribution
on your platform. The block device layer is a C library with a single public
header file named bdev.h. All SPDK configuration described in following
chapters is done by using JSON-RPC commands. SPDK provides a python-based
command line tool for sending RPC commands located at `scripts/rpc.py`. User
can list available commands by running this script with `-h` or `--help` flag.
Additionally user can retrieve currently supported set of RPC commands
directly from SPDK application by running `scripts/rpc.py rpc_get_methods`.
Detailed help for each command can be displayed by adding `-h` flag as a
command parameter.
## NVMe {#bdev_config_nvme}
## Configuring Block Device Modules {#bdev_ug_general_rpcs}
The SPDK nvme bdev driver provides SPDK block layer access to NVMe SSDs via the SPDK userspace
NVMe driver. The nvme bdev driver binds only to devices explicitly specified. These devices
can be either locally attached SSDs or remote NVMe subsystems via NVMe-oF.
Block devices can be configured using JSON RPCs. A complete list of available RPC commands
with detailed information can be found on the @ref jsonrpc_components_bdev page.
~~~
[Nvme]
# NVMe Device Whitelist
# Users may specify which NVMe devices to claim by their transport id.
# See spdk_nvme_transport_id_parse() in spdk/nvme.h for the correct format.
# The devices will be assigned names in the format <YourName>nY, where YourName is the
# name specified at the end of the TransportId line and Y is the namespace id, which starts at 1.
TransportID "trtype:PCIe traddr:0000:00:00.0" Nvme0
TransportID "trtype:RDMA adrfam:IPv4 subnqn:nqn.2016-06.io.spdk:cnode1 traddr:192.168.100.1 trsvcid:4420" Nvme1
~~~
## Common Block Device Configuration Examples
This exports block devices for all namespaces attached to the two controllers. Block devices
for namespaces attached to the first controller will be in the format Nvme0nY, where Y is
the namespace ID. Most NVMe SSDs have a single namespace with ID=1. Block devices attached to
the second controller will be in the format Nvme1nY.
## Malloc {#bdev_config_malloc}
The SPDK malloc bdev driver allocates a buffer of memory in userspace as the target for block I/O
operations. This effectively serves as a userspace ramdisk target.
Configuration file syntax:
~~~
[Malloc]
NumberOfLuns 4
LunSizeInMB 64
~~~
This exports 4 malloc block devices, named Malloc0 through Malloc3. Each malloc block device will
be 64MB in size.
## Pmem {#bdev_config_pmem}
The SPDK pmem bdev driver uses pmemblk pool as the the target for block I/O operations.
First, you need to compile SPDK with NVML:
~~~
./configure --with-nvml
~~~
To create pmemblk pool for use with SPDK use pmempool tool included with NVML:
Usage: pmempool create [<args>] <blk|log|obj> [<bsize>] <file>
Example:
~~~
./nvml/src/tools/pmempool/pmempool create -s 32000000 blk 512 /path/to/pmem_pool
~~~
There is also pmem management included in SPDK RPC, it contains three calls:
- create_pmem_pool - Creates pmem pool file
- delete_pmem_pool - Deletes pmem pool file
- pmem_pool_info - Show information if specified file is proper pmem pool file and some detailed information about pool like block size and number of blocks
Example:
~~~
./scripts/rpc.py create_pmem_pool /path/to/pmem_pool
~~~
It is possible to create pmem bdev using SPDK RPC:
~~~
./scripts/rpc.py construct_pmem_bdev -n bdev_name /path/to/pmem_pool
~~~
## Null {#bdev_config_null}
The SPDK null bdev driver is a dummy block I/O target that discards all writes and returns undefined
data for reads. It is useful for benchmarking the rest of the bdev I/O stack with minimal block
device overhead and for testing configurations that can't easily be created with the Malloc bdev.
Configuration file syntax:
~~~
[Null]
# Dev <name> <size_in_MiB> <block_size>
# Create an 8 petabyte null bdev with 4K block size called Null0
Dev Null0 8589934592 4096
~~~
## Linux AIO {#bdev_config_aio}
The SPDK aio bdev driver provides SPDK block layer access to Linux kernel block devices via Linux AIO.
Note that O_DIRECT is used and thus bypasses the Linux page cache. This mode is probably as close to
a typical kernel based target as a user space target can get without using a user-space driver.
Configuration file syntax:
~~~
[AIO]
# AIO <file name> <bdev name> [<block size>]
# The file name is the backing device
# The bdev name can be referenced from elsewhere in the configuration file.
# Block size may be omitted to automatically detect the block size of a disk.
AIO /dev/sdb AIO0
AIO /dev/sdc AIO1
AIO /tmp/myfile AIO2 4096
~~~
This exports 2 aio block devices, named AIO0 and AIO1.
## Ceph RBD {#bdev_config_rbd}
The SPDK RBD bdev driver provides SPDK block layer access to Ceph RADOS block
devices (RBD). Ceph RBD devices are accessed via librbd and librados libraries
to access the RADOS block device exported by Ceph. To create Ceph bdev RPC
command `bdev_rbd_register_cluster` and `bdev_rbd_create` should be used.
The SPDK rbd bdev driver provides SPDK block layer access to Ceph RADOS block devices (RBD). Ceph
RBD devices are accessed via librbd and librados libraries to access the RADOS block device
exported by Ceph.
SPDK provides two ways of creating a RBD bdev. One is to create a new Rados cluster object
for each RBD bdev. Another is to share the same Rados cluster object for multiple RBD bdevs.
Each Rados cluster object creates a small number of io_context_pool and messenger threads.
Ceph commands `ceph config help librados_thread_count` and `ceph config help ms_async_op_threads`
could help to check these threads information. Besides, you can specify the number of threads by
updating ceph.conf file or using Ceph config commands. For more information, please refer to
[Ceph configuration](https://docs.ceph.com/en/latest/rados/configuration/ceph-conf/)
One set of threads may not be enough to maximize performance with a large number of RBD bdevs,
but one set of threads per RBD bdev may add too much context switching. Therefore, performance
tuning on the number of RBD bdevs per cluster object and thread may be required.
Configuration file syntax:
Example command
~~~
[Ceph]
# The format of provided rbd info should be: Ceph rbd_pool_name rbd_name size.
# In the following example, rbd is the name of rbd_pool; foo is the name of
# rbd device exported by Ceph; value 512 represents the configured block size
# for this rbd, the block size should be a multiple of 512.
Ceph rbd foo 512
~~~
`rpc.py bdev_rbd_register_cluster rbd_cluster`
This exports 1 rbd block device, named Ceph0.
This command will register a cluster named rbd_cluster. Optional `--config-file` and
`--key-file` params are specified for the cluster.
## Virtio SCSI {#bdev_config_virtio_scsi}
To remove a registered cluster use the bdev_rbd_unregister_cluster command.
The SPDK Virtio SCSI driver allows creating SPDK block devices from Virtio SCSI LUNs.
`rpc.py bdev_rbd_unregister_cluster rbd_cluster`
Use the following configuration file snippet to bind all available Virtio-SCSI PCI
devices on a virtual machine. The driver will perform a target scan on each device
and automatically create block device for each LUN.
To create RBD bdev with a registered cluster.
~~~
[VirtioPci]
# If enabled, the driver will automatically use all available Virtio-SCSI PCI
# devices. Disabled by default.
Enable Yes
~~~
`rpc.py bdev_rbd_create rbd foo 512 -c rbd_cluster`
The driver also supports connecting to vhost-user devices exposed on the same host.
In the following case, the host app has created a vhost-scsi controller which is
accessible through the /tmp/vhost.0 domain socket.
This command will create a bdev that represents the 'foo' image from a pool called 'rbd'.
When specifying -c for `bdev_rbd_create`, RBD bdevs will share the same rados cluster with
one connection of Ceph in librbd module. Instead it will create a new rados cluster with one
cluster connection for every bdev without specifying -c.
~~~
[VirtioUser0]
# Path to the Unix domain socket using vhost-user protocol.
Path /tmp/vhost.0
# Maximum number of request queues to use. Default value is 1.
Queues 1
To remove a block device representation use the bdev_rbd_delete command.
#[VirtioUser1]
#Path /tmp/vhost.1
~~~
`rpc.py bdev_rbd_delete Rbd0`
To resize a bdev use the bdev_rbd_resize command.
`rpc.py bdev_rbd_resize Rbd0 4096`
This command will resize the Rbd0 bdev to 4096 MiB.
## Compression Virtual Bdev Module {#bdev_config_compress}
The compression bdev module can be configured to provide compression/decompression
services for an underlying thinly provisioned logical volume. Although the underlying
module can be anything (i.e. NVME bdev) the overall compression benefits will not be realized
unless the data stored on disk is placed appropriately. The compression vbdev module
relies on an internal SPDK library called `reduce` to accomplish this, see @ref reduce
for detailed information.
The compression bdev module leverages the [Acceleration Framework](https://spdk.io/doc/accel_fw.html) to
carry out the actual compression and decompression. The acceleration framework can be configured to use
ISA-L software optimized compression or the DPDK Compressdev module for hardware acceleration. To configure
the Compressdev module please see the `compressdev_scan_accel_module` documentation [here](https://spdk.io/doc/jsonrpc.html)
Persistent memory is used to store metadata associated with the layout of the data on the
backing device. SPDK relies on [PMDK](http://pmem.io/pmdk/) to interface persistent memory so any hardware
supported by PMDK should work. If the directory for PMEM supplied upon vbdev creation does
not point to persistent memory (i.e. a regular filesystem) performance will be severely
impacted. The vbdev module and reduce libraries were designed to use persistent memory for
any production use.
Example command
`rpc.py bdev_compress_create -p /pmem_files -b myLvol`
In this example, a compression vbdev is created using persistent memory that is mapped to
the directory `pmem_files` on top of the existing thinly provisioned logical volume `myLvol`.
The resulting compression bdev will be named `COMP_LVS/myLvol` where LVS is the name of the
logical volume store that `myLvol` resides on.
The logical volume is referred to as the backing device and once the compression vbdev is
created it cannot be separated from the persistent memory file that will be created in
the specified directory. If the persistent memory file is not available, the compression
vbdev will also not be available.
To remove a compression vbdev, use the following command which will also delete the PMEM
file. If the logical volume is deleted the PMEM file will not be removed and the
compression vbdev will not be available.
`rpc.py bdev_compress_delete COMP_LVS/myLvol`
To list compression volumes that are only available for deletion because their PMEM file
was missing use the following. The name parameter is optional and if not included will list
all volumes, if used it will return the name or an error that the device does not exist.
`rpc.py bdev_compress_get_orphans --name COMP_Nvme0n1`
## Crypto Virtual Bdev Module {#bdev_config_crypto}
The crypto virtual bdev module can be configured to provide at rest data encryption
for any underlying bdev. The module relies on the SPDK Accel Framework to provide
all cryptographic functionality.
One of the accel modules, dpdk_cryptodev is implemented with the DPDK CryptoDev API,
it provides support for many different software only cryptographic modules as well hardware
assisted support for the Intel QAT board and NVIDIA crypto enabled NICs.
For reads, the buffer provided to the crypto block device will be used as the destination buffer
for unencrypted data. For writes, however, a temporary scratch buffer is used as the
destination buffer for encryption which is then passed on to the underlying bdev as the
write buffer. This is done to avoid encrypting the data in the original source buffer which
may cause problems in some use cases.
Below is information about accel modules which support crypto operations:
### dpdk_cryptodev accel module
Supports the following ciphers:
- AESN-NI Multi Buffer Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC
- Intel(R) QuickAssist (QAT) Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES128_CBC,
RTE_CRYPTO_CIPHER_AES128_XTS
(Note: QAT is functional however is marked as experimental until the hardware has
been fully integrated with the SPDK CI system.)
- MLX5 Crypto Poll Mode Driver: RTE_CRYPTO_CIPHER_AES256_XTS, RTE_CRYPTO_CIPHER_AES512_XTS
In order to support using the bdev block offset (LBA) as the initialization vector (IV),
the crypto module break up all I/O into crypto operations of a size equal to the block
size of the underlying bdev. For example, a 4K I/O to a bdev with a 512B block size,
would result in 8 cryptographic operations.
### SW accel module
Supports the following ciphers:
- AES_XTS cipher with 128 or 256 bit keys implemented with ISA-L_crypto
### General workflow
- Set desired accel module to perform crypto operations, that can be done with `accel_assign_opc` RPC command
- Create a named crypto key using `accel_crypto_key_create` RPC command. The key will use the assigned accel
module. Set of parameters and supported ciphers may be different in each accel module.
- Create virtual crypto block device providing the base block device name and the crypto key name
using `bdev_crypto_create` RPC command
#### Example
Example command which uses dpdk_cryptodev accel module
```
# start SPDK application with `--wait-for-rpc` parameter
rpc.py dpdk_cryptodev_scan_accel_module
rpc.py dpdk_cryptodev_set_driver crypto_aesni_mb
rpc.py accel_assign_opc -o encrypt -m dpdk_cryptodev
rpc.py accel_assign_opc -o decrypt -m dpdk_cryptodev
rpc.py framework_start_init
rpc.py accel_crypto_key_create -c AES_CBC -k 01234567891234560123456789123456 -n key_aesni_cbc_1
rpc.py bdev_crypto_create NVMe1n1 CryNvmeA -n key_aesni_cbc_1
```
These commands will create a crypto vbdev called 'CryNvmeA' on top of the NVMe bdev
'NVMe1n1' and will use a key named `key_aesni_cbc_1`. The key will work with the accel module which
has been assigned for encrypt operations, in this example it will be the dpdk_cryptodev.
### Crypto key format
Please make sure the keys are provided in hexlified format. This means string passed to
rpc.py must be twice as long than the key length in binary form.
#### Example command
`rpc.py accel_crypto_key_create -c AES_XTS -k2 7859243a027411e581e0c40a35c8228f -k d16a2f3a9e9f5b32daefacd7f5984f4578add84425be4a0baa489b9de8884b09 -n sample_key`
This command will create a key called `sample_key`, the AES key
'd16a2f3a9e9f5b32daefacd7f5984f4578add84425be4a0baa489b9de8884b09' and the XTS key
'7859243a027411e581e0c40a35c8228f'. In other words, the compound AES_XTS key to be used is
'd16a2f3a9e9f5b32daefacd7f5984f4578add84425be4a0baa489b9de8884b097859243a027411e581e0c40a35c8228f'
### Delete the virtual crypto block device
To remove the vbdev use the bdev_crypto_delete command.
`rpc.py bdev_crypto_delete CryNvmeA`
### dpdk_cryptodev mlx5_pci driver configuration
The mlx5_pci driver works with crypto enabled Nvidia NICs and requires special configuration of
DPDK environment to enable crypto function. It can be done via SPDK event library by configuring
`env_context` member of `spdk_app_opts` structure or by passing corresponding CLI arguments in
the following form: `--allow=BDF,class=crypto,wcs_file=/full/path/to/wrapped/credentials`, e.g.
`--allow=0000:01:00.0,class=crypto,wcs_file=/path/credentials.txt`.
## Delay Bdev Module {#bdev_config_delay}
The delay vbdev module is intended to apply a predetermined additional latency on top of a lower
level bdev. This enables the simulation of the latency characteristics of a device during the functional
or scalability testing of an SPDK application. For example, to simulate the effect of drive latency when
processing I/Os, one could configure a NULL bdev with a delay bdev on top of it.
The delay bdev module is not intended to provide a high fidelity replication of a specific NVMe drive's latency,
instead it's main purpose is to provide a "big picture" understanding of how a generic latency affects a given
application.
A delay bdev is created using the `bdev_delay_create` RPC. This rpc takes 6 arguments, one for the name
of the delay bdev and one for the name of the base bdev. The remaining four arguments represent the following
latency values: average read latency, average write latency, p99 read latency, and p99 write latency.
Within the context of the delay bdev p99 latency means that one percent of the I/O will be delayed by at
least by the value of the p99 latency before being completed to the upper level protocol. All of the latency values
are measured in microseconds.
Example command:
`rpc.py bdev_delay_create -b Null0 -d delay0 -r 10 --nine-nine-read-latency 50 -w 30 --nine-nine-write-latency 90`
This command will create a delay bdev with average read and write latencies of 10 and 30 microseconds and p99 read
and write latencies of 50 and 90 microseconds respectively.
A delay bdev can be deleted using the `bdev_delay_delete` RPC
Example command:
`rpc.py bdev_delay_delete delay0`
Each Virtio-SCSI device may export up to 64 block devices named VirtioScsi0t0 ~ VirtioScsi0t63.
## GPT (GUID Partition Table) {#bdev_config_gpt}
The GPT virtual bdev driver is enabled by default and does not require any configuration.
It will automatically detect @ref bdev_ug_gpt on any attached bdev and will create
possibly multiple virtual bdevs.
The GPT virtual bdev driver examines all bdevs as they are added and exposes partitions
with a SPDK-specific partition type as bdevs.
The SPDK partition type GUID is `7c5222bd-8f5d-4087-9c00-bf9843c7b58c`.
### SPDK GPT partition table {#bdev_ug_gpt}
Configuration file syntax:
The SPDK partition type GUID is `6527994e-2c5a-4eec-9613-8f5944074e8b`. Existing SPDK bdevs
can be exposed as Linux block devices via NBD and then can be partitioned with
standard partitioning tools. After partitioning, the bdevs will need to be deleted and
attached again for the GPT bdev module to see any changes. NBD kernel module must be
loaded first. To create NBD bdev user should use `nbd_start_disk` RPC command.
~~~
[Gpt]
# If Gpt is disabled, it will not automatically expose GPT partitions as bdevs.
Disable No
~~~
Example command
### Creating a GPT partition table using NBD
`rpc.py nbd_start_disk Malloc0 /dev/nbd0`
The bdev NBD app can be used to temporarily expose an SPDK bdev through the Linux kernel
block stack so that standard partitioning tools can be used.
This will expose an SPDK bdev `Malloc0` under the `/dev/nbd0` block device.
~~~
# Assumes bdev.conf is already configured with a bdev named Nvme0n1 -
# see the NVMe section above.
test/app/bdev_svc/bdev_svc -c bdev.conf &
nbd_pid=$!
To remove NBD device user should use `nbd_stop_disk` RPC command.
Example command
`rpc.py nbd_stop_disk /dev/nbd0`
To display full or specified nbd device list user should use `nbd_get_disks` RPC command.
Example command
`rpc.py nbd_stop_disk -n /dev/nbd0`
### Creating a GPT partition table using NBD {#bdev_ug_gpt_create_part}
~~~bash
# Expose bdev Nvme0n1 as kernel block device /dev/nbd0 by JSON-RPC
rpc.py nbd_start_disk Nvme0n1 /dev/nbd0
scripts/rpc.py start_nbd_disk Nvme0n1 /dev/nbd0
# Create GPT partition table.
parted -s /dev/nbd0 mklabel gpt
@ -312,374 +215,17 @@ parted -s /dev/nbd0 mkpart MyPartition '0%' '50%'
# Change the partition type to the SPDK GUID.
# sgdisk is part of the gdisk package.
sgdisk -t 1:6527994e-2c5a-4eec-9613-8f5944074e8b /dev/nbd0
sgdisk -t 1:7c5222bd-8f5d-4087-9c00-bf9843c7b58c /dev/nbd0
# Stop the NBD device (stop exporting /dev/nbd0).
rpc.py nbd_stop_disk /dev/nbd0
# Kill the NBD application (stop exporting /dev/nbd0).
kill $nbd_pid
# Now Nvme0n1 is configured with a GPT partition table, and
# the first partition will be automatically exposed as
# Nvme0n1p1 in SPDK applications.
~~~
## iSCSI bdev {#bdev_config_iscsi}
## Logical Volumes
The SPDK iSCSI bdev driver depends on libiscsi and hence is not enabled by default.
In order to use it, build SPDK with an extra `--with-iscsi-initiator` configure option.
The following command creates an `iSCSI0` bdev from a single LUN exposed at given iSCSI URL
with `iqn.2016-06.io.spdk:init` as the reported initiator IQN.
`rpc.py bdev_iscsi_create -b iSCSI0 -i iqn.2016-06.io.spdk:init --url iscsi://127.0.0.1/iqn.2016-06.io.spdk:disk1/0`
The URL is in the following format:
`iscsi://[<username>[%<password>]@]<host>[:<port>]/<target-iqn>/<lun>`
## Linux AIO bdev {#bdev_config_aio}
The SPDK AIO bdev driver provides SPDK block layer access to Linux kernel block
devices or a file on a Linux filesystem via Linux AIO. Note that O_DIRECT is
used and thus bypasses the Linux page cache. This mode is probably as close to
a typical kernel based target as a user space target can get without using a
user-space driver. To create AIO bdev RPC command `bdev_aio_create` should be
used.
Example commands
`rpc.py bdev_aio_create /dev/sda aio0`
This command will create `aio0` device from /dev/sda.
`rpc.py bdev_aio_create /tmp/file file 4096`
This command will create `file` device with block size 4096 from /tmp/file.
To delete an aio bdev use the bdev_aio_delete command.
`rpc.py bdev_aio_delete aio0`
## OCF Virtual bdev {#bdev_config_cas}
OCF virtual bdev module is based on [Open CAS Framework](https://github.com/Open-CAS/ocf) - a
high performance block storage caching meta-library.
To enable the module, configure SPDK using `--with-ocf` flag.
OCF bdev can be used to enable caching for any underlying bdev.
Below is an example command for creating OCF bdev:
`rpc.py bdev_ocf_create Cache1 wt Malloc0 Nvme0n1`
This command will create new OCF bdev `Cache1` having bdev `Malloc0` as caching-device
and `Nvme0n1` as core-device and initial cache mode `Write-Through`.
`Malloc0` will be used as cache for `Nvme0n1`, so data written to `Cache1` will be present
on `Nvme0n1` eventually.
By default, OCF will be configured with cache line size equal 4KiB
and non-volatile metadata will be disabled.
To remove `Cache1`:
`rpc.py bdev_ocf_delete Cache1`
During removal OCF-cache will be stopped and all cached data will be written to the core device.
Note that OCF has a per-device RAM requirement. More details can be found in the
[OCF documentation](https://open-cas.github.io/guide_system_requirements.html).
## Malloc bdev {#bdev_config_malloc}
Malloc bdevs are ramdisks. Because of its nature they are volatile. They are created from hugepage memory given to SPDK
application.
Example command for creating malloc bdev:
`rpc.py bdev_malloc_create -b Malloc0 64 512`
Example command for removing malloc bdev:
`rpc.py bdev_malloc_delete Malloc0`
## Null {#bdev_config_null}
The SPDK null bdev driver is a dummy block I/O target that discards all writes and returns undefined
data for reads. It is useful for benchmarking the rest of the bdev I/O stack with minimal block
device overhead and for testing configurations that can't easily be created with the Malloc bdev.
To create Null bdev RPC command `bdev_null_create` should be used.
Example command
`rpc.py bdev_null_create Null0 8589934592 4096`
This command will create an 8 petabyte `Null0` device with block size 4096.
To delete a null bdev use the bdev_null_delete command.
`rpc.py bdev_null_delete Null0`
## NVMe bdev {#bdev_config_nvme}
There are two ways to create block device based on NVMe device in SPDK. First
way is to connect local PCIe drive and second one is to connect NVMe-oF device.
In both cases user should use `bdev_nvme_attach_controller` RPC command to achieve that.
Example commands
`rpc.py bdev_nvme_attach_controller -b NVMe1 -t PCIe -a 0000:01:00.0`
This command will create NVMe bdev of physical device in the system.
`rpc.py bdev_nvme_attach_controller -b Nvme0 -t RDMA -a 192.168.100.1 -f IPv4 -s 4420 -n nqn.2016-06.io.spdk:cnode1`
This command will create NVMe bdev of NVMe-oF resource.
To remove an NVMe controller use the bdev_nvme_detach_controller command.
`rpc.py bdev_nvme_detach_controller Nvme0`
This command will remove NVMe bdev named Nvme0.
The SPDK NVMe bdev driver provides the multipath feature. Please refer to
@ref nvme_multipath for details.
### NVMe bdev character device {#bdev_config_nvme_cuse}
This feature is considered as experimental. You must configure with --with-nvme-cuse
option to enable this RPC.
Example commands
`rpc.py bdev_nvme_cuse_register -n Nvme3`
This command will register a character device under /dev/spdk associated with Nvme3
controller. If there are namespaces created on Nvme3 controller, a namespace
character device is also created for each namespace.
For example, the first controller registered will have a character device path of
/dev/spdk/nvmeX, where X is replaced with a unique integer to differentiate it from
other controllers. Note that this 'nvmeX' name here has no correlation to the name
associated with the controller in SPDK. Namespace character devices will have a path
of /dev/spdk/nvmeXnY, where Y is the namespace ID.
Cuse devices are removed from system, when NVMe controller is detached or unregistered
with command:
`rpc.py bdev_nvme_cuse_unregister -n Nvme0`
## Logical volumes {#bdev_ug_logical_volumes}
The Logical Volumes library is a flexible storage space management system. It allows
creating and managing virtual block devices with variable size on top of other bdevs.
The SPDK Logical Volume library is built on top of @ref blob. For detailed description
please refer to @ref lvol.
### Logical volume store {#bdev_ug_lvol_store}
Before creating any logical volumes (lvols), an lvol store has to be created first on
selected block device. Lvol store is lvols vessel responsible for managing underlying
bdev space assignment to lvol bdevs and storing metadata. To create lvol store user
should use using `bdev_lvol_create_lvstore` RPC command.
Example command
`rpc.py bdev_lvol_create_lvstore Malloc2 lvs -c 4096`
This will create lvol store named `lvs` with cluster size 4096, build on top of
`Malloc2` bdev. In response user will be provided with uuid which is unique lvol store
identifier.
User can get list of available lvol stores using `bdev_lvol_get_lvstores` RPC command (no
parameters available).
Example response
~~~
{
"uuid": "330a6ab2-f468-11e7-983e-001e67edf35d",
"base_bdev": "Malloc2",
"free_clusters": 8190,
"cluster_size": 8192,
"total_data_clusters": 8190,
"block_size": 4096,
"name": "lvs"
}
~~~
To delete lvol store user should use `bdev_lvol_delete_lvstore` RPC command.
Example commands
`rpc.py bdev_lvol_delete_lvstore -u 330a6ab2-f468-11e7-983e-001e67edf35d`
`rpc.py bdev_lvol_delete_lvstore -l lvs`
### Lvols {#bdev_ug_lvols}
To create lvols on existing lvol store user should use `bdev_lvol_create` RPC command.
Each created lvol will be represented by new bdev.
Example commands
`rpc.py bdev_lvol_create lvol1 25 -l lvs`
`rpc.py bdev_lvol_create lvol2 25 -u 330a6ab2-f468-11e7-983e-001e67edf35d`
## Passthru {#bdev_config_passthru}
The SPDK Passthru virtual block device module serves as an example of how to write a
virtual block device module. It implements the required functionality of a vbdev module
and demonstrates some other basic features such as the use of per I/O context.
Example commands
`rpc.py bdev_passthru_create -b aio -p pt`
`rpc.py bdev_passthru_delete pt`
## RAID {#bdev_ug_raid}
RAID virtual bdev module provides functionality to combine any SPDK bdevs into
one RAID bdev. Currently SPDK supports only RAID 0. RAID metadata may be stored
on member disks if enabled when creating the RAID bdev, so user does not have to
recreate the RAID volume when restarting application. It is not enabled by
default for backward compatibility. User may specify member disks to create
RAID volume even if they do not exist yet - as the member disks are registered at
a later time, the RAID module will claim them and will surface the RAID volume
after all of the member disks are available. It is allowed to use disks of
different sizes - the smallest disk size will be the amount of space used on
each member disk.
Example commands
`rpc.py bdev_raid_create -n Raid0 -z 64 -r 0 -b "lvol0 lvol1 lvol2 lvol3"`
`rpc.py bdev_raid_get_bdevs`
`rpc.py bdev_raid_delete Raid0`
## Split {#bdev_ug_split}
The split block device module takes an underlying block device and splits it into
several smaller equal-sized virtual block devices. This serves as an example to create
more vbdevs on a given base bdev for user testing.
Example commands
To create four split bdevs with base bdev_b0 use the `bdev_split_create` command.
Each split bdev will be one fourth the size of the base bdev.
`rpc.py bdev_split_create bdev_b0 4`
The `split_size_mb`(-s) parameter restricts the size of each split bdev.
The total size of all split bdevs must not exceed the base bdev size.
`rpc.py bdev_split_create bdev_b0 4 -s 128`
To remove the split bdevs, use the `bdev_split_delete` command with the base bdev name.
`rpc.py bdev_split_delete bdev_b0`
## Uring {#bdev_ug_uring}
The uring bdev module issues I/O to kernel block devices using the io_uring Linux kernel API. This module requires liburing.
For more information on io_uring refer to kernel [IO_uring] (https://kernel.dk/io_uring.pdf)
The user needs to configure SPDK to include io_uring support:
`configure --with-uring`
Support for zoned devices is enabled by default in uring bdev. It can be explicitly disabled as follows:
`configure --with-uring --without-uring-zns`
To create a uring bdev with given filename, bdev name and block size use the `bdev_uring_create` RPC.
`rpc.py bdev_uring_create /path/to/device bdev_u0 512`
To remove a uring bdev use the `bdev_uring_delete` RPC.
`rpc.py bdev_uring_delete bdev_u0`
## xnvme {#bdev_ug_xnvme}
The xnvme bdev module issues I/O to the underlying NVMe devices through various I/O mechanisms
such as libaio, io_uring, Asynchronous IOCTL using io_uring passthrough, POSIX aio, emulated aio etc.
This module requires xNVMe library.
For more information on xNVMe refer to [xNVMe] (https://xnvme.io/docs/latest)
The user needs to configure SPDK to include xNVMe support:
`configure --with-xnvme`
To create a xnvme bdev with given filename, bdev name and I/O mechanism use the `bdev_xnvme_create` RPC.
`rpc.py bdev_xnvme_create /dev/ng0n1 bdev_ng0n1 io_uring_cmd`
To remove a xnvme bdev use the `bdev_xnvme_delete` RPC.
`rpc.py bdev_xnvme_delete bdev_ng0n1`
## Virtio Block {#bdev_config_virtio_blk}
The Virtio-Block driver allows creating SPDK bdevs from Virtio-Block devices.
The following command creates a Virtio-Block device named `VirtioBlk0` from a vhost-user
socket `/tmp/vhost.0` exposed directly by SPDK @ref vhost. Optional `vq-count` and
`vq-size` params specify number of request queues and queue depth to be used.
`rpc.py bdev_virtio_attach_controller --dev-type blk --trtype user --traddr /tmp/vhost.0 --vq-count 2 --vq-size 512 VirtioBlk0`
The driver can be also used inside QEMU-based VMs. The following command creates a Virtio
Block device named `VirtioBlk0` from a Virtio PCI device at address `0000:00:01.0`.
The entire configuration will be read automatically from PCI Configuration Space. It will
reflect all parameters passed to QEMU's vhost-user-scsi-pci device.
`rpc.py bdev_virtio_attach_controller --dev-type blk --trtype pci --traddr 0000:01:00.0 VirtioBlk1`
Virtio-Block devices can be removed with the following command
`rpc.py bdev_virtio_detach_controller VirtioBlk0`
## Virtio SCSI {#bdev_config_virtio_scsi}
The Virtio-SCSI driver allows creating SPDK block devices from Virtio-SCSI LUNs.
Virtio-SCSI bdevs are created the same way as Virtio-Block ones.
`rpc.py bdev_virtio_attach_controller --dev-type scsi --trtype user --traddr /tmp/vhost.0 --vq-count 2 --vq-size 512 VirtioScsi0`
`rpc.py bdev_virtio_attach_controller --dev-type scsi --trtype pci --traddr 0000:01:00.0 VirtioScsi0`
Each Virtio-SCSI device may export up to 64 block devices named VirtioScsi0t0 ~ VirtioScsi0t63,
one LUN (LUN0) per SCSI device. The above 2 commands will output names of all exposed bdevs.
Virtio-SCSI devices can be removed with the following command
`rpc.py bdev_virtio_detach_controller VirtioScsi0`
Removing a Virtio-SCSI device will destroy all its bdevs.
## DAOS bdev {#bdev_config_daos}
DAOS bdev creates SPDK block device on top of DAOS DFS, the name of the bdev defines the file name in DFS namespace.
Note that DAOS container has to be POSIX type, e.g.: ` daos cont create --pool=test-pool --label=test-cont --type=POSIX`
To build SPDK with daos support, daos-devel package has to be installed, please see the setup [guide](https://docs.daos.io/v2.0/).
To enable the module, configure SPDK using `--with-daos` flag.
Running `daos_agent` service on the target machine is required for the SPDK DAOS bdev communication with a DAOS cluster.
The implementation uses the independent pool and container connections per device's channel for the best IO throughput, therefore,
running a target application with multiple cores (`-m [0-7], for example) is highly advisable.
Example command for creating daos bdev:
`rpc.py bdev_daos_create daosdev0 test-pool test-cont 64 4096`
Example command for removing daos bdev:
`rpc.py bdev_daos_delete daosdev0`
To resize a bdev use the bdev_daos_resize command.
`rpc.py bdev_daos_resize daosdev0 8192`
This command will resize the daosdev0 bdev to 8192 MiB.
The SPDK lvol driver allows to dynamically partition other SPDK backends.
No static configuration for this driver. Refer to @ref lvol for detailed RPC configuration.

View File

@ -1,196 +0,0 @@
# Writing a Custom Block Device Module {#bdev_module}
## Target Audience
This programming guide is intended for developers authoring their own block
device modules to integrate with SPDK's bdev layer. For a guide on how to use
the bdev layer, see @ref bdev_pg.
## Introduction
A block device module is SPDK's equivalent of a device driver in a traditional
operating system. The module provides a set of function pointers that are
called to service block device I/O requests. SPDK provides a number of block
device modules including NVMe, RAM-disk, and Ceph RBD. However, some users
will want to write their own to interact with either custom hardware or to an
existing storage software stack. This guide is intended to demonstrate exactly
how to write a module.
## Creating A New Module
Block device modules are located in subdirectories under module/bdev today. It is not
currently possible to place the code for a bdev module elsewhere, but updates
to the build system could be made to enable this in the future. To create a
module, add a new directory with a single C file and a Makefile. A great
starting point is to copy the existing 'null' bdev module.
The primary interface that bdev modules will interact with is in
include/spdk/bdev_module.h. In that header a macro is defined that registers
a new bdev module - SPDK_BDEV_MODULE_REGISTER. This macro take as argument a
pointer spdk_bdev_module structure that is used to register new bdev module.
The spdk_bdev_module structure describes the module properties like
initialization (`module_init`) and teardown (`module_fini`) functions,
the function that returns context size (`get_ctx_size`) - scratch space that
will be allocated in each I/O request for use by this module, and a callback
that will be called each time a new bdev is registered by another module
(`examine_config` and `examine_disk`). Please check the documentation of
struct spdk_bdev_module for more details.
## Creating Bdevs
New bdevs are created within the module by calling spdk_bdev_register(). The
module must allocate a struct spdk_bdev, fill it out appropriately, and pass
it to the register call. The most important field to fill out is `fn_table`,
which points at this data structure:
~~~{.c}
/*
* Function table for a block device backend.
*
* The backend block device function table provides a set of APIs to allow
* communication with a backend. The main commands are read/write API
* calls for I/O via submit_request.
*/
struct spdk_bdev_fn_table {
/* Destroy the backend block device object */
int (*destruct)(void *ctx);
/* Process the IO. */
void (*submit_request)(struct spdk_io_channel *ch, struct spdk_bdev_io *);
/* Check if the block device supports a specific I/O type. */
bool (*io_type_supported)(void *ctx, enum spdk_bdev_io_type);
/* Get an I/O channel for the specific bdev for the calling thread. */
struct spdk_io_channel *(*get_io_channel)(void *ctx);
/*
* Output driver-specific configuration to a JSON stream. Optional - may be NULL.
*
* The JSON write context will be initialized with an open object, so the bdev
* driver should write a name (based on the driver name) followed by a JSON value
* (most likely another nested object).
*/
int (*dump_config_json)(void *ctx, struct spdk_json_write_ctx *w);
/* Get spin-time per I/O channel in microseconds.
* Optional - may be NULL.
*/
uint64_t (*get_spin_time)(struct spdk_io_channel *ch);
};
~~~
The bdev module must implement these function callbacks.
The `destruct` function is called to tear down the device when the system no
longer needs it. What `destruct` does is up to the module - it may just be
freeing memory or it may be shutting down a piece of hardware.
The `io_type_supported` function returns whether a particular I/O type is
supported. The available I/O types are:
~~~{.c}
/** bdev I/O type */
enum spdk_bdev_io_type {
SPDK_BDEV_IO_TYPE_INVALID = 0,
SPDK_BDEV_IO_TYPE_READ,
SPDK_BDEV_IO_TYPE_WRITE,
SPDK_BDEV_IO_TYPE_UNMAP,
SPDK_BDEV_IO_TYPE_FLUSH,
SPDK_BDEV_IO_TYPE_RESET,
SPDK_BDEV_IO_TYPE_NVME_ADMIN,
SPDK_BDEV_IO_TYPE_NVME_IO,
SPDK_BDEV_IO_TYPE_NVME_IO_MD,
SPDK_BDEV_IO_TYPE_WRITE_ZEROES,
};
~~~
For the simplest bdev modules, only `SPDK_BDEV_IO_TYPE_READ` and
`SPDK_BDEV_IO_TYPE_WRITE` are necessary. `SPDK_BDEV_IO_TYPE_UNMAP` is often
referred to as "trim" or "deallocate", and is a request to mark a set of
blocks as no longer containing valid data. `SPDK_BDEV_IO_TYPE_FLUSH` is a
request to make all previously completed writes durable. Many devices do not
require flushes. `SPDK_BDEV_IO_TYPE_WRITE_ZEROES` is just like a regular
write, but does not provide a data buffer (it would have just contained all
0's). If it isn't supported, the generic bdev code is capable of emulating it
by sending regular write requests.
`SPDK_BDEV_IO_TYPE_RESET` is a request to abort all I/O and return the
underlying device to its initial state. Do not complete the reset request
until all I/O has been completed in some way.
`SPDK_BDEV_IO_TYPE_NVME_ADMIN`, `SPDK_BDEV_IO_TYPE_NVME_IO`, and
`SPDK_BDEV_IO_TYPE_NVME_IO_MD` are all mechanisms for passing raw NVMe
commands through the SPDK bdev layer. They're strictly optional, and it
probably only makes sense to implement those if the backing storage device is
capable of handling NVMe commands.
The `get_io_channel` function should return an I/O channel. For a detailed
explanation of I/O channels, see @ref concurrency. The generic bdev layer will
call `get_io_channel` one time per thread, cache the result, and pass that
result to `submit_request`. It will use the corresponding channel for the
thread it calls `submit_request` on.
The `submit_request` function is called to actually submit I/O requests to the
block device. Once the I/O request is completed, the module must call
spdk_bdev_io_complete(). The I/O does not have to finish within the calling
context of `submit_request`.
Integrating a new bdev module into the build system requires updates to various
files in the /mk directory.
## Creating Bdevs in an External Repository
A User can build their own bdev module and application on top of existing SPDK libraries. The example in
test/external_code serves as a template for creating, building and linking an external
bdev module. Refer to test/external_code/README.md and @ref so_linking for further information.
## Creating Virtual Bdevs
Block devices are considered virtual if they handle I/O requests by routing
the I/O to other block devices. The canonical example would be a bdev module
that implements RAID. Virtual bdevs are created in the same way as regular
bdevs, but take the one additional step of claiming the bdev.
The module can open the underlying bdevs it wishes to route I/O to using
spdk_bdev_open_ext(), where the string name is provided by the user via an RPC.
To ensure that other consumers do not modify the underlying bdev in an unexpected
way, the virtual bdev should take a claim on the underlying bdev before
reading from or writing to the underlying bdev.
There are two slightly different APIs for taking and releasing claims. The
preferred interface uses `spdk_bdev_module_claim_bdev_desc()`. This method allows
claims that ensure there is a single writer with
`SPDK_BDEV_CLAIM_READ_MANY_WRITE_ONE`, cooperating shared writers with
`SPDK_BDEV_CLAIM_READ_MANY_WRITE_SHARED`, and shared readers that prevent any
writers with `SPDK_BDEV_CLAIM_READ_MANY_WRITE_NONE`. In all cases,
`spdk_bdev_open_ext()` may be used to open the underlying bdev read-only. If a
read-only bdev descriptor successfully claims a bdev with
`SPDK_BDEV_CLAIM_READ_MANY_WRITE_ONE` or `SPDK_BDEV_CLAIM_READ_MANY_WRITE_SHARED`
the bdev descriptor is promoted to read-write.
Any claim that is obtained with `spdk_bdev_module_claim_bdev_desc()` is
automatically released upon closing the bdev descriptor used to obtain the
claim. Shared claims continue to block new incompatible claims and new writers
until the last claim is released.
The non-preferred interface for obtaining a claim allows the caller to obtain
an exclusive writer claim with `spdk_bdev_module_claim_bdev()`. It may be
be released with `spdk_bdev_module_release_bdev()`. If a read-only bdev
descriptor is passed, it is promoted to read-write. NULL may be passed instead
of a bdev descriptor to avoid promotion and to block new writers. New code
should use `spdk_bdev_module_claim_bdev_desc()` with the claim type that is
tailored to the virtual bdev's needs.
The descriptor obtained from the successful spdk_bdev_open_ext() may be used
with spdk_bdev_get_io_channel() to obtain I/O channels for the bdev. This is
likely done in response to the virtual bdev's `get_io_channel` callback.
Channels may be obtained before and/or after claiming the underlying bdev, but
beware there may be other unknown writers until the underlying bdev has been
claimed.
When a virtual bdev module claims an underlying bdev from its `examine_config`
callback, it causes the `examine_disk` callback to only be called for this
module and any others that establish a shared claim. If no claims are taken by
`examine_config` callbacks, all virtual bdevs' `examine_disk` callbacks are
called.

View File

@ -1,147 +0,0 @@
# Block Device Layer Programming Guide {#bdev_pg}
## Target Audience
This programming guide is intended for developers authoring applications that
use the SPDK bdev library to access block devices.
## Introduction
A block device is a storage device that supports reading and writing data in
fixed-size blocks. These blocks are usually 512 or 4096 bytes. The
devices may be logical constructs in software or correspond to physical
devices like NVMe SSDs.
The block device layer consists of a single generic library in `lib/bdev`,
plus a number of optional modules (as separate libraries) that implement
various types of block devices. The public header file for the generic library
is bdev.h, which is the entirety of the API needed to interact with any type
of block device. This guide will cover how to interact with bdevs using that
API. For a guide to implementing a bdev module, see @ref bdev_module.
The bdev layer provides a number of useful features in addition to providing a
common abstraction for all block devices:
- Automatic queueing of I/O requests in response to queue full or out-of-memory conditions
- Hot remove support, even while I/O traffic is occurring.
- I/O statistics such as bandwidth and latency
- Device reset support and I/O timeout tracking
## Basic Primitives
Users of the bdev API interact with a number of basic objects.
struct spdk_bdev, which this guide will refer to as a *bdev*, represents a
generic block device. struct spdk_bdev_desc, heretofore called a *descriptor*,
represents a handle to a given block device. Descriptors are used to establish
and track permissions to use the underlying block device, much like a file
descriptor on UNIX systems. Requests to the block device are asynchronous and
represented by spdk_bdev_io objects. Requests must be submitted on an
associated I/O channel. The motivation and design of I/O channels is described
in @ref concurrency.
Bdevs can be layered, such that some bdevs service I/O by routing requests to
other bdevs. This can be used to implement caching, RAID, logical volume
management, and more. Bdevs that route I/O to other bdevs are often referred
to as virtual bdevs, or *vbdevs* for short.
## Initializing The Library
The bdev layer depends on the generic message passing infrastructure
abstracted by the header file include/spdk/thread.h. See @ref concurrency for a
full description. Most importantly, calls into the bdev library may only be
made from threads that have been allocated with SPDK by calling
spdk_thread_create().
From an allocated thread, the bdev library may be initialized by calling
spdk_bdev_initialize(), which is an asynchronous operation. Until the completion
callback is called, no other bdev library functions may be invoked. Similarly,
to tear down the bdev library, call spdk_bdev_finish().
## Discovering Block Devices
All block devices have a simple string name. At any time, a pointer to the
device object can be obtained by calling spdk_bdev_get_by_name(), or the entire
set of bdevs may be iterated using spdk_bdev_first() and spdk_bdev_next() and
their variants or spdk_for_each_bdev() and its variant.
Some block devices may also be given aliases, which are also string names.
Aliases behave like symlinks - they can be used interchangeably with the real
name to look up the block device.
## Preparing To Use A Block Device
In order to send I/O requests to a block device, it must first be opened by
calling spdk_bdev_open_ext(). This will return a descriptor. Multiple users may have
a bdev open at the same time, and coordination of reads and writes between
users must be handled by some higher level mechanism outside of the bdev
layer. Opening a bdev with write permission may fail if a virtual bdev module
has *claimed* the bdev. Virtual bdev modules implement logic like RAID or
logical volume management and forward their I/O to lower level bdevs, so they
mark these lower level bdevs as claimed to prevent outside users from issuing
writes.
When a block device is opened, a callback and context must be provided that
will be called with appropriate spdk_bdev_event_type enum as an argument when
the bdev triggers asynchronous event such as bdev removal. For example,
the callback will be called on each open descriptor for a bdev backed by
a physical NVMe SSD when the NVMe SSD is hot-unplugged. In this case
the callback can be thought of as a request to close the open descriptor so
other memory may be freed. A bdev cannot be torn down while open descriptors
exist, so it is required that a callback is provided.
When a user is done with a descriptor, they may release it by calling
spdk_bdev_close().
Descriptors may be passed to and used from multiple threads simultaneously.
However, for each thread a separate I/O channel must be obtained by calling
spdk_bdev_get_io_channel(). This will allocate the necessary per-thread
resources to submit I/O requests to the bdev without taking locks. To release
a channel, call spdk_put_io_channel(). A descriptor cannot be closed until
all associated channels have been destroyed.
## Sending I/O
Once a descriptor and a channel have been obtained, I/O may be sent by calling
the various I/O submission functions such as spdk_bdev_read(). These calls each
take a callback as an argument which will be called some time later with a
handle to an spdk_bdev_io object. In response to that completion, the user
must call spdk_bdev_free_io() to release the resources. Within this callback,
the user may also use the functions spdk_bdev_io_get_nvme_status() and
spdk_bdev_io_get_scsi_status() to obtain error information in the format of
their choosing.
I/O submission is performed by calling functions such as spdk_bdev_read() or
spdk_bdev_write(). These functions take as an argument a pointer to a region of
memory or a scatter gather list describing memory that will be transferred to
the block device. This memory must be allocated through spdk_dma_malloc() or
its variants. For a full explanation of why the memory must come from a
special allocation pool, see @ref memory. Where possible, data in memory will
be *directly transferred to the block device* using
[Direct Memory Access](https://en.wikipedia.org/wiki/Direct_memory_access).
That means it is not copied.
All I/O submission functions are asynchronous and non-blocking. They will not
block or stall the thread for any reason. However, the I/O submission
functions may fail in one of two ways. First, they may fail immediately and
return an error code. In that case, the provided callback will not be called.
Second, they may fail asynchronously. In that case, the associated
spdk_bdev_io will be passed to the callback and it will report error
information.
Some I/O request types are optional and may not be supported by a given bdev.
To query a bdev for the I/O request types it supports, call
spdk_bdev_io_type_supported().
## Resetting A Block Device
In order to handle unexpected failure conditions, the bdev library provides a
mechanism to perform a device reset by calling spdk_bdev_reset(). This will pass
a message to every other thread for which an I/O channel exists for the bdev,
pause it, then forward a reset request to the underlying bdev module and wait
for completion. Upon completion, the I/O channels will resume and the reset
will complete. The specific behavior inside the bdev module is
module-specific. For example, NVMe devices will delete all queue pairs,
perform an NVMe reset, then recreate the queue pairs and continue. Most
importantly, regardless of device type, *all I/O outstanding to the block
device will be completed prior to the reset completing.*

View File

@ -1,87 +0,0 @@
# bdevperf {#bdevperf}
## Introduction
bdevperf is an SPDK application used for performance testing
of block devices (bdevs) exposed by the SPDK bdev layer. It is an
alternative to the SPDK bdev fio plugin for benchmarking SPDK bdevs.
In some cases, bdevperf can provide lower overhead than the fio
plugin, resulting in better performance and efficiency for tests
using a limited number of CPU cores.
bdevperf exposes command line interface that allows to specify
SPDK framework options as well as testing options.
bdevperf also supports a configuration file format similar
to FIO. It allows user to create jobs parameterized by
filename, cpumask, blocksize, queuesize, etc.
## Config file
bdevperf's config file format is similar to FIO.
Below is an example config file that uses all available parameters:
~~~{.ini}
[global]
filename=Malloc0:Malloc1
bs=1024
iosize=256
rw=randrw
rwmixread=90
[A]
cpumask=0xff
[B]
cpumask=[0-128]
filename=Malloc1
[global]
filename=Malloc0
rw=write
[C]
bs=4096
iosize=128
offset=1000000
length=1000000
~~~
Jobs `[A]` `[B]` or `[C]`, inherit default values from `[global]`
section residing above them. So in the example, job `[A]` inherits
`filename` value and uses both `Malloc0` and `Malloc1` bdevs as targets,
job `[B]` overrides its `filename` value and uses `Malloc1` and
job `[C]` inherits value `Malloc0` for its `filename`.
Interaction with CLI arguments is not the same as in FIO however.
If bdevperf receives CLI argument, it overrides values
of corresponding parameter for all `[global]` sections of config file.
So if example config is used, specifying `-q` argument
will make jobs `[A]` and `[B]` use its value.
Below is a full list of supported parameters with descriptions.
Param | Default | Description
--------- | ----------------- | -----------
filename | | Bdevs to use, separated by ":"
cpumask | Maximum available | CPU mask. Format is defined at @ref cpu_mask
bs | | Block size (io size)
iodepth | | Queue depth
rwmixread | `50` | Percentage of a mixed workload that should be reads
offset | `0` | Start I/O at the provided offset on the bdev
length | 100% of bdev size | End I/O at `offset`+`length` on the bdev
rw | | Type of I/O pattern
Available rw types:
- read
- randread
- write
- randwrite
- verify
- reset
- unmap
- write_zeroes
- flush
- rw
- randrw

View File

@ -1,525 +1,275 @@
# Blobstore Programmer's Guide {#blob}
# Blobstore {#blob}
## In this document {#blob_pg_toc}
## Introduction
* @ref blob_pg_audience
* @ref blob_pg_intro
* @ref blob_pg_theory
* @ref blob_pg_design
* @ref blob_pg_examples
* @ref blob_pg_config
* @ref blob_pg_component
The blobstore is a persistent, power-fail safe block allocator designed to be
used as the local storage system backing a higher level storage service,
typically in lieu of a traditional filesystem. These higher level services can
be local databases or key/value stores (MySQL, RocksDB), they can be dedicated
appliances (SAN, NAS), or distributed storage systems (ex. Ceph, Cassandra). It
is not designed to be a general purpose filesystem, however, and it is
intentionally not POSIX compliant. To avoid confusion, no reference to files or
objects will be made at all, instead using the term 'blob'. The blobstore is
designed to allow asynchronous, uncached, parallel reads and writes to groups
of blocks on a block device called 'blobs'. Blobs are typically large,
measured in at least hundreds of kilobytes, and are always a multiple of the
underlying block size.
## Target Audience {#blob_pg_audience}
The blobstore is designed primarily to run on "next generation" media, which
means the device supports fast random reads _and_ writes, with no required
background garbage collection. However, in practice the design will run well on
NAND too. Absolutely no attempt will be made to make this efficient on spinning
media.
The programmer's guide is intended for developers authoring applications that utilize the SPDK Blobstore. It is
intended to supplement the source code in providing an overall understanding of how to integrate Blobstore into
an application as well as provide some high level insight into how Blobstore works behind the scenes. It is not
intended to serve as a design document or an API reference and in some cases source code snippets and high level
sequences will be discussed; for the latest source code reference refer to the [repo](https://github.com/spdk).
## Design Goals
## Introduction {#blob_pg_intro}
The blobstore is intended to solve a number of problems that local databases
have when using traditional POSIX filesystems. These databases are assumed to
'own' the entire storage device, to not need to track access times, and to
require only a very simple directory hierarchy. These assumptions allow
significant design optimizations over a traditional POSIX filesystem and block
stack.
Blobstore is a persistent, power-fail safe block allocator designed to be used as the local storage system
backing a higher level storage service, typically in lieu of a traditional filesystem. These higher level services
can be local databases or key/value stores (MySQL, RocksDB), they can be dedicated appliances (SAN, NAS), or
distributed storage systems (ex. Ceph, Cassandra). It is not designed to be a general purpose filesystem, however,
and it is intentionally not POSIX compliant. To avoid confusion, we avoid references to files or objects instead
using the term 'blob'. The Blobstore is designed to allow asynchronous, uncached, parallel reads and writes to
groups of blocks on a block device called 'blobs'. Blobs are typically large, measured in at least hundreds of
kilobytes, and are always a multiple of the underlying block size.
Asynchronous I/O can be an order of magnitude or more faster than synchronous
I/O, and so solutions like
[libaio](https://git.fedorahosted.org/cgit/libaio.git/) have become popular.
However, libaio is [not actually
asynchronous](http://www.scylladb.com/2016/02/09/qualifying-filesystems/) in
all cases. The blobstore will provide truly asynchronous operations in all
cases without any hidden locks or stalls.
The Blobstore is designed primarily to run on "next generation" media, which means the device supports fast random
reads and writes, with no required background garbage collection. However, in practice the design will run well on
NAND too.
With the advent of NVMe, storage devices now have a hardware interface that
allows for highly parallel I/O submission from many threads with no locks.
Unfortunately, placement of data on a device requires some central coordination
to avoid conflicts. The blobstore will separate operations that require
coordination from operations that do not, and allow users to explictly
associate I/O with channels. Operations on different channels happen in
parallel, all the way down to the hardware, with no locks or coordination.
## Theory of Operation {#blob_pg_theory}
As media access latency improves, strategies for in-memory caching are changing
and often the kernel page cache is a bottleneck. Many databases have moved to
opening files only in O_DIRECT mode, avoiding the page cache entirely, and
writing their own caching layer. With the introduction of next generation media
and its additional expected latency reductions, this strategy will become far
more prevalent. To support this, the blobstore will perform no in-memory
caching of data at all, essentially making all blob operations conceptually
equivalent to O_DIRECT. This means the blobstore has similar restrictions to
O_DIRECT where data can only be read or written in units of pages (4KiB),
although memory alignment requirements are much less strict than O_DIRECT (the
pages can even be composed of scattered buffers). We fully expect that DRAM
caching will remain critical to performance, but leave the specifics of the
cache design to higher layers.
### Abstractions
Storage devices pull data from host memory using a DMA engine, and those DMA
engines operate on physical addresses and often introduce alignment
restrictions. Further, to avoid data corruption, the data must not be paged out
by the operating system while it is being transferred to disk. Traditionally,
operating systems solve this problem either by copying user data into special
kernel buffers that were allocated for this purpose and the I/O operations are
performed to/from there, or taking locks to mark all user pages as locked and
unmovable. Historically, the time to perform the copy or locking was
inconsequential relative to the I/O time at the storage device, but that is
simply no longer the case. The blobstore will instead provide zero copy,
lockless read and write access to the device. To do this, memory to be used for
blob data must be registered with the blobstore up front, preferably at
application start and out of the I/O path, so that it can be pinned, the
physical addresses can be determined, and the alignment requirements can be
verified.
The Blobstore defines a hierarchy of storage abstractions as follows.
Hardware devices are necessarily limited to some maximum queue depth. For NVMe
devices that can be quite large (the spec allows up to 64k!), but is typically
much smaller (128 - 1024 per queue). Under heavy load, databases may generate
enough requests to exceed the hardware queue depth, which requires queueing in
software. For operating systems this is often done in the generic block layer
and may cause unexpected stalls or require locks. The blobstore will avoid this
by simply failing requests with an appropriate error code when the queue is
full. This allows the blobstore to easily stick to its commitment to never
block, but may require the user to provide their own queueing layer.
* **Logical Block**: Logical blocks are exposed by the disk itself, which are numbered from 0 to N, where N is the
number of blocks in the disk. A logical block is typically either 512B or 4KiB.
* **Page**: A page is defined to be a fixed number of logical blocks defined at Blobstore creation time. The logical
blocks that compose a page are always contiguous. Pages are also numbered from the beginning of the disk such
that the first page worth of blocks is page 0, the second page is page 1, etc. A page is typically 4KiB in size,
so this is either 8 or 1 logical blocks in practice. The SSD must be able to perform atomic reads and writes of
at least the page size.
* **Cluster**: A cluster is a fixed number of pages defined at Blobstore creation time. The pages that compose a cluster
are always contiguous. Clusters are also numbered from the beginning of the disk, where cluster 0 is the first cluster
worth of pages, cluster 1 is the second grouping of pages, etc. A cluster is typically 1MiB in size, or 256 pages.
* **Blob**: A blob is an ordered list of clusters. Blobs are manipulated (created, sized, deleted, etc.) by the application
and persist across power failures and reboots. Applications use a Blobstore provided identifier to access a particular blob.
Blobs are read and written in units of pages by specifying an offset from the start of the blob. Applications can also
store metadata in the form of key/value pairs with each blob which we'll refer to as xattrs (extended attributes).
* **Blobstore**: An SSD which has been initialized by a Blobstore-based application is referred to as "a Blobstore." A
Blobstore owns the entire underlying device which is made up of a private Blobstore metadata region and the collection of
blobs as managed by the application.
## The Basics
```text
+-----------------------------------------------------------------+
| Blob |
| +-----------------------------+ +-----------------------------+ |
| | Cluster | | Cluster | |
| | +----+ +----+ +----+ +----+ | | +----+ +----+ +----+ +----+ | |
| | |Page| |Page| |Page| |Page| | | |Page| |Page| |Page| |Page| | |
| | +----+ +----+ +----+ +----+ | | +----+ +----+ +----+ +----+ | |
| +-----------------------------+ +-----------------------------+ |
+-----------------------------------------------------------------+
```
The blobstore defines a hierarchy of three units of disk space. The smallest are
the *logical blocks* exposed by the disk itself, which are numbered from 0 to N,
where N is the number of blocks in the disk. A logical block is typically
either 512B or 4KiB.
### Atomicity
The blobstore defines a *page* to be a fixed number of logical blocks defined
at blobstore creation time. The logical blocks that compose a page are
contiguous. Pages are also numbered from the beginning of the disk such that
the first page worth of blocks is page 0, the second page is page 1, etc. A
page is typically 4KiB in size, so this is either 8 or 1 logical blocks in
practice. The device must be able to perform atomic reads and writes of at
least the page size.
For all Blobstore operations regarding atomicity, there is a dependency on the underlying device to guarantee atomic
operations of at least one page size. Atomicity here can refer to multiple operations:
The largest unit is a *cluster*, which is a fixed number of pages defined at
blobstore creation time. The pages that compose a cluster are contiguous.
Clusters are also numbered from the beginning of the disk, where cluster 0 is
the first cluster worth of pages, cluster 1 is the second grouping of pages,
etc. A cluster is typically 1MiB in size, or 256 pages.
* **Data Writes**: For the case of data writes, the unit of atomicity is one page. Therefore if a write operation of
greater than one page is underway and the system suffers a power failure, the data on media will be consistent at a page
size granularity (if a single page were in the middle of being updated when power was lost, the data at that page location
will be as it was prior to the start of the write operation following power restoration.)
* **Blob Metadata Updates**: Each blob has its own set of metadata (xattrs, size, etc). For performance reasons, a copy of
this metadata is kept in RAM and only synchronized with the on-disk version when the application makes an explicit call to
do so, or when the Blobstore is unloaded. Therefore, setting of an xattr, for example is not consistent until the call to
synchronize it (covered later) which is, however, performed atomically.
* **Blobstore Metadata Updates**: Blobstore itself has its own metadata which, like per blob metadata, has a copy in both
RAM and on-disk. Unlike the per blob metadata, however, the Blobstore metadata region is not made consistent via a blob
synchronization call, it is only synchronized when the Blobstore is properly unloaded via API. Therefore, if the Blobstore
metadata is updated (blob creation, deletion, resize, etc.) and not unloaded properly, it will need to perform some extra
steps the next time it is loaded which will take a bit more time than it would have if shutdown cleanly, but there will be
no inconsistencies.
On top of these three basic units, the blobstore defines three primitives. The
most fundamental is the blob, where a blob is an ordered list of clusters plus
an identifier. Blobs persist across power failures and reboots. The set of all
blobs described by shared metadata is called the blobstore. I/O operations on
blobs are submitted through a channel. Channels are tied to threads, but
multiple threads can simultaneously submit I/O operations to the same blob on
their own channels.
### Callbacks
Blobs are read and written in units of pages by specifying an offset in the
virtual blob address space. This offset is translated by first determining
which cluster(s) are being accessed, and then translating to a set of logical
blocks. This translation is done trivially using only basic math - there is no
mapping data structure. Unlike read and write, blobs are resized in units of
clusters.
Blobstore is callback driven; in the event that any Blobstore API is unable to make forward progress it will
not block but instead return control at that point and make a call to the callback function provided in the API, along with
arguments, when the original call is completed. The callback will be made on the same thread that the call was made from, more on
threads later. Some API, however, offer no callback arguments; in these cases the calls are fully synchronous. Examples of
asynchronous calls that utilize callbacks include those that involve disk IO, for example, where some amount of polling
is required before the IO is completed.
Blobs are described by their metadata which consists of a discontiguous set of
pages stored in a reserved region on the disk. Each page of metadata is
referred to as a *metadata page*. Blobs do not share metadata pages with other
blobs, and in fact the design relies on the backing storage device supporting
an atomic write unit greater than or equal to the page size. Most devices
backed by NAND and next generation media support this atomic write capability,
but often magnetic media does not.
### Backend Support
The metadata region is fixed in size and defined upon creation of the
blobstore. The size is configurable, but by default one page is allocated for
each cluster. For 1MiB clusters and 4KiB pages, that results in 0.4% metadata
overhead.
Blobstore requires a backing storage device that can be integrated using the `bdev` layer, or by directly integrating a
device driver to Blobstore. The blobstore performs operations on a backing block device by calling function pointers
supplied to it at initialization time. For convenience, an implementation of these function pointers that route I/O
to the bdev layer is available in `bdev_blob.c`. Alternatively, for example, the SPDK NVMe driver may be directly integrated
bypassing a small amount of `bdev` layer overhead. These options will be discussed further in the upcoming section on examples.
## Conventions
### Metadata Operations
Data formats on the device are specified in [Backus-Naur
Form](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form). All data is
stored on media in little-endian format. Unspecified data must be zeroed.
Because Blobstore is designed to be lock-free, metadata operations need to be isolated to a single
thread to avoid taking locks on in memory data structures that maintain data on the layout of definitions of blobs (along
with other data). In Blobstore this is implemented as `the metadata thread` and is defined to be the thread on which the
application makes metadata related calls on. It is up to the application to setup a separate thread to make these calls on
and to assure that it does not mix relevant IO operations with metadata operations even if they are on separate threads.
This will be discussed further in the Design Considerations section.
## Media Format
### Threads
The blobstore owns the entire storage device. The device is divided into
clusters starting from the beginning, such that cluster 0 begins at the first
logical block.
An application using Blobstore with the SPDK NVMe driver, for example, can support a variety of thread scenarios.
The simplest would be a single threaded application where the application, the Blobstore code and the NVMe driver share a
single core. In this case, the single thread would be used to submit both metadata operations as well as IO operations and
it would be up to the application to assure that only one metadata operation is issued at a time and not intermingled with
affected IO operations.
### Channels
Channels are an SPDK-wide abstraction and with Blobstore the best way to think about them is that they are
required in order to do IO. The application will perform IO to the channel and channels are best thought of as being
associated 1:1 with a thread.
With external snapshots (see @ref blob_pg_esnap_and_esnap_clone), a read from a blob may lead to
reading from the device containing the blobstore or an external snapshot device. To support this,
each blobstore IO channel maintains a tree of channels to be used when reading from external
snapshot devices.
### Blob Identifiers
When an application creates a blob, it does not provide a name as is the case with many other similar
storage systems, instead it is returned a unique identifier by the Blobstore that it needs to use on subsequent APIs to
perform operations on the Blobstore.
## Design Considerations {#blob_pg_design}
### Initialization Options
When the Blobstore is initialized, there are multiple configuration options to consider. The
options and their defaults are:
* **Cluster Size**: By default, this value is 1MB. The cluster size is required to be a multiple of page size and should be
selected based on the applications usage model in terms of allocation. Recall that blobs are made up of clusters so when
a blob is allocated/deallocated or changes in size, disk LBAs will be manipulated in groups of cluster size. If the
application is expecting to deal with mainly very large (always multiple GB) blobs then it may make sense to change the
cluster size to 1GB for example.
* **Number of Metadata Pages**: By default, Blobstore will assume there can be as many clusters as there are metadata pages
which is the worst case scenario in terms of metadata usage and can be overridden here however the space efficiency is
not significant.
* **Maximum Simultaneous Metadata Operations**: Determines how many internally pre-allocated memory structures are set
aside for performing metadata operations. It is unlikely that changes to this value (default 32) would be desirable.
* **Maximum Simultaneous Operations Per Channel**: Determines how many internally pre-allocated memory structures are set
aside for channel operations. Changes to this value would be application dependent and best determined by both a knowledge
of the typical usage model, an understanding of the types of SSDs being used and empirical data. The default is 512.
* **Blobstore Type**: This field is a character array to be used by applications that need to identify whether the
Blobstore found here is appropriate to claim or not. The default is NULL and unless the application is being deployed in
an environment where multiple applications using the same disks are at risk of inadvertently using the wrong Blobstore, there
is no need to set this value. It can, however, be set to any valid set of characters.
* **External Snapshot Device Creation Callback**: If the blobstore supports external snapshots this function will be called
as a blob that clones an external snapshot (an "esnap clone") is opened so that the blobstore consumer can load the external
snapshot and register a blobstore device that will satisfy read requests. See @ref blob_pg_esnap_and_esnap_clone.
### Sub-page Sized Operations
Blobstore is only capable of doing page sized read/write operations. If the application
requires finer granularity it will have to accommodate that itself.
### Threads
As mentioned earlier, Blobstore can share a single thread with an application or the application
can define any number of threads, within resource constraints, that makes sense. The basic considerations that must be
followed are:
* Metadata operations (API with MD in the name) should be isolated from each other as there is no internal locking on the
memory structures affected by these API.
* Metadata operations should be isolated from conflicting IO operations (an example of a conflicting IO would be one that is
reading/writing to an area of a blob that a metadata operation is deallocating).
* Asynchronous callbacks will always take place on the calling thread.
* No assumptions about IO ordering can be made regardless of how many or which threads were involved in the issuing.
### Data Buffer Memory
As with all SPDK based applications, Blobstore requires memory used for data buffers to be allocated
with SPDK API.
### Error Handling
Asynchronous Blobstore callbacks all include an error number that should be checked; non-zero values
indicate an error. Synchronous calls will typically return an error value if applicable.
### Asynchronous API
Asynchronous callbacks will return control not immediately, but at the point in execution where no
more forward progress can be made without blocking. Therefore, no assumptions can be made about the progress of
an asynchronous call until the callback has completed.
### Xattrs
Setting and removing of xattrs in Blobstore is a metadata operation, xattrs are stored in per blob metadata.
Therefore, xattrs are not persisted until a blob synchronization call is made and completed. Having a step process for
persisting per blob metadata allows for applications to perform batches of xattr updates, for example, with only one
more expensive call to synchronize and persist the values.
### Synchronizing Metadata
As described earlier, there are two types of metadata in Blobstore, per blob and one global
metadata for the Blobstore itself. Only the per blob metadata can be explicitly synchronized via API. The global
metadata will be inconsistent during run-time and only synchronized on proper shutdown. The implication, however, of
an improper shutdown is only a performance penalty on the next startup as the global metadata will need to be rebuilt
based on a parsing of the per blob metadata. For consistent start times, it is important to always close down the Blobstore
properly via API.
### Iterating Blobs
Multiple examples of how to iterate through the blobs are included in the sample code and tools.
Worthy to note, however, if walking through the existing blobs via the iter API, if your application finds the blob its
looking for it will either need to explicitly close it (because was opened internally by the Blobstore) or complete walking
the full list.
### The Super Blob
The super blob is simply a single blob ID that can be stored as part of the global metadata to act
as sort of a "root" blob. The application may choose to use this blob to store any information that it needs or finds
relevant in understanding any kind of structure for what is on the Blobstore.
## Examples {#blob_pg_examples}
There are multiple examples of Blobstore usage in the [repo](https://github.com/spdk/spdk):
* **Hello World**: Actually named `hello_blob.c` this is a very basic example of a single threaded application that
does nothing more than demonstrate the very basic API. Although Blobstore is optimized for NVMe, this example uses
a RAM disk (malloc) back-end so that it can be executed easily in any development environment. The malloc back-end
is a `bdev` module thus this example uses not only the SPDK Framework but the `bdev` layer as well.
* **CLI**: The `blobcli.c` example is command line utility intended to not only serve as example code but as a test
and development tool for Blobstore itself. It is also a simple single threaded application that relies on both the
SPDK Framework and the `bdev` layer but offers multiple modes of operation to accomplish some real-world tasks. In
command mode, it accepts single-shot commands which can be a little time consuming if there are many commands to
get through as each one will take a few seconds waiting for DPDK initialization. It therefore has a shell mode that
allows the developer to get to a `blob>` prompt and then very quickly interact with Blobstore with simple commands
that include the ability to import/export blobs from/to regular files. Lastly there is a scripting mode to automate
a series of tasks, again, handy for development and/or test type activities.
## Configuration {#blob_pg_config}
Blobstore configuration options are described in the initialization options section under @ref blob_pg_design.
## Component Detail {#blob_pg_component}
The information in this section is not necessarily relevant to designing an application for use with Blobstore, but
understanding a little more about the internals may be interesting and is also included here for those wanting to
contribute to the Blobstore effort itself.
### Media Format
The Blobstore owns the entire storage device. The device is divided into clusters starting from the beginning, such
that cluster 0 begins at the first logical block.
```text
LBA 0 LBA N
+-----------+-----------+-----+-----------+
| Cluster 0 | Cluster 1 | ... | Cluster N |
+-----------+-----------+-----+-----------+
```
Cluster 0 is special and has the following format, where page 0 is the first page of the cluster:
Or in formal notation:
<media-format> ::= <cluster0> <cluster>*
Cluster 0 is special and has the following format, where page 0
is the first page of the cluster:
```text
+--------+-------------------+
| Page 0 | Page 1 ... Page N |
+--------+-------------------+
| Super | Metadata Region |
| Block | |
+--------+-------------------+
```
The super block is a single page located at the beginning of the partition. It contains basic information about
the Blobstore. The metadata region is the remainder of cluster 0 and may extend to additional clusters. Refer
to the latest source code for complete structural details of the super block and metadata region.
Or formally:
Each blob is allocated a non-contiguous set of pages inside the metadata region for its metadata. These pages
form a linked list. The first page in the list will be written in place on update, while all other pages will
be written to fresh locations. This requires the backing device to support an atomic write size greater than
or equal to the page size to guarantee that the operation is atomic. See the section on atomicity for details.
<cluster0> ::= <super-block> <metadata-region>
### Blob cluster layout {#blob_pg_cluster_layout}
The super block is a single page located at the beginning of the partition.
It contains basic information about the blobstore. The metadata region
is the remainder of cluster 0 and may extend to additional clusters.
Each blob is an ordered list of clusters, where starting LBA of a cluster is called extent. A blob can be
thin provisioned, resulting in no extent for some of the clusters. When first write operation occurs
to the unallocated cluster - new extent is chosen. This information is stored in RAM and on-disk.
<super-block> ::= <sb-version> <sb-len> <sb-super-blob> <sb-params>
<sb-metadata-start> <sb-metadata-len>
<sb-blobid-start> <sb-blobid-len> <crc>
<sb-version> ::= u32
<sb-len> ::= u32 # Length of this super block, in bytes. Starts from the
# beginning of this structure.
<sb-super-blob> ::= u64 # Special blobid set by the user that indicates where
# their starting metadata resides.
There are two extent representations on-disk, dependent on `use_extent_table` (default:true) opts used
when creating a blob.
<sb-md-start> ::= u64 # Metadata start location, in pages
<sb-md-len> ::= u64 # Metadata length, in pages
<sb-blobid-start> ::= u32 # Start of bitmask of valid blobids (in pages)
<sb-blobid-len> ::= u32 # Lenget of bitmask of valid blobids (in pages)
<crc> ::= u32 # Crc for super block
* **use_extent_table=true**: EXTENT_PAGE descriptor is not part of linked list of pages. It contains extents
that are not run-length encoded. Each extent page is referenced by EXTENT_TABLE descriptor, which is serialized
as part of linked list of pages. Extent table is run-length encoding all unallocated extent pages.
Every new cluster allocation updates a single extent page, in case when extent page was previously allocated.
Otherwise additionally incurs serializing whole linked list of pages for the blob.
The `<sb-params>` data contains parameters specified by the user when the blob
store was initially formatted.
* **use_extent_table=false**: EXTENT_RLE descriptor is serialized as part of linked list of pages.
Extents pointing to contiguous LBA are run-length encoded, including unallocated extents represented by 0.
Every new cluster allocation incurs serializing whole linked list of pages for the blob.
<sb-params> ::= <sb-page-size> <sb-cluster-size> <sb-bs-type>
<sb-page-size> ::= u32 # page size, in bytes.
# Must be a multiple of the logical block size.
# The implementation today requires this to be 4KiB.
<sb-cluster-size> ::= u32 # Cluster size, in bytes.
# Must be a multiple of the page size.
<sb-bs-type> ::= char[16] # Blobstore type
### Thin Blobs, Snapshots, and Clones
Each blob is allocated a non-contiguous set of pages inside the metadata region
for its metadata. These pages form a linked list. The first page in the list
will be written in place on update, while all other pages will be written to
fresh locations. This requires the backing device to support an atomic write
size greater than or equal to the page size to guarantee that the operation is
atomic. See the section on atomicity for details.
Each in-use cluster is allocated to blobstore metadata or to a particular blob. Once a cluster is
allocated to a blob it is considered owned by that blob and that particular blob's metadata
maintains a reference to the cluster as a record of ownership. Cluster ownership is transferred
during snapshot operations described later in @ref blob_pg_snapshots.
Each page is defined as:
Through the use of thin provisioning, snapshots, and/or clones, a blob may be backed by clusters it
owns, clusters owned by another blob, or by a zeroes device. The behavior of reads and writes depend
on whether the operation targets blocks that are backed by a cluster owned by the blob or not.
<metadata-page> ::= <blob-id> <blob-sequence-num> <blob-descriptor>*
<blob-next> <blob-crc>
<blob-id> ::= u64 # The blob guid
<blob-sequence-num> ::= u32 # The sequence number of this page in the linked
# list.
* **read from blocks on an owned cluster**: The read is serviced by reading directly from the
appropriate cluster.
* **read from other blocks**: The read is passed on to the blob's *back device* and the back
device services the read. The back device may be another blob or it may be a zeroes device.
* **write to blocks on an owned cluster**: The write is serviced by writing directly to the
appropriate cluster.
* **write to thin provisioned cluster**: If the back device is the zeroes device and no cluster
is allocated to the blob the process described in @ref blob_pg_thin_provisioning is followed.
* **write to other blocks**: A copy-on-write operation is triggered. See @ref blob_pg_copy_on_write
for details.
<blob-descriptor> ::= <blob-descriptor-type> <blob-descriptor-length>
<blob-descriptor-data>
<blob-descriptor-type> ::= u8 # 0 means padding, 1 means "extent", 2 means
# xattr, 3 means flags. The type
# describes how to interpret the descriptor data.
<blob-descriptor-length> ::= u32 # Length of the entire descriptor
External snapshots allow some external data source to act as a snapshot. This allows clones to be
created of data that resides outside of the blobstore containing the clone.
<blob-descriptor-data-padding> ::= u8
#### Thin Provisioning {#blob_pg_thin_provisioning}
<blob-descriptor-data-extent> ::= <extent-cluster-id> <extent-cluster-count>
<extent-cluster-id> ::= u32 # The cluster id where this extent starts
<extent-cluster-count> ::= u32 # The number of clusters in this extent
As mentioned in @ref blob_pg_cluster_layout, a blob may be thin provisioned. A thin provisioned blob
starts out with no allocated clusters. Clusters are allocated as writes occur. A thin provisioned
blob's back device is a *zeroes device*. A read from a zeroes device fills the read buffer with
zeroes.
<blob-descriptor-data-xattr> ::= <xattr-name-length> <xattr-value-length>
<xattr-name> <xattr-value>
<xattr-name-length> ::= u16
<xattr-value-length> ::= u16
<xattr-name> ::= u8*
<xattr-value> ::= u8*
When a thin provisioned volume writes to a block that does not have an allocated cluster, the
following steps are performed:
<blob-descriptor-data-flags> ::= <flags-invalid> <flags-data-ro> <flags-md-ro>
1. Allocate a cluster.
2. Update blob metadata.
3. Perform the write.
<flags-invalid> ::= u64
<flags-data-ro> ::= u64
<flags-md-ro> ::= u64
#### Snapshots and Clones {#blob_pg_snapshots}
<blob-next> ::= u32 # The offset into the metadata region that contains the
# next page of metadata. 0 means no next page.
<blob-crc> ::= u32 # CRC of the entire page
A snapshot is a read-only blob that may have clones. A snapshot may itself be a clone of one other
blob. While the interface gives the illusion of being able to create many snapshots of a blob, under
the covers this results in a chain of snapshots that are clones of the previous snapshot.
When blob1 is snapshotted, a new read-only blob is created and blob1 becomes a clone of this new
blob. That is:
Descriptors cannot span metadata pages.
| Step | Action | State |
| ---- | ------------------------------ | ------------------------------------------------- |
| 1 | Create blob1 | `blob1 (rw)` |
| 2 | Create snapshot blob2 of blob1 | `blob1 (rw) --> blob2 (ro)` |
| 2a | Write to blob1 | `blob1 (rw) --> blob2 (ro)` |
| 3 | Create snapshot blob3 of blob1 | `blob1 (rw) --> blob3 (ro) ---> blob2 (ro)` |
## Atomicity
Supposing blob1 was not thin provisioned, step 1 would have allocated clusters needed to perform a
full write of blob1. As blob2 is created in step 2, the ownership of all of blob1's clusters is
transferred to blob2 and blob2 becomes blob1's back device. During step2a, the writes to blob1 cause
one or more clusters to be allocated to blob1. When blob3 is created in step 3, the clusters
allocated in step 2a are given to blob3, blob3's back device becomes blob2, and blob1's back device
becomes blob3.
Metadata in the blobstore is cached and must be explicitly synced by the user.
Data is not cached, however, so when a write completes the data can be
considered durable if the metadata is synchronized. Metadata does not often
change, and in fact only must be synchronized after these explicit operations:
It is important to understand the chain above when considering strategies to use a golden image from
which many clones are made. The IO path is more efficient if one snapshot is cloned many times than
it is to create a new snapshot for every clone. The following illustrates the difference.
* resize
* set xattr
* remove xattr
Using a single snapshot means the data originally referenced by the golden image is always one hop
away.
Any other operation will not dirty the metadata. Further, the metadata for each
blob is independent of all of the others, so a synchronization operation is
only needed on the specific blob that is dirty.
```text
create golden golden --> golden-snap
snapshot golden as golden-snap ^ ^ ^
clone golden-snap as clone1 clone1 ---+ | |
clone golden-snap as clone2 clone2 -----+ |
clone golden-snap as clone3 clone3 -------+
```
Using a snapshot per clone means that the chain of back devices grows with every new snapshot and
clone pair. Reading a block from clone3 may result in a read from clone3's back device (snap3), from
clone2's back device (snap2), then finally clone1's back device (snap1, the current owner of the
blocks originally allocated to golden).
```text
create golden
snapshot golden as snap1 golden --> snap3 -----> snap2 ----> snap1
clone snap1 as clone1 clone3----/ clone2 --/ clone1 --/
snapshot golden as snap2
clone snap2 as clone2
snapshot golden as snap3
clone snap3 as clone3
```
A snapshot with no more than one clone can be deleted. When a snapshot with one clone is deleted,
the clone becomes a regular blob. The clusters owned by the snapshot are transferred to the clone or
freed, depending on whether the clone already owns a cluster for a particular block range.
Removal of the last clone leaves the snapshot in place. This snapshot continues to be read-only and
can serve as the snapshot for future clones.
#### Inflating and Decoupling Clones
A clone can remove its dependence on a snapshot with the following operations:
1. Inflate the clone. Clusters backed by any snapshot or a zeroes device are copied into newly
allocated clusters. The blob becomes a thick provisioned blob.
2. Decouple the clone. Clusters backed by the first back device snapshot are copied into newly
allocated clusters. If the clone's back device snapshot was itself a clone of another
snapshot, the clone remains a clone but is now a clone of a different snapshot.
3. Remove the snapshot. This is only possible if the snapshot has one clone. The end result is
usually the same as decoupling but ownership of clusters is transferred from the snapshot rather
than being copied. If the snapshot that was deleted was itself a clone of another snapshot, the
clone remains a clone, but is now a clone of a different snapshot.
#### External Snapshots and Esnap Clones {#blob_pg_esnap_and_esnap_clone}
A blobstore that is loaded with the `esnap_bs_dev_create` callback defined will support external
snapshots (esnaps). An external snapshot is not useful on its own: it needs to be cloned by a blob.
A clone of an external snapshot is referred to as an *esnap clone*. An esnap clone supports IO and
other operations just like any other clone.
An esnap clone can be recognized in various ways:
* **On disk**: the blob metadata has the `SPDK_BLOB_EXTERNAL_SNAPSHOT` (0x8) bit is set in
`invalid_flags` and an internal XATTR with name `BLOB_EXTERNAL_SNAPSHOT_ID` ("EXTSNAP") exists.
* **In memory**: The `spdk_blob` structure contains the metadata read from disk, `blob->parent_id`
is set to `SPDK_BLOBID_EXTERNAL_SNAPSHOT`, and `blob->back_bs_dev` references a blobstore device
which is not a blob in the same blobstore nor a zeroes device.
#### Copy-on-write {#blob_pg_copy_on_write}
A copy-on-write operation is somewhat expensive, with the cost being proportional to the cluster
size. Typical copy-on-write involves the following steps:
1. Allocate a cluster.
2. Allocate a cluster-sized buffer into which data can be read.
3. Trigger a full-cluster read from the back device into the cluster-sized buffer.
4. Write from the cluster-sized buffer into the newly allocated cluster.
5. Update the blob's on-disk metadata to record ownership of the newly allocated cluster. This
involves at least one page-sized write.
6. Write the new data to the just allocated and copied cluster.
If the source cluster is backed by a zeroes device, steps 2 through 4 are skipped. Alternatively, if
the blobstore resides on a device that can perform the copy on its own, steps 2 through 4 are
offloaded to the device. Neither of these optimizations are available when the back device is an
external snapshot.
### Sequences and Batches
Internally Blobstore uses the concepts of sequences and batches to submit IO to the underlying device in either
a serial fashion or in parallel, respectively. Both are defined using the following structure:
~~~{.sh}
struct spdk_bs_request_set;
~~~
These requests sets are basically bookkeeping mechanisms to help Blobstore efficiently deal with related groups
of IO. They are an internal construct only and are pre-allocated on a per channel basis (channels were discussed
earlier). They are removed from a channel associated linked list when the set (sequence or batch) is started and
then returned to the list when completed.
Each request set maintains a reference to a `channel` and a `back_channel`. The `channel` is used
for performing IO on the blobstore device. The `back_channel` is used for performing IO on the
blob's back device, `blob->back_bs_dev`. For blobs that are not esnap clones, `channel` and
`back_channel` reference an IO channel used with the device that contains the blobstore. For blobs
that are esnap clones, `channel` is the same as with any other blob and `back_channel` is an IO
channel for the external snapshot device.
### Key Internal Structures
`blobstore.h` contains many of the key structures for the internal workings of Blobstore. Only a few notable ones
are reviewed here. Note that `blobstore.h` is an internal header file, the header file for Blobstore that defines
the public API is `blob.h`.
~~~{.sh}
struct spdk_blob
~~~
This is an in-memory data structure that contains key elements like the blob identifier, its current state and two
copies of the mutable metadata for the blob; one copy is the current metadata and the other is the last copy written
to disk.
~~~{.sh}
struct spdk_blob_mut_data
~~~
This is a per blob structure, included the `struct spdk_blob` struct that actually defines the blob itself. It has the
specific information on size and makeup of the blob (ie how many clusters are allocated for this blob and which ones.)
~~~{.sh}
struct spdk_blob_store
~~~
This is the main in-memory structure for the entire Blobstore. It defines the global on disk metadata region and maintains
information relevant to the entire system - initialization options such as cluster size, etc.
~~~{.sh}
struct spdk_bs_super_block
~~~
The super block is an on-disk structure that contains all of the relevant information that's in the in-memory Blobstore
structure just discussed along with other elements one would expect to see here such as signature, version, checksum, etc.
### Code Layout and Common Conventions
In general, `Blobstore.c` is laid out with groups of related functions blocked together with descriptive comments. For
example,
~~~{.sh}
/* START spdk_bs_md_delete_blob */
< relevant functions to accomplish the deletion of a blob >
/* END spdk_bs_md_delete_blob */
~~~
And for the most part the following conventions are followed throughout:
* functions beginning with an underscore are called internally only
* functions or variables with the letters `cpl` are related to set or callback completions
The metadata consists of a linked list of pages. Updates to the metadata are
done by first writing page 2 through N to a new location, writing page 1 in
place to atomically update the chain, and then erasing the remainder of the old
chain. The vast majority of the time, blobs consist of just a single metadata
page and so this operation is very efficient. For this scheme to work the write
to the first page must be atomic, which requires hardware support from the
backing device. For most, if not all, NVMe SSDs, an atomic write unit of 4KiB
can be expected. Devices specify their atomic write unit in their NVMe identify
data - specifically in the AWUN field.

View File

@ -1,8 +1,8 @@
# BlobFS (Blobstore Filesystem) {#blobfs}
## BlobFS Getting Started Guide {#blobfs_getting_started}
# BlobFS Getting Started Guide {#blobfs_getting_started}
## RocksDB Integration {#blobfs_rocksdb}
# RocksDB Integration {#blobfs_rocksdb}
Clone and build the SPDK repository as per https://github.com/spdk/spdk
@ -14,30 +14,32 @@ make
~~~
Clone the RocksDB repository from the SPDK GitHub fork into a separate directory.
Make sure you check out the `6.15.fb` branch.
Make sure you check out the `spdk-v5.6.1` branch.
~~~{.sh}
cd ..
git clone -b 6.15.fb https://github.com/spdk/rocksdb.git
git clone -b spdk-v5.6.1 https://github.com/spdk/rocksdb.git
~~~
Build RocksDB. Only the `db_bench` benchmarking tool is integrated with BlobFS.
(Note: add `DEBUG_LEVEL=0` for a release build.)
~~~{.sh}
cd rocksdb
make db_bench SPDK_DIR=relative_path/to/spdk
make db_bench SPDK_DIR=path/to/spdk
~~~
Or you can also add `DEBUG_LEVEL=0` for a release build (need to turn on `USE_RTTI`).
Copy `etc/spdk/rocksdb.conf.in` from the SPDK repository to `/usr/local/etc/spdk/rocksdb.conf`.
~~~{.sh}
export USE_RTTI=1 && make db_bench DEBUG_LEVEL=0 SPDK_DIR=relative_path/to/spdk
cd ../spdk
cp etc/spdk/rocksdb.conf.in /usr/local/etc/spdk/rocksdb.conf
~~~
Create an NVMe section in the configuration file using SPDK's `gen_nvme.sh` script.
Append an NVMe section to the configuration file using SPDK's `gen_nvme.sh` script.
~~~{.sh}
scripts/gen_nvme.sh --json-with-subsystems > /usr/local/etc/spdk/rocksdb.json
scripts/gen_nvme.sh >> /usr/local/etc/spdk/rocksdb.conf
~~~
Verify the configuration file has specified the correct NVMe SSD.
@ -54,7 +56,7 @@ HUGEMEM=5120 scripts/setup.sh
Create an empty SPDK blobfs for testing.
~~~{.sh}
test/blobfs/mkfs/mkfs /usr/local/etc/spdk/rocksdb.json Nvme0n1
test/lib/blobfs/mkfs/mkfs /usr/local/etc/spdk/rocksdb.conf Nvme0n1
~~~
At this point, RocksDB is ready for testing with SPDK. Three `db_bench` parameters are used to configure SPDK:
@ -66,20 +68,20 @@ At this point, RocksDB is ready for testing with SPDK. Three `db_bench` paramet
Default is 4096 (4GB). (Optional)
SPDK has a set of scripts which will run `db_bench` against a variety of workloads and capture performance and profiling
data. The primary script is `test/blobfs/rocksdb/rocksdb.sh`.
data. The primary script is `test/blobfs/rocksdb/run_tests.sh`.
## FUSE
# FUSE
BlobFS provides a FUSE plug-in to mount an SPDK BlobFS as a kernel filesystem for inspection or debug purposes.
The FUSE plug-in requires fuse3 and will be built automatically when fuse3 is detected on the system.
~~~{.sh}
test/blobfs/fuse/fuse /usr/local/etc/spdk/rocksdb.json Nvme0n1 /mnt/fuse
test/lib/blobfs/fuse/fuse /usr/local/etc/spdk/rocksdb.conf Nvme0n1 /mnt/fuse
~~~
Note that the FUSE plug-in has some limitations - see the list below.
## Limitations
# Limitations
* BlobFS has primarily been tested with RocksDB so far, so any use cases different from how RocksDB uses a filesystem
may run into issues. BlobFS will be tested in a broader range of use cases after this initial release.

View File

@ -1,7 +0,0 @@
# CI Tools {#ci_tools}
Section describing tools used by CI to verify integrity of the submitted
patches ([status](https://ci.spdk.io)).
- @subpage shfmt
- @subpage distributions

View File

@ -1,286 +0,0 @@
# SPDK "Reduce" Block Compression Algorithm {#reduce}
## Overview
The SPDK "reduce" block compression scheme is based on using SSDs for storing compressed blocks of
storage and persistent memory for metadata. This metadata includes mappings of logical blocks
requested by a user to the compressed blocks on SSD. The scheme described in this document
is generic and not tied to any specific block device framework such as the SPDK block device (bdev)
framework. This algorithm will be implemented in a library called "libreduce". Higher-level
software modules can built on top of this library to create and present block devices in a
specific block device framework. For SPDK, a bdev_reduce module will serve as a wrapper around
the libreduce library, to present the compressed block devices as an SPDK bdev.
This scheme only describes how compressed blocks are stored on an SSD and the metadata for tracking
those compressed blocks. It relies on the higher-software module to perform the compression
algorithm itself. For SPDK, the bdev_reduce module will utilize the DPDK compressdev framework
to perform compression and decompression on behalf of the libreduce library.
(Note that in some cases, blocks of storage may not be compressible, or cannot be compressed enough
to realize savings from the compression. In these cases, the data may be stored uncompressed on
disk. The phrase "compressed blocks of storage" includes these uncompressed blocks.)
A compressed block device is a logical entity built on top of a similarly-sized backing storage
device. The backing storage device must be thin-provisioned to realize any savings from
compression for reasons described later in this document. This algorithm has no direct knowledge
of the implementation of the backing storage device, except that it will always use the
lowest-numbered blocks available on the backing storage device. This will ensure that when this
algorithm is used on a thin-provisioned backing storage device, blocks will not be allocated until
they are actually needed.
The backing storage device must be sized for the worst case scenario, where no data can be
compressed. In this case, the size of the backing storage device would be the same as the
compressed block device. Since this algorithm ensures atomicity by never overwriting data
in place, some additional backing storage is required to temporarily store data for writes in
progress before the associated metadata is updated.
Storage from the backing storage device will be allocated, read, and written to in 4KB units for
best NVMe performance. These 4KB units are called "backing IO units". They are indexed from 0 to N-1
with the indices called "backing IO unit indices". At start, the full set of indices represent the
"free backing IO unit list".
A compressed block device compresses and decompresses data in units of chunks, where a chunk is a
multiple of at least two 4KB backing IO units. The number of backing IO units per chunk determines
the chunk size and is specified when the compressed block device is created. A chunk
consumes a number of 4KB backing IO units between 1 and the number of 4KB units in the chunk. For
example, a 16KB chunk consumes 1, 2, 3 or 4 backing IO units. The number of backing IO units depends on how
much the chunk was able to be compressed. The blocks on disk associated with a chunk are stored in a
"chunk map" in persistent memory. Each chunk map consists of N 64-bit values, where N is the maximum
number of backing IO units in the chunk. Each 64-bit value corresponds to a backing IO unit index. A
special value (for example, 2^64-1) is used for backing IO units not needed due to compression. The
number of chunk maps allocated is equal to the size of the compressed block device divided by its chunk
size, plus some number of extra chunk maps. These extra chunk maps are used to ensure atomicity on
writes and will be explained later in this document. At start, all of the chunk maps represent the
"free chunk map list".
Finally, the logical view of the compressed block device is represented by the "logical map". The
logical map is a mapping of chunk offsets into the compressed block device to the corresponding
chunk map. Each entry in the logical map is a 64-bit value, denoting the associated chunk map.
A special value (UINT64_MAX) is used if there is no associated chunk map. The mapping is
determined by dividing the byte offset by the chunk size to get an index, which is used as an
array index into the array of chunk map entries. At start, all entries in the logical map have no
associated chunk map. Note that while access to the backing storage device is in 4KB units, the
logical view may allow 4KB or 512B unit access and should perform similarly.
## Example
To illustrate this algorithm, we will use a real example at a very small scale.
The size of the compressed block device is 64KB, with a chunk size of 16KB. This will
realize the following:
* "Backing storage" will consist of an 80KB thin-provisioned logical volume. This
corresponds to the 64KB size of the compressed block device, plus an extra 16KB to handle
additional write operations under a worst-case compression scenario.
* "Free backing IO unit list" will consist of indices 0 through 19 (inclusive). These represent
the 20 4KB IO units in the backing storage.
* A "chunk map" will be 32 bytes in size. This corresponds to 4 backing IO units per chunk
(16KB / 4KB), and 8B (64b) per backing IO unit index.
* 5 chunk maps will be allocated in 160B of persistent memory. This corresponds to 4 chunk maps
for the 4 chunks in the compressed block device (64KB / 16KB), plus an extra chunk map for use
when overwriting an existing chunk.
* "Free chunk map list" will consist of indices 0 through 4 (inclusive). These represent the
5 allocated chunk maps.
* The "logical map" will be allocated in 32B of persistent memory. This corresponds to
4 entries for the 4 chunks in the compressed block device and 8B (64b) per entry.
In these examples, the value "X" will represent the special value (2^64-1) described above.
### Initial Creation
```text
+--------------------+
Backing Device | |
+--------------------+
Free Backing IO Unit List 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
+------------+------------+------------+------------+------------+
Chunk Maps | | | | | |
+------------+------------+------------+------------+------------+
Free Chunk Map List 0, 1, 2, 3, 4
+---+---+---+---+
Logical Map | X | X | X | X |
+---+---+---+---+
```
### Write 16KB at Offset 32KB
* Find the corresponding index into the logical map. Offset 32KB divided by the chunk size
(16KB) is 2.
* Entry 2 in the logical map is "X". This means no part of this 16KB has been written to yet.
* Allocate a 16KB buffer in memory
* Compress the incoming 16KB of data into this allocated buffer
* Assume this data compresses to 6KB. This requires 2 4KB backing IO units.
* Allocate 2 blocks (0 and 1) from the free backing IO unit list. Always use the lowest numbered
entries in the free backing IO unit list - this ensures that unnecessary backing storage
is not allocated in the thin-provisioned logical volume holding the backing storage.
* Write the 6KB of data to backing IO units 0 and 1.
* Allocate a chunk map (0) from the free chunk map list.
* Write (0, 1, X, X) to the chunk map. This represents that only 2 backing IO units were used to
store the 16KB of data.
* Write the chunk map index to entry 2 in the logical map.
```text
+--------------------+
Backing Device |01 |
+--------------------+
Free Backing IO Unit List 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
+------------+------------+------------+------------+------------+
Chunk Maps | 0 1 X X | | | | |
+------------+------------+------------+------------+------------+
Free Chunk Map List 1, 2, 3, 4
+---+---+---+---+
Logical Map | X | X | 0 | X |
+---+---+---+---+
```
### Write 4KB at Offset 8KB
* Find the corresponding index into the logical map. Offset 8KB divided by the chunk size is 0.
* Entry 0 in the logical map is "X". This means no part of this 16KB has been written to yet.
* The write is not for the entire 16KB chunk, so we must allocate a 16KB chunk-sized buffer for
source data.
* Copy the incoming 4KB data to offset 8KB of this 16KB buffer. Zero the rest of the 16KB buffer.
* Allocate a 16KB destination buffer.
* Compress the 16KB source data buffer into the 16KB destination buffer
* Assume this data compresses to 3KB. This requires 1 4KB backing IO unit.
* Allocate 1 block (2) from the free backing IO unit list.
* Write the 3KB of data to block 2.
* Allocate a chunk map (1) from the free chunk map list.
* Write (2, X, X, X) to the chunk map.
* Write the chunk map index to entry 0 in the logical map.
```text
+--------------------+
Backing Device |012 |
+--------------------+
Free Backing IO Unit List 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
+------------+------------+------------+------------+------------+
Chunk Maps | 0 1 X X | 2 X X X | | | |
+------------+------------+------------+------------+------------+
Free Chunk Map List 2, 3, 4
+---+---+---+---+
Logical Map | 1 | X | 0 | X |
+---+---+---+---+
```
### Read 16KB at Offset 16KB
* Offset 16KB maps to index 1 in the logical map.
* Entry 1 in the logical map is "X". This means no part of this 16KB has been written to yet.
* Since no data has been written to this chunk, return all 0's to satisfy the read I/O.
### Write 4KB at Offset 4KB
* Offset 4KB maps to index 0 in the logical map.
* Entry 0 in the logical map is "1". Since we are not overwriting the entire chunk, we must
do a read-modify-write.
* Chunk map 1 only specifies one backing IO unit (2). Allocate a 16KB buffer and read block
2 into it. This will be called the compressed data buffer. Note that 16KB is allocated
instead of 4KB so that we can reuse this buffer to hold the compressed data that will
be written later back to disk.
* Allocate a 16KB buffer for the uncompressed data for this chunk. Decompress the data from
the compressed data buffer into this buffer.
* Copy the incoming 4KB of data to offset 4KB of the uncompressed data buffer.
* Compress the 16KB uncompressed data buffer into the compressed data buffer.
* Assume this data compresses to 5KB. This requires 2 4KB backing IO units.
* Allocate blocks 3 and 4 from the free backing IO unit list.
* Write the 5KB of data to blocks 3 and 4.
* Allocate chunk map 2 from the free chunk map list.
* Write (3, 4, X, X) to chunk map 2. Note that at this point, the chunk map is not referenced
by the logical map. If there was a power fail at this point, the previous data for this chunk
would still be fully valid.
* Write chunk map 2 to entry 0 in the logical map.
* Free chunk map 1 back to the free chunk map list.
* Free backing IO unit 2 back to the free backing IO unit list.
```text
+--------------------+
Backing Device |01 34 |
+--------------------+
Free Backing IO Unit List 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
+------------+------------+------------+------------+------------+
Chunk Maps | 0 1 X X | | 3 4 X X | | |
+------------+------------+------------+------------+------------+
Free Chunk Map List 1, 3, 4
+---+---+---+---+
Logical Map | 2 | X | 0 | X |
+---+---+---+---+
```
### Operations that span across multiple chunks
Operations that span a chunk boundary are logically split into multiple operations, each of
which is associated with a single chunk.
Example: 20KB write at offset 4KB
In this case, the write operation is split into a 12KB write at offset 4KB (affecting only
chunk 0 in the logical map) and a 8KB write at offset 16KB (affecting only chunk 1 in the
logical map). Each write is processed independently using the algorithm described above.
Completion of the 20KB write does not occur until both operations have completed.
### Unmap Operations
Unmap operations on an entire chunk are achieved by removing the chunk map entry (if any) from
the logical map. The chunk map is returned to the free chunk map list, and any backing IO units
associated with the chunk map are returned to the free backing IO unit list.
Unmap operations that affect only part of a chunk can be treated as writing zeroes to that
region of the chunk. If the entire chunk is unmapped via several operations, it can be
detected via the uncompressed data equaling all zeroes. When this occurs, the chunk map entry
may be removed from the logical map.
After an entire chunk has been unmapped, subsequent reads to the chunk will return all zeroes.
This is similar to the "Read 16KB at offset 16KB" example above.
### Write Zeroes Operations
Write zeroes operations are handled similarly to unmap operations. If a write zeroes
operation covers an entire chunk, we can remove the chunk's entry in the logical map
completely. Then subsequent reads to that chunk will return all zeroes.
### Restart
An application using libreduce will periodically exit and need to be restarted. When the
application restarts, it will reload compressed volumes so they can be used again from the
same state as when the application exited.
When the compressed volume is reloaded, the free chunk map list and free backing IO unit list
are reconstructed by walking the logical map. The logical map will only point to valid
chunk maps, and the valid chunk maps will only point to valid backing IO units. Any chunk maps
and backing IO units not referenced go into their respective free lists.
This ensures that if a system crashes in the middle of a write operation - i.e. during or
after a chunk map is updated, but before it is written to the logical map - that everything
related to that in-progress write will be ignored after the compressed volume is restarted.
### Overlapping operations on same chunk
Implementations must take care to handle overlapping operations on the same chunk. For example,
operation 1 writes some data to chunk A, and while this is in progress, operation 2 also writes
some data to chunk A. In this case, operation 2 should not start until operation 1 has
completed. Further optimizations are outside the scope of this document.
### Thin provisioned backing storage
Backing storage must be thin provisioned to realize any savings from compression. This algorithm
will always use (and reuse) backing IO units available closest to offset 0 on the backing device.
This ensures that even though backing storage device may have been sized similarly to the size of
the compressed volume, storage for the backing storage device will not actually be allocated
until the backing IO units are actually needed.

View File

@ -1,10 +0,0 @@
# Concepts {#concepts}
- @subpage userspace
- @subpage memory
- @subpage concurrency
- @subpage ssd_internals
- @subpage nvme_spec
- @subpage vhost_processing
- @subpage overview
- @subpage porting

View File

@ -1,140 +1,126 @@
# Message Passing and Concurrency {#concurrency}
## Theory
# Theory
One of the primary aims of SPDK is to scale linearly with the addition of
hardware. This can mean many things in practice. For instance, moving from one
SSD to two should double the number of I/O's per second. Or doubling the number
of CPU cores should double the amount of computation possible. Or even doubling
the number of NICs should double the network throughput. To achieve this, the
software's threads of execution must be independent from one another as much as
possible. In practice, that means avoiding software locks and even atomic
instructions.
hardware. This can mean a number of things in practice. For instance, moving
from one SSD to two should double the number of I/O's per second. Or doubling
the number of CPU cores should double the amount of computation possible. Or
even doubling the number of NICs should double the network throughput. To
achieve this, the software must be designed such that threads of execution are
independent from one another as much as possible. In practice, that means
avoiding software locks and even atomic instructions.
Traditionally, software achieves concurrency by placing some shared data onto
the heap, protecting it with a lock, and then having all threads of execution
acquire the lock only when accessing the data. This model has many great
properties:
acquire the lock only when that shared data needs to be accessed. This model
has a number of great properties:
* It's easy to convert single-threaded programs to multi-threaded programs
because you don't have to change the data model from the single-threaded
version. You add a lock around the data.
* It's relatively easy to convert single-threaded programs to multi-threaded
programs because you don't have to change the data model from the
single-threaded version. You just add a lock around the data.
* You can write your program as a synchronous, imperative list of statements
that you read from top to bottom.
* The scheduler can interrupt threads, allowing for efficient time-sharing
of CPU resources.
* Your threads can be interrupted and put to sleep by the operating system
scheduler behind the scenes, allowing for efficient time-sharing of CPU resources.
Unfortunately, as the number of threads scales up, contention on the lock around
the shared data does too. More granular locking helps, but then also increases
the complexity of the program. Even then, beyond a certain number of contended
locks, threads will spend most of their time attempting to acquire the locks and
the program will not benefit from more CPU cores.
Unfortunately, as the number of threads scales up, contention on the lock
around the shared data does too. More granular locking helps, but then also
greatly increases the complexity of the program. Even then, beyond a certain
number highly contended locks, threads will spend most of their time
attempting to acquire the locks and the program will not benefit from any
additional CPU cores.
SPDK takes a different approach altogether. Instead of placing shared data in a
global location that all threads access after acquiring a lock, SPDK will often
assign that data to a single thread. When other threads want to access the data,
they pass a message to the owning thread to perform the operation on their
behalf. This strategy, of course, is not at all new. For instance, it is one of
the core design principles of
assign that data to a single thread. When other threads want to access the
data, they pass a message to the owning thread to perform the operation on
their behalf. This strategy, of course, is not at all new. For instance, it is
one of the core design principles of
[Erlang](http://erlang.org/download/armstrong_thesis_2003.pdf) and is the main
concurrency mechanism in [Go](https://tour.golang.org/concurrency/2). A message
in SPDK consists of a function pointer and a pointer to some context. Messages
are passed between threads using a
in SPDK typically consists of a function pointer and a pointer to some context,
and is passed between threads using a
[lockless ring](http://dpdk.org/doc/guides/prog_guide/ring_lib.html). Message
passing is often much faster than most software developer's intuition leads them
to believe due to caching effects. If a single core is accessing the same data
(on behalf of all of the other cores), then that data is far more likely to be
in a cache closer to that core. It's often most efficient to have each core work
on a small set of data sitting in its local cache and then hand off a small
message to the next core when done.
passing is often much faster than most software developer's intuition leads them to
believe, primarily due to caching effects. If a single core is consistently
accessing the same data (on behalf of all of the other cores), then that data
is far more likely to be in a cache closer to that core. It's often most
efficient to have each core work on a relatively small set of data sitting in
its local cache and then hand off a small message to the next core when done.
In more extreme cases where even message passing may be too costly, each thread
may make a local copy of the data. The thread will then only reference its local
copy. To mutate the data, threads will send a message to each other thread
telling them to perform the update on their local copy. This is great when the
data isn't mutated very often, but is read very frequently, and is often
employed in the I/O path. This of course trades memory size for computational
efficiency, so it is used in only the most critical code paths.
In more extreme cases where even message passing may be too costly, a copy of
the data will be made for each thread. The thread will then only reference its
local copy. To mutate the data, threads will send a message to each other
thread telling them to perform the update on their local copy. This is great
when the data isn't mutated very often, but may be read very frequently, and is
often employed in the I/O path. This of course trades memory size for
computational efficiency, so it's use is limited to only the most critical code
paths.
## Message Passing Infrastructure
# Message Passing Infrastructure
SPDK provides several layers of message passing infrastructure. The most
fundamental libraries in SPDK, for instance, don't do any message passing on
their own and instead enumerate rules about when functions may be called in
their documentation (e.g. @ref nvme). Most libraries, however, depend on SPDK's
[thread](http://www.spdk.io/doc/thread_8h.html)
abstraction, located in `libspdk_thread.a`. The thread abstraction provides a
basic message passing framework and defines a few key primitives.
[io_channel](http://www.spdk.io/doc/io__channel_8h.html) infrastructure,
located in `libspdk_util.a`. The io_channel infrastructure is an abstraction
around a basic message passing framework and defines a few key abstractions.
First, `spdk_thread` is an abstraction for a lightweight, stackless thread of
execution. A lower level framework can execute an `spdk_thread` for a single
timeslice by calling `spdk_thread_poll()`. A lower level framework is allowed to
move an `spdk_thread` between system threads at any time, as long as there is
only a single system thread executing `spdk_thread_poll()` on that
`spdk_thread` at any given time. New lightweight threads may be created at any
time by calling `spdk_thread_create()` and destroyed by calling
`spdk_thread_destroy()`. The lightweight thread is the foundational abstraction for
threading in SPDK.
First, spdk_thread is an abstraction for a thread of execution and
spdk_poller is an abstraction for a function that should be
periodically called on the given thread. On each system thread that the user
wishes to use with SPDK, they must first call spdk_allocate_thread(). This
function takes three function pointers - one that will be called to pass a
message to this thread, one that will be called to request that a poller be
started on this thread, and finally one to request that a poller be stopped.
*The implementation of these functions is not provided by this library*. Many
applications already have facilities for passing messages, so to ease
integration with existing code bases we've left the implementation up to the
user. However, for users starting from scratch, see the following section on
the event framework for an SPDK-provided implementation.
There are then a few additional abstractions layered on top of the
`spdk_thread`. One is the `spdk_poller`, which is an abstraction for a
function that should be repeatedly called on the given thread. Another is an
`spdk_msg_fn`, which is a function pointer and a context pointer, that can
be sent to a thread for execution via `spdk_thread_send_msg()`.
The library also defines two other abstractions: spdk_io_device and
spdk_io_channel. In the course of implementing SPDK we noticed the
same pattern emerging in a number of different libraries. In order to
implement a message passing strategy, the code would describe some object with
global state and also some per-thread context associated with that object that
was accessed in the I/O path to avoid locking on the global state. The pattern
was clearest in the lowest layers where I/O was being submitted to block
devices. These devices often expose multiple queues that can be assigned to
threads and then accessed without a lock to submit I/O. To abstract that, we
generalized the device to spdk_io_device and the thread-specific queue to
spdk_io_channel. Over time, however, the pattern has appeared in a huge
number of places that don't fit quite so nicely with the names we originally
chose. In today's code spdk_io_device is any pointer, whose uniqueness is
predicated only on its memory address, and spdk_io_channel is the per-thread
context associated with a particular spdk_io_device.
The library also defines two additional abstractions: `spdk_io_device` and
`spdk_io_channel`. In the course of implementing SPDK we noticed the same
pattern emerging in a number of different libraries. In order to implement a
message passing strategy, the code would describe some object with global state
and also some per-thread context associated with that object that was accessed
in the I/O path to avoid locking on the global state. The pattern was clearest
in the lowest layers where I/O was being submitted to block devices. These
devices often expose multiple queues that can be assigned to threads and then
accessed without a lock to submit I/O. To abstract that, we generalized the
device to `spdk_io_device` and the thread-specific queue to `spdk_io_channel`.
Over time, however, the pattern has appeared in a huge number of places that
don't fit quite so nicely with the names we originally chose. In today's code
`spdk_io_device` is any pointer, whose uniqueness is predicated only on its
memory address, and `spdk_io_channel` is the per-thread context associated with
a particular `spdk_io_device`.
The threading abstraction provides functions to send a message to any other
The io_channel infrastructure provides functions to send a message to any other
thread, to send a message to all threads one by one, and to send a message to
all threads for which there is an io_channel for a given io_device.
Most critically, the thread abstraction does not actually spawn any system level
threads of its own. Instead, it relies on the existence of some lower level
framework that spawns system threads and sets up event loops. Inside those event
loops, the threading abstraction simply requires the lower level framework to
repeatedly call `spdk_thread_poll()` on each `spdk_thread()` that exists. This
makes SPDK very portable to a wide variety of asynchronous, event-based
frameworks such as [Seastar](https://www.seastar.io) or [libuv](https://libuv.org/).
# The event Framework
## SPDK Spinlocks
As the number of example applications in SPDK grew, it became clear that a
large portion of the code in each was implementing the basic message passing
infrastructure required to call spdk_allocate_thread(). This includes spawning
one thread per core, pinning each thread to a unique core, and allocating
lockless rings between the threads for message passing. Instead of
re-implementing that infrastructure for each example application, SPDK provides
the SPDK [event framework](http://www.spdk.io/doc/event_8h.html). This library
handles setting up all of the message passing infrastructure, installing signal
handlers to cleanly shutdown, implements periodic pollers, and does basic
command line parsing. When started through spdk_app_start(), the library
automatically spawns all of the threads requested, pins them, and calls
spdk_allocate_thread() with appropriate function pointers for each one. This
makes it much easier to implement a brand new SPDK application and is the
recommended method for those starting out. Only established applications with
sufficient message passing infrastructure should consider directly integrating
the lower level libraries.
There are some cases where locks are used. These should be limited in favor of
the message passing interface described above. When locks are needed,
SPDK spinlocks should be used instead of POSIX locks.
POSIX locks like `pthread_mutex_t` and `pthread_spinlock_t` do not properly
handle locking between SPDK's lightweight threads. SPDK's `spdk_spinlock`
is safe to use in SPDK libraries and applications. This safety comes from
imposing restrictions on when locks can be held. See
[spdk_spinlock](structspdk__spinlock.html) for details.
## The event Framework
The SPDK project didn't want to officially pick an asynchronous, event-based
framework for all of the example applications it shipped with, in the interest
of supporting the widest variety of frameworks possible. But the applications do
of course require something that implements an asynchronous event loop in order
to run, so enter the `event` framework located in `lib/event`. This framework
includes things like polling and scheduling the lightweight threads, installing
signal handlers to cleanly shutdown, and basic command line option parsing.
Only established applications should consider directly integrating the lower
level libraries.
## Limitations of the C Language
# Limitations of the C Language
Message passing is efficient, but it results in asynchronous code.
Unfortunately, asynchronous code is a challenge in C. It's often implemented by
@ -152,7 +138,6 @@ function `foo` performs some asynchronous operation and when that completes
function `bar` is called, then function `bar` performs some operation that
calls function `baz` on completion, a good way to write it is as such:
```c
void baz(void *ctx) {
...
}
@ -164,19 +149,17 @@ calls function `baz` on completion, a good way to write it is as such:
void foo(void *ctx) {
async_op(bar, ctx);
}
```
Don't split these functions up - keep them as a nice unit that can be read from bottom to top.
For more complex callback chains, especially ones that have logical branches
or loops, it's best to write out a state machine. It turns out that higher
level languages that support futures and promises are just generating state
level langauges that support futures and promises are just generating state
machines at compile time, so even though we don't have the ability to generate
them in C we can still write them out by hand. As an example, here's a
callback chain that performs `foo` 5 times and then calls `bar` - effectively
an asynchronous for loop.
```c
enum states {
FOO_START = 0,
FOO_END,
@ -259,7 +242,6 @@ an asynchronous for loop.
run_state_machine(sm);
}
```
This is complex, of course, but the `run_state_machine` function can be read
from top to bottom to get a clear overview of what's happening in the code

View File

@ -1,101 +0,0 @@
# SPDK and Containers {#containers}
This is a living document as there are many ways to use containers with
SPDK. As new usages are identified and tested, they will be documented
here.
## In this document {#containers_toc}
* @ref spdk_in_docker
* @ref spdk_docker_suite
* @ref kata_containers_with_spdk_vhost
## Containerizing an SPDK Application for Docker {#spdk_in_docker}
There are no SPDK specific changes needed to run an SPDK based application in
a docker container, however this quick start guide should help you as you
containerize your SPDK based application.
1. Make sure you have all of your app dependencies identified and included in your Dockerfile
2. Make sure you have compiled your application for the target arch
3. Make sure your host has hugepages enabled
4. Make sure your host has bound your nvme device to your userspace driver
5. Write your Dockerfile. The following is a simple Dockerfile to containerize the nvme `hello_world`
example:
~~~{.sh}
# start with the latest Fedora
FROM fedora
# if you are behind a proxy, set that up now
ADD dnf.conf /etc/dnf/dnf.conf
# these are the min dependencies for the hello_world app
RUN dnf install libaio-devel -y
RUN dnf install numactl-devel -y
# set our working dir
WORKDIR /app
# add the hello_world binary
ADD hello_world hello_world
# run the app
CMD ./hello_world
~~~
6. Create your image
`sudo docker image build -t hello:1.0 .`
7. You docker command line will need to include at least the following:
- the `--privileged` flag to enable sharing of hugepages
- use of the `-v` switch to map hugepages
`sudo docker run --privileged -v /dev/hugepages:/dev/hugepages hello:1.0`
or depending on the needs of your app you may need one or more of the following parameters:
- If you are using the SPDK app framework: `-v /dev/shm:/dev/shm`
- If you need to use RPCs from outside of the container: `-v /var/tmp:/var/tmp`
- If you need to use the host network (i.e. NVMF target application): `--network host`
Your output should look something like this:
~~~{.sh}
$ sudo docker run --privileged -v //dev//hugepages://dev//hugepages hello:1.0
Starting SPDK v20.01-pre git sha1 80da95481 // DPDK 19.11.0 initialization...
[ DPDK EAL parameters: hello_world -c 0x1 --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --iova-mode=pa
--base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk0 --proc-type=auto ]
EAL: No available hugepages reported in hugepages-1048576kB
Initializing NVMe Controllers
Attaching to 0000:06:00.0
Attached to 0000:06:00.0
Using controller INTEL SSDPEDMD400G4 (CVFT7203005M400LGN ) with 1 namespaces.
Namespace ID: 1 size: 400GB
Initialization complete.
INFO: using host memory buffer for IO
Hello world!
~~~
## SPDK Docker suite {#spdk_docker_suite}
When considering how to generate SPDK docker container images formally,
deploy SPDK containers correctly, interact with SPDK container instances,
and orchestrate SPDK container instances, you can get practiced and inspired from
this SPDK docker-compose example:
[SPDK Docker suite](https://github.com/spdk/spdk/blob/master/docker/README.md).
## Using SPDK vhost target to provide volume service to Kata Containers and Docker {#kata_containers_with_spdk_vhost}
[Kata Containers](https://katacontainers.io) can build a secure container
runtime with lightweight virtual machines that feel and perform like
containers, but provide stronger workload isolation using hardware
virtualization technology as a second layer of defense.
From Kata Containers [1.11.0](https://github.com/kata-containers/runtime/releases/tag/1.11.0),
vhost-user-blk support is enabled in `kata-containers/runtime`. That is to say
SPDK vhost target can be used to provide volume service to Kata Containers directly.
In addition, a container manager like Docker, can be configured easily to launch
a Kata container with an SPDK vhost-user block device. For operating details, visit
Kata containers use-case [Setup to run SPDK vhost-user devices with Kata Containers and Docker](https://github.com/kata-containers/documentation/blob/master/use-cases/using-SPDK-vhostuser-and-kata.md#host-setup-for-vhost-user-devices)

122
doc/directory_structure.md Normal file
View File

@ -0,0 +1,122 @@
# SPDK Directory Structure {#directory_structure}
# Overview {#dir_overview}
SPDK is primarily a collection of C libraries intended to be consumed directly by
applications, but the repository also contains many examples and full-fledged applications.
This will provide a general overview of what is where in the repository.
## Applications {#dir_app}
The `app` top-level directory contains four applications:
- `app/iscsi_tgt`: An iSCSI target
- `app/nvmf_tgt`: An NVMe-oF target
- `app/iscsi_top`: Informational tool (like `top`) that tracks activity in the
iSCSI target.
- `app/trace`: A tool for processing trace points output from the iSCSI and
NVMe-oF targets.
- `app/vhost`: A vhost application that presents virtio controllers to
QEMU-based VMs and process I/O submitted to those controllers.
The application binaries will be in their respective directories after compiling and all
can be run with no arguments to print out their command line arguments. For the iSCSI
and NVMe-oF targets, they both need a configuration file (-c option). Fully commented
examples of the configuration files live in the `etc/spdk` directory.
## Build Collateral {#dir_build}
The `build` directory contains all of the static libraries constructed during
the build process. The `lib` directory combined with the `include/spdk`
directory are the official outputs of an SPDK release, if it were to be packaged.
## Documentation {#dir_doc}
The `doc` top-level directory contains all of SPDK's documentation. API Documentation
is created using Doxygen directly from the code, but more general articles and longer
explanations reside in this directory, as well as the Doxygen config file.
To build the documentation, just type `make` within the doc directory.
## Examples {#dir_examples}
The `examples` top-level directory contains a set of examples intended to be used
for reference. These are different than the applications, which are doing a "real"
task that could reasonably be deployed. The examples are instead either heavily
contrived to demonstrate some facet of SPDK, or aren't considered complete enough
to warrant tagging them as a full blown SPDK application.
This is a great place to learn about how SPDK works. In particular, check out
`examples/nvme/hello_world`.
## Include {#dir_include}
The `include` directory is where all of the header files are located. The public API
is all placed in the `spdk` subdirectory of `include` and we highly
recommend that applications set their include path to the top level `include`
directory and include the headers by prefixing `spdk/` like this:
~~~{.c}
#include "spdk/nvme.h"
~~~
Most of the headers here correspond with a library in the `lib` directory and will be
covered in that section. There are a few headers that stand alone, however. They are:
- `assert.h`
- `barrier.h`
- `endian.h`
- `fd.h`
- `mmio.h`
- `queue.h` and `queue_extras.h`
- `string.h`
There is also an `spdk_internal` directory that contains header files widely included
by libraries within SPDK, but that are not part of the public API and would not be
installed on a user's system.
## Libraries {#dir_lib}
The `lib` directory contains the real heart of SPDK. Each component is a C library with
its own directory under `lib`.
### Block Device Abstraction Layer {#dir_bdev}
The `bdev` directory contains a block device abstraction layer that is currently used
within the iSCSI and NVMe-oF targets. The public interface is `include/spdk/bdev.h`.
This library lacks clearly defined responsibilities as of this writing and instead does a
number of
things:
- Translates from a common `block` protocol to specific protocols like NVMe or to system
calls like libaio. There are currently three block device backend modules that can be
plugged in - libaio, SPDK NVMe, CephRBD, and a RAM-based backend called malloc.
- Provides a mechanism for composing virtual block devices from physical devices (to do
RAID and the like).
- Handles some memory allocation for data buffers.
This layer also could be made to do I/O queueing or splitting in a general way. We're open
to design ideas and discussion here.
### Configuration File Parser {#dir_conf}
The `conf` directory contains configuration file parser. The public header
is `include/spdk/conf.h`. The configuration file format is kind of like INI,
except that the directives are are "Name Value" instead of "Name = Value". This is
the configuration format for both the iSCSI and NVMe-oF targets.
... Lots more libraries that need to be described ...
## Makefile Fragments {#dir_mk}
The `mk` directory contains a number of shared Makefile fragments used in the build system.
## Scripts {#dir_scripts}
The `scripts` directory contains convenient scripts for a number of operations. The two most
important are `check_format.sh`, which will use astyle and pep8 to check C, C++, and Python
coding style against our defined conventions, and `setup.sh` which binds and unbinds devices
from kernel drivers.
## Tests {#dir_tests}
The `test` directory contains all of the tests for SPDK's components and the subdirectories mirror
the structure of the entire repository. The tests are a mixture of unit tests and functional tests.

View File

@ -1,69 +0,0 @@
# distributions {#distributions}
## In this document {#distros_toc}
* @ref distros_overview
* @ref linux_list
* @ref freebsd_list
## Overview {#distros_overview}
CI pool uses different flavors of `Linux` and `FreeBSD` distributions which are
used as a base for all the tests run against submitted patches. Below is the
listing which covers all currently supported versions and the related CI
jobs (see [status](https://ci.spdk.io) as a reference).
## Linux distributions {#linux_list}
* Fedora: Trying to follow new release as per the release cycle whenever possible.
```list
- autobuild-vg-autotest
- clang-vg-autotest
- iscsi*-vg-autotest
- nvme-vg-autotest
- nvmf*-vg-autotest
- scanbuild-vg-autotest
- unittest-vg-autotest
- vhost-initiator-vg-autotest
```
Jobs listed below are run on bare-metal systems where version of
Fedora may vary. In the future these will be aligned with the
`vg` jobs.
```list
- BlobFS-autotest
- crypto-autotest
- nvme-phy-autotest
- nvmf*-phy-autotest
- vhost-autotest
```
* Ubuntu: Last two LTS releases. Currently `18.04` and `20.04`.
```list
- ubuntu18-vg-autotest
- ubuntu20-vg-autotest
```
* CentOS: Maintained releases. Currently `7.9`. Centos 8.3 is only used for testing on 22.01.x branch.
```list
- centos7-vg-autotest
- centos8-vg-autotest
```
* Rocky Linux: Last release. Currently `8.6`. CentOS 8 replacement.
```list
- rocky8-vg-autotest
```
## FreeBSD distributions {#freebsd_list}
* FreeBSD: Production release. Currently `12.2`.
```list
- freebsd-vg-autotest
```

View File

@ -1,7 +0,0 @@
# Driver Modules {#driver_modules}
- @subpage nvme
- @subpage ioat
- @subpage idxd
- @subpage virtio
- @subpage vmd

View File

@ -1,83 +1,71 @@
# Event Framework {#event}
# Event framework {#event}
SPDK provides a framework for writing asynchronous, polled-mode,
shared-nothing server applications. The event framework is intended to be
optional; most other SPDK components are designed to be integrated into an
application without specifically depending on the SPDK event library. The
framework defines several concepts - reactors, events, and pollers - that are
described in the following sections. The event framework spawns one thread per
core (reactor) and connects the threads with lockless queues. Messages
(events) can then be passed between the threads. On modern CPU architectures,
message passing is often much faster than traditional locking. For a
discussion of the theoretical underpinnings of this framework, see @ref
concurrency.
SPDK provides a framework for writing asynchronous, polled-mode, shared-nothing server applications.
The event framework is intended to be optional; most other SPDK components are designed to be
integrated into an application without specifically depending on the SPDK event library.
The framework defines several concepts - reactors, events, and pollers - that are described
in the following sections.
The event framework spawns one thread per core (reactor) and connects the threads with
lockless queues.
Messages (events) can then be passed between the threads.
On modern CPU architectures, message passing is often much faster than traditional locking.
The event framework public interface is defined in event.h.
The event framework public interface is defined in spdk/event.h.
## Event Framework Design Considerations {#event_design}
# Event Framework Design Considerations {#event_design}
Simple server applications can be written in a single-threaded fashion. This
allows for straightforward code that can maintain state without any locking or
other synchronization. However, to scale up (for example, to allow more
simultaneous connections), the application may need to use multiple threads.
In the ideal case where each connection is independent from all other
connections, the application can be scaled by creating additional threads and
assigning connections to them without introducing cross-thread
synchronization. Unfortunately, in many real-world cases, the connections are
not entirely independent and cross-thread shared state is necessary. SPDK
provides an event framework to help solve this problem.
Simple server applications can be written in a single-threaded fashion. This allows for
straightforward code that can maintain state without any locking or other synchronization.
However, to scale up (for example, to allow more simultaneous connections), the application may
need to use multiple threads.
In the ideal case where each connection is independent from all other connections,
the application can be scaled by creating additional threads and assigning connections to them
without introducing cross-thread synchronization.
Unfortunately, in many real-world cases, the connections are not entirely independent
and cross-thread shared state is necessary.
SPDK provides an event framework to help solve this problem.
## SPDK Event Framework Components {#event_components}
# SPDK Event Framework Components {#event_components}
### Events {#event_component_events}
## Events {#event_component_events}
To accomplish cross-thread communication while minimizing synchronization
overhead, the framework provides message passing in the form of events. The
event framework runs one event loop thread per CPU core. These threads are
called reactors, and their main responsibility is to process incoming events
from a queue. Each event consists of a bundled function pointer and its
arguments, destined for a particular CPU core. Events are created using
spdk_event_allocate() and executed using spdk_event_call(). Unlike a
thread-per-connection server design, which achieves concurrency by depending
on the operating system to schedule many threads issuing blocking I/O onto a
limited number of cores, the event-driven model requires use of explicitly
asynchronous operations to achieve concurrency. Asynchronous I/O may be issued
with a non-blocking function call, and completion is typically signaled using
a callback function.
To accomplish cross-thread communication while minimizing synchronization overhead,
the framework provides message passing in the form of events.
The event framework runs one event loop thread per CPU core.
These threads are called reactors, and their main responsibility is to process incoming events
from a queue.
Each event consists of a bundled function pointer and its arguments, destined for
a particular CPU core.
Events are created using spdk_event_allocate() and executed using spdk_event_call().
Unlike a thread-per-connection server design, which achieves concurrency by depending on the
operating system to schedule many threads issuing blocking I/O onto a limited number of cores,
the event-driven model requires use of explicitly asynchronous operations to achieve concurrency.
Asynchronous I/O may be issued with a non-blocking function call, and completion is typically
signaled using a callback function.
### Reactors {#event_component_reactors}
## Reactors {#event_component_reactors}
Each reactor has a lock-free queue for incoming events to that core, and
threads from any core may insert events into the queue of any other core. The
reactor loop running on each core checks for incoming events and executes them
in first-in, first-out order as they are received. Event functions should
never block and should preferably execute very quickly, since they are called
directly from the event loop on the destination core.
Each reactor has a lock-free queue for incoming events to that core, and threads from any core
may insert events into the queue of any other core.
The reactor loop running on each core checks for incoming events and executes them in
first-in, first-out order as they are received.
Event functions should never block and should preferably execute very quickly,
since they are called directly from the event loop on the destination core.
### Pollers {#event_component_pollers}
## Pollers {#event_component_pollers}
The framework also defines another type of function called a poller. Pollers
may be registered with the spdk_poller_register() function. Pollers, like
events, are functions with arguments that can be bundled and executed.
However, unlike events, pollers are executed repeatedly until unregistered and
are executed on the thread they are registered on. The reactor event loop
intersperses calls to the pollers with other event processing. Pollers are
intended to poll hardware as a replacement for interrupts. Normally, pollers
are executed on every iteration of the main event loop. Pollers may also be
scheduled to execute periodically on a timer if low latency is not required.
The framework also defines another type of function called a poller.
Pollers may be registered with the spdk_poller_register() function.
Pollers, like events, are functions with arguments that can be bundled and sent to a specific
core to be executed.
However, unlike events, pollers are executed repeatedly until unregistered.
The reactor event loop intersperses calls to the pollers with other event processing.
Pollers are intended to poll hardware as a replacement for interrupts.
Normally, pollers are executed on every iteration of the main event loop.
Pollers may also be scheduled to execute periodically on a timer if low latency is not required.
### Application Framework {#event_component_app}
## Application Framework {#event_component_app}
The framework itself is bundled into a higher level abstraction called an "app". Once
spdk_app_start() is called, it will block the current thread until the application
terminates by calling spdk_app_stop() or an error condition occurs during the
initialization code within spdk_app_start(), itself, before invoking the caller's
supplied function.
### Custom shutdown callback {#event_component_shutdown}
When creating SPDK based application user may add custom shutdown callback which
will be called before the application framework starts the shutdown process.
To do that set shutdown_cb function callback in spdk_app_opts structure passed
to spdk_app_start(). Custom shutdown callback should call spdk_app_stop() before
returning to continue application shutdown process.
terminates by calling spdk_app_stop().

View File

@ -1,206 +0,0 @@
# Flash Translation Layer {#ftl}
The Flash Translation Layer library provides efficient 4K block device access on top of devices
with >4K write unit size (eg. raid5f bdev) or devices with large indirection units (some
capacity-focused NAND drives), which don't handle 4K writes well. It handles the logical to
physical address mapping and manages the garbage collection process.
## Terminology {#ftl_terminology}
### Logical to physical address map {#ftl_l2p}
- Shorthand: `L2P`
Contains the mapping of the logical addresses (LBA) to their on-disk physical location. The LBAs
are contiguous and in range from 0 to the number of surfaced blocks (the number of spare blocks
are calculated during device formation and are subtracted from the available address space). The
spare blocks account for zones going offline throughout the lifespan of the device as well as
provide necessary buffer for data [garbage collection](#ftl_reloc).
Since the L2P would occupy a significant amount of DRAM (4B/LBA for drives smaller than 16TiB,
8B/LBA for bigger drives), FTL will, by default, store only the 2GiB of most recently used L2P
addresses in memory (the amount is configurable), and page them in and out of the cache device
as necessary.
### Band {#ftl_band}
A band describes a collection of zones, each belonging to a different parallel unit. All writes to
a band follow the same pattern - a batch of logical blocks is written to one zone, another batch
to the next one and so on. This ensures the parallelism of the write operations, as they can be
executed independently on different zones. Each band keeps track of the LBAs it consists of, as
well as their validity, as some of the data will be invalidated by subsequent writes to the same
logical address. The L2P mapping can be restored from the SSD by reading this information in order
from the oldest band to the youngest.
```text
+--------------+ +--------------+ +--------------+
band 1 | zone 1 +--------+ zone 1 +---- --- --- --- --- ---+ zone 1 |
+--------------+ +--------------+ +--------------+
band 2 | zone 2 +--------+ zone 2 +---- --- --- --- --- ---+ zone 2 |
+--------------+ +--------------+ +--------------+
band 3 | zone 3 +--------+ zone 3 +---- --- --- --- --- ---+ zone 3 |
+--------------+ +--------------+ +--------------+
| ... | | ... | | ... |
+--------------+ +--------------+ +--------------+
band m | zone m +--------+ zone m +---- --- --- --- --- ---+ zone m |
+--------------+ +--------------+ +--------------+
| ... | | ... | | ... |
+--------------+ +--------------+ +--------------+
parallel unit 1 pu 2 pu n
```
The address map (`P2L`) is saved as a part of the band's metadata, at the end of each band:
```text
band's data tail metadata
+-------------------+-------------------------------+------------------------+
|zone 1 |...|zone n |...|...|zone 1 |...| | ... |zone m-1 |zone m|
|block 1| |block 1| | |block x| | | |block y |block y|
+-------------------+-------------+-----------------+------------------------+
```
Bands are written sequentially (in a way that was described earlier). Before a band can be written
to, all of its zones need to be erased. During that time, the band is considered to be in a `PREP`
state. Then the band moves to the `OPEN` state and actual user data can be written to the
band. Once the whole available space is filled, tail metadata is written and the band transitions to
`CLOSING` state. When that finishes the band becomes `CLOSED`.
### Non volatile cache {#ftl_nvcache}
- Shorthand: `nvcache`
Nvcache is a bdev that is used for buffering user writes and storing various metadata.
Nvcache data space is divided into chunks. Chunks are written in sequential manner.
When number of free chunks is below assigned threshold data from fully written chunks
is moved to base_bdev. This process is called chunk compaction.
```text
nvcache
+-----------------------------------------+
|chunk 1 |
| +--------------------------------- + |
| |blk 1 + md| blk 2 + md| blk n + md| |
| +----------------------------------| |
+-----------------------------------------+
| ... |
+-----------------------------------------+
+-----------------------------------------+
|chunk N |
| +--------------------------------- + |
| |blk 1 + md| blk 2 + md| blk n + md| |
| +----------------------------------| |
+-----------------------------------------+
```
### Garbage collection and relocation {#ftl_reloc}
- Shorthand: gc, reloc
Since a write to the same LBA invalidates its previous physical location, some of the blocks on a
band might contain old data that basically wastes space. As there is no way to overwrite an already
written block for a ZNS drive, this data will stay there until the whole zone is reset. This might create a
situation in which all of the bands contain some valid data and no band can be erased, so no writes
can be executed anymore. Therefore a mechanism is needed to move valid data and invalidate whole
bands, so that they can be reused.
```text
band band
+-----------------------------------+ +-----------------------------------+
| ** * * *** * *** * * | | |
|** * * * * * * *| +----> | |
|* *** * * * | | |
+-----------------------------------+ +-----------------------------------+
```
Valid blocks are marked with an asterisk '\*'.
Module responsible for data relocation is called `reloc`. When a band is chosen for garbage collection,
the appropriate blocks are marked as required to be moved. The `reloc` module takes a band that has
some of such blocks marked, checks their validity and, if they're still valid, copies them.
Choosing a band for garbage collection depends its validity ratio (proportion of valid blocks to all
user blocks). The lower the ratio, the higher the chance the band will be chosen for gc.
## Metadata {#ftl_metadata}
In addition to the [L2P](#ftl_l2p), FTL will store additional metadata both on the cache, as
well as on the base devices. The following types of metadata are persisted:
- Superblock - stores the global state of FTL; stored on cache, mirrored to the base device
- L2P - see the [L2P](#ftl_l2p) section for details
- Band - stores the state of bands - write pointers, their OPEN/FREE/CLOSE state; stored on cache, mirrored to a different section of the cache device
- Valid map - bitmask of all the valid physical addresses, used for improving [relocation](#ftl_reloc)
- Chunk - stores the state of chunks - write pointers, their OPEN/FREE/CLOSE state; stored on cache, mirrored to a different section of the cache device
- P2L - stores the address mapping (P2L, see [band](#ftl_band)) of currently open bands. This allows for the recovery of open
bands after dirty shutdown without needing VSS DIX metadata on the base device; stored on the cache device
- Trim - stores information about unmapped (trimmed) LBAs; stored on cache, mirrored to a different section of the cache device
## Dirty shutdown recovery {#ftl_dirty_shutdown}
After power failure, FTL needs to rebuild the whole L2P using the address maps (`P2L`) stored within each band/chunk.
This needs to done, because while individual L2P pages may have been paged out and persisted to the cache device,
there's no way to tell which, if any, pages were dirty before the power failure occurred. The P2L consists of not only
the mapping itself, but also a sequence id (`seq_id`), which describes the relative age of a given logical block
(multiple writes to the same logical block would produce the same amount of P2L entries, only the last one having the current data).
FTL will therefore rebuild the whole L2P by reading the P2L of all closed bands and chunks. For open bands, the P2L is stored on
the cache device, in a separate metadata region (see [the P2L section](#ftl_metadata)). Open chunks can be restored thanks to storing
the mapping in the VSS DIX metadata, which the cache device must be formatted with.
### Shared memory recovery {#ftl_shm_recovery}
In order to shorten the recovery after crash of the target application, FTL also stores its metadata in shared memory (`shm`) - this
allows it to keep track of the dirty-ness state of individual pages and shortens the recovery time dramatically, as FTL will only
need to mark any potential L2P pages which were paging out at the time of the crash as dirty and reissue the writes. There's no need
to read the whole P2L in this case.
### Trim {#ftl_trim}
Due to metadata size constraints and the difficulty of maintaining consistent data returned before and after dirty shutdown, FTL
currently only allows for trims (unmaps) aligned to 4MiB (alignment concerns both the offset and length of the trim command).
## Usage {#ftl_usage}
### Prerequisites {#ftl_prereq}
In order to use the FTL module, a cache device formatted with VSS DIX metadata is required.
### FTL bdev creation {#ftl_create}
Similar to other bdevs, the FTL bdevs can be created either based on JSON config files or via RPC.
Both interfaces require the same arguments which are described by the `--help` option of the
`bdev_ftl_create` RPC call, which are:
- bdev's name
- base bdev's name
- cache bdev's name (cache bdev must support VSS DIX mode - could be emulated by providing SPDK_FTL_VSS_EMU=1 flag to make;
emulating VSS should be done for testing purposes only, it is not power-fail safe)
- UUID of the FTL device (if the FTL is to be restored from the SSD)
## FTL bdev stack {#ftl_bdev_stack}
In order to create FTL on top of a regular bdev:
1) Create regular bdev e.g. `bdev_nvme`, `bdev_null`, `bdev_malloc`
2) Create second regular bdev for nvcache
3) Create FTL bdev on top of bdev created in step 1 and step 2
Example:
```
$ scripts/rpc.py bdev_nvme_attach_controller -b nvme0 -a 00:05.0 -t pcie
nvme0n1
$ scripts/rpc.py bdev_nvme_attach_controller -b nvme1 -a 00:06.0 -t pcie
nvme1n1
$ scripts/rpc.py bdev_ftl_create -b ftl0 -d nvme0n1 -c nvme1n1
{
"name": "ftl0",
"uuid": "3b469565-1fa5-4bfb-8341-747ec9f3a9b9"
}
```

View File

@ -1,269 +0,0 @@
# GDB Macros User Guide {#gdb_macros}
## Introduction
When debugging an spdk application using gdb we may need to view data structures
in lists, e.g. information about bdevs or threads.
If, for example I have several bdevs, and I wish to get information on bdev by
the name 'test_vols3', I will need to manually iterate over the list as follows:
~~~{.sh}
(gdb) p g_bdev_mgr->bdevs->tqh_first->name
$5 = 0x7f7dcc0b21b0 "test_vols1"
(gdb) p g_bdev_mgr->bdevs->tqh_first->internal->link->tqe_next->name
$6 = 0x7f7dcc0b1a70 "test_vols2"
(gdb) p
g_bdev_mgr->bdevs->tqh_first->internal->link->tqe_next->internal->link->tqe_next->name
$7 = 0x7f7dcc215a00 "test_vols3"
(gdb) p
g_bdev_mgr->bdevs->tqh_first->internal->link->tqe_next->internal->link->tqe_next
$8 = (struct spdk_bdev *) 0x7f7dcc2c7c08
~~~
At this stage, we can start looking at the relevant fields of our bdev which now
we know is in address 0x7f7dcc2c7c08.
This can be somewhat troublesome if there are 100 bdevs, and the one we need is
56th in the list...
Instead, we can use a gdb macro in order to get information about all the
devices.
Examples:
Printing bdevs:
~~~{.sh}
(gdb) spdk_print_bdevs
SPDK object of type struct spdk_bdev at 0x7f7dcc1642a8
((struct spdk_bdev*) 0x7f7dcc1642a8)
name 0x7f7dcc0b21b0 "test_vols1"
---------------
SPDK object of type struct spdk_bdev at 0x7f7dcc216008
((struct spdk_bdev*) 0x7f7dcc216008)
name 0x7f7dcc0b1a70 "test_vols2"
---------------
SPDK object of type struct spdk_bdev at 0x7f7dcc2c7c08
((struct spdk_bdev*) 0x7f7dcc2c7c08)
name 0x7f7dcc215a00 "test_vols3"
---------------
~~~
Finding a bdev by name:
~~~{.sh}
(gdb) spdk_find_bdev test_vols1
test_vols1
SPDK object of type struct spdk_bdev at 0x7f7dcc1642a8
((struct spdk_bdev*) 0x7f7dcc1642a8)
name 0x7f7dcc0b21b0 "test_vols1"
~~~
Printing spdk threads:
~~~{.sh}
(gdb) spdk_print_threads
SPDK object of type struct spdk_thread at 0x7fffd0008b50
((struct spdk_thread*) 0x7fffd0008b50)
name 0x7fffd00008e0 "reactor_1"
IO Channels:
SPDK object of type struct spdk_io_channel at 0x7fffd0052610
((struct spdk_io_channel*) 0x7fffd0052610)
name
ref 1
device 0x7fffd0008c80 (0x7fffd0008ce0 "nvmf_tgt")
---------------
SPDK object of type struct spdk_io_channel at 0x7fffd0056cd0
((struct spdk_io_channel*) 0x7fffd0056cd0)
name
ref 2
device 0x7fffd0056bf0 (0x7fffd0008e70 "test_vol1")
---------------
SPDK object of type struct spdk_io_channel at 0x7fffd00582e0
((struct spdk_io_channel*) 0x7fffd00582e0)
name
ref 1
device 0x7fffd0056c50 (0x7fffd0056cb0 "bdev_test_vol1")
---------------
SPDK object of type struct spdk_io_channel at 0x7fffd00583b0
((struct spdk_io_channel*) 0x7fffd00583b0)
name
ref 1
device 0x7fffd0005630 (0x7fffd0005690 "bdev_mgr")
---------------
~~~
Printing nvmf subsystems:
~~~{.sh}
(gdb) spdk_print_nvmf_subsystems
SPDK object of type struct spdk_nvmf_subsystem at 0x7fffd0008d00
((struct spdk_nvmf_subsystem*) 0x7fffd0008d00)
name "nqn.2014-08.org.nvmexpress.discovery", '\000' <repeats 187 times>
nqn "nqn.2014-08.org.nvmexpress.discovery", '\000' <repeats 187 times>
ID 0
---------------
SPDK object of type struct spdk_nvmf_subsystem at 0x7fffd0055760
((struct spdk_nvmf_subsystem*) 0x7fffd0055760)
name "nqn.2016-06.io.spdk.umgmt:cnode1", '\000' <repeats 191 times>
nqn "nqn.2016-06.io.spdk.umgmt:cnode1", '\000' <repeats 191 times>
ID 1
~~~
Printing SPDK spinlocks:
In this example, the spinlock has been initialized and locked but has never been unlocked.
After it is unlocked the first time the last unlocked stack will be present and the
`Locked by spdk_thread` line will say `not locked`.
~~~{.sh}
Breakpoint 2, spdk_spin_unlock (sspin=0x655110 <g_bdev_mgr+80>) at thread.c:2915
2915 struct spdk_thread *thread = spdk_get_thread();
(gdb) print *sspin
$2 = struct spdk_spinlock:
Locked by spdk_thread: 0x658080
Initialized at:
0x43e677 <spdk_spin_init+213> thread.c:2878
0x404feb <_bdev_init+16> /build/spdk/spdk-review-public/lib/bdev/bdev.c:116
0x44483d <__libc_csu_init+77>
0x7ffff62c9d18 <__libc_start_main+120>
0x40268e <_start+46>
Last locked at:
0x43e936 <spdk_spin_lock+436> thread.c:2909
0x40ca9c <bdev_name_add+129> /build/spdk/spdk-review-public/lib/bdev/bdev.c:3855
0x411a3c <bdev_register+641> /build/spdk/spdk-review-public/lib/bdev/bdev.c:6660
0x412e1e <spdk_bdev_register+24> /build/spdk/spdk-review-public/lib/bdev/bdev.c:7171
0x417895 <num_blocks_test+119> bdev_ut.c:878
0x7ffff7bc38cb <run_single_test.constprop+379>
0x7ffff7bc3b61 <run_single_suite.constprop+433>
0x7ffff7bc3f76 <CU_run_all_tests+118>
0x43351f <main+1439> bdev_ut.c:6295
0x7ffff62c9d85 <__libc_start_main+229>
0x40268e <_start+46>
Last unlocked at:
~~~
Print a single spinlock stack:
~~~{.sh}
(gdb) print sspin->internal.lock_stack
$1 = struct sspin_stack:
0x40c6a1 <spdk_spin_lock+436> /build/spdk/spdk-review-public/lib/thread/thread.c:2909
0x413f48 <spdk_spin+552> thread_ut.c:1831
0x7ffff7bc38cb <run_single_test.constprop+379>
0x7ffff7bc3b61 <run_single_suite.constprop+433>
0x7ffff7bc3f76 <CU_run_all_tests+118>
0x4148fa <main+547> thread_ut.c:1948
0x7ffff62c9d85 <__libc_start_main+229>
0x40248e <_start+46>
~~~
## Loading The gdb Macros
Copy the gdb macros to the host where you are about to debug.
It is best to copy the file either to somewhere within the PYTHONPATH, or to add
the destination directory to the PYTHONPATH. This is not mandatory, and can be
worked around, but can save a few steps when loading the module to gdb.
From gdb, with the application core open, invoke python and load the modules.
In the example below, I copied the macros to the /tmp directory which is not in
the PYTHONPATH, so I had to manually add the directory to the path.
~~~{.sh}
(gdb) python
>import sys
>sys.path.append('/tmp')
>import gdb_macros
>end
(gdb) spdk_load_macros
~~~
## Using the gdb Data Directory
On most systems, the data directory is /usr/share/gdb. The python script should
be copied into the python/gdb/function (or python/gdb/command) directory under
the data directory, e.g. /usr/share/gdb/python/gdb/function.
If the python script is in there, then the only thing you need to do when
starting gdb is type "spdk_load_macros".
## Using .gdbinit To Load The Macros
.gdbinit can also be used in order to run automatically run the manual steps
above prior to starting gdb.
Example .gdbinit:
~~~{.sh}
source /opt/km/install/tools/gdb_macros/gdb_macros.py
~~~
When starting gdb you still have to call spdk_load_macros.
## Why Do We Need to Explicitly Call spdk_load_macros
The reason is that the macros need to use globals provided by spdk in order to
iterate the spdk lists and build iterable representations of the list objects.
This will result in errors if these are not available which is very possible if
gdb is used for reasons other than debugging spdk core dumps.
In the example below, I attempted to load the macros when the globals are not
available causing gdb to fail loading the gdb_macros:
~~~{.sh}
(gdb) spdk_load_macros
Traceback (most recent call last):
File "/opt/km/install/tools/gdb_macros/gdb_macros.py", line 257, in invoke
spdk_print_threads()
File "/opt/km/install/tools/gdb_macros/gdb_macros.py", line 241, in __init__
threads = SpdkThreads()
File "/opt/km/install/tools/gdb_macros/gdb_macros.py", line 234, in __init__
super(SpdkThreads, self).__init__('g_threads', SpdkThread)
File "/opt/km/install/tools/gdb_macros/gdb_macros.py", line 25, in __init__
['tailq'])
File "/opt/km/install/tools/gdb_macros/gdb_macros.py", line 10, in __init__
self.list = gdb.parse_and_eval(self.list_pointer)
RuntimeError: No symbol table is loaded. Use the "file" command.
Error occurred in Python command: No symbol table is loaded. Use the "file"
command.
~~~
## Macros available
- spdk_load_macros: load the macros (use --reload in order to reload them)
- spdk_print_bdevs: information about bdevs
- spdk_find_bdev: find a bdev (substring search)
- spdk_print_io_devices: information about io devices
- spdk_print_nvmf_subsystems: information about nvmf subsystems
- spdk_print_threads: information about threads
## Adding New Macros
The list iteration macros are usually built from 3 layers:
- SpdkPrintCommand: inherits from gdb.Command and invokes the list iteration
- SpdkTailqList: Performs the iteration of a tailq list according to the tailq
member implementation
- SpdkObject: Provides the __str__ function so that the list iteration can print
the object
Other useful objects:
- SpdkNormalTailqList: represents a list which has 'tailq' as the tailq object
- SpdkArr: Iteration over an array (instead of a linked list)

View File

@ -1,6 +0,0 @@
# General Information {#general}
- @subpage event
- @subpage scheduler
- @subpage logical_volumes
- @subpage accel_fw

View File

@ -1,28 +1,23 @@
# Getting Started {#getting_started}
## Getting the Source Code {#getting_started_source}
# Getting the Source Code {#getting_started_source}
~~~{.sh}
git clone https://github.com/spdk/spdk --recursive
git clone https://github.com/spdk/spdk
cd spdk
git submodule update --init
~~~
## Installing Prerequisites {#getting_started_prerequisites}
# Installing Prerequisites {#getting_started_prerequisites}
The `scripts/pkgdep.sh` script will automatically install the bare minimum
dependencies required to build SPDK.
Use `--help` to see information on installing dependencies for optional components.
The `scripts/pkgdep.sh` script will automatically install the full set of
dependencies required to build and develop SPDK.
~~~{.sh}
sudo scripts/pkgdep.sh
~~~
Option --all will install all dependencies needed by SPDK features.
~~~{.sh}
sudo scripts/pkgdep.sh --all
~~~
## Building {#getting_started_building}
# Building {#getting_started_building}
Linux:
@ -55,20 +50,20 @@ can enable it by doing the following:
make
~~~
## Running the Unit Tests {#getting_started_unittests}
# Running the Unit Tests {#getting_started_unittests}
It's always a good idea to confirm your build worked by running the
unit tests.
~~~{.sh}
./test/unit/unittest.sh
./unittest.sh
~~~
You will see several error messages when running the unit tests, but they are
part of the test suite. The final message at the end of the script indicates
success or failure.
## Running the Example Applications {#getting_started_examples}
# Running the Example Applications {#getting_started_examples}
Before running an SPDK application, some hugepages must be allocated and
any NVMe and I/OAT devices must be unbound from the native kernel drivers.
@ -108,7 +103,7 @@ with no arguments to see the help output. If your system has its IOMMU
enabled you can run the examples as your regular user. If it doesn't, you'll
need to run as a privileged user (root).
A good example to start with is `build/examples/identify`, which prints
A good example to start with is `examples/nvme/identify`, which prints
out information about all of the NVMe devices on your system.
Larger, more fully functional applications are available in the `app`

View File

@ -2,6 +2,8 @@
<html>
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<!-- For Mobile Devices -->
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta http-equiv="Content-Type" content="text/xhtml; charset=utf-8">
<meta name="generator" content="Doxygen $doxygenversion">
@ -9,7 +11,6 @@
<script type="text/javascript" src="$relpath^jquery.js"></script>
<script type="text/javascript" src="$relpath^dynsections.js"></script>
<script type="text/javascript" src="$relpath^two.min.js"></script>
$treeview
$search

View File

@ -1,23 +0,0 @@
# IDXD Driver {#idxd}
## Public Interface {#idxd_interface}
- spdk/idxd.h
## Key Functions {#idxd_key_functions}
Function | Description
--------------------------------------- | -----------
spdk_idxd_probe() | @copybrief spdk_idxd_probe()
spdk_idxd_submit_copy() | @copybrief spdk_idxd_submit_copy()
spdk_idxd_submit_compare() | @copybrief spdk_idxd_submit_compare()
spdk_idxd_submit_crc32c() | @copybrief spdk_idxd_submit_crc32c()
spdk_idxd_submit_dualcast | @copybrief spdk_idxd_submit_dualcast()
spdk_idxd_submit_fill() | @copybrief spdk_idxd_submit_fill()
## Kernel vs User {#idxd_configs}
The low level library can be initialized either directly via `spdk_idxd_set_config`,
passing in a value of `true` indicates that the IDXD kernel driver is loaded and
that SPDK will use work queue(s) surfaced by the driver. Passing in `false` means
that the SPDK user space driver will be used to initialize the hardware.

View File

@ -1,827 +0,0 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
width="174.625mm"
height="82.020836mm"
version="1.1"
viewBox="0 0 174.625 82.020833"
id="svg136"
sodipodi:docname="iscsi.svg"
inkscape:version="0.92.3 (2405546, 2018-03-11)">
<sodipodi:namedview
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="1"
objecttolerance="10"
gridtolerance="10"
guidetolerance="10"
inkscape:pageopacity="0"
inkscape:pageshadow="2"
inkscape:window-width="1387"
inkscape:window-height="888"
id="namedview138"
showgrid="true"
inkscape:zoom="0.9096286"
inkscape:cx="242.15534"
inkscape:cy="182.31015"
inkscape:window-x="1974"
inkscape:window-y="112"
inkscape:window-maximized="0"
inkscape:current-layer="svg136"
fit-margin-top="0"
fit-margin-left="0"
fit-margin-right="0"
fit-margin-bottom="0">
<inkscape:grid
type="xygrid"
id="grid2224"
originx="38.364584"
originy="-17.197913" />
</sodipodi:namedview>
<title
id="title2">Thin Provisioning Write</title>
<defs
id="defs22">
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker5538"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path5536"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker5348"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mstart"
inkscape:collect="always">
<path
inkscape:connector-curvature="0"
transform="matrix(0.4,0,0,0.4,4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path5346" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker5152"
style="overflow:visible"
inkscape:isstock="true"
inkscape:collect="always">
<path
id="path5150"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker4974"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mstart"
inkscape:collect="always">
<path
inkscape:connector-curvature="0"
transform="matrix(0.4,0,0,0.4,4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path4972" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker4802"
style="overflow:visible"
inkscape:isstock="true"
inkscape:collect="always">
<path
id="path4800"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker4636"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mstart"
inkscape:collect="always">
<path
inkscape:connector-curvature="0"
transform="matrix(0.4,0,0,0.4,4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path4634" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker4476"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mstart">
<path
inkscape:connector-curvature="0"
transform="matrix(0.4,0,0,0.4,4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path4474" />
</marker>
<marker
inkscape:stockid="Arrow1Mend"
orient="auto"
refY="0"
refX="0"
id="marker2468"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2466"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#ff0000;fill-opacity:1;fill-rule:evenodd;stroke:#ff2a2a;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker2464"
style="overflow:visible"
inkscape:isstock="true"
inkscape:collect="always">
<path
id="path2462"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#ff0000;fill-opacity:1;fill-rule:evenodd;stroke:#ff2a2a;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="Arrow1Mstart"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2198"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#ff0000;fill-opacity:1;fill-rule:evenodd;stroke:#ff2a2a;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mend"
orient="auto"
refY="0"
refX="0"
id="Arrow1Mend"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2201"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#ff0000;fill-opacity:1;fill-rule:evenodd;stroke:#ff2a2a;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7-5"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2-9" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7-9"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2-6" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7-5-2"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2-9-3" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7-9-4"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2-6-9" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7-5-27"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2-9-4" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7-5-27-9"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2-9-4-4" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker2683-6"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2681-3"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mend"
orient="auto"
refY="0"
refX="0"
id="marker2679-9"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2677-8"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
inkscape:connector-curvature="0" />
</marker>
</defs>
<metadata
id="metadata24">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
<dc:title>Thin Provisioning Write</dc:title>
</cc:Work>
</rdf:RDF>
</metadata>
<rect
style="fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:0.26458332;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
id="rect7030"
width="174.625"
height="82.020836"
x="0"
y="1.4210855e-014"
ry="0" />
<rect
style="fill:none;fill-opacity:1;stroke:#999999;stroke-width:0.5;stroke-opacity:1"
id="rect132-6"
ry="1.3229001"
height="50.270832"
width="75.406242"
y="-91.281242"
x="2.6458344"
transform="rotate(90)" />
<rect
x="50.270416"
y="19.84375"
width="22.49"
height="6.6146002"
id="rect104"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<rect
style="fill:none;stroke:#999999;stroke-width:0.26458332;stroke-miterlimit:4;stroke-dasharray:none"
id="rect132"
ry="1.3229001"
height="30.427082"
width="33.072914"
y="-76.729164"
x="11.906253"
transform="rotate(90)" />
<text
x="56.69899"
y="24.392132"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90">LUN0</text>
<rect
style="fill:none;fill-opacity:1;stroke:#999999;stroke-width:0.5;stroke-opacity:1"
id="rect132-6-8"
ry="1.3229001"
height="33.072914"
width="64.822906"
y="-35.718758"
x="10.583331"
transform="rotate(90)" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2"
d="m 30.427087,23.812498 19.843748,3e-6"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26511249;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker4476);marker-end:url(#marker1826-2-4-7-1-7)" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path11761-9-7"
d="m 105.83333,33.072917 38.36458,2e-6"
style="fill:#ff0000;stroke:#ff2a2a;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker2464);marker-end:url(#marker2468)" />
<rect
x="50.270416"
y="27.781233"
width="22.49"
height="6.6146002"
id="rect104-6"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<rect
x="50.270836"
y="35.718746"
width="22.49"
height="6.6146002"
id="rect104-5"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="49.004951"
y="16.552654"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5">Target1</text>
<text
x="56.810654"
y="32.229481"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-59">LUN1</text>
<text
x="56.853249"
y="40.350986"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-0">LUN2</text>
<text
x="43.28257"
y="6.9284844"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-5">iSCSI Target server</text>
<rect
x="50.270416"
y="55.562496"
width="22.49"
height="6.6146002"
id="rect104-0"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<rect
style="fill:none;stroke:#999999;stroke-width:0.26458332;stroke-miterlimit:4;stroke-dasharray:none"
id="rect132-3"
ry="1.3229001"
height="30.427078"
width="25.135414"
y="-76.729164"
x="47.624996"
transform="rotate(90)" />
<text
x="56.69899"
y="60.110878"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-05">LUN0</text>
<rect
x="50.270416"
y="63.499977"
width="22.49"
height="6.6146002"
id="rect104-6-8"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="49.004944"
y="52.2714"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-2">Target2</text>
<text
x="56.810646"
y="67.948235"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-59-4">LUN1</text>
<rect
x="7.937088"
y="19.84375"
width="22.49"
height="6.6146002"
id="rect104-64"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="14.365662"
y="24.392132"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-56">bdev0</text>
<rect
x="7.937088"
y="27.781233"
width="22.49"
height="6.6146002"
id="rect104-6-9"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<rect
x="7.9375038"
y="35.718746"
width="22.49"
height="6.6146002"
id="rect104-5-4"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="14.477322"
y="32.229481"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-59-2">bdev1</text>
<text
x="14.51992"
y="40.350986"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-0-5">bdev2</text>
<rect
x="7.937088"
y="55.562496"
width="22.49"
height="6.6146002"
id="rect104-0-8"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="14.365662"
y="60.110878"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-05-7">bdev3</text>
<rect
x="7.937088"
y="63.499977"
width="22.49"
height="6.6146002"
id="rect104-6-8-2"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="14.477322"
y="67.948235"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-59-4-0">bdev4</text>
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2-6"
d="m 30.427087,31.749998 19.843748,3e-6"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker4636);marker-end:url(#marker1826-2-4-7-1-7-5)" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2-4"
d="m 30.427087,39.687498 19.843748,2e-6"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker4802);marker-end:url(#marker1826-2-4-7-1-7-9)" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2-6-5"
d="m 30.427087,59.531248 19.843748,2e-6"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker4974);marker-end:url(#marker1826-2-4-7-1-7-5-2)" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2-4-5"
d="m 30.427087,67.468748 19.843748,10e-7"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker5152);marker-end:url(#marker1826-2-4-7-1-7-9-4)" />
<rect
x="83.343323"
y="29.104166"
width="22.49"
height="6.6146002"
id="rect104-63"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="84.467346"
y="33.405464"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-1">portal grp 0</text>
<rect
x="83.343323"
y="54.239578"
width="22.49"
height="6.6146002"
id="rect104-63-1"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="84.673019"
y="58.540874"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-1-7">portal grp 1</text>
<text
x="4.7052402"
y="14.717848"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-5-8">SPDK bdevs</text>
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2-6-4"
d="m 76.729167,33.072917 h 6.614587"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker5348);marker-end:url(#marker1826-2-4-7-1-7-5-27)" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2-6-4-2"
d="m 76.729167,58.208333 h 6.614587"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker5538);marker-end:url(#marker1826-2-4-7-1-7-5-27-9)" />
<rect
x="144.19748"
y="29.104151"
width="22.49"
height="6.6146002"
id="rect104-63-9"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="147.16313"
y="33.713963"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-1-8">initiator 0</text>
<rect
x="144.19748"
y="54.239567"
width="22.49"
height="6.6146002"
id="rect104-63-1-5"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="147.23584"
y="58.922092"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-1-7-0">initiator 1</text>
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path11761-9-7-9"
d="m 105.83333,58.208333 38.36458,2e-6"
style="fill:#ff0000;stroke:#ff2a2a;stroke-width:0.26511249;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#Arrow1Mstart);marker-end:url(#Arrow1Mend)" />
<rect
style="fill:none;stroke:#999999;stroke-width:0.5"
id="rect132-6-1"
ry="1.3229001"
height="33.072926"
width="38.364586"
y="-171.97916"
x="2.6458333"
transform="rotate(90)" />
<rect
style="fill:none;stroke:#999999;stroke-width:0.5"
id="rect132-6-1-3"
ry="1.3229001"
height="33.072914"
width="35.71875"
y="-171.97916"
x="43.65625"
transform="rotate(90)" />
<text
x="141.38495"
y="7.1341634"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-5-7">iSCSI client 0</text>
<text
x="141.15009"
y="48.275509"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-5-7-5">iSCSI client 1</text>
<path
style="display:inline;fill:none;stroke:#999999;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="M 105.83333,87.312502 124.35416,1.3229172"
id="path2638"
inkscape:connector-curvature="0"
sodipodi:nodetypes="cc" />
<path
style="display:inline;fill:none;stroke:#999999;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="M 107.15625,88.635419 125.67708,2.6458333"
id="path2640"
inkscape:connector-curvature="0"
sodipodi:nodetypes="cc" />
<text
x="105.28584"
y="13.99068"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;display:inline;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-5-9">TCP Network</text>
<path
style="display:inline;fill:none;stroke:#000000;stroke-width:0.52916664;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker2683-6);marker-end:url(#marker2679-9)"
d="m 107.15625,17.197917 h 18.52083"
id="path2669"
inkscape:connector-curvature="0"
sodipodi:nodetypes="cc" />
<g
id="g4350-40"
transform="matrix(1,0,0,0.61904764,50.020836,28.004467)">
<ellipse
ry="2.6458333"
rx="6.614583"
cy="-11.045678"
cx="104.76043"
id="path4344-1"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
<path
sodipodi:nodetypes="ccccc"
inkscape:connector-curvature="0"
id="path4346-6"
d="m 98.145835,-11.045677 v 6.4110574 c 10e-6,3.968751 13.229165,3.968751 13.229165,0 v -6.4110574 c 0,4.2740384 -13.229155,3.9687504 -13.229165,0 z"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
<ellipse
ry="2.645833"
rx="6.6145835"
cy="-17.456738"
cx="104.76044"
id="path4344-1-7"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
<path
sodipodi:nodetypes="ccccc"
inkscape:connector-curvature="0"
id="path4346-6-3"
d="m 98.145841,-17.456734 v 6.411057 c 10e-6,3.968751 13.229159,3.968751 13.229159,0 v -6.411057 c 0,4.274038 -13.229149,3.96875 -13.229159,0 z"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
<ellipse
ry="2.645833"
rx="6.6145835"
cy="-23.867794"
cx="104.76044"
id="path4344-1-9"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
<path
sodipodi:nodetypes="ccccc"
inkscape:connector-curvature="0"
id="path4346-6-2"
d="m 98.145841,-23.867792 v 6.411058 c 10e-6,3.968751 13.229159,3.968751 13.229159,0 v -6.411058 c 0,4.274039 -13.229149,3.968751 -13.229159,0 z"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
<ellipse
ry="2.645833"
rx="6.6145835"
cy="72.298073"
cx="106.08334"
id="path4344-1-5"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
<ellipse
ry="2.645833"
rx="6.6145835"
cy="65.887009"
cx="106.08335"
id="path4344-1-7-3"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
<path
sodipodi:nodetypes="ccccc"
inkscape:connector-curvature="0"
id="path4346-6-3-4"
d="m 99.468754,65.887013 v 6.411057 c 10e-6,3.968751 13.229156,3.968751 13.229156,0 v -6.411057 c 0,4.274038 -13.229146,3.96875 -13.229156,0 z"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
<ellipse
ry="2.645833"
rx="6.6145835"
cy="59.475952"
cx="106.08335"
id="path4344-1-9-1"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" />
<path
sodipodi:nodetypes="ccccc"
inkscape:connector-curvature="0"
id="path4346-6-2-9"
d="m 99.468754,59.475955 v 6.411058 c 10e-6,3.968751 13.229156,3.968751 13.229156,0 v -6.411058 c 0,4.274039 -13.229146,3.968751 -13.229156,0 z"
style="fill:#afdde9;fill-opacity:1;stroke:#000000;stroke-width:0.52916664;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
</g>
</svg>

Before

Width:  |  Height:  |  Size: 33 KiB

View File

@ -1,540 +0,0 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
width="169.33331mm"
height="53.006062mm"
version="1.1"
viewBox="0 0 169.33331 53.00606"
id="svg136"
sodipodi:docname="iscsi_example.svg"
inkscape:version="0.92.3 (2405546, 2018-03-11)">
<sodipodi:namedview
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="1"
objecttolerance="10"
gridtolerance="10"
guidetolerance="10"
inkscape:pageopacity="0"
inkscape:pageshadow="2"
inkscape:window-width="1742"
inkscape:window-height="910"
id="namedview138"
showgrid="true"
inkscape:zoom="1.2864091"
inkscape:cx="231.4415"
inkscape:cy="205.83148"
inkscape:window-x="1676"
inkscape:window-y="113"
inkscape:window-maximized="0"
inkscape:current-layer="layer1"
fit-margin-top="0"
fit-margin-left="0"
fit-margin-right="0"
fit-margin-bottom="0">
<inkscape:grid
type="xygrid"
id="grid2224"
originx="33.072915"
originy="-46.257384" />
</sodipodi:namedview>
<title
id="title2">Thin Provisioning Write</title>
<defs
id="defs22">
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker2683-6"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2681-3"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mend"
orient="auto"
refY="0"
refX="0"
id="marker2679-9"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2677-8"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker2464-2-6-1"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2462-7-8-2"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#999999;fill-opacity:1;fill-rule:evenodd;stroke:#999999;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mend"
orient="auto"
refY="0"
refX="0"
id="marker2468-8-9-5"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2466-1-3-2"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#999999;fill-opacity:1;fill-rule:evenodd;stroke:#999999;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker2464-2-0"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2462-7-6"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#999999;fill-opacity:1;fill-rule:evenodd;stroke:#999999;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mend"
orient="auto"
refY="0"
refX="0"
id="marker2468-8-8"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2466-1-5"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#999999;fill-opacity:1;fill-rule:evenodd;stroke:#999999;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker2659-1"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2657-7"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7-5-27-1"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2-9-4-0" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker2667-4"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2665-0"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7-5-9"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2-9-9" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker2464-3"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2462-5"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#ff0000;fill-opacity:1;fill-rule:evenodd;stroke:#ff2a2a;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mend"
orient="auto"
refY="0"
refX="0"
id="marker2468-5"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2466-4"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#ff0000;fill-opacity:1;fill-rule:evenodd;stroke:#ff2a2a;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mstart"
orient="auto"
refY="0"
refX="0"
id="marker2663-8"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path2661-0"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,4,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:isstock="true"
style="overflow:visible"
id="marker1826-2-4-7-1-7-97"
refX="0"
refY="0"
orient="auto"
inkscape:stockid="Arrow1Mend">
<path
inkscape:connector-curvature="0"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
style="fill:#0000ff;fill-opacity:1;fill-rule:evenodd;stroke:#0000ff;stroke-width:1.00000003pt;stroke-opacity:1"
d="M 0,0 5,-5 -12.5,0 5,5 Z"
id="path1824-9-4-2-5-2-93" />
</marker>
</defs>
<metadata
id="metadata24">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
<dc:title>Thin Provisioning Write</dc:title>
</cc:Work>
</rdf:RDF>
</metadata>
<g
inkscape:groupmode="layer"
id="layer1"
inkscape:label="Layer 1"
style="display:inline"
transform="translate(-20.09375,9.9883163e-4)">
<rect
style="fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:0.52916664;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
id="rect2890"
width="169.33331"
height="52.916664"
x="20.09375"
y="0.043701001" />
<rect
x="70.364159"
y="19.887449"
width="22.49"
height="6.6146002"
id="rect104"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<rect
style="fill:none;stroke:#999999;stroke-width:0.26458332;stroke-miterlimit:4;stroke-dasharray:none"
id="rect132"
ry="1.3229001"
height="30.427082"
width="33.072914"
y="-96.822914"
x="11.949952"
transform="rotate(90)" />
<text
x="76.792732"
y="24.435831"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90">LUN0</text>
<rect
x="70.364159"
y="27.824934"
width="22.49"
height="6.6146002"
id="rect104-6"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="69.098686"
y="16.596354"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5">Target: disk1</text>
<text
x="76.904396"
y="32.273182"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-59">LUN1</text>
<text
x="63.376305"
y="6.9721842"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-5">iSCSI Target server</text>
<rect
x="28.030828"
y="19.887449"
width="22.49"
height="6.6146002"
id="rect104-64"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="33.225346"
y="24.641508"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-56">Malloc0</text>
<rect
x="28.03083"
y="27.824945"
width="22.49"
height="6.6146002"
id="rect104-6-9"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="33.337006"
y="32.273182"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-59-2">Malloc1</text>
<rect
style="fill:none;stroke:#999999;stroke-width:0.5"
id="rect132-6"
ry="1.3229001"
height="50.270836"
width="47.624996"
y="-111.375"
x="2.6895342"
transform="rotate(90)" />
<rect
style="fill:none;stroke:#999999;stroke-width:0.5"
id="rect132-6-8"
ry="1.3229001"
height="33.072918"
width="27.781242"
y="-55.812492"
x="11.949948"
transform="rotate(90)" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2-6"
d="m 50.520827,31.793698 19.843748,3e-6"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker2667-4);marker-end:url(#marker1826-2-4-7-1-7-5-9)" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2"
d="m 50.520827,23.856198 19.843748,2e-6"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker2663-8);marker-end:url(#marker1826-2-4-7-1-7-97)" />
<rect
x="103.4371"
y="37.085365"
width="18.521248"
height="6.6145835"
id="rect104-63"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="105.57915"
y="41.386662"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-1">portal 1</text>
<text
x="25.394737"
y="15.738133"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-5-8">SPDK bdevs</text>
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path1192-8-7-7-4-2-6-4"
d="M 96.822918,41.054113 H 103.4375"
style="fill:#0000ff;stroke:#0000ff;stroke-width:0.26511249;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker2659-1);marker-end:url(#marker1826-2-4-7-1-7-5-27-1)" />
<rect
x="158.99957"
y="37.08535"
width="22.49"
height="6.6146002"
id="rect104-63-9"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="161.96524"
y="41.69516"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-1-8">initiator 2</text>
<rect
style="fill:none;stroke:#999999;stroke-width:0.5"
id="rect132-6-1"
ry="1.3229001"
height="33.072933"
width="38.364578"
y="-186.78125"
x="11.949951"
transform="rotate(90)" />
<text
x="156.03279"
y="15.81625"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-5-7">iSCSI client 0</text>
<text
x="101.36903"
y="47.613781"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-7">10.0.0.1:3260</text>
<rect
x="161.64542"
y="19.887432"
width="19.844177"
height="6.6146011"
id="rect104-9"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="168.07399"
y="24.435814"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-7">sdd</text>
<rect
x="161.64542"
y="27.824913"
width="19.844177"
height="6.6146178"
id="rect104-6-8"
style="fill:#fff6d5;fill-opacity:1;stroke:#000000;stroke-width:0.26458001" />
<text
x="168.18565"
y="32.273163"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-59-1">sde</text>
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path11761-9-7-0"
d="m 92.854164,23.8562 68.791666,-1e-6"
style="fill:#999999;fill-opacity:1;stroke:#999999;stroke-width:0.26511249;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:1.06044998, 1.06044998;stroke-dashoffset:0;stroke-opacity:1;marker-start:url(#marker2464-2-0);marker-end:url(#marker2468-8-8)" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path11761-9-7-0-0"
d="m 92.854164,31.7937 68.791666,-2e-6"
style="fill:#999999;fill-opacity:1;stroke:#999999;stroke-width:0.26511249;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:1.06044998, 1.06044998;stroke-dashoffset:0;stroke-opacity:1;marker-start:url(#marker2464-2-6-1);marker-end:url(#marker2468-8-9-5)" />
<text
x="160.41017"
y="47.490952"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-7-2">10.0.0.2/32</text>
<path
style="fill:none;stroke:#999999;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="M 125.92708,51.63745 144.44792,0.04369787"
id="path2638"
inkscape:connector-curvature="0"
sodipodi:nodetypes="cc" />
<path
style="fill:none;stroke:#999999;stroke-width:0.26458332px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="M 127.25,52.960366 145.77084,1.3666139"
id="path2640"
inkscape:connector-curvature="0"
sodipodi:nodetypes="cc" />
<path
sodipodi:nodetypes="cc"
inkscape:connector-curvature="0"
id="path11761-9-7"
d="M 121.95833,41.054117 159,41.054115"
style="fill:#ff0000;stroke:#ff2a2a;stroke-width:0.26499999;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker2464-3);marker-end:url(#marker2468-5)" />
<text
x="122.73377"
y="8.7427139"
font-size="3.5278px"
style="font-size:3.52780008px;line-height:1.25;font-family:sans-serif;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal;letter-spacing:0px;word-spacing:0px;fill:#000000;stroke-width:0.26458001"
xml:space="preserve"
id="text90-5-5-9">TCP Network</text>
<path
style="fill:none;stroke:#000000;stroke-width:0.52916664;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#marker2683-6);marker-end:url(#marker2679-9)"
d="M 124.60417,11.949951 H 143.125"
id="path2669"
inkscape:connector-curvature="0" />
</g>
</svg>

Before

Width:  |  Height:  |  Size: 21 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 12 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 13 KiB

View File

@ -1,673 +0,0 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
width="181.24mm"
height="79.375mm"
version="1.1"
viewBox="0 0 181.24 79.375"
id="svg172"
sodipodi:docname="lvol_esnap_clone.svg"
inkscape:version="1.2.2 (b0a8486541, 2022-12-01)"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<sodipodi:namedview
id="namedview174"
pagecolor="#ffffff"
bordercolor="#000000"
borderopacity="0.25"
inkscape:showpageshadow="2"
inkscape:pageopacity="0.0"
inkscape:pagecheckerboard="0"
inkscape:deskcolor="#d1d1d1"
inkscape:document-units="mm"
showgrid="false"
inkscape:zoom="1.7926966"
inkscape:cx="338.59607"
inkscape:cy="148.93764"
inkscape:window-width="1351"
inkscape:window-height="930"
inkscape:window-x="762"
inkscape:window-y="134"
inkscape:window-maximized="0"
inkscape:current-layer="g170" />
<title
id="title2">Thin Provisioning</title>
<defs
id="defs28">
<marker
id="marker2036"
overflow="visible"
orient="auto">
<path
transform="matrix(-.4 0 0 -.4 -4 0)"
d="m0 0 5-5-17.5 5 17.5 5z"
fill-rule="evenodd"
stroke="#000"
stroke-width="1pt"
id="path4" />
</marker>
<marker
id="marker1960"
overflow="visible"
orient="auto">
<path
transform="matrix(-.4 0 0 -.4 -4 0)"
d="m0 0 5-5-17.5 5 17.5 5z"
fill-rule="evenodd"
stroke="#000"
stroke-width="1pt"
id="path7" />
</marker>
<marker
id="marker1890"
overflow="visible"
orient="auto">
<path
transform="matrix(-.4 0 0 -.4 -4 0)"
d="m0 0 5-5-17.5 5 17.5 5z"
fill-rule="evenodd"
stroke="#000"
stroke-width="1pt"
id="path10" />
</marker>
<marker
id="marker1826"
overflow="visible"
orient="auto">
<path
transform="matrix(-.4 0 0 -.4 -4 0)"
d="m0 0 5-5-17.5 5 17.5 5z"
fill-rule="evenodd"
stroke="#000"
stroke-width="1pt"
id="path13" />
</marker>
<marker
id="marker1816"
overflow="visible"
orient="auto">
<path
transform="matrix(-.4 0 0 -.4 -4 0)"
d="m0 0 5-5-17.5 5 17.5 5z"
fill-rule="evenodd"
stroke="#000"
stroke-width="1pt"
id="path16" />
</marker>
<marker
id="Arrow1Mend"
overflow="visible"
orient="auto">
<path
transform="matrix(-.4 0 0 -.4 -4 0)"
d="m0 0 5-5-17.5 5 17.5 5z"
fill-rule="evenodd"
stroke="#000"
stroke-width="1pt"
id="path19" />
</marker>
<marker
id="marker11771-4-9"
overflow="visible"
orient="auto">
<path
transform="matrix(-.4 0 0 -.4 -4 0)"
d="m0 0 5-5-17.5 5 17.5 5z"
fill="#f00"
fill-rule="evenodd"
stroke="#ff2a2a"
stroke-width="1pt"
id="path22" />
</marker>
<marker
id="marker1826-2-4-7-1-7"
overflow="visible"
orient="auto">
<path
transform="matrix(-.4 0 0 -.4 -4 0)"
d="m0 0 5-5-17.5 5 17.5 5z"
fill="#00f"
fill-rule="evenodd"
stroke="#00f"
stroke-width="1pt"
id="path25" />
</marker>
</defs>
<metadata
id="metadata30">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
<dc:title>Thin Provisioning</dc:title>
</cc:Work>
</rdf:RDF>
</metadata>
<g
transform="translate(2.6458 2.3956)"
id="g34">
<rect
x="-2.6458"
y="-2.3956"
width="181.24"
height="79.375"
fill="#fffffe"
stroke-width=".26458"
id="rect32" />
</g>
<g
transform="translate(-3.9688 -4.6356)"
id="g170">
<g
stroke="#000"
id="g52">
<g
stroke-width=".26458"
id="g48">
<rect
x="44.979"
y="32.417"
width="22.49"
height="6.6146"
fill="none"
stroke-dasharray="0.52916663, 0.52916663"
id="rect36" />
<rect
x="67.469"
y="32.417"
width="22.49"
height="6.6146"
fill="#d7d7f4"
id="rect38" />
<rect
x="89.958"
y="32.417"
width="22.49"
height="6.6146"
fill="#d7d7f4"
id="rect40" />
<rect
x="112.45"
y="32.417"
width="22.49"
height="6.6146"
fill="none"
stroke-dasharray="0.52916663, 0.52916663"
id="rect42" />
<rect
x="134.94"
y="32.417"
width="22.49"
height="6.6146"
fill="none"
stroke-dasharray="0.52916663, 0.52916663"
id="rect44" />
<rect
x="157.43"
y="32.417"
width="22.49"
height="6.6146"
fill="#d7d7f4"
id="rect46" />
</g>
<rect
x="44.979"
y="46.969"
width="22.49"
height="6.6146"
fill="#f4d7d7"
stroke-dasharray="0.52999997, 0.26499999"
stroke-width=".265"
id="rect50" />
</g>
<text
x="56.412949"
y="51.598957"
fill="#000000"
font-family="sans-serif"
font-size="10.583px"
letter-spacing="0px"
stroke-width="0.26458"
word-spacing="0px"
style="line-height:1.25"
xml:space="preserve"
id="text56"><tspan
x="56.412949"
y="51.598957"
font-family="sans-serif"
font-size="3.5278px"
stroke-width="0.26458"
text-align="center"
text-anchor="middle"
style="font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
id="tspan54">26f9a7...</tspan></text>
<rect
x="67.469"
y="46.969"
width="22.49"
height="6.6146"
fill="#f4d7d7"
stroke="#000"
stroke-dasharray="0.52999997, 0.26499999"
stroke-width=".265"
id="rect58" />
<text
x="78.902527"
y="51.598961"
fill="#000000"
font-family="sans-serif"
font-size="10.583px"
letter-spacing="0px"
stroke-width="0.26458"
word-spacing="0px"
style="line-height:1.25"
xml:space="preserve"
id="text62"><tspan
x="78.902527"
y="51.598961"
font-family="sans-serif"
font-size="3.5278px"
stroke-width="0.26458"
text-align="center"
text-anchor="middle"
style="font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
id="tspan60">b44ab3...</tspan></text>
<rect
x="89.958"
y="46.969"
width="22.49"
height="6.6146"
fill="#f4d7d7"
stroke="#000"
stroke-dasharray="0.52999997, 0.26499999"
stroke-width=".265"
id="rect64" />
<text
x="101.39211"
y="51.598961"
fill="#000000"
font-family="sans-serif"
font-size="10.583px"
letter-spacing="0px"
stroke-width="0.26458"
word-spacing="0px"
style="line-height:1.25"
xml:space="preserve"
id="text68"><tspan
x="101.39211"
y="51.598961"
font-family="sans-serif"
font-size="3.5278px"
stroke-width="0.26458"
text-align="center"
text-anchor="middle"
style="font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
id="tspan66">ee5593...</tspan></text>
<rect
x="112.45"
y="46.969"
width="22.49"
height="6.6146"
fill="#f4d7d7"
stroke="#000"
stroke-dasharray="0.52999997, 0.26499999"
stroke-width=".265"
id="rect70" />
<text
x="123.88169"
y="51.598961"
fill="#000000"
font-family="sans-serif"
font-size="10.583px"
letter-spacing="0px"
stroke-width="0.26458"
word-spacing="0px"
style="line-height:1.25"
xml:space="preserve"
id="text74"><tspan
x="123.88169"
y="51.598961"
font-family="sans-serif"
font-size="3.5278px"
stroke-width="0.26458"
text-align="center"
text-anchor="middle"
style="font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
id="tspan72">7a3bfe...</tspan></text>
<rect
x="134.94"
y="46.969"
width="22.49"
height="6.6146"
fill="#f4d7d7"
stroke="#000"
stroke-dasharray="0.52999997, 0.26499999"
stroke-width=".265"
id="rect76" />
<text
x="146.37128"
y="51.598957"
fill="#000000"
font-family="sans-serif"
font-size="10.583px"
letter-spacing="0px"
stroke-width="0.26458"
word-spacing="0px"
style="line-height:1.25"
xml:space="preserve"
id="text80"><tspan
x="146.37128"
y="51.598957"
font-family="sans-serif"
font-size="3.5278px"
stroke-width="0.26458"
text-align="center"
text-anchor="middle"
style="font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
id="tspan78">8f4e15...</tspan></text>
<rect
x="157.43"
y="46.969"
width="22.49"
height="6.6146"
fill="#f4d7d7"
stroke="#000"
stroke-dasharray="0.52999997, 0.26499999"
stroke-width=".265"
id="rect82" />
<g
font-family="sans-serif"
letter-spacing="0px"
stroke-width=".26458"
word-spacing="0px"
id="g98">
<text
x="168.86086"
y="51.598961"
font-size="10.583px"
style="line-height:1.25"
xml:space="preserve"
id="text86"><tspan
x="168.86086"
y="51.598961"
font-family="sans-serif"
font-size="3.5278px"
stroke-width="0.26458"
text-align="center"
text-anchor="middle"
style="font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
id="tspan84">40c285...</tspan></text>
<text
x="6.6430736"
y="51.680019"
font-size="3.5278px"
style="line-height:1.25;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
xml:space="preserve"
id="text90"><tspan
x="6.6430736"
y="51.680019"
stroke-width="0.26458"
id="tspan88">read-only bdev</tspan></text>
<text
x="6.6296382"
y="12.539818"
font-size="3.5278px"
style="line-height:1.25;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
xml:space="preserve"
id="text96"><tspan
sodipodi:role="line"
id="tspan436"
x="6.6296382"
y="12.539818">esnap clone</tspan><tspan
sodipodi:role="line"
x="6.6296382"
y="16.949568"
id="tspan440">Volume</tspan><tspan
sodipodi:role="line"
id="tspan438"
x="6.6296382"
y="21.359318" /></text>
</g>
<g
stroke="#000"
id="g118">
<path
d="m6.6146 24.479 173.3 1e-6"
fill="none"
stroke-dasharray="1.59, 1.59"
stroke-width=".265"
id="path100" />
<g
fill="#f4d7d7"
stroke-dasharray="0.52916663, 0.26458332"
stroke-width=".26458"
id="g108">
<rect
x="44.979"
y="9.9271"
width="22.49"
height="6.6146"
id="rect102" />
<rect
x="112.45"
y="9.9271"
width="22.49"
height="6.6146"
id="rect104" />
<rect
x="134.94"
y="9.9271"
width="22.49"
height="6.6146"
id="rect106" />
</g>
<g
fill="#d7d7f4"
stroke-width=".26458"
id="g116">
<rect
x="67.469"
y="9.9271"
width="22.49"
height="6.6146"
id="rect110" />
<rect
x="89.958"
y="9.9271"
width="22.49"
height="6.6146"
id="rect112" />
<rect
x="157.43"
y="9.9271"
width="22.49"
height="6.6146"
id="rect114" />
</g>
</g>
<text
x="6.614583"
y="37.708332"
fill="#000000"
font-family="sans-serif"
font-size="3.5278px"
letter-spacing="0px"
stroke-width=".26458"
word-spacing="0px"
style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25"
xml:space="preserve"
id="text122"><tspan
x="6.614583"
y="37.708332"
stroke-width=".26458"
id="tspan120">active clusters</tspan></text>
<rect
x="37.042"
y="7.2812"
width="145.52"
height="11.906"
ry="1.3229"
fill="none"
stroke="#999"
stroke-width=".5"
id="rect124" />
<rect
x="37.042"
y="29.771"
width="145.52"
height="26.458"
ry="1.3229"
fill="none"
stroke="#999"
stroke-width=".5"
id="rect126" />
<g
fill="#00f"
stroke="#00f"
id="g144">
<g
stroke-width=".26458"
id="g140">
<path
d="m78.052 16.542v15.875"
marker-end="url(#marker1960)"
id="path128" />
<path
d="m55.562 16.542v30.427"
marker-end="url(#marker2036)"
id="path130" />
<path
d="m100.54 16.542v15.875"
marker-end="url(#marker1890)"
id="path132" />
<path
d="m169.33 16.542v15.875"
marker-end="url(#Arrow1Mend)"
id="path134" />
<path
d="m124.35 16.542v30.427"
marker-end="url(#marker1826)"
id="path136" />
<path
d="m146.84 16.542v30.427"
marker-end="url(#marker1816)"
id="path138" />
</g>
<path
d="m132.29 61.521 10.583 1e-5"
marker-end="url(#marker1826-2-4-7-1-7)"
stroke-width=".265"
id="path142" />
</g>
<path
d="m132.29 66.813h10.583"
fill="#f00"
marker-end="url(#marker11771-4-9)"
stroke="#ff2a2a"
stroke-width=".265"
id="path146" />
<g
stroke-width=".26458"
id="g162">
<text
x="145.52083"
y="62.843975"
fill="#000000"
font-family="sans-serif"
font-size="3.5278px"
letter-spacing="0px"
word-spacing="0px"
style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25"
xml:space="preserve"
id="text150"><tspan
x="145.52083"
y="62.843975"
font-family="sans-serif"
font-size="2.8222px"
stroke-width=".26458"
style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal"
id="tspan148">read</tspan></text>
<text
x="145.52083"
y="68.135651"
fill="#000000"
font-family="sans-serif"
font-size="3.5278px"
letter-spacing="0px"
word-spacing="0px"
style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25"
xml:space="preserve"
id="text154"><tspan
x="145.52083"
y="68.135651"
font-family="sans-serif"
font-size="2.8222px"
stroke-width=".26458"
style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal"
id="tspan152">allocate and copy cluster</tspan></text>
<rect
x="132.29"
y="70.781"
width="10.583"
height="2.6458"
fill="none"
stroke="#000"
stroke-dasharray="0.52916664, 0.52916664"
id="rect156" />
<text
x="145.52083"
y="73.427307"
fill="#000000"
font-family="sans-serif"
font-size="3.5278px"
letter-spacing="0px"
word-spacing="0px"
style="line-height:1.25;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
xml:space="preserve"
id="text160"><tspan
x="145.52083"
y="73.427307"
font-family="sans-serif"
font-size="2.8222px"
stroke-width="0.26458"
style="font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-feature-settings:normal"
id="tspan158">external snapshot cluster</tspan></text>
</g>
<rect
x="132.29"
y="76.073"
width="10.583"
height="2.6458"
fill="none"
stroke="#000"
stroke-width=".265"
id="rect164" />
<text
x="145.52083"
y="78.718971"
fill="#000000"
font-family="sans-serif"
font-size="3.5278px"
letter-spacing="0px"
stroke-width=".26458"
word-spacing="0px"
style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25"
xml:space="preserve"
id="text168"><tspan
x="145.52083"
y="78.718971"
font-family="sans-serif"
font-size="2.8222px"
stroke-width=".26458"
style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal"
id="tspan166">allocated cluster</tspan></text>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 19 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 13 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 12 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 11 KiB

View File

@ -1,124 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<svg width="193.94mm" height="139.71mm" version="1.1" viewBox="0 0 193.94 139.71" xmlns="http://www.w3.org/2000/svg" xmlns:cc="http://creativecommons.org/ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<title>NVMe CUSE</title>
<defs>
<marker id="marker9353" overflow="visible" orient="auto">
<path transform="matrix(-.8 0 0 -.8 -10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="marker7156" overflow="visible" orient="auto">
<path transform="matrix(.8 0 0 .8 10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="marker4572" overflow="visible" orient="auto">
<path transform="matrix(-.8 0 0 -.8 -10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="marker4436" overflow="visible" orient="auto">
<path transform="matrix(-.8 0 0 -.8 -10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="marker4324" overflow="visible" orient="auto">
<path transform="matrix(.8 0 0 .8 10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="marker2300" overflow="visible" orient="auto">
<path transform="matrix(-.8 0 0 -.8 -10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="marker2110" overflow="visible" orient="auto">
<path transform="matrix(-.8 0 0 -.8 -10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="marker2028" overflow="visible" orient="auto">
<path transform="matrix(-.8 0 0 -.8 -10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="marker1219" overflow="visible" orient="auto">
<path transform="matrix(.8 0 0 .8 10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="Arrow1Lstart" overflow="visible" orient="auto">
<path transform="matrix(.8 0 0 .8 10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="marker1127" overflow="visible" orient="auto">
<path transform="matrix(-.8 0 0 -.8 -10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
<marker id="Arrow1Lend" overflow="visible" orient="auto">
<path transform="matrix(-.8 0 0 -.8 -10 0)" d="m0 0 5-5-17.5 5 17.5 5z" fill-rule="evenodd" stroke="#000" stroke-width="1pt"/>
</marker>
</defs>
<metadata>
<rdf:RDF>
<cc:Work rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/>
<dc:title>NVMe CUSE</dc:title>
</cc:Work>
</rdf:RDF>
</metadata>
<g transform="translate(-2.1066 -22.189)">
<rect x="11.906" y="134.85" width="72.004" height="20.6" ry="3.7798" fill="none" stroke="#000" stroke-width=".5"/>
<text x="14.363094" y="149.02231" fill="#000000" font-family="sans-serif" font-size="10.583px" letter-spacing="0px" stroke-width=".26458" word-spacing="0px" style="line-height:1.25" xml:space="preserve"><tspan x="14.363094" y="149.02231" font-family="sans-serif" font-size="3.5278px" stroke-width=".26458" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal">/dev/spdk/nvme0</tspan></text>
<text x="47.625" y="149.02231" fill="#000000" font-family="sans-serif" font-size="10.583px" letter-spacing="0px" stroke-width=".26458" word-spacing="0px" style="line-height:1.25" xml:space="preserve"><tspan x="47.625" y="149.02231" font-family="sans-serif" font-size="3.5278px" stroke-width=".26458" writing-mode="lr" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal">/dev/spdk/nvme0n1</tspan></text>
<g stroke="#000">
<rect x="12.095" y="35.818" width="71.249" height="88.446" ry="4.3467" fill="none" stroke-width=".5"/>
<rect x="133.43" y="33.929" width="62.366" height="76.351" ry="4.7247" fill="none" stroke-width=".5"/>
<g fill="#fff" stroke-width=".26458">
<rect x="14.174" y="91.57" width="64.256" height="24.568"/>
<g fill-opacity=".9798">
<rect x="46.302" y="100.64" width="26.62" height="11.061"/>
</g>
</g>
<g transform="translate(-.53932 -.16291)">
<path d="m63.878 111.98v32.884" fill="none" marker-end="url(#marker1127)" marker-start="url(#Arrow1Lstart)" stroke-width=".26458px"/>
<g stroke-width=".265">
<path d="m34.585 115.57v28.726" fill="none" marker-end="url(#Arrow1Lend)" marker-start="url(#marker1219)"/>
<rect x="136.26" y="39.031" width="54.996" height="58.586" fill="#fff"/>
<rect x="153.84" y="52.26" width="34.018" height="11.906" ry="5.8544" fill="none"/>
</g>
<path d="m112.45 24.479v137.58" fill="none" stroke-dasharray="1.5874999, 1.5874999" stroke-width=".26458"/>
</g>
<g fill="#fff" stroke-width=".265">
<rect x="89.58" y="54.339" width="38.365" height="8.8824"/>
</g>
</g>
<g font-family="sans-serif" font-size="4.2333px" letter-spacing="0px" stroke-width=".26458" word-spacing="0px">
<text x="93.54911" y="59.800339" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="93.54911" y="59.800339" stroke-width=".26458">io_msg queue</tspan></text>
<text x="11.906249" y="27.31399" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="11.906249" y="27.31399" stroke-width=".26458">CUSE threads</tspan></text>
<text x="165.36458" y="27.502975" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="165.36458" y="27.502975" stroke-width=".26458">SPDK threads</tspan></text>
</g>
<g stroke="#000">
<rect x="17.009" y="47.914" width="29.482" height="13.04" ry="6.5201" fill="#fff" stroke-width=".265"/>
<rect x="49.921" y="68.161" width="28.915" height="13.04" ry="6.5201" fill="#fff" stroke-width=".265"/>
<g fill="none">
<path d="m32.506 61.143v30.427" marker-start="url(#marker7156)" stroke-width=".26458px"/>
<path d="m63.689 81.176 0.18899 19.277" marker-start="url(#marker4324)" stroke-width=".265"/>
<g stroke-width=".26458px">
<path d="m46.113 54.339h43.467" marker-end="url(#marker2028)"/>
<path d="m64.284 67.972c0.02768-6.3997-1.3229-5.2917 25.135-5.2917" marker-end="url(#marker2110)"/>
<path d="m127.78 56.066h25.135" marker-end="url(#marker2300)"/>
</g>
</g>
</g>
<g stroke-width=".26458">
<g transform="translate(-.25341)" font-family="sans-serif" font-size="4.2333px" letter-spacing="0px" word-spacing="0px">
<text x="138.90625" y="44.889877" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="138.90625" y="44.889877" stroke-width=".26458">NVMe</tspan></text>
<text x="16.063986" y="97.050598" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="16.063986" y="97.050598" stroke-width=".26458">CUSE ctrlr</tspan></text>
<text x="48.380947" y="106.12202" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="48.380947" y="106.12202" stroke-width=".26458">CUSE ns</tspan></text>
<text x="51.420551" y="75.799461" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="51.420551" y="75.799461" stroke-width=".26458">ioctl pthread</tspan></text>
<text x="18.906757" y="55.833015" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="18.906757" y="55.833015" stroke-width=".26458">ioctl pthread</tspan></text>
</g>
<path d="m160.86 85.17c0.38097 13.154-7.1538 11.542-82.052 10.936" fill="none" marker-end="url(#marker4572)" stroke="#000" stroke-dasharray="0.79374995, 0.79374995"/>
<path d="m179.38 85.17c0.37797 22.25-6.5765 20.83-106.08 20.641" fill="none" marker-end="url(#marker4436)" stroke="#000" stroke-dasharray="0.79374995, 0.79374995"/>
</g>
<g font-family="sans-serif" font-size="4.2333px" letter-spacing="0px" stroke-width=".26458" word-spacing="0px">
<text x="13.229166" y="139.7619" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="13.229166" y="139.7619" stroke-width=".26458">Kernel</tspan></text>
<text x="14.552083" y="41.488094" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="14.552083" y="41.488094" stroke-width=".26458">CUSE</tspan></text>
<text x="161.73709" y="59.415913" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="161.73709" y="59.415913" stroke-width=".26458">io poller</tspan></text>
</g>
<g fill="none" stroke="#000">
<path d="m111.91 127.5h-109.8" stroke-dasharray="1.58749992, 1.58749992" stroke-width=".26458"/>
<rect x="153.3" y="71.941" width="34.018" height="13.229" ry="6.6146" stroke-width=".265"/>
<path d="m170.12 64.003v7.9375" marker-end="url(#marker9353)" stroke-width=".265"/>
</g>
<g font-family="sans-serif" font-size="4.2333px" letter-spacing="0px" stroke-width=".26458" word-spacing="0px">
<text x="159.72221" y="79.76664" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="159.72221" y="79.76664" stroke-width=".26458">io execute</tspan></text>
<text x="172.34003" y="68.59539" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="172.34003" y="68.59539" font-family="sans-serif" font-size="2.8222px" stroke-width=".26458" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal">fn(arg)</tspan></text>
<text x="53.046707" y="52.192699" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="53.046707" y="52.192699" font-family="sans-serif" font-size="2.8222px" stroke-width=".26458" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal">nvme_io_msg send()</tspan></text>
<text x="53.102341" y="60.250244" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="53.102341" y="60.250244" font-family="sans-serif" font-size="2.8222px" stroke-width=".26458" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal">nvme_io_msg send()</tspan></text>
<text x="120.79763" y="50.70586" font-size="12px" stroke-width="1" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal;line-height:1.25" xml:space="preserve"><tspan x="120.79763" y="50.70586" font-family="sans-serif" font-size="2.8222px" stroke-width=".26458" style="font-feature-settings:normal;font-variant-caps:normal;font-variant-ligatures:normal;font-variant-numeric:normal">spdk_nvme_io_msg process()</tspan></text>
</g>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 12 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 17 KiB

View File

@ -1,41 +0,0 @@
<?xml version="1.0"?>
<svg width="680" height="420" xmlns="http://www.w3.org/2000/svg" xmlns:svg="http://www.w3.org/2000/svg">
<!-- Created with SVG-edit - https://github.com/SVG-Edit/svgedit-->
<g class="layer">
<title>Layer 1</title>
<rect fill="#ffffff" height="369" id="svg_1" stroke="#000000" width="635.87" x="22.74" y="26.61"/>
<rect fill="#aaffff" height="0" id="svg_2" stroke="#000000" width="0" x="191.24" y="101.36">Application A</rect>
<rect fill="#aaffff" height="88.96" id="svg_3" stroke="#000000" width="171" x="400.9" y="67.61">ublk Server</rect>
<line fill="none" id="svg_4" stroke="#000000" stroke-dasharray="5,5" stroke-width="2" x1="23.11" x2="660.11" y1="199.03" y2="198.03">ublk Server</line>
<text fill="#000000" font-family="Serif" font-size="21" font-weight="bold" id="svg_5" stroke="#000000" stroke-width="0" text-anchor="middle" transform="matrix(1 0 0 1 0 0)" x="488.28" xml:space="preserve" y="122.24">ublk Server</text>
<rect fill="#aaffff" height="62" id="svg_6" stroke="#000000" transform="matrix(1 0 0 1 0 0)" width="161" x="384.38" y="311.2"/>
<text fill="#000000" font-family="Serif" font-size="21" font-weight="bold" id="svg_7" stroke="#000000" stroke-width="0" text-anchor="middle" transform="matrix(1 0 0 1 0 0)" x="468.93" xml:space="preserve" y="349.7">ublk Driver</text>
<rect fill="#ffff00" height="32" id="svg_8" stroke="#000000" width="98" x="144.36" y="212.94"/>
<text fill="#000000" font-family="Serif" font-size="18" id="svg_9" stroke="#000000" stroke-width="0" text-anchor="middle" x="194.36" xml:space="preserve" y="235.94">/dev/ublkb3</text>
<rect fill="#ffffff" height="0" id="svg_10" stroke="#000000" width="0" x="175.36" y="246.94"/>
<rect fill="#ffff00" height="33" id="svg_11" stroke="#000000" width="97" x="200.03" y="239.6"/>
<text fill="#000000" font-family="Serif" font-size="18" id="svg_12" stroke="#000000" stroke-width="0" text-anchor="middle" x="249.36" xml:space="preserve" y="263.27">/dev/ublkb2</text>
<rect fill="#ffffff" height="0" id="svg_13" stroke="#000000" width="0" x="174.36" y="264.94"/>
<rect fill="#ffff00" height="33" id="svg_14" stroke="#000000" width="97" x="33.99" y="244.06"/>
<text fill="#000000" font-family="Serif" font-size="18" id="svg_15" stroke="#000000" stroke-width="0" text-anchor="middle" x="82.99" xml:space="preserve" y="267.06">/dev/ublkb1</text>
<rect fill="#00ff00" height="32" id="svg_16" stroke="#000000" width="93" x="35.99" y="206.31">le/dev/ublkb1</rect>
<text fill="#000000" font-family="Serif" font-size="18" id="svg_17" stroke="#000000" stroke-width="0" text-anchor="middle" x="80.99" xml:space="preserve" y="226.31">Filesystem</text>
<path d="m383.94,359.38l-298.65,-1.66c0,0 -1.68,-79.96 -1.68,-79.96" fill="none" id="svg_22" stroke="#000000" stroke-linejoin="bevel" stroke-width="4"/>
<path d="m384.83,334.28l-148.14,-0.2c0,0 3.33,-62.12 3.33,-62.12" fill="none" id="svg_26" stroke="#000000" stroke-linejoin="bevel" stroke-width="4" transform="matrix(1 0 0 1 0 0)"/>
<path d="m384.69,347.33l-201.99,-0.22l0,-102.04" fill="none" id="svg_27" stroke="#000000" stroke-linejoin="bevel" stroke-width="4" transform="matrix(1 0 0 1 0 0)"/>
<path d="m454.33,155.75c0,0 0.48,154.94 0.32,154.69c-0.16,-0.25 -0.32,-154.69 -0.32,-154.69z" fill="none" id="svg_28" stroke="#000000" stroke-linejoin="bevel" stroke-width="3"/>
<path d="m468.6,156.42l0.18,155.99l-0.18,-155.99z" fill="none" id="svg_29" stroke="#000000" stroke-linejoin="bevel" stroke-width="3"/>
<path d="m482.69,157.08l-0.32,154.03l0.32,-154.03z" fill="none" id="svg_30" stroke="#000000" stroke-linecap="square" stroke-linejoin="bevel" stroke-width="3">ublk Server</path>
<rect fill="#aaffff" height="35.63" id="svg_40" stroke="#000000" width="109.37" x="65.74" y="91.86">Application A</rect>
<text fill="#000000" font-family="Serif" font-size="18" id="svg_41" stroke="#000000" stroke-width="0" style="cursor: move;" text-anchor="middle" x="119.36" xml:space="preserve" y="112.19">Application D</text>
<rect fill="#aaffff" height="30.63" id="svg_42" stroke="#000000" width="109.37" x="89.49" y="115.61">Application A</rect>
<text fill="#000000" font-family="Serif" font-size="18" id="svg_43" stroke="#000000" stroke-width="0" text-anchor="middle" x="143.11" xml:space="preserve" y="136.56">Application C</text>
<rect fill="#aaffff" height="31.25" id="svg_44" stroke="#000000" width="109.37" x="114.49" y="139.99">Application A</rect>
<text fill="#000000" font-family="Serif" font-size="18" id="svg_45" stroke="#000000" stroke-width="0" style="cursor: move;" text-anchor="middle" x="169.36" xml:space="preserve" y="160.31">Application B</text>
<rect fill="#aaffff" height="30.63" id="svg_46" stroke="#000000" width="109.37" x="145.74" y="164.99">Application A</rect>
<text fill="#000000" font-family="Serif" font-size="18" id="svg_47" stroke="#000000" stroke-width="0" text-anchor="middle" x="201.24" xml:space="preserve" y="186.56">Application A</text>
<text fill="#000000" font-family="Serif" font-size="21" font-weight="bold" id="svg_50" stroke="#000000" stroke-width="0" text-anchor="middle" transform="matrix(1 0 0 1 0 0)" x="161.4" xml:space="preserve" y="82.24">ublk Workload</text>
<text fill="#000000" font-family="Serif" font-size="19" font-style="italic" font-weight="normal" id="svg_51" stroke="#000000" stroke-width="0" text-anchor="middle" x="602.65" xml:space="preserve" y="222.24">Kernel Space</text>
<text fill="#000000" font-family="Serif" font-size="19" font-style="italic" font-weight="normal" id="svg_52" stroke="#000000" stroke-width="0" text-anchor="middle" transform="matrix(1 0 0 1 0 0)" x="602.03" xml:space="preserve" y="188.49">Userspace</text>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 5.6 KiB

View File

@ -1,41 +1,50 @@
# Storage Performance Development Kit {#mainpage}
# Storage Performance Development Kit {#index}
## Introduction
# Introduction {#intro}
@copydoc intro
- @ref about
- @ref getting_started
- @ref vagrant
- @ref changelog
- [Source Code (GitHub)](https://github.com/spdk/spdk/)
## Concepts
# Concepts {#concepts}
@copydoc concepts
- @ref userspace
- @ref memory
- @ref concurrency
- @ref ssd_internals
- @ref porting
## User Guides
# User Guides {#user_guides}
@copydoc user_guides
- @ref iscsi_getting_started
- @ref nvmf_getting_started
- @ref blobfs_getting_started
- @ref jsonrpc
## Programmer Guides
# Programmer Guides {#general}
@copydoc prog_guides
- @ref directory_structure
- [Public API header files](files.html)
## General Information
# Modules {#modules}
@copydoc general
- @ref event
- @ref nvme
- @ref nvmf
- @ref ioat
- @ref iscsi
- @ref bdev
- @ref blob
- @ref blobfs
- @ref vhost
- @ref virtio
## Miscellaneous
# Tools {#tools}
@copydoc misc
- @ref nvme-cli
## Driver Modules
# Performance Reports {#performancereports}
@copydoc driver_modules
## Tools
@copydoc tools
## CI Tools
@copydoc ci_tools
## Performance Reports
@copydoc performance_reports
- [SPDK 17.07 vhost-scsi Performance Report](https://ci.spdk.io/download/performance-reports/SPDK17_07_vhost_scsi_performance_report.pdf)

View File

@ -1,8 +0,0 @@
# Introduction {#intro}
- @subpage about
- @subpage getting_started
- @subpage vagrant
- @subpage changelog
- @subpage deprecation
- [Source Code (GitHub)](https://github.com/spdk/spdk)

View File

@ -1,10 +1,10 @@
# I/OAT Driver {#ioat}
## Public Interface {#ioat_interface}
# Public Interface {#ioat_interface}
- spdk/ioat.h
## Key Functions {#ioat_key_functions}
# Key Functions {#ioat_key_functions}
Function | Description
--------------------------------------- | -----------

View File

@ -1,6 +1,6 @@
# iSCSI Target {#iscsi}
## iSCSI Target Getting Started Guide {#iscsi_getting_started}
# iSCSI Target Getting Started Guide {#iscsi_getting_started}
The Storage Performance Development Kit iSCSI target application is named `iscsi_tgt`.
This following section describes how to run iscsi from your cloned package.
@ -10,71 +10,89 @@ This following section describes how to run iscsi from your cloned package.
This guide starts by assuming that you can already build the standard SPDK distribution on your
platform.
Once built, the binary will be in `build/bin`.
Once built, the binary will be in `app/iscsi_tgt`.
If you want to kill the application by using signal, make sure use the SIGTERM, then the application
will release all the shared memory resource before exit, the SIGKILL will make the shared memory
resource have no chance to be released by applications, you may need to release the resource manually.
## Introduction
## Configuring iSCSI Target {#iscsi_config}
The following diagram shows relations between different parts of iSCSI structure described in this
document.
A `iscsi_tgt` specific configuration file is used to configure the iSCSI target. A fully documented
example configuration file is located at `etc/spdk/iscsi.conf.in`.
![iSCSI structure](iscsi.svg)
The configuration file is used to configure the SPDK iSCSI target. This file defines the following:
TCP ports to use as iSCSI portals; general iSCSI parameters; initiator names and addresses to allow
access to iSCSI target nodes; number and types of storage backends to export over iSCSI LUNs; iSCSI
target node mappings between portal groups, initiator groups, and LUNs.
### Assigning CPU Cores to the iSCSI Target {#iscsi_config_lcore}
You should make a copy of the example configuration file, modify it to suit your environment, and
then run the iscsi_tgt application and pass it the configuration file using the -c option. Right now,
the target requires elevated privileges (root) to run.
~~~
app/iscsi_tgt/iscsi_tgt -c /path/to/iscsi.conf
~~~
## Assigning CPU Cores to the iSCSI Target {#iscsi_config_lcore}
SPDK uses the [DPDK Environment Abstraction Layer](http://dpdk.org/doc/guides/prog_guide/env_abstraction_layer.html)
to gain access to hardware resources such as huge memory pages and CPU core(s). DPDK EAL provides
functions to assign threads to specific cores.
To ensure the SPDK iSCSI target has the best performance, place the NICs and the NVMe devices on the
same NUMA node and configure the target to run on CPU cores associated with that node. The following
command line option is used to configure the SPDK iSCSI target:
parameters in the configuration file are used to configure SPDK iSCSI target:
~~~bash
-m 0xF000000
**ReactorMask:** A hexadecimal bit mask of the CPU cores that SPDK is allowed to execute work
items on. The ReactorMask is located in the [Global] section of the configuration file. For example,
to assign lcores 24,25,26 and 27 to iSCSI target work items, set the ReactorMask to:
~~~{.sh}
ReactorMask 0xF000000
~~~
This is a hexadecimal bit mask of the CPU cores where the iSCSI target will start polling threads.
In this example, CPU cores 24, 25, 26 and 27 would be used.
## Configuring a LUN in the iSCSI Target {#iscsi_lun}
Each LUN in an iSCSI target node is associated with an SPDK block device. See @ref bdev_getting_started
for details on configuring SPDK block devices. The block device to LUN mappings are specified in the
configuration file as:
~~~~
[TargetNodeX]
LUN0 Malloc0
LUN1 Nvme0n1
~~~~
This exports a malloc'd target. The disk is a RAM disk that is a chunk of memory allocated by iscsi in
user space. It will use offload engine to do the copy job instead of memcpy if the system has enough DMA
channels.
## Configuring iSCSI Target via RPC method {#iscsi_rpc}
The iSCSI target is configured via JSON-RPC calls. See @ref jsonrpc for details.
In addition to the configuration file, the iSCSI target may also be configured via JSON-RPC calls. See
@ref jsonrpc for details.
### Portal groups
### Add the portal group
- iscsi_create_portal_group -- Add a portal group.
- iscsi_delete_portal_group -- Delete an existing portal group.
- iscsi_target_node_add_pg_ig_maps -- Add initiator group to portal group mappings to an existing iSCSI target node.
- iscsi_target_node_remove_pg_ig_maps -- Delete initiator group to portal group mappings from an existing iSCSI target node.
- iscsi_get_portal_groups -- Show information about all available portal groups.
~~~bash
/path/to/spdk/scripts/rpc.py iscsi_create_portal_group 1 10.0.0.1:3260
~~~
python /path/to/spdk/scripts/rpc.py add_portal_group 1 127.0.0.1:3260
~~~
### Initiator groups
### Add the initiator group
- iscsi_create_initiator_group -- Add an initiator group.
- iscsi_delete_initiator_group -- Delete an existing initiator group.
- iscsi_initiator_group_add_initiators -- Add initiators to an existing initiator group.
- iscsi_get_initiator_groups -- Show information about all available initiator groups.
~~~bash
/path/to/spdk/scripts/rpc.py iscsi_create_initiator_group 2 ANY 10.0.0.2/32
~~~
python /path/to/spdk/scripts/rpc.py add_initiator_group 2 ANY 127.0.0.1/32
~~~
### Target nodes
### Construct the backend block device
- iscsi_create_target_node -- Add an iSCSI target node.
- iscsi_delete_target_node -- Delete an iSCSI target node.
- iscsi_target_node_add_lun -- Add a LUN to an existing iSCSI target node.
- iscsi_get_target_nodes -- Show information about all available iSCSI target nodes.
~~~
python /path/to/spdk/scripts/rpc.py construct_malloc_bdev -b MyBdev 64 512
~~~
~~~bash
/path/to/spdk/scripts/rpc.py iscsi_create_target_node Target3 Target3_alias MyBdev:0 1:2 64 -d
### Construct the target node
~~~
python /path/to/spdk/scripts/rpc.py construct_target_node Target3 Target3_alias MyBdev:0 1:2 64 0 0 0 1
~~~
## Configuring iSCSI Initiator {#iscsi_initiator}
@ -83,30 +101,30 @@ The Linux initiator is open-iscsi.
Installing open-iscsi package
Fedora:
~~~bash
~~~
yum install -y iscsi-initiator-utils
~~~
Ubuntu:
~~~bash
~~~
apt-get install -y open-iscsi
~~~
### Setup
Edit /etc/iscsi/iscsid.conf
~~~bash
~~~
node.session.cmds_max = 4096
node.session.queue_depth = 128
~~~
iscsid must be restarted or receive SIGHUP for changes to take effect. To send SIGHUP, run:
~~~bash
~~~
killall -HUP iscsid
~~~
Recommended changes to /etc/sysctl.conf
~~~bash
~~~
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 0
@ -123,15 +141,14 @@ net.core.netdev_max_backlog = 300000
### Discovery
Assume target is at 10.0.0.1
~~~bash
iscsiadm -m discovery -t sendtargets -p 10.0.0.1
Assume target is at 192.168.1.5
~~~
iscsiadm -m discovery -t sendtargets -p 192.168.1.5
~~~
### Connect to target
~~~bash
~~~
iscsiadm -m node --login
~~~
@ -140,13 +157,13 @@ they came up as.
### Disconnect from target
~~~bash
~~~
iscsiadm -m node --logout
~~~
### Deleting target node cache
~~~bash
~~~
iscsiadm -m node -o delete
~~~
@ -154,7 +171,7 @@ This will cause the initiator to forget all previously discovered iSCSI target n
### Finding /dev/sdX nodes for iSCSI LUNs
~~~bash
~~~
iscsiadm -m session -P 3 | grep "Attached scsi disk" | awk '{print $4}'
~~~
@ -166,117 +183,28 @@ After the targets are connected, they can be tuned. For example if /dev/sdc is
an iSCSI disk then the following can be done:
Set noop to scheduler
~~~bash
~~~
echo noop > /sys/block/sdc/queue/scheduler
~~~
Disable merging/coalescing (can be useful for precise workload measurements)
~~~bash
~~~
echo "2" > /sys/block/sdc/queue/nomerges
~~~
Increase requests for block queue
~~~bash
~~~
echo "1024" > /sys/block/sdc/queue/nr_requests
~~~
### Example: Configure simple iSCSI Target with one portal and two LUNs
Assuming we have one iSCSI Target server with portal at 10.0.0.1:3200, two LUNs (Malloc0 and Malloc1),
and accepting initiators on 10.0.0.2/32, like on diagram below:
![Sample iSCSI configuration](iscsi_example.svg)
#### Configure iSCSI Target
Start iscsi_tgt application:
```bash
./build/bin/iscsi_tgt
```
Construct two 64MB Malloc block devices with 512B sector size "Malloc0" and "Malloc1":
```bash
./scripts/rpc.py bdev_malloc_create -b Malloc0 64 512
./scripts/rpc.py bdev_malloc_create -b Malloc1 64 512
```
Create new portal group with id 1, and address 10.0.0.1:3260:
```bash
./scripts/rpc.py iscsi_create_portal_group 1 10.0.0.1:3260
```
Create one initiator group with id 2 to accept any connection from 10.0.0.2/32:
```bash
./scripts/rpc.py iscsi_create_initiator_group 2 ANY 10.0.0.2/32
```
Finally construct one target using previously created bdevs as LUN0 (Malloc0) and LUN1 (Malloc1)
with a name "disk1" and alias "Data Disk1" using portal group 1 and initiator group 2.
```bash
./scripts/rpc.py iscsi_create_target_node disk1 "Data Disk1" "Malloc0:0 Malloc1:1" 1:2 64 -d
```
#### Configure initiator
Discover target
~~~bash
$ iscsiadm -m discovery -t sendtargets -p 10.0.0.1
10.0.0.1:3260,1 iqn.2016-06.io.spdk:disk1
~~~
Connect to the target
~~~bash
iscsiadm -m node --login
~~~
At this point the iSCSI target should show up as SCSI disks.
Check dmesg to see what they came up as. In this example it can look like below:
~~~bash
...
[630111.860078] scsi host68: iSCSI Initiator over TCP/IP
[630112.124743] scsi 68:0:0:0: Direct-Access INTEL Malloc disk 0001 PQ: 0 ANSI: 5
[630112.125445] sd 68:0:0:0: [sdd] 131072 512-byte logical blocks: (67.1 MB/64.0 MiB)
[630112.125468] sd 68:0:0:0: Attached scsi generic sg3 type 0
[630112.125926] sd 68:0:0:0: [sdd] Write Protect is off
[630112.125934] sd 68:0:0:0: [sdd] Mode Sense: 83 00 00 08
[630112.126049] sd 68:0:0:0: [sdd] Write cache: enabled, read cache: disabled, doesn't support DPO or FUA
[630112.126483] scsi 68:0:0:1: Direct-Access INTEL Malloc disk 0001 PQ: 0 ANSI: 5
[630112.127096] sd 68:0:0:1: Attached scsi generic sg4 type 0
[630112.127143] sd 68:0:0:1: [sde] 131072 512-byte logical blocks: (67.1 MB/64.0 MiB)
[630112.127566] sd 68:0:0:1: [sde] Write Protect is off
[630112.127573] sd 68:0:0:1: [sde] Mode Sense: 83 00 00 08
[630112.127728] sd 68:0:0:1: [sde] Write cache: enabled, read cache: disabled, doesn't support DPO or FUA
[630112.128246] sd 68:0:0:0: [sdd] Attached SCSI disk
[630112.129789] sd 68:0:0:1: [sde] Attached SCSI disk
...
~~~
You may also use simple bash command to find /dev/sdX nodes for each iSCSI LUN
in all logged iSCSI sessions:
~~~bash
$ iscsiadm -m session -P 3 | grep "Attached scsi disk" | awk '{print $4}'
sdd
sde
~~~
## iSCSI Hotplug {#iscsi_hotplug}
# iSCSI Hotplug {#iscsi_hotplug}
At the iSCSI level, we provide the following support for Hotplug:
1. bdev/nvme:
At the bdev/nvme level, we start one hotplug monitor which will call
spdk_nvme_probe() periodically to get the hotplug events. We provide the
private attach_cb and remove_cb for spdk_nvme_probe(). For the attach_cb,
@ -286,46 +214,17 @@ upper level stack (for iSCSI target, the upper level stack is scsi/lun) to
handle the hot-remove event.
2. scsi/lun:
When the LUN receive the hot-remove notification from block device layer,
the LUN will be marked as removed, and all the IOs after this point will
return with check condition status. Then the LUN starts one poller which will
wait for all the commands which have already been submitted to block device to
return back; after all the commands return back, the LUN will be deleted.
## Known bugs and limitations {#iscsi_hotplug_bugs}
For write command, if you want to test hotplug with write command which will
cause r2t, for example 1M size IO, it will crash the iscsi tgt.
For read command, if you want to test hotplug with large read IO, for example 1M
size IO, it will probably crash the iscsi tgt.
@sa spdk_nvme_probe
## iSCSI Login Redirection {#iscsi_login_redirection}
The SPDK iSCSI target application supports iSCSI login redirection feature.
A portal refers to an IP address and TCP port number pair, and a portal group
contains a set of portals. Users for the SPDK iSCSI target application configure
portals through portal groups.
To support login redirection feature, we utilize two types of portal groups,
public portal group and private portal group.
The SPDK iSCSI target application usually has a discovery portal. The discovery
portal is connected by an initiator to get a list of targets, as well as the list
of portals on which these target may be accessed, by a discovery session.
Public portal groups have their portals returned by a discovery session. Private
portal groups do not have their portals returned by a discovery session. A public
portal group may optionally have a redirect portal for non-discovery logins for
each associated target. This redirect portal must be from a private portal group.
Initiators configure portals in public portal groups as target portals. When an
initiator logs in to a target through a portal in an associated public portal group,
the target sends a temporary redirection response with a redirect portal. Then the
initiator logs in to the target again through the redirect portal.
Users set a portal group to public or private at creation using the
`iscsi_create_portal_group` RPC, associate portal groups with a target using the
`iscsi_create_target_node` RPC or the `iscsi_target_node_add_pg_ig_maps` RPC,
specify a up-to-date redirect portal in a public portal group for a target using
the `iscsi_target_node_set_redirect` RPC, and terminate the corresponding connections
by asynchronous logout request using the `iscsi_target_node_request_logout` RPC.
Typically users will use the login redirection feature in scale out iSCSI target
system, which runs multiple SPDK iSCSI target applications.

Some files were not shown because too many files have changed in this diff Show More