Go to file
Evgeniy Kochetov ed0b611fc5 nvmf/rdma: Add shared receive queue support
This is a new feature for NVMEoF RDMA target, that is intended to save
resource allocation (by sharing them) and utilize the
locality (completions and memory) to get the best performance with
Shared Receive Queues (SRQs). We'll create a SRQ per core (poll
group), per device and associate each created QP/CQ with an
appropriate SRQ.

Our testing environment has 2 hosts.
Host 1:
  CPU: Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz dual socket (8 cores total)
  Network: ConnectX-5, ConnectX-5 VPI , 100GbE, single-port QSFP28, PCIe3.0 x16
  Disk: Intel Optane SSD 900P Series
  OS: Fedora 27 x86_64
Host 2:
  CPU: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz dual-socket (24 cores total)
  Network: ConnectX-4 VPI , 100GbE, dual-port QSFP28
  Disk: Intel Optane SSD 900P Series
  OS : CentOS 7.5.1804 x86_64
Hosts are connected via Spectrum switch.
Host 1 is running SPDK NVMeoF target.
Host 2 is used as initiator running fio with SPDK plugin.

Configuration:
- SPDK NVMeoF target: cpu mask 0x0F (4 cores), max queue depth 128,
  max SRQ depth 1024, max QPs per controller 1024
- Single NVMf subsystem with single namespace backed by physical SSD disk
- fio with SPDK plugin: randread pattern, 1-256 jobs, block size 4k,
  IO depth 16, cpu_mask 0xFFF0, IO rate 10k, rate process “poisson”

Here is a full fio command line:
fio  --name=Job --stats=1 --group_reporting=1 --idle-prof=percpu \
--loops=1 --numjobs=1 --thread=1 --time_based=1 --runtime=30s \
--ramp_time=5s --bs=4k --size=4G --iodepth=16 --readwrite=randread \
--rwmixread=75 --randrepeat=1 --ioengine=spdk --direct=1 \
--gtod_reduce=0 --cpumask=0xFFF0 --rate_iops=10k \
--rate_process=poisson \
--filename='trtype=RDMA adrfam=IPv4 traddr=1.1.79.1 trsvcid=4420 ns=1'

SPDK allocates the following entities for every work request in
receive queue (shared or not): reqs (1024 bytes), recvs (96 bytes),
cmds (64 bytes), cpls (16 bytes), in_capsule_buffer. All except the
last one are fixed size. In capsule data size is configured to 4096.
Memory consumption calculation (target):
- Multiple SRQ: core_num * ib_devs_num * SRQ_depth * (1200 +
  in_capsule_data_size)
- Multiple RQ: queue_num * RQ_depth * (1200 + in_capsule_data_size)
We ignore admin queues in calculations for simplicity.

Cases:
1. Multiple SRQ with 1024 entries:
   - Mem = 4 * 1 * 1024 * (1200 + 4096) = 20.7 MiB
     (Constant number – does not depend on initiators number)
2. RQ with 128 entries for 64 initiators:
   - Mem = 64 * 128 * (1200 + 4096) = 41.4 MiB

Results:
FIO_JOBS   kIOPS     Bandwidth,MiB/s  AvgLatency,us  MaxResidentSize,kiB
       RQ       SRQ     RQ      SRQ    RQ       SRQ      RQ       SRQ
1      8.623    8.623   33.7    33.7   13.89    14.03    144376   155624
2      17.3     17.3    67.4    67.4   14.03    14.1     145776   155700
4      34.5     34.5    135     135    14.15    14.23    146540   156184
8      69.1     69.1    270     270    14.64    14.49    148116   156960
16     138      138     540     540    14.84    15.38    151216   158668
32     276      276     1079    1079   16.5     16.61    157560   161936
64     513      502     2005    1960   1673     1612     170408   168440
128    535      526     2092    2054   3329     3344     195796   181524
256    571      571     2232    2233   6854     6873     246484   207856

We can see the benefit in memory consumption.

Change-Id: I40c70f6ccbad7754918bcc6cb397e955b09d1033
Signed-off-by: Evgeniy Kochetov <evgeniik@mellanox.com>
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/428458
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
2019-03-15 19:19:17 +00:00
.githooks test: use SKIP_DPDK_BUILD in pre-push githook 2018-07-14 02:20:30 +00:00
app configure: add option not to use the internal rte_vhost copy 2019-03-13 14:26:20 +00:00
build/lib build: consolidate library outputs in build/lib 2016-11-17 13:15:09 -07:00
doc bdev: remove delete_bdev RPC 2019-03-01 08:50:07 +00:00
dpdk@67b915b09b dpdk: Update to DPDK 19.02 2019-02-22 18:31:52 +00:00
dpdkbuild bdev/crypto: add /include symlink for ISAL 2019-03-08 18:52:17 +00:00
etc/spdk nvmf/rdma: Add shared receive queue support 2019-03-15 19:19:17 +00:00
examples fio_plugin: remove metadata location limitation when doing PI 2019-03-14 19:30:53 +00:00
go go: empty Go package 2018-06-28 18:15:51 +00:00
include nvmf/rdma: Add shared receive queue support 2019-03-15 19:19:17 +00:00
intel-ipsec-mb@134c90c912 ipsec: update submodule commit 2018-07-26 22:29:25 +00:00
ipsecbuild ipsecbuild: force CC=cc 2019-01-28 02:33:50 +00:00
isa-l@09e787231b spdk: Add ISA-L support with related crc32 function 2019-01-29 08:31:00 +00:00
isalbuild bdev/crypto: add /include symlink for ISAL 2019-03-08 18:52:17 +00:00
lib nvmf/rdma: Add shared receive queue support 2019-03-15 19:19:17 +00:00
mk build: Don't pass -fuse-ld to compiler if LD_TYPE not set 2019-03-12 21:38:59 +00:00
ocf@e235500472 ocf: switch to dynamic queues 2019-03-12 22:21:58 +00:00
pkg version: 19.04 pre 2019-02-01 09:29:12 +00:00
scripts nvmf/rdma: Add shared receive queue support 2019-03-15 19:19:17 +00:00
shared_lib build: fix duplicated clean target in shared_lib/Makefile 2019-02-14 05:15:40 +00:00
test ocf/test: update integrity test for multicore case 2019-03-15 19:04:10 +00:00
.astylerc astyle: change "add-braces" to "j" for compatibility 2017-12-13 21:23:27 -05:00
.gitignore configure: use mk/config.mk instead of CONFIG.local 2018-10-16 12:40:43 +00:00
.gitmodules ocf: add ocf submodule 2019-02-27 17:26:51 +00:00
.travis.yml .travis.yml: only notify IRC for spdk/spdk repository 2019-02-27 07:17:22 +00:00
autobuild.sh spdk: Add ISA-L support with related crc32 function 2019-01-29 08:31:00 +00:00
autopackage.sh ocf: update autotest to use OCF submodule 2019-02-27 17:26:51 +00:00
autorun_post.py Check file permissions in the check_format script 2018-10-04 23:08:12 +00:00
autorun.sh autorun: passthrough WITH_DPDK_DIR to autotest.sh 2018-10-12 23:46:14 +00:00
autotest.sh test/lvol: Run all test cases. 2019-03-12 21:29:19 +00:00
CHANGELOG.md nvmf/rdma: Add shared receive queue support 2019-03-15 19:19:17 +00:00
CONFIG configure: add option not to use the internal rte_vhost copy 2019-03-13 14:26:20 +00:00
configure configure: Make indentation consistenly use tabs 2019-03-13 23:19:37 +00:00
CONTRIBUTING.md Add CONTRIBUTING.md 2017-09-05 13:25:45 -04:00
ISSUE_TEMPLATE.md github: Add issue tracker template 2018-04-19 13:50:08 -04:00
LICENSE Remove year from copyright headers. 2016-01-28 08:54:18 -07:00
Makefile build: make dpdk build depend on isal 2019-02-27 07:17:22 +00:00
README.md doc: update doc with instructions for building shared lib 2018-10-26 20:41:24 +00:00

Storage Performance Development Kit

Build Status

The Storage Performance Development Kit (SPDK) provides a set of tools and libraries for writing high performance, scalable, user-mode storage applications. It achieves high performance by moving all of the necessary drivers into userspace and operating in a polled mode instead of relying on interrupts, which avoids kernel context switches and eliminates interrupt handling overhead.

The development kit currently includes:

In this readme:

Documentation

Doxygen API documentation is available, as well as a Porting Guide for porting SPDK to different frameworks and operating systems.

Source Code

git clone https://github.com/spdk/spdk
cd spdk
git submodule update --init

Prerequisites

The dependencies can be installed automatically by scripts/pkgdep.sh.

./scripts/pkgdep.sh

Build

Linux:

./configure
make

FreeBSD: Note: Make sure you have the matching kernel source in /usr/src/ and also note that CONFIG_COVERAGE option is not available right now for FreeBSD builds.

./configure
gmake

Unit Tests

./test/unit/unittest.sh

You will see several error messages when running the unit tests, but they are part of the test suite. The final message at the end of the script indicates success or failure.

Vagrant

A Vagrant setup is also provided to create a Linux VM with a virtual NVMe controller to get up and running quickly. Currently this has only been tested on MacOS and Ubuntu 16.04.2 LTS with the VirtualBox provider. The VirtualBox Extension Pack must also be installed in order to get the required NVMe support.

Details on the Vagrant setup can be found in the SPDK Vagrant documentation.

Advanced Build Options

Optional components and other build-time configuration are controlled by settings in the Makefile configuration file in the root of the repository. CONFIG contains the base settings for the configure script. This script generates a new file, mk/config.mk, that contains final build settings. For advanced configuration, there are a number of additional options to configure that may be used, or mk/config.mk can simply be created and edited by hand. A description of all possible options is located in CONFIG.

Boolean (on/off) options are configured with a 'y' (yes) or 'n' (no). For example, this line of CONFIG controls whether the optional RDMA (libibverbs) support is enabled:

CONFIG_RDMA?=n

To enable RDMA, this line may be added to mk/config.mk with a 'y' instead of 'n'. For the majority of options this can be done using the configure script. For example:

./configure --with-rdma

Additionally, CONFIG options may also be overridden on the make command line:

make CONFIG_RDMA=y

Users may wish to use a version of DPDK different from the submodule included in the SPDK repository. Note, this includes the ability to build not only from DPDK sources, but also just with the includes and libraries installed via the dpdk and dpdk-devel packages. To specify an alternate DPDK installation, run configure with the --with-dpdk option. For example:

Linux:

./configure --with-dpdk=/path/to/dpdk/x86_64-native-linuxapp-gcc
make

FreeBSD:

./configure --with-dpdk=/path/to/dpdk/x86_64-native-bsdapp-clang
gmake

The options specified on the make command line take precedence over the values in mk/config.mk. This can be useful if you, for example, generate a mk/config.mk using the configure script and then have one or two options (i.e. debug builds) that you wish to turn on and off frequently.

Shared libraries

By default, the build of the SPDK yields static libraries against which the SPDK applications and examples are linked. Configure option --with-shared provides the ability to produce SPDK shared libraries, in addition to the default static ones. Use of this flag also results in the SPDK executables linked to the shared versions of libraries. SPDK shared libraries by default, are located in ./build/lib. This includes the single SPDK shared lib encompassing all of the SPDK static libs (libspdk.so) as well as individual SPDK shared libs corresponding to each of the SPDK static ones.

In order to start a SPDK app linked with SPDK shared libraries, make sure to do the following steps:

  • run ldconfig specifying the directory containing SPDK shared libraries
  • provide proper LD_LIBRARY_PATH

Linux:

./configure --with-shared
make
ldconfig -v -n ./build/lib
LD_LIBRARY_PATH=./build/lib/ ./app/spdk_tgt/spdk_tgt

Hugepages and Device Binding

Before running an SPDK application, some hugepages must be allocated and any NVMe and I/OAT devices must be unbound from the native kernel drivers. SPDK includes a script to automate this process on both Linux and FreeBSD. This script should be run as root.

sudo scripts/setup.sh

Users may wish to configure a specific memory size. Below is an example of configuring 8192MB memory.

sudo HUGEMEM=8192 scripts/setup.sh

Example Code

Example code is located in the examples directory. The examples are compiled automatically as part of the build process. Simply call any of the examples with no arguments to see the help output. You'll likely need to run the examples as a privileged user (root) unless you've done additional configuration to grant your user permission to allocate huge pages and map devices through vfio.

Contributing

For additional details on how to get more involved in the community, including contributing code and participating in discussions and other activities, please refer to spdk.io