Call recv to trigger busy polling even when no socket is active. when
epoll_wait returns zero, the first socket in poll group is used to
trigger busy polling in kernel stack and potentially reap incoming data
Change-Id: I15f04cb4a2c7b382dd07391eda69678fd7919790
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3180
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Thus, we can make sure that when read data is larger than
the pipe size, it will not read the data into the pipe.
Change-Id: I87f3b03fd9b81eb693e9eae0fea9eef7d1b9eaa8
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2450
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
The pipe buffer gives about 19% randwrite and 56% randread
performance boost on arm64 from my test. We can also enable it on
arm64.
Signed-off-by: Richael Zhuang <richael.zhuang@arm.com>
Change-Id: Iff4c6eaffb4ec5ae8fe02b35f72aed7f6b272bb5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2255
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Purpose: To disabled the zero copy support if IP is configured on a loopback
device.
To address issue: https://github.com/spdk/spdk/issues/1340
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Ia337f108851653c60d259328b6fefd4ba1e3d654
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/2372
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Purpose: This is used to make users can specify
some options on the socket, e.g., the different priority for the socket.
While creating sockets, the priority needs to be set before connect()
and listen system calls, so better to add one parameter in spdk_sock_opts
which can contain options (e.g., priority) in spdk_sock_listen_ext and
spdk_sock_connect_ext functions.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Change-Id: I406238e9da7abd69f937b7072535a19124ed0169
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1874
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Since the related feature is already contained in
spdk_sock_listen and spdk_sock_connect functions,
we no longer need this function.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I1eafff0d139fa266a355fbee2bf0fc3947db69fc
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1876
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
When the socket is migrated from one polling group to another
(e.g., iSCSI target example), we should add it into pending_recv_list
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I56d34000344ae452a25b82952c831f09e0266d66
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1448
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The code used to do this but it was removed when the buffering was
shifted down to the posix layer. Add a way for users of sockets
to still properly size the buffers.
This also means that by default, the receive buffering is not enabled
on sockets. That matches the behavior of the previous release.
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I20ce875be2efd841fe3a900047b4655a317d7799
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1560
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
The statement in line 1199 is may not be correct:
Actually, it should be following two cases
to pendinging_recv into false;
1 posck->recv_pipe == NULL
2 psock->recv_pipe != NULL & avalable_bytes == 0;
So this patch can fix this issue.
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Ie612f781b0848e1f0ab1b48dc8c45e39862b772a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/1446
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Do large reads from the socket and buffer into a pipe.
Change-Id: I0a1bc5b356cb8c403048ab3b22a159332f085e6b
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/447
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
For client side connections, SO_RCVBUF needs to set
before connect call() so this patch moves those calls
accordingly.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Change-Id: Ifa8373145b3699e697e34e93132b5c006e7fbf83
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/757
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ziye Yang <ziye.yang@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
CMSG_FIRSTHDR could theoretically return NULL. Check it for the
peace of mind.
CMSG_FIRSTHDR() returns a pointer to the first cmsghdr in the
ancillary data buffer associated with the passed msghdr.
It returns NULL if there isn't enough space for a cmsghdr
in the buffer.
Change-Id: I6c7e1eb59121b59c568d3ad7f5eda649a49026f4
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/771
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Purpose: The function spdk_sock_request_put may
return an error code, and close the socket, so we should change the
return type of _sock_check_zcopy.
If the return value of _sock_check_zcopy is not zero,
we should not handle the EPOLLIN event.
Fixes#1169
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Ie6fbd7ebff54749da8fa48836cc631eea09c4ab8
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/483311
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
If available, automatically use MSG_ZEROCOPY when sending on sockets.
Storage workloads contain sufficient data transfer sizes that this is
always a performance improvement, regardless of workload.
Change-Id: I14429d78c22ad3bc036aec13c9fce6453e899c92
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/471752
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Or Gerlitz <gerlitz.or@gmail.com>
Amortize the writev syscall cost by using the writev_async socket API.
This allows the socket layer to batch writes into one system call
and also apply further optimizations such as posix's MSG_ZEROCOPY
when they are available. As part of doing so we remove the error
return in the socket layer writev_async implementation for sockets
that don't have a poll group.
Doing so eliminates the send queue processing.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Change-Id: I5432ae322afaff7b96c22269fc06b75f9ae60b81
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/475420
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Initiator drivers (e.g nvme/tcp) don't use poll groups but rather directly
poll the qpair. In this case we want to allow the polling function (e.g
_qpair_process_completions()) to flush async writes pending on the socket.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Change-Id: Ibd8c73691213d58e287b7110d0f5a381a89a64d0
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/475419
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Purpose: Prepare for setting priorities for different
kernel based sock implementations.
The g_net_impls list is maintained in decreasing order
according to the priority of each sock implementation.
For examaple, if there are 3 sock implementations, i.e.,
posix (priority = 0), vpp (priority = 1), sock_ut (priority =2),
then the list will be maintained as:
sock_ut -> vpp -> posix.
Then if users use spdk_sock_open/listen with impl_name as NULL,
then the order to try is: sock_ut, vpp, then posix
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I43899de5bac14751ab060a11eb814cd7a0a83cc6
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/479488
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Community-CI: SPDK CI Jenkins <sys_sgci@intel.com>
This is preparation to add flags to the call.
Change-Id: I36109c069b42bd3ec22e023079c62d41a435f44c
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/471771
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Community-CI: SPDK CI Jenkins <sys_sgci@intel.com>
Add an additional queue for requests that have been sent on the network
but aren't complete yet. As of this patch, the code
is still calling writev with no flags in the POSIX layer, so it completes
synchronously. That means requests pass through this new pending list
only very briefly inside of one function.
Change-Id: Iaab6efc118a6d5fe9589199515eb3a7293db4b8e
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/471768
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Or Gerlitz <gerlitz.or@gmail.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Add spdk_sock_writev_async for performing asynchronous writes to
sockets. The user of this call is responsible for allocating their own
spdk_sock_request structures to pass to this call.
spdk_sock_writev_async will not return EAGAIN and will instead leave the
requests queued until they are fully sent or aborted due to socket
error.
Change-Id: Idf3239e65d26a3024e578122c23e4fb8f95e241b
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470523
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Community-CI: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Close can't fail. And if it did, we still want to release
the sock memory.
Change-Id: I0e4f4d23d49f32132f4526fef8587823ace0a774
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/475311
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
This is useful for detecting sockets that have been disconnected
by the other end without reading data.
Change-Id: Ieb6529984d282d48373766d9f5555cf11720f19b
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470513
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
For new connections, just set the size to a good value for
storage automatically.
Change-Id: Ida2b21eac512a8a25ef70f486b43d5c47be16b63
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470510
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Instead of having two places in the code that do the
syscalls to set up the recvbuf size, just call the
function we already have.
Change-Id: I7d098fc7a39d0593c58fbe05b1f0b14f8b6a360d
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/470509
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Have each sock implementation free the group_impl itself.
This allows C++ based sock implementations like Seastar
to release the group_impl memory using delete rather
than free.
Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: If40a91e8bc93a531701fc30d847ab28fa11858ab
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/469618
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Everywhere we use this, we increase the size to 2MB.
Just make that the default behavior.
Change-Id: Ie174419a09df1792a0c7311eddd0c2dcefa267b5
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/466991
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Rounding out the module concept of SPDK libraries.
Change-Id: I2b316153809ae9f73361648fe505274a59d0bdb3
Signed-off-by: Seth Howell <seth.howell@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/465456
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Broadcom SPDK FC-NVMe CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>