env_dpdk: don't call remove_rte_dev from secondary proc

During secondary process shutdown, when nvme device
would get detached, it would trigger env_dpdk to
send rte_eal_hotplug_remove event for the corresponding
BDF.  But this isn't valid from a secondary process -
we can only attach/detach from the primary process.

Usually this would just result in a bunch of annoying
print messages as the secondary and primary process
sent rte messages between each other.  But occasionally
one of the response messages from the primary process
could arrive just as the secondary process was going
through rte_eal_cleanup.  The message would get
kicked to the DPDK intr thread, we would do bus_cleanup
which frees all of the PCI state, and then the message
would execute on the intr thread causing seg faults,
use-after-free or some other violation.

It's possible some kind of cleanup should be
implemented in DPDK, but for now, let's just not
induce incorrect behavior from SPDK, and don't send
the hotplug_remove messages from a secondary
process.

Fixes issue #2651.

Signed-off-by: Jim Harris <james.r.harris@intel.com>
Change-Id: Ie431a1f8e74503e1de1be36cbb9589682d6dc94a

Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/16553
Reviewed-by: Michal Berger <michal.berger@intel.com>
Reviewed-by: Shuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: Aleksey Marchuk <alexeymar@nvidia.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This commit is contained in:
Jim Harris 2023-01-26 16:11:21 -07:00 committed by Tomasz Zawadzki
parent cf64422ad7
commit 6dc9dbe5ca

View File

@ -131,7 +131,6 @@ detach_rte(struct spdk_pci_device *dev)
bool removed;
if (!spdk_process_is_primary()) {
remove_rte_dev(rte_dev);
return;
}