13 KiB
Storage Network Through gRPC Proxy
Summary
Currently, Longhorn uses the Kubernetes cluster CNI network and share the network with the entire cluster resources. This makes network availability impossible to control.
We would like to have a global Storage Network
setting to allow users to input an existing Multus NetworkAttachmentDefinition
CR network in <namespace>/<name>
format. Longhorn can use the storage network for in-cluster data traffics.
The segregation can achieve by replacing the engine binary calls in the Longhorn manager with gRPC connections to the instance manager. Then the instance manager will be responsible for handling the requests between the management network and storage network.
NOTE: There are other possible approaches we have considered to segregating the networks:
-
Add Longhorn Manager to the storage network. The Manager needs to restart itself to get the secondary storage network IP, and there is no storage network segregation to the Longhorn data plane (engine & replica).
-
Provide Engine/Replica with dual IPs. Code change around this approach is confusing and likely to increase maintenance complexity.
Related Issues
https://github.com/longhorn/longhorn/issues/2285
https://github.com/longhorn/longhorn/issues/3546
Motivation
Goals
-
Have a new
Storage Network
setting. -
Replace Manager engine binary calls with gRPC client to the instance manager.
-
Keep using the management network for the communication between Manager and Instance Manager.
-
Use the storage network for the data traffic of data plane components to the instance processes. Those are the engines and replicas in Instance Manager pods.
-
Support backward compatibility of the communication between the new Manager and the old Instance Manager after the upgrade. Ensure existing engine/replicas work without issues.
Non-goals [optional]
-
Setup and configure the Multus
NetworkAttachmentDefinition
CRs. -
Monitor for
NetworkAttachmentDefintition
CRs. The user needs to ensure the traffic is reachable between pods and across different nodes. Without monitoring, Longhorn will not get notified of the update of theNetworkAttachmentDefinition
CRs. Thus the user should create a newNetworkAttachmentDefinition
CR and update thestorage-network
setting. -
Out-cluster data traffic. For example, backing image upload and download.
Proposal
Communication between Manager and Engine/Replica processes via Instance Manager gRPC proxy
-
Introduce a new gRPC server in Instance Manager.
-
Keep reusable connections between Manager and Instance Managers.
-
Allow Manager to fall back to engine binary call when communicating with old Instance Manager.
Storage Network
-
Add a new
Storage Network
global setting. -
Add
k8s.v1.cni.cncf.io/networks
annotation to pods that involve data transfer. The annotation will use the value from the storage network setting. Multus will attach a secondary network to pods with this annotation.- Engine instance manager pods
- Replica instance manager pods
- Backing image data source pods. Data traffic between replicas and backing image data source.
- Backing image manager pods. Data traffic in-between backing image managers.
-
Add new
storageIP
toEngine
,Replica
andBackingImageManager
CRD status. The storage IP will be use to communicate to the instance processes.
User Stories
Story 1 - set up the storage network
As a Longhorn user / System administrator.
I have set up Multus NetworkAttachmentDefinition
for additional network management.
And I want to segregate Longhorn in-cluster data traffic with an additional network interface.
Longhorn should provide a setting to input the NetworkAttachmentDefinition
CR name for the storage network.
So I can guarantee network availability for Longhorn in-cluster data traffic.
Story 2 - upgrade
As a Longhorn user / System administrator.
When I upgrade Longhorn, the changes should support existing attached volumes.
So I can decide when to upgrade the Engine Image.
User Experience In Detail
Story 1 - set up the storage network
- I have a Kubernetes cluster with Multus installed.
- I created
NetworkAttachmentDefinition
CR and ensured the configuration is correct. - I Added
<namespace>/<NetworkAttachmentDefinition name>
to LonghornStorage Network
setting. - I see setting update failed when volumes are attached.
- I detach all volumes.
- When updating the setting I see engine/replica instance manager pod and backing image manager pods is restarted.
- I attach the volumes.
- I describe Engine, Replica, and BackingImageManager, and see the
storageIP
in CR status is in the range of theNetworkAttachmentDefinition
subnet/CIDR. I also see thestorageIP
is different from theip
in CR status. - I describe the Engine and see the
replicaAddressMap
in CR spec and status is using the storage IP. - I see pod logs indicate the network directions.
Story 2 - upgrade
- I Longhorn v1.2.4 cluster.
- I have healthy volumes attached.
- I upgrade Longhorn.
- I see volumes still attached and healthy with available engine image upgrade.
- I cannot upgrade the volume engine image with the volume attached.
- After I detach the volume, I can upgrade its engine image.
- I attached the volumes.
- I see the volumes are healthy.
API changes
- The new global setting
Storage Network
will use the existing/v1/settings
API.
Design
Overview gRPC Proxy Implementation
Instance Manager
- Start the gRPC proxy server with the next port to the process server. The default should be
localhost:8501
. - The gRPC proxy service shares the same
imrpc
package name as the process server.Ping ServerVersionGet VolumeGet VolumeExpand VolumeFrontendStart VolumeFrontendShutdown VolumeSnapshot SnapshotList SnapshotRevert SnapshotPurge SnapshotPurgeStatus SnapshotClone SnapshotCloneStatus SnapshotRemove SnapshotBackup SnapshotBackupStatus BackupRestore BackupRestoreStatus BackupVolumeList BackupVolumeGet BackupGet BackupConfigMetaGet BackupRemove ReplicaAdd ReplicaList ReplicaRebuildingStatus ReplicaVerifyRebuild ReplicaRemove
Manager
-
Create a proxyHandler object to map the controller ID to an EngineClient interface. The proxyHandler object is shared between controllers.
-
The Instance Manager Controller is responsible for the life cycle of the proxy gRPC client. For every enqueue:
- Check for the existing gRPC client in the proxyHandler, and check the connection liveness with the
Ping
request. - If the proxy gRPC client connection is dead, stop the proxy gRPC client and error so it will re-queue.
- If the proxy gRPC client doesn't exist in the proxyHandler, start a new gRPC connection and map it to the current controller ID.
- Do not create the proxy gRPC connection when the instance manager version is less than the current version. We will provide the fallback interface caller provided when getting the client.
- Check for the existing gRPC client in the proxyHandler, and check the connection liveness with the
-
The gRPC client will use the EngineClient interface.
- Provide a fallback interface caller when getting the gRPC client from the proxyHandler. The fallback callers are:
- the existing
Engine
client used for the binary call BackupTargetClient
.
- the existing
- Use the fallback caller when the instance manager version is less than the current version.
- Add new
BackupTargetBinaryClient
interface for fallback.type BackupTargetBinaryClient interface { BackupGet(destURL string, credential map[string]string) (*Backup, error) BackupVolumeGet(destURL string, credential map[string]string) (volume *BackupVolume, err error) BackupNameList(destURL, volumeName string, credential map[string]string) (names []string, err error) BackupVolumeNameList(destURL string, credential map[string]string) (names []string, err error) BackupDelete(destURL string, credential map[string]string) (err error) BackupVolumeDelete(destURL, volumeName string, credential map[string]string) (err error) BackupConfigMetaGet(destURL string, credential map[string]string) (*ConfigMetadata, error) }
- Introduce A new
EngineClientProxy
interface for the Proxy, which includes proxy-specific methods and implementation of the existingEnglineClient
andBackupTargetClient
interfaces. This will be adaptive when using the EngineClient interface for the proxy or non-proxy/fallback operations.type EngineClientProxy interface { EngineClient BackupTargetBinaryClient IsGRPC() bool Start(*longhorn.InstanceManager, logrus.FieldLogger, *datastore.DataStore) error Stop(*longhorn.InstanceManager) error Ping() error }
- Provide a fallback interface caller when getting the gRPC client from the proxyHandler. The fallback callers are:
Overview Storage Network Overview Implementation
Setting
Add a new global setting Storage Network
.
- The setting is
string
. - The default value is
""
. - The setting should be in the
danger zone
category. - The setting will be validated at admission webhook setting validator.
- The setting should be in the form of
< NAMESPACE>/<NETWORK-ATTACHMENT-DEFINITION-NAME>
. - The setting cannot be updated when volumes are attached.
- The setting should be in the form of
CRD
Engine:
- New
storageIP
in status. - Use the replica
status.storageIP
instead of the replicastatus.IP
for the replicaAddressMap.
Replica:
- New
storageIP
in status.
BackingImageManager:
- New
storageIP
in status.
Instance Manager Controller
- When creating instance manager pods, add
k8s.v1.cni.cncf.io/networks
annotation withlhnet1
as interface name. Use thestorage-network
setting value for the namespace and name.k8s.v1.cni.cncf.io/networks: ' [ { "namespace": "kube-system", "name": "demo-10-30-0-0", "interface": "lhnet1" } ] '
Instance Handler
- Get the IP from instance manager Pod annotation
k8s.v1.cni.cncf.io/network-status
. Use the IP forEngine
andReplica
Storage IP. When thestorage-network
setting is empty, The Storage IP will be the pod IP.
Backing Image Manager Controller
- When creating backing image manager pods, add
k8s.v1.cni.cncf.io/networks
annotation withlhnet1
as interface name. Use thestorage-network
setting value for the namespace and name.k8s.v1.cni.cncf.io/networks: ' [ { "namespace": "kube-system", "name": "demo-10-30-0-0", "interface": "lhnet1" } ] '
- Get the IP from backing image manager Pod annotation
k8s.v1.cni.cncf.io/network-status
. Use the IP forBackingImageManager
Storage IP. When thestorage-network
setting is empty, The Storage IP will be the pod IP.
Backing Image Data Source Controller
- When creating backing image data source pods, add
k8s.v1.cni.cncf.io/networks
annotation withlhnet1
as interface name. Use thestorage-network
setting value for the namespace and name.k8s.v1.cni.cncf.io/networks: ' [ { "namespace": "kube-system", "name": "demo-10-30-0-0", "interface": "lhnet1" } ] '
Backing Image Manager - Export From volume
- get the IPv4 of the
lhnet1
interface and use it as the receiver address. Use the pod IP if the interface doesn't exist.
Setting Controller
- Do not update the
storage-network
setting and return an error whenVolumes
are attached. - Delete all backing image manager pods.
- Delete all instance manager pods.
Test plan
CI Pipeline
All existing tests should pass when the cluster has the storage network configured. We should consider having a new test pipeline for the storage network.
Infra Prerequisites:
- Secondary network interface added to each cluster instance.
- Multus deployed.
- Network-attachment-definition created.
- Routing is configured in all cluster nodes to ensure the network is accessible between instances.
- For AWS, disable network source/destination checks for each cloud-provider instance.
Test storage-network setting
Scenario: Engine
, Replica
and BackingImageManager
should use IP in storage-network
NetworkAttachmentDefinition
subnet/CIDR range after setting update.
Upgrade strategy
Some old instance manager pods are still running after upgrade. Old engine instance managers do not have the gRPC proxy server for Manager to communicate. Hence, we need to support backward compatibility.
Manager communication:
- Bump instance manager API version.
- Manager checks for incompatible version and fall back to requests through the engine binary.
Volume/Engine live upgrade:
- Keep live upgrade. This will be a soft notice for users to know we will not enforce any change in 1.3, but it will happen in 1.4.
Note [optional]
None