feat(lep): consolidate instance managers

Ref: 5208

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
This commit is contained in:
Chin-Ya Huang 2023-03-01 11:15:24 +08:00 committed by David Ko
parent 26a6c23156
commit 5f4111249a

View File

@ -0,0 +1,598 @@
# Consolidate Longhorn Instance Managers
## Summary
Longhorn architecture includes engine and replica instance manager pods on each node. After the upgrade, Longhorn adds an additional engine and replica instance manager pods. When the cluster is set with a default request of 12% guaranteed CPU, all instance manager pods will occupy 12% * 4 CPUs per node. Nevertheless, this caused high base resource requirements and is likely unnecessary.
```
NAME STATE E-CPU(CORES) E-MEM(BYTES) R-CPU(CORES) R-MEM(BYTES) CREATED-WORKLOADS DURATION(MINUTES) AGE
demo-0 (no-IO) Complete 8.88m 24Mi 1.55m 43Mi 5 10 22h
demo-0-bs-512b-5g Complete 109.70m 66Mi 36.46m 54Mi 5 10 16h
demo-0-bs-1m-10g Complete 113.16m 65Mi 36.63m 56Mi 5 10 14h
demo-0-bs-5m-10g Complete 114.17m 64Mi 31.37m 54Mi 5 10 42m
```
Aiming to simplify the architecture and free up some resource requests, this document proposes to consolidate the engine and replica instance managers into a single pod. This consolidation will not affect any data plane operations or volume migration. As the engine process is the primary consumer of CPU resources, merging the instance managers will result in a 50% reduction in CPU requests for instance managers. This is because there will only be one instance manager pod for both process types.
### Related Issues
Phase 1:
- https://github.com/longhorn/longhorn/issues/5208
Phase 2:
- https://github.com/longhorn/longhorn/issues/5842
- https://github.com/longhorn/longhorn/issues/5844
## Motivation
### Goals
- Having single instance manager pods to run replica and engine processes.
- After the Longhorn upgrade, the previous engine instance manager should continue to handle data plane operations for attached volumes until they are detached. And the replica instance managers should continue servicing data plane operations until the volume engine is upgraded or volume is detached.
- Automatically clean up any engine/replica instance managers when all instances (process) get removed.
- Online/offline upgrade volume engine should be functional. The replicas will automatically migrate to use the new `aio` (all-in-one) type instance managers, and the `engine` type instance manager will continue to serve until the first volume detachment.
- The Pod Disruption Budget (PDB) handling for cluster auto-scaler and node drain should work as expected.
### Non-goals [optional]
`None`
## Proposal
To ensure uninterrupted upgrades, this enhancement will be implemented in two phases. The existing `engine`/`replica` instance manager may coexist with the consolidated instance manager during the transition.
Phase 1:
- Introduce a new `aio` instance manager type. The `engine` and `replica` instance manager types will be deprecated and continue to serve for the upgraded volumes until the first volume detachment.
- Introduce new `Guaranteed Instance Manager CPU` setting, `Guaranteed Engine Manager CPU` and `Guaranteed Replica Manager CPU` settings will be deprecated and continues to serve for the upgraded volumes until the first volume detachment.
Phase 2:
- Remove all instance manager types.
- Remove the `Guaranteed Engine Manager CPU` and `Guaranteed Replica Manager CPU` settings.
### User Stories
- For freshly installed Longhorn, the user will see `aio` type instance managers.
- For upgraded Longhorn with all volume detached, the user will see the `engine`, and `replica` instance managers removed and replaced by `aio` type instance managers.
- For upgraded Longhorn with volume attached, the user will see existing `engine`, and `replica` instance managers still servicing the old attached volumes and the new `aio` type instance manager servicing new volume attachments.
### User Experience In Detail
#### New Installation
1. User creates and attaches a volume.
```
> kubectl -n longhorn-system get volume
NAME STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
demo-0 attached unknown 21474836480 ip-10-0-1-113 12s
> kubectl -n longhorn-system get lhim
NAME STATE TYPE NODE AGE
instance-manager-8f81ca7c3bf95bbbf656be6ac2d1b7c4 running aio ip-10-0-1-105 124m
instance-manager-7e59c9f2ef7649630344050a8d5be68e running aio ip-10-0-1-102 124m
instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc running aio ip-10-0-1-113 124m
> kubectl -n longhorn-system get lhim/instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc -o yaml
apiVersion: longhorn.io/v1beta2
kind: InstanceManager
metadata:
creationTimestamp: "2023-03-16T10:48:59Z"
generation: 1
labels:
longhorn.io/component: instance-manager
longhorn.io/instance-manager-image: imi-8d41c3a4
longhorn.io/instance-manager-type: aio
longhorn.io/managed-by: longhorn-manager
longhorn.io/node: ip-10-0-1-113
name: instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc
namespace: longhorn-system
ownerReferences:
- apiVersion: longhorn.io/v1beta2
blockOwnerDeletion: true
kind: Node
name: ip-10-0-1-113
uid: 00c0734b-f061-4b28-8071-62596274cb18
resourceVersion: "926067"
uid: a869def6-1077-4363-8b64-6863097c1e26
spec:
engineImage: ""
image: c3y1huang/research:175-lh-im
nodeID: ip-10-0-1-113
type: aio
status:
apiMinVersion: 1
apiVersion: 3
currentState: running
instanceEngines:
demo-0-e-06d4c77d:
spec:
name: demo-0-e-06d4c77d
status:
endpoint: ""
errorMsg: ""
listen: ""
portEnd: 10015
portStart: 10015
resourceVersion: 0
state: running
type: engine
instanceReplicas:
demo-0-r-ca78cab4:
spec:
name: demo-0-r-ca78cab4
status:
endpoint: ""
errorMsg: ""
listen: ""
portEnd: 10014
portStart: 10000
resourceVersion: 0
state: running
type: replica
ip: 10.42.0.238
ownerID: ip-10-0-1-113
proxyApiMinVersion: 1
proxyApiVersion: 4
```
- The engine and replica instances(processes) created in the `aio` type instance manager.
#### Upgrade With Volumes Detached
1. User has a Longhorn v1.4.0 cluster and a volume in the detached state.
```
> kubectl -n longhorn-system get volume
NAME STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
demo-1 detached unknown 21474836480 12s
> kubectl -n longhorn-system get lhim
NAME STATE TYPE NODE AGE
instance-manager-r-1278a39fa6e6d8f49eba156b81ac1f59 running replica ip-10-0-1-113 3m44s
instance-manager-e-1278a39fa6e6d8f49eba156b81ac1f59 running engine ip-10-0-1-113 3m44s
instance-manager-e-45ad195db7f55ed0a2dd1ea5f19c5edf running engine ip-10-0-1-105 3m41s
instance-manager-r-45ad195db7f55ed0a2dd1ea5f19c5edf running replica ip-10-0-1-105 3m41s
instance-manager-e-225a2c7411a666c8eab99484ab632359 running engine ip-10-0-1-102 3m42s
instance-manager-r-225a2c7411a666c8eab99484ab632359 running replica ip-10-0-1-102 3m42s
```
1. User upgraded Longhorn to v1.5.0.
```
> kubectl -n longhorn-system get lhim
NAME STATE TYPE NODE AGE
instance-manager-8f81ca7c3bf95bbbf656be6ac2d1b7c4 running aio ip-10-0-1-105 112s
instance-manager-7e59c9f2ef7649630344050a8d5be68e running aio ip-10-0-1-102 48s
instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc running aio ip-10-0-1-113 47s
```
- Unused `engine` type instance managers removed.
- Unused `replica` type instance managers removed.
- 3 `aio` type instance managers created.
1. User upgraded volume engine.
1. User attaches the volume.
```
> kubectl -n longhorn-system get volume
NAME STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
demo-1 attached healthy 21474836480 ip-10-0-1-113 4m51s
> kubectl -n longhorn-system get lhim
NAME STATE TYPE NODE AGE
instance-manager-8f81ca7c3bf95bbbf656be6ac2d1b7c4 running aio ip-10-0-1-105 3m58s
instance-manager-7e59c9f2ef7649630344050a8d5be68e running aio ip-10-0-1-102 2m54s
instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc running aio ip-10-0-1-113 2m53s
> kubectl -n longhorn-system get lhim/instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc -o yaml
apiVersion: longhorn.io/v1beta2
kind: InstanceManager
metadata:
creationTimestamp: "2023-03-16T13:03:15Z"
generation: 1
labels:
longhorn.io/component: instance-manager
longhorn.io/instance-manager-image: imi-8d41c3a4
longhorn.io/instance-manager-type: aio
longhorn.io/managed-by: longhorn-manager
longhorn.io/node: ip-10-0-1-113
name: instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc
namespace: longhorn-system
ownerReferences:
- apiVersion: longhorn.io/v1beta2
blockOwnerDeletion: true
kind: Node
name: ip-10-0-1-113
uid: 12eb73cd-e9de-4c45-875d-3eff7cfb1034
resourceVersion: "3762"
uid: c996a89a-f841-4841-b69d-4218ed8d8c6e
spec:
engineImage: ""
image: c3y1huang/research:175-lh-im
nodeID: ip-10-0-1-113
type: aio
status:
apiMinVersion: 1
apiVersion: 3
currentState: running
instanceEngines:
demo-1-e-b7d28fb3:
spec:
name: demo-1-e-b7d28fb3
status:
endpoint: ""
errorMsg: ""
listen: ""
portEnd: 10015
portStart: 10015
resourceVersion: 0
state: running
type: engine
instanceReplicas:
demo-1-r-189c1bbb:
spec:
name: demo-1-r-189c1bbb
status:
endpoint: ""
errorMsg: ""
listen: ""
portEnd: 10014
portStart: 10000
resourceVersion: 0
state: running
type: replica
ip: 10.42.0.28
ownerID: ip-10-0-1-113
proxyApiMinVersion: 1
proxyApiVersion: 4
```
- The engine and replica instances(processes) created in the `aio` type instance manager.
#### Upgrade With Volumes Attached
1. User has a Longhorn v1.4.0 cluster and a volume in the attached state.
```
> kubectl -n longhorn-system get volume
NAME STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
demo-2 attached healthy 21474836480 ip-10-0-1-113 35s
> kubectl -n longhorn-system get lhim
NAME STATE TYPE NODE AGE
instance-manager-r-1278a39fa6e6d8f49eba156b81ac1f59 running replica ip-10-0-1-113 2m41s
instance-manager-r-45ad195db7f55ed0a2dd1ea5f19c5edf running replica ip-10-0-1-105 119s
instance-manager-r-225a2c7411a666c8eab99484ab632359 running replica ip-10-0-1-102 119s
instance-manager-e-1278a39fa6e6d8f49eba156b81ac1f59 running engine ip-10-0-1-113 2m41s
instance-manager-e-225a2c7411a666c8eab99484ab632359 running engine ip-10-0-1-102 119s
instance-manager-e-45ad195db7f55ed0a2dd1ea5f19c5edf running engine ip-10-0-1-105 119s
```
1. User upgraded Longhorn to v1.5.0.
```
> kubectl -n longhorn-system get lhim
NAME STATE TYPE NODE AGE
instance-manager-r-1278a39fa6e6d8f49eba156b81ac1f59 running replica ip-10-0-1-113 5m24s
instance-manager-r-45ad195db7f55ed0a2dd1ea5f19c5edf running replica ip-10-0-1-105 4m42s
instance-manager-r-225a2c7411a666c8eab99484ab632359 running replica ip-10-0-1-102 4m42s
instance-manager-e-1278a39fa6e6d8f49eba156b81ac1f59 running engine ip-10-0-1-113 5m24s
instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc running aio ip-10-0-1-113 117s
instance-manager-7e59c9f2ef7649630344050a8d5be68e running aio ip-10-0-1-102 33s
instance-manager-8f81ca7c3bf95bbbf656be6ac2d1b7c4 running aio ip-10-0-1-105 32s
```
- 2 unused `engine` type instance managers removed.
- 3 `aio` type instance managers created.
1. User upgraded online volume engine.
```
> kubectl -n longhorn-system get lhim
NAME STATE TYPE NODE AGE
instance-manager-8f81ca7c3bf95bbbf656be6ac2d1b7c4 running aio ip-10-0-1-105 6m53s
instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc running aio ip-10-0-1-113 8m18s
instance-manager-7e59c9f2ef7649630344050a8d5be68e running aio ip-10-0-1-102 6m54s
instance-manager-e-1278a39fa6e6d8f49eba156b81ac1f59 running engine ip-10-0-1-113 11m
```
- All `replica` type instance manager migrated to `aio` type instance managers.
1. User detached the volume.
```
> kubectl -n longhorn-system get lhim
NAME STATE TYPE NODE AGE
instance-manager-8f81ca7c3bf95bbbf656be6ac2d1b7c4 running aio ip-10-0-1-105 8m38s
instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc running aio ip-10-0-1-113 10m
instance-manager-7e59c9f2ef7649630344050a8d5be68e running aio ip-10-0-1-102 8m39s
```
- The `engine` type instance managers removed.
1. User attached the volume.
```
> kubectl -n longhorn-system get volume
NAME STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
demo-2 attached healthy 21474836480 ip-10-0-1-113 12m
> kubectl -n longhorn-system get lhim
NAME STATE TYPE NODE AGE
instance-manager-7e59c9f2ef7649630344050a8d5be68e running aio ip-10-0-1-102 9m40s
instance-manager-8f81ca7c3bf95bbbf656be6ac2d1b7c4 running aio ip-10-0-1-105 9m39s
instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc running aio ip-10-0-1-113 11m
> kubectl -n longhorn-system get lhim/instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc -o yaml
apiVersion: longhorn.io/v1beta2
kind: InstanceManager
metadata:
creationTimestamp: "2023-03-16T13:12:41Z"
generation: 1
labels:
longhorn.io/component: instance-manager
longhorn.io/instance-manager-image: imi-8d41c3a4
longhorn.io/instance-manager-type: aio
longhorn.io/managed-by: longhorn-manager
longhorn.io/node: ip-10-0-1-113
name: instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc
namespace: longhorn-system
ownerReferences:
- apiVersion: longhorn.io/v1beta2
blockOwnerDeletion: true
kind: Node
name: ip-10-0-1-113
uid: 6d109c40-abe3-42ed-8e40-f76cfc33e4c2
resourceVersion: "4339"
uid: 01556f2c-fbb4-4a15-a778-c73df518b070
spec:
engineImage: ""
image: c3y1huang/research:175-lh-im
nodeID: ip-10-0-1-113
type: aio
status:
apiMinVersion: 1
apiVersion: 3
currentState: running
instanceEngines:
demo-2-e-65845267:
spec:
name: demo-2-e-65845267
status:
endpoint: ""
errorMsg: ""
listen: ""
portEnd: 10015
portStart: 10015
resourceVersion: 0
state: running
type: engine
instanceReplicas:
demo-2-r-a2bd415f:
spec:
name: demo-2-r-a2bd415f
status:
endpoint: ""
errorMsg: ""
listen: ""
portEnd: 10014
portStart: 10000
resourceVersion: 0
state: running
type: replica
ip: 10.42.0.31
ownerID: ip-10-0-1-113
proxyApiMinVersion: 1
proxyApiVersion: 4
```
- The engine and replica instances(processes) created in the `aio` type instance manager.
### API changes
- Introduce new `instanceManagerCPURequest` in `Node` resource.
- Introduce new `instanceEngines` in InstanceManager resource.
- Introduce new `instanceReplicas` in InstanceManager resource.
## Design
### Phase 1: All-in-one Instance Manager Implementation Overview
Introducing a new instance manager type to have Longhorn continue to service existing attached volumes for Longhorn v1.5.x.
#### New Instance Manager Type
- Introduce a new `aio` (all-in-one) instance manager type to differentiate the handling of the old `engine`/`replica` instance managers and the new consolidated instance managers.
- When getting InstanceManagers by instance of the attached volume, retrieve the InstanceManager from the instance manager list using the new `aio` type.
#### InstanceManager `instances` Field Replacement For New InstanceManagers
- New InstanceManagers will use the `instanceEngines` and `instanceReplicas` fields, replacing the `instances` field.
- For the existing InstanceManagers for the attached Volumes, the `instances` field will remain in use.
#### Instance Manager Execution
- Rename the `engine-manager` script to `instance-manager`.
- Bump up version to `4`.
#### New Instance Manager Pod
- Replace `engine` and `replica` pod creation with spec to use for `aio` instance manager pod.
```
> kubectl -n longhorn-system get pod/instance-manager-0d96990c6881c828251c534eb31bfa85 -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
longhorn.io/last-applied-tolerations: '[]'
creationTimestamp: "2023-03-01T08:13:03Z"
labels:
longhorn.io/component: instance-manager
longhorn.io/instance-manager-image: imi-a1873aa3
longhorn.io/instance-manager-type: aio
longhorn.io/managed-by: longhorn-manager
longhorn.io/node: ip-10-0-1-113
name: instance-manager-0d96990c6881c828251c534eb31bfa85
namespace: longhorn-system
ownerReferences:
- apiVersion: longhorn.io/v1beta2
blockOwnerDeletion: true
controller: true
kind: InstanceManager
name: instance-manager-0d96990c6881c828251c534eb31bfa85
uid: 51c13e4f-d0a2-445d-b98b-80cca7080c78
resourceVersion: "12133"
uid: 81397cca-d9e9-48f6-8813-e7f2e2cd4617
spec:
containers:
- args:
- instance-manager
- --debug
- daemon
- --listen
- 0.0.0.0:8500
env:
- name: TLS_DIR
value: /tls-files/
image: c3y1huang/research:174-lh-im
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 3
periodSeconds: 5
successThreshold: 1
tcpSocket:
port: 8500
timeoutSeconds: 4
name: instance-manager
resources:
requests:
cpu: 960m
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /host
mountPropagation: HostToContainer
name: host
- mountPath: /engine-binaries/
mountPropagation: HostToContainer
name: engine-binaries
- mountPath: /host/var/lib/longhorn/unix-domain-socket/
name: unix-domain-socket
- mountPath: /tls-files/
name: longhorn-grpc-tls
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-hkbfc
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: ip-10-0-1-113
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
serviceAccount: longhorn-service-account
serviceAccountName: longhorn-service-account
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- hostPath:
path: /
type: ""
name: host
- hostPath:
path: /var/lib/longhorn/engine-binaries/
type: ""
name: engine-binaries
- hostPath:
path: /var/lib/longhorn/unix-domain-socket/
type: ""
name: unix-domain-socket
- name: longhorn-grpc-tls
secret:
defaultMode: 420
optional: true
secretName: longhorn-grpc-tls
- name: kube-api-access-hkbfc
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-03-01T08:13:03Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2023-03-01T08:13:04Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2023-03-01T08:13:04Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2023-03-01T08:13:03Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://cb249b97d128e47a7f13326b76496656d407fd16fc44b5f1a37384689d0fa900
image: docker.io/c3y1huang/research:174-lh-im
imageID: docker.io/c3y1huang/research@sha256:1f4e86b92b3f437596f9792cd42a1bb59d1eace4196139dc030b549340af2e68
lastState: {}
name: instance-manager
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2023-03-01T08:13:03Z"
hostIP: 10.0.1.113
phase: Running
podIP: 10.42.0.27
podIPs:
- ip: 10.42.0.27
qosClass: Burstable
startTime: "2023-03-01T08:13:03Z"
```
#### Controllers Change
- Map the status of the engine/replica process to the corresponding instanceEngines/instanceReplicas fields in the InstanceManager instead of the instances field. To ensure backward compatibility, the instances field will continue to be utilized by the pre-upgrade attached volume.
- Ensure support for the previous version's attached volumes with the old engine/replica instance manager types.
- Replace the old engine/replica InstanceManagers with the aio type instance manager during replenishment.
#### New Setting
- Introduce a new `Guaranteed Instance Manager CPU` setting for the new `aio` instance manager pod.
- The `Guaranteed Engine Manager CPU` and `Guaranteed Replica Manager CPU` will co-exist with this setting in Longhorn v1.5.x.
### Phase 2 - Deprecations Overview
Based on the assumption when upgrading from v1.5.x to 1.6.x, volumes should have detached at least once and migrated to `aio` type instance managers. Then the cluster should not have volume depending on `engine` and `replica` type instance managers. Therefore in this phase, remove the related types and settings.
#### Old Instance Manager Types
- Remove the `engine`, `replica`, and `aio` instance manager types. There is no need for differentiation.
### Old Settings
- Remove the `Guaranteed Engine Manager CPU` and `Guaranteed Replica Manager CPU` settings. The settings have already been replaced by the `Guaranteed Instance Manager CPU` setting in phase 1.
#### Controllers Change
- Remove support for engine/replica InstanceManager types.
### Test plan
Support new `aio` instance manager type and run regression test cases.
### Upgrade strategy
The `instances` field in the instance manager custom resource will still be utilized by old instance managers of the attached volume.
## Note [optional]
`None`