4.0 KiB
Use PDB to protect Longhorn components from drains
Summary
Some Longhorn components should be available to correctly handle cleanup/detach Longhorn volumes during the draining process.
They are: csi-attacher
, csi-provisioner
, longhorn-admission-webhook
, longhorn-conversion-webhook
, share-manager
, instance-manager
, and daemonset pods in longhorn-system
namespace.
This LEP outlines our existing solutions to protect these components, the issues of these solutions, and the proposal for improvement.
Related Issues
https://github.com/longhorn/longhorn/issues/3304
Motivation
Goals
- Have better ways to protect Longhorn components (
csi-attacher
,csi-provisioner
,longhorn-admission-webhook
,longhorn-conversion-webhook
) without demanding the users to specify the draining flags to skip these pods.
Proposal
-
Our existing solutions to protect these components are:
- For
instance-manager
: dynamically create/delete instance manager PDB - For Daemonset pods in
longhorn-system
namespace: we advise the users to specify--ignore-daemonsets
to ignore them in thekubectl drain
command. This actually follows the best practice - For
csi-attacher
,csi-provisioner
,longhorn-admission-webhook
, andlonghorn-conversion-webhook
: we advise the user to specify--pod-selector
to ignore these pods
- For
-
Proposal for
csi-attacher
,csi-provisioner
,longhorn-admission-webhook
, andlonghorn-conversion-webhook
:
The problem with the existing solution is that sometime, users could not specify--pod-selector
for thekubectl drain
command. For example, for the users that are using the project System Upgrade Controller, they don't have option to specify--pod-selector
. Also, we would like to have a more automatic way instead of relying on the user to set kubectl drain options.Therefore, we propose the following design:
- Longhorn manager automatically create PDBs for
csi-attacher
,csi-provisioner
,longhorn-admission-webhook
, andlonghorn-conversion-webhook
withminAvailable
set to 1. This will make sure that each of these deployment has at least 1 running pod during the draining process. - Longhorn manager continuously watches the volumes and removes the PDBs once there is no attached volume.
This should work for both single-node and multi-node cluster.
- Longhorn manager automatically create PDBs for
User Stories
Story 1
Before the enhancement, users would need to specify the drain options for drain command to exclude Longhorn pods. Sometimes, this is not possible when users use third-party solution to drain and upgrade kubernetes, such as System Upgrade Controller.
Story 2
User Experience In Detail
After the enhancement, the user can doesn't need to specify the drain options for the drain command to exclude Longhorn pods.
API changes
None
Design
Implementation Overview
Create a new controller inside Longhorn manager called longhorn-pdb-controller
, the controller listens for the changes for
csi-attacher
, csi-provisioner
, longhorn-admission-webhook
, longhorn-conversion-webhook
, and Longhorn volumes to adjust the PDB correspondingly.
Test plan
https://github.com/longhorn/longhorn/issues/3304#issuecomment-1467174481
Upgrade strategy
No Upgrade is needed
Note
In the original Github ticket, we mentioned that we need to add PDB to protect share manager pod from being drained before its workload pods because if share manager pod doesn't exist then its volume cannot be unmounted in the CSI flow. However, with the fix https://github.com/longhorn/longhorn/issues/5296, we can always umounted the volume even if the share manager is not running. Therefore, we don't need to protect share manager pod.