doc: Restore volume after unexpected detachment

Longhorn #851

Signed-off-by: Shuo Wu <shuo@rancher.com>
This commit is contained in:
Shuo Wu 2019-11-08 17:32:13 -08:00 committed by Sheng Yang
parent 5e2f8cc45e
commit 4dc5d9ea4b
2 changed files with 80 additions and 0 deletions

View File

@ -222,6 +222,7 @@ More examples are available at `./examples/`
### [Use CSI driver on RancherOS/CoreOS + RKE or K3S](./docs/csi-config.md)
### [Restore a backup to an image file](./docs/restore-to-file.md)
### [Disaster Recovery Volume](./docs/dr-volume.md)
### [Restore volume after unexpected detachment](./docs/restore-volume.md)
# Troubleshooting
You can click `Generate Support Bundle` link at the bottom of the UI to download a zip file contains Longhorn related configuration and logs.

79
docs/restore-volume.md Normal file
View File

@ -0,0 +1,79 @@
# Restore volume after unexpected detachment
## Overview
1. Now Longhorn can automatically reattach then remount volumes if unexpected detachment happens. e.g., [Kubernetes upgrade](https://github.com/longhorn/longhorn/issues/703), [Docker reboot](https://github.com/longhorn/longhorn/issues/686).
2. After reattachment and remount complete, users may need to **manually restart the related workload containers** for the volume restoration **if the following recommended setup is not applied**.
## Recommended setup when using Longhorn volumes
In order to restore unexpectedly detached volumes automatically, users can set `restartPolicy` to `Always` then add `livenessProbe` for the workloads using Longhorn volumes.
Then those workloads will be restarted automatically after reattachment and remount.
Here is one example for the setup:
```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: longhorn-volv-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 2Gi
---
apiVersion: v1
kind: Pod
metadata:
name: volume-test
namespace: default
spec:
restartPolicy: Always
containers:
- name: volume-test
image: nginx:stable-alpine
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- ls
- /data/lost+found
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- name: volv
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: volv
persistentVolumeClaim:
claimName: longhorn-volv-pvc
```
- The directory used in the `livenessProbe` will be `<volumeMount.mountPath>/lost+found`
- Don't set a short interval for `livenessProbe.periodSeconds`, e.g., 1s. The liveness command is CPU consuming.
## Manually restart workload containers
## This solution is applied only if:
1. The Longhorn volume is reattached automatically.
2. The above setup is not included when the related workload is launched.
### Steps
1. Figure out on which node the related workload's containers are running
```
kubectl -n <namespace of your workload> get pods <workload's pod name> -o wide
```
2. Connect to the node. e.g., `ssh`
3. Figure out the containers belonging to the workload
```
docker ps
```
By checking the columns `COMMAND` and `NAMES` of the output, you can find the corresponding container
4. Restart the container
```
docker restart <the container ID of the workload>
```
### Reason
Typically the volume mount propagation is not `Bidirectional`. It means the Longhorn remount operation won't be propagated to the workload containers if the containers are not restarted.