doc: Restore volume after unexpected detachment

Longhorn #851 Signed-off-by: Shuo Wu <shuo@rancher.com>
2019-11-08 17:32:13 -08:00 · 2019-11-08 17:32:13 -08:00 · 4dc5d9ea4b
commit 4dc5d9ea4b
parent 5e2f8cc45e
2 changed files with 80 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -222,6 +222,7 @@ More examples are available at `./examples/`
 ### [Use CSI driver on RancherOS/CoreOS + RKE or K3S](./docs/csi-config.md)
 ### [Restore a backup to an image file](./docs/restore-to-file.md)
 ### [Disaster Recovery Volume](./docs/dr-volume.md)
+### [Restore volume after unexpected detachment](./docs/restore-volume.md)

 # Troubleshooting
 You can click `Generate Support Bundle` link at the bottom of the UI to download a zip file contains Longhorn related configuration and logs.
--- a/docs/restore-volume.md
+++ b/docs/restore-volume.md
@ -0,0 +1,79 @@
+# Restore volume after unexpected detachment
+
+## Overview
+1. Now Longhorn can automatically reattach then remount volumes if unexpected detachment happens. e.g., [Kubernetes upgrade](https://github.com/longhorn/longhorn/issues/703), [Docker reboot](https://github.com/longhorn/longhorn/issues/686).
+2. After reattachment and remount complete, users may need to **manually restart the related workload containers** for the volume restoration **if the following recommended setup is not applied**.
+
+## Recommended setup when using Longhorn volumes
+In order to restore unexpectedly detached volumes automatically, users can set `restartPolicy` to `Always` then add `livenessProbe` for the workloads using Longhorn volumes.
+Then those workloads will be restarted automatically after reattachment and remount.
+
+Here is one example for the setup:
+```
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: longhorn-volv-pvc
+spec:
+  accessModes:
+    - ReadWriteOnce
+  storageClassName: longhorn
+  resources:
+    requests:
+      storage: 2Gi
+---
+apiVersion: v1
+kind: Pod
+metadata:
+  name: volume-test
+  namespace: default
+spec:
+  restartPolicy: Always
+  containers:
+  - name: volume-test
+    image: nginx:stable-alpine
+    imagePullPolicy: IfNotPresent
+    livenessProbe:
+      exec:
+        command:
+        - ls
+        - /data/lost+found
+      initialDelaySeconds: 5
+      periodSeconds: 5
+    volumeMounts:
+    - name: volv
+      mountPath: /data
+    ports:
+    - containerPort: 80
+  volumes:
+  - name: volv
+    persistentVolumeClaim:
+      claimName: longhorn-volv-pvc
+```
+- The directory used in the `livenessProbe` will be `<volumeMount.mountPath>/lost+found`
+- Don't set a short interval for `livenessProbe.periodSeconds`, e.g., 1s. The liveness command is CPU consuming.
+
+## Manually restart workload containers
+## This solution is applied only if:
+1. The Longhorn volume is reattached automatically.
+2. The above setup is not included when the related workload is launched.
+
+### Steps
+1. Figure out on which node the related workload's containers are running
+```
+kubectl -n <namespace of your workload> get pods <workload's pod name> -o wide
+```
+2. Connect to the node. e.g., `ssh`
+3. Figure out the containers belonging to the workload
+```
+docker ps
+```
+By checking the columns `COMMAND` and `NAMES` of the output, you can find the corresponding container
+
+4. Restart the container
+```
+docker restart <the container ID of the workload>
+``` 
+
+### Reason
+Typically the volume mount propagation is not `Bidirectional`. It means the Longhorn remount operation won't be propagated to the workload containers if the containers are not restarted.