doc: Restore volume after unexpected detachment

Longhorn #851 Signed-off-by: Shuo Wu <shuo@rancher.com>
2019-11-08 17:32:13 -08:00 · 2019-11-08 17:32:13 -08:00 · 4dc5d9ea4b
commit 4dc5d9ea4b
parent 5e2f8cc45e
2 changed files with 80 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -222,6 +222,7 @@ More examples are available at `./examples/`
 ### [Use CSI driver on RancherOS/CoreOS + RKE or K3S](./docs/csi-config.md)
 ### [Restore a backup to an image file](./docs/restore-to-file.md)
 ### [Disaster Recovery Volume](./docs/dr-volume.md)
 ### [Restore volume after unexpected detachment](./docs/restore-volume.md)
 # Troubleshooting
 You can click `Generate Support Bundle` link at the bottom of the UI to download a zip file contains Longhorn related configuration and logs.
--- a/docs/restore-volume.md
+++ b/docs/restore-volume.md
@ -0,0 +1,79 @@
 # Restore volume after unexpected detachment
 ## Overview
 1. Now Longhorn can automatically reattach then remount volumes if unexpected detachment happens. e.g., [Kubernetes upgrade](https://github.com/longhorn/longhorn/issues/703), [Docker reboot](https://github.com/longhorn/longhorn/issues/686).
 2. After reattachment and remount complete, users may need to **manually restart the related workload containers** for the volume restoration **if the following recommended setup is not applied**.
 ## Recommended setup when using Longhorn volumes
 In order to restore unexpectedly detached volumes automatically, users can set `restartPolicy` to `Always` then add `livenessProbe` for the workloads using Longhorn volumes.
 Then those workloads will be restarted automatically after reattachment and remount.
 Here is one example for the setup:
 ```
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: longhorn-volv-pvc
 spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 2Gi
 ---
 apiVersion: v1
 kind: Pod
 metadata:
  name: volume-test
  namespace: default
 spec:
  restartPolicy: Always
  containers:
  - name: volume-test
    image: nginx:stable-alpine
    imagePullPolicy: IfNotPresent
    livenessProbe:
      exec:
        command:
        - ls
        - /data/lost+found
      initialDelaySeconds: 5
      periodSeconds: 5
    volumeMounts:
    - name: volv
      mountPath: /data
    ports:
    - containerPort: 80
  volumes:
  - name: volv
    persistentVolumeClaim:
      claimName: longhorn-volv-pvc
 ```
 - The directory used in the `livenessProbe` will be `<volumeMount.mountPath>/lost+found`
 - Don't set a short interval for `livenessProbe.periodSeconds`, e.g., 1s. The liveness command is CPU consuming.
 ## Manually restart workload containers
 ## This solution is applied only if:
 1. The Longhorn volume is reattached automatically.
 2. The above setup is not included when the related workload is launched.
 ### Steps
 1. Figure out on which node the related workload's containers are running
 ```
 kubectl -n <namespace of your workload> get pods <workload's pod name> -o wide
 ```
 2. Connect to the node. e.g., `ssh`
 3. Figure out the containers belonging to the workload
 ```
 docker ps
 ```
 By checking the columns `COMMAND` and `NAMES` of the output, you can find the corresponding container
 4. Restart the container
 ```
 docker restart <the container ID of the workload>
 ``` 
 ### Reason
 Typically the volume mount propagation is not `Bidirectional`. It means the Longhorn remount operation won't be propagated to the workload containers if the containers are not restarted.