Sheng Yang c39b540deb Rename restore-volume.md to recover-volume.md

Signed-off-by: Sheng Yang <sheng.yang@rancher.com>

2019-11-12 23:37:16 -08:00

2.6 KiB

Raw Permalink Blame History

Recover volume after unexpected detachment

Overview

Now Longhorn can automatically reattach then remount volumes if unexpected detachment happens. e.g., Kubernetes upgrade, Docker reboot.
After reattachment and remount complete, users may need to manually restart the related workload containers for the volume restoration if the following recommended setup is not applied.

Recommended setup when using Longhorn volumes

In order to recover unexpectedly detached volumes automatically, users can set restartPolicy to Always then add livenessProbe for the workloads using Longhorn volumes. Then those workloads will be restarted automatically after reattachment and remount.

Here is one example for the setup:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: longhorn-volv-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 2Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: volume-test
  namespace: default
spec:
  restartPolicy: Always
  containers:
  - name: volume-test
    image: nginx:stable-alpine
    imagePullPolicy: IfNotPresent
    livenessProbe:
      exec:
        command:
        - ls
        - /data/lost+found
      initialDelaySeconds: 5
      periodSeconds: 5
    volumeMounts:
    - name: volv
      mountPath: /data
    ports:
    - containerPort: 80
  volumes:
  - name: volv
    persistentVolumeClaim:
      claimName: longhorn-volv-pvc

The directory used in the livenessProbe will be <volumeMount.mountPath>/lost+found
Don't set a short interval for livenessProbe.periodSeconds, e.g., 1s. The liveness command is CPU consuming.

Manually restart workload containers

This solution is applied only if:

The Longhorn volume is reattached automatically.
The above setup is not included when the related workload is launched.

Steps

Figure out on which node the related workload's containers are running

kubectl -n <namespace of your workload> get pods <workload's pod name> -o wide

Connect to the node. e.g., ssh
Figure out the containers belonging to the workload

docker ps

By checking the columns COMMAND and NAMES of the output, you can find the corresponding container

Restart the container

docker restart <the container ID of the workload>

Reason

Typically the volume mount propagation is not Bidirectional. It means the Longhorn remount operation won't be propagated to the workload containers if the containers are not restarted.

2.6 KiB Raw Permalink Blame History