Remove longhorn.io docs from longhorn/longhorn

Signed-off-by: Catherine Luse <catherine.luse@gmail.com>
This commit is contained in:
Catherine Luse 2020-04-13 20:15:49 -07:00 committed by Sheng Yang
parent eb5301b62a
commit 57785d98b8
24 changed files with 7 additions and 2086 deletions

243
README.md
View File

@ -33,255 +33,26 @@ The latest release of Longhorn is **v0.8.0**.
Longhorn is 100% open source software. Project source code is spread across a number of repos:
1. Longhorn engine -- Core controller/replica logic https://github.com/longhorn/longhorn-engine
1. Longhorn manager -- Longhorn orchestration, includes Flexvolume driver for Kubernetes https://github.com/longhorn/longhorn-manager
1. Longhorn manager -- Longhorn orchestration https://github.com/longhorn/longhorn-manager
1. Longhorn UI -- Dashboard https://github.com/longhorn/longhorn-ui
![Longhorn UI](./longhorn-ui.png)
# Requirements
1. Docker v1.13+
2. Kubernetes v1.14+.
3. `open-iscsi` has been installed on all the nodes of the Kubernetes cluster, and `iscsid` daemon is running on all the nodes.
1. For GKE, recommended Ubuntu as guest OS image since it contains open-iscsi already.
2. For Debian/Ubuntu, use `apt-get install open-iscsi` to install.
3. For RHEL/CentOS, use `yum install iscsi-initiator-utils` to install.
4. For EKS with `EKS Kubernetes Worker AMI with AmazonLinux2 image`,
use `yum install iscsi-initiator-utils` to install. You may need to edit cluster security group to allow ssh access.
4. A host filesystem supports `file extents` feature on the nodes to store the data. Currently we support:
1. ext4
2. XFS
For the installation requirements, refer to the [Longhorn documentation.](https://longhorn.io/docs/install/requirements)
# Install
## On Kubernetes clusters Managed by Rancher 2.1 or newer
Longhorn can be installed on a Kubernetes cluster in several ways:
The easiest way to install Longhorn is to deploy Longhorn from Rancher Catalog.
1. On Rancher UI, select the cluster and project you want to install Longhorn. We recommended to create a new project e.g. `Storage` for Longhorn.
2. Navigate to the `Catalog Apps` screen. Select `Launch`, find Longhorn in the list. Select `View Details`, then click `Launch`. Longhorn will be installed in the `longhorn-system` namespace.
After Longhorn has been successfully installed, you can access the Longhorn UI by navigating to the `Catalog Apps` screen.
One benefit of installing Longhorn through Rancher catalog is Rancher provides authentication to Longhorn UI.
If there is a new version of Longhorn available, you will see an `Upgrade Available` sign on the `Catalog Apps` screen. You can click `Upgrade` button to upgrade Longhorn manager. See more about upgrade [here](#upgrade).
## On any Kubernetes cluster
### Install Longhorn with kubectl
You can install Longhorn on any Kubernetes cluster using following command:
```
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml
```
Google Kubernetes Engine (GKE) requires additional setup in order for Longhorn to function properly. If your are a GKE user, read [this page](docs/gke.md) before proceeding.
### Install Longhorn with Helm
First, you need to initialize Helm locally and [install Tiller into your Kubernetes cluster with RBAC](https://helm.sh/docs/using_helm/#role-based-access-control).
Then download Longhorn repository:
```
git clone https://github.com/longhorn/longhorn.git
```
Now using following command to install Longhorn:
* Helm2
```
helm install ./longhorn/chart --name longhorn --namespace longhorn-system
```
* Helm3
```
kubectl create namespace longhorn-system
helm install longhorn ./longhorn/chart/ --namespace longhorn-system
```
---
Longhorn will be installed in the namespace `longhorn-system`
One of the two available drivers (CSI and Flexvolume) would be chosen automatically based on the version of Kubernetes you use. See [here](docs/driver.md) for details.
A successful CSI-based deployment looks like this:
```
# kubectl -n longhorn-system get pod
NAME READY STATUS RESTARTS AGE
compatible-csi-attacher-d9fb48bcf-2rzmb 1/1 Running 0 8m58s
csi-attacher-78bf9b9898-grn2c 1/1 Running 0 32s
csi-attacher-78bf9b9898-lfzvq 1/1 Running 0 8m59s
csi-attacher-78bf9b9898-r64sv 1/1 Running 0 33s
csi-provisioner-8599d5bf97-c8r79 1/1 Running 0 33s
csi-provisioner-8599d5bf97-fc5pz 1/1 Running 0 33s
csi-provisioner-8599d5bf97-p9psl 1/1 Running 0 8m59s
csi-resizer-586665f745-b7p6h 1/1 Running 0 8m59s
csi-resizer-586665f745-kgdxs 1/1 Running 0 33s
csi-resizer-586665f745-vsvvq 1/1 Running 0 33s
engine-image-ei-e10d6bf5-pv2s6 1/1 Running 0 9m30s
instance-manager-e-379373af 1/1 Running 0 8m41s
instance-manager-r-101f13ba 1/1 Running 0 8m40s
longhorn-csi-plugin-7v2dc 4/4 Running 0 8m59s
longhorn-driver-deployer-775897bdf6-k4sfd 1/1 Running 0 10m
longhorn-manager-79xgj 1/1 Running 0 9m50s
longhorn-ui-9fbb5445-httqf 0/1 Running 0 33s
```
### Accessing the UI
> For Longhorn v0.8.0+, UI service type has been changed from `LoadBalancer` to `ClusterIP`
You can run `kubectl -n longhorn-system get svc` to get Longhorn UI service:
```
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
longhorn-backend ClusterIP 10.20.248.250 <none> 9500/TCP 58m
longhorn-frontend ClusterIP 10.20.245.110 <none> 80/TCP 58m
```
To access Longhorn UI when installed from YAML manifest, you need to create an ingress controller.
See more about how to create an Nginx ingress controller with basic authentication [here](https://github.com/longhorn/longhorn/blob/master/docs/longhorn-ingress.md)
# Upgrade
[See here](docs/upgrade.md) for details.
## Upgrade Longhorn manager
##### On Kubernetes clusters Managed by Rancher 2.1 or newer
Follow [the same steps for installation](#install) to upgrade Longhorn manager
##### Using kubectl
```
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml
```
##### Using Helm
```
helm upgrade longhorn ./longhorn/chart
```
## Upgrade Longhorn engine
After Longhorn Manager was upgraded, Longhorn Engine also need to be upgraded using Longhorn UI. [See here](docs/upgrade.md) for details.
# Create Longhorn Volumes
Before you create Kubernetes volumes, you must first create a storage class. Use following command to create a StorageClass called `longhorn`.
```
kubectl create -f https://raw.githubusercontent.com/longhorn/longhorn/master/examples/storageclass.yaml
```
Now you can create a pod using Longhorn like this:
```
kubectl create -f https://raw.githubusercontent.com/longhorn/longhorn/master/examples/pvc.yaml
```
The above yaml file contains two parts:
1. Create a PVC using Longhorn StorageClass.
```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: longhorn-volv-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 2Gi
```
2. Use it in the a Pod as a persistent volume:
```
apiVersion: v1
kind: Pod
metadata:
name: volume-test
namespace: default
spec:
containers:
- name: volume-test
image: nginx:stable-alpine
imagePullPolicy: IfNotPresent
volumeMounts:
- name: volv
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: volv
persistentVolumeClaim:
claimName: longhorn-volv-pvc
```
More examples are available at `./examples/`
- [kubectl](https://longhorn.io/docs/install/install-with-kubectl/)
- [Helm](https://longhorn.io/docs/install/install-with-helm/)
- [Rancher catalog app](https://longhorn.io/docs/install/install-with-rancher/)
# Documentation
### [Snapshot and Backup](./docs/snapshot-backup.md)
### [Volume operations](./docs/volume.md)
### [Settings](./docs/settings.md)
### [Multiple disks](./docs/multidisk.md)
### [iSCSI](./docs/iscsi.md)
### [Kubernetes workload in Longhorn UI](./docs/k8s-workload.md)
### [Storage Tags](./docs/storage-tags.md)
### [Customized default setting](./docs/customized-default-setting.md)
### [Taint Toleration](./docs/taint-toleration.md)
### [Volume Expansion](./docs/expansion.md)
### [Restoring Stateful Set volumes](./docs/restore_statefulset.md)
### [Google Kubernetes Engine](./docs/gke.md)
### [Deal with Kubernetes node failure](./docs/node-failure.md)
### [Use CSI driver on RancherOS/CoreOS + RKE or K3S](./docs/csi-config.md)
### [Restore a backup to an image file](./docs/restore-to-file.md)
### [Disaster Recovery Volume](./docs/dr-volume.md)
### [Recover volume after unexpected detachment](./docs/recover-volume.md)
# Troubleshooting
You can click `Generate Support Bundle` link at the bottom of the UI to download a zip file contains Longhorn related configuration and logs.
See [here](./docs/troubleshooting.md) for the troubleshooting guide.
# Uninstall Longhorn
### Using kubectl
1. To prevent damaging the Kubernetes cluster, we recommend deleting all Kubernetes workloads using Longhorn volumes (PersistentVolume, PersistentVolumeClaim, StorageClass, Deployment, StatefulSet, DaemonSet, etc) first.
2. Create the uninstallation job to clean up CRDs from the system and wait for success:
```
kubectl create -f https://raw.githubusercontent.com/longhorn/longhorn/master/uninstall/uninstall.yaml
kubectl get job/longhorn-uninstall -w
```
Example output:
```
$ kubectl create -f https://raw.githubusercontent.com/longhorn/longhorn/master/uninstall/uninstall.yaml
serviceaccount/longhorn-uninstall-service-account created
clusterrole.rbac.authorization.k8s.io/longhorn-uninstall-role created
clusterrolebinding.rbac.authorization.k8s.io/longhorn-uninstall-bind created
job.batch/longhorn-uninstall created
$ kubectl get job/longhorn-uninstall -w
NAME COMPLETIONS DURATION AGE
longhorn-uninstall 0/1 3s 3s
longhorn-uninstall 1/1 20s 20s
^C
```
3. Remove remaining components:
```
kubectl delete -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml
kubectl delete -f https://raw.githubusercontent.com/longhorn/longhorn/master/uninstall/uninstall.yaml
```
Tip: If you try `kubectl delete -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml` first and get stuck there,
pressing `Ctrl C` then running `kubectl create -f https://raw.githubusercontent.com/longhorn/longhorn/master/uninstall/uninstall.yaml` can also help you remove Longhorn. Finally, don't forget to cleanup remaining components.
### Using Helm
```
helm delete longhorn --purge
```
The official Longhorn documentation is [here.](https://longhorn.io/docs)
## Community
Longhorn is an open source software, so contribution are greatly welcome. Please read [Code of Conduct](./CODE_OF_CONDUCT.md) and [Contributing Guideline](./CONTRIBUTING.md) before contributing.

View File

@ -1,57 +0,0 @@
# Rancher Longhorn Chart
The following document pertains to running Longhorn from the Rancher 2.0 chart.
## Source Code
Longhorn is 100% open source software. Project source code is spread across a number of repos:
1. Longhorn Engine -- Core controller/replica logic https://github.com/rancher/longhorn-engine
2. Longhorn Manager -- Longhorn orchestration, includes Flexvolume driver for Kubernetes https://github.com/rancher/longhorn-manager
3. Longhorn UI -- Dashboard https://github.com/rancher/longhorn-ui
## Prerequisites
1. Rancher v2.1+
2. Docker v1.13+
3. Kubernetes v1.8+ cluster with 1 or more nodes and Mount Propagation feature enabled. If your Kubernetes cluster was provisioned by Rancher v2.0.7+ or later, MountPropagation feature is enabled by default. [Check your Kubernetes environment now](https://github.com/rancher/longhorn#environment-check-script). If MountPropagation is disabled, the Kubernetes Flexvolume driver will be deployed instead of the default CSI driver. Base Image feature will also be disabled if MountPropagation is disabled.
4. Make sure `curl`, `findmnt`, `grep`, `awk` and `blkid` has been installed in all nodes of the Kubernetes cluster.
5. Make sure `open-iscsi` has been installed in all nodes of the Kubernetes cluster. For GKE, recommended Ubuntu as guest OS image since it contains `open-iscsi` already.
## Uninstallation
1. To prevent damage to the Kubernetes cluster, we recommend deleting all Kubernetes workloads using Longhorn volumes (PersistentVolume, PersistentVolumeClaim, StorageClass, Deployment, StatefulSet, DaemonSet, etc).
2. From Rancher UI, navigate to `Catalog Apps` tab and delete Longhorn app.
## Troubleshooting
### I deleted the Longhorn App from Rancher UI instead of following the uninstallation procedure
Redeploy the (same version) Longhorn App. Follow the uninstallation procedure above.
### Problems with CRDs
If your CRD instances or the CRDs themselves can't be deleted for whatever reason, run the commands below to clean up. Caution: this will wipe all Longhorn state!
```
# Delete CRD finalizers, instances and definitions
for crd in $(kubectl get crd -o jsonpath={.items[*].metadata.name} | tr ' ' '\n' | grep longhorn.rancher.io); do
kubectl -n ${NAMESPACE} get $crd -o yaml | sed "s/\- longhorn.rancher.io//g" | kubectl apply -f -
kubectl -n ${NAMESPACE} delete $crd --all
kubectl delete crd/$crd
done
```
### Volume can be attached/detached from UI, but Kubernetes Pod/StatefulSet etc cannot use it
Check if volume plugin directory has been set correctly. This is automatically detected unless user explicitly set it.
By default, Kubernetes uses `/usr/libexec/kubernetes/kubelet-plugins/volume/exec/`, as stated in the [official document](https://github.com/kubernetes/community/blob/master/contributors/devel/flexvolume.md#prerequisites).
Some vendors choose to change the directory for various reasons. For example, GKE uses `/home/kubernetes/flexvolume` instead.
User can find the correct directory by running `ps aux|grep kubelet` on the host and check the `--volume-plugin-dir` parameter. If there is none, the default `/usr/libexec/kubernetes/kubelet-plugins/volume/exec/` will be used.
---
Please see [link](https://github.com/rancher/longhorn) for more information.

View File

@ -1,53 +0,0 @@
# Longhorn CSI on K3S
## Requirements
1. Kubernetes v1.11 or higher.
2. Longhorn v0.4.1 or higher.
## Instruction
#### K3S:
##### 1. For Longhorn v0.7.0 and above
Longhorn v0.7.0 and above support k3s v0.10.0 and above only by default.
If you want to deploy these new Longhorn versions on versions before k3s v0.10.0, you need to set `--kubelet-root-dir` to `<data-dir>/agent/kubelet` for the Deployment `longhorn-driver-deployer` in `longhorn/deploy/longhorn.yaml`.
`data-dir` is a `k3s` arg and it can be set when you launch a k3s server. By default it is `/var/lib/rancher/k3s`.
##### 2. For Longhorn before v0.7.0
Longhorn versions before v0.7.0 support k3s below v0.10.0 only by default.
If you want to deploy these older Longhorn versions on k3s v0.10.0 and above, you need to set `--kubelet-root-dir` to `/var/lib/kubelet` for the Deployment `longhorn-driver-deployer` in `longhorn/deploy/longhorn.yaml`
## Troubleshooting
### Common issues
#### Failed to get arg root-dir: Cannot get kubelet root dir, no related proc for root-dir detection ...
This error is due to Longhorn cannot detect where is the root dir setup for Kubelet, so the CSI plugin installation failed.
User can override the root-dir detection by manually setting argument `kubelet-root-dir` here:
https://github.com/rancher/longhorn/blob/master/deploy/longhorn.yaml#L329
**For K3S v0.10.0-**
Run `ps aux | grep k3s` and get argument `--data-dir` or `-d` on k3s server node.
e.g.
```
$ ps uax | grep k3s
root 4160 0.0 0.0 51420 3948 pts/0 S+ 00:55 0:00 sudo /usr/local/bin/k3s server --data-dir /opt/test/k3s/data/dir
root 4161 49.0 4.0 259204 164292 pts/0 Sl+ 00:55 0:04 /usr/local/bin/k3s server --data-dir /opt/test/k3s/data/dir
```
You will find `data-dir` in the cmdline of proc `k3s`. By default it is not set and `/var/lib/rancher/k3s` will be used. Then joining `data-dir` with `/agent/kubelet` you will get the `root-dir`. So the default `root-dir` for K3S is `/var/lib/rancher/k3s/agent/kubelet`.
If K3S is using a configuration file, you would need to check the configuration file to locate the `data-dir` parameter.
**For K3S v0.10.0+**
It is always `/var/lib/kubelet`
## Background
#### Longhorn versions before v0.7.0 don't work on K3S v0.10.0 or above
K3S now sets its kubelet directory to `/var/lib/kubelet`. See [the K3S release comment](https://github.com/rancher/k3s/releases/tag/v0.10.0) for details.
## Reference
https://github.com/kubernetes-csi/driver-registrar

View File

@ -1,91 +0,0 @@
# Customized Default Setting
## Overview
During Longhorn system deployment, users can customize the default settings for Longhorn. e.g. specify `Create Default Disk With Node Labeled` and `Default Data Path` before starting the Longhorn system.
## Usage
### Note:
1. This default setting is only for Longhorn system that hasn't been deployed. And it has no impact on the existing Longhorn system.
2. The users should modify the settings for an existing Longhorn system via UI.
### Via Rancher UI
[Cluster] -> System -> Apps -> Launch -> longhorn -> LONGHORN DEFAULT SETTINGS
### Via Longhorn deployment yaml file
1. Download the longhorn repo:
```
git clone https://github.com/longhorn/longhorn.git
```
2. Modify the config map named `longhorn-default-setting` in the yaml file `longhorn/deploy/longhorn.yaml`. For example:
```
---
apiVersion: v1
kind: ConfigMap
metadata:
name: longhorn-default-setting
namespace: longhorn-system
data:
default-setting.yaml: |-
backup-target: s3://backupbucket@us-east-1/backupstore
backup-target-credential-secret: minio-secret
create-default-disk-labeled-nodes: true
default-data-path: /var/lib/longhorn-example/
replica-soft-anti-affinity: false
storage-over-provisioning-percentage: 600
storage-minimal-available-percentage: 15
upgrade-checker: false
default-replica-count: 2
guaranteed-engine-cpu:
default-longhorn-static-storage-class: longhorn-static-example
backupstore-poll-interval: 500
taint-toleration: key1=value1:NoSchedule; key2:NoExecute
---
```
### Via helm
1. Download the chart in the longhorn repo:
```
git clone https://github.com/longhorn/longhorn.git
```
2.1. Use helm command with `--set` flag to modify the default settings.
For example:
```
helm install ./longhorn/chart --name longhorn --namespace longhorn-system --set defaultSettings.taintToleration="key1=value1:NoSchedule; key2:NoExecute"
```
2.2. Or directly modifying the default settings in the yaml file `longhorn/chart/values.yaml` then using helm command without `--set` to deploy Longhorn.
For example:
In `longhorn/chart/values.yaml`:
```
defaultSettings:
backupTarget: s3://backupbucket@us-east-1/backupstore
backupTargetCredentialSecret: minio-secret
createDefaultDiskLabeledNodes: true
defaultDataPath: /var/lib/longhorn-example/
replicaSoftAntiAffinity: false
storageOverProvisioningPercentage: 600
storageMinimalAvailablePercentage: 15
upgradeChecker: false
defaultReplicaCount: 2
guaranteedEngineCPU:
defaultLonghornStaticStorageClass: longhorn-static-example
backupstorePollInterval: 500
taintToleration: key1=value1:NoSchedule; key2:NoExecute
```
Then use helm command:
```
helm install ./longhorn/chart --name longhorn --namespace longhorn-system
```
For more info about using helm, see:
[Install-Longhorn-with-helm](../README.md#install-longhorn-with-helm)
## History
[Original feature request](https://github.com/longhorn/longhorn/issues/623)
Available since v0.6.0

View File

@ -1,53 +0,0 @@
# Disaster Recovery Volume
## What is Disaster Recovery Volume?
To increase the resiliency of the volume, Longhorn supports disaster recovery volume.
The disaster recovery volume is designed for the backup cluster in the case of the whole main cluster goes down.
A disaster recovery volume is normally in standby mode. User would need to activate it before using it as a normal volume.
A disaster recovery volume can be created from a volume's backup in the backup store. And Longhorn will monitor its
original backup volume and incrementally restore from the latest backup. Once the original volume in the main cluster goes
down and users decide to activate the disaster recovery volume in the backup cluster, the disaster recovery volume can be
activated immediately in the most condition, so it will greatly reduced the time needed to restore the data from the
backup store to the volume in the backup cluster.
## How to create Disaster Recovery Volume?
1. In the cluster A, make sure the original volume X has backup created or recurring backup scheduling.
2. Set backup target in cluster B to be same as cluster A's.
3. In backup page of cluster B, choose the backup volume X then create disaster recovery volume Y. It's highly recommended
to use backup volume name as disaster volume name.
4. Attach the disaster recovery volume Y to any node. Then Longhorn will automatically polling for the last backup of the
volume X, and incrementally restore it to the volume Y.
5. If volume X is down, users can activate volume Y immediately. Once activated, volume Y will become a
normal Longhorn volume.
5.1. Notice that deactivate a normal volume is not allowed.
## About Activating Disaster Recovery Volume
1. A disaster recovery volume doesn't support creating/deleting/reverting snapshot, creating backup, creating
PV/PVC. Users cannot update `Backup Target` in Settings if any disaster recovery volumes exist.
2. When users try to activate a disaster recovery volume, Longhorn will check the last backup of the original volume. If
it hasn't been restored, the restoration will be started, and the activate action will fail. Users need to wait for
the restoration to complete before retrying.
3. For disaster recovery volume, `Last Backup` indicates the most recent backup of its original backup volume. If the icon
representing disaster volume is gray, it means the volume is restoring `Last Backup` and users cannot activate this
volume right now; if the icon is blue, it means the volume has restored the `Last Backup`.
## RPO and RTO
Typically incremental restoration is triggered by the periodic backup store update. Users can set backup store update
interval in `Setting - General - Backupstore Poll Interval`. Notice that this interval can potentially impact
Recovery Time Objective(RTO). If it is too long, there may be a large amount of data for the disaster recovery volume to
restore, which will take a long time. As for Recovery Point Objective(RPO), it is determined by recurring backup
scheduling of the backup volume. You can check [here](snapshot-backup.md) to see how to set recurring backup in Longhorn.
e.g.:
If recurring backup scheduling for normal volume A is creating backup every hour, then RPO is 1 hour.
Assuming the volume creates backup every hour, and incrementally restoring data of one backup takes 5 minutes.
If `Backupstore Poll Interval` is 30 minutes, then there will be at most one backup worth of data since last restoration.
The time for restoring one backup is 5 minute, so RTO is 5 minutes.
If `Backupstore Poll Interval` is 12 hours, then there will be at most 12 backups worth of data since last restoration.
The time for restoring the backups is 5 * 12 = 60 minutes, so RTO is 60 minutes.

View File

@ -1,115 +0,0 @@
# Kubernetes driver
## Background
Longhorn can be used in Kubernetes to provide persistent storage through either Longhorn Container Storage Interface (CSI) driver or Longhorn FlexVolume driver. Longhorn will automatically deploy one of the drivers, depending on the Kubernetes cluster configuration. User can also specify the driver in the deployment yaml file. CSI is preferred.
Noted that the volume created and used through one driver won't be recongized by Kubernetes using the other driver. So please don't switch driver (e.g. during upgrade) if you have existing volumes created using the old driver. If you really want to switch driver, see [here](upgrade.md#migrating-between-flexvolume-and-csi-driver) for instructions.
## CSI
### Requirement for the CSI driver
1. Kubernetes v1.10+
1. CSI is in beta release for this version of Kubernetes, and enabled by default.
2. Mount propagation feature gate enabled.
1. It's enabled by default in Kubernetes v1.10. But some early versions of RKE may not enable it.
2. You can check it by using [environment check script](#environment-check-script).
3. If above conditions cannot be met, Longhorn will fall back to the FlexVolume driver.
### Check if your setup satisfied CSI requirement
1. Use the following command to check your Kubernetes server version
```
kubectl version
```
Result:
```
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.1", GitCommit:"d4ab47518836c750f9949b9e0d387f20fb92260b", GitTreeState:"clean", BuildDate:"2018-04-12T14:14:26Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
```
The `Server Version` should be `v1.10` or above.
2. The result of [environment check script](#environment-check-script) should contain `MountPropagation is enabled!`.
### Environment check script
We've wrote a script to help user to gather enough information about the factors
Before installing, run:
```
curl -sSfL https://raw.githubusercontent.com/rancher/longhorn/master/scripts/environment_check.sh | bash
```
Example result:
```
daemonset.apps/longhorn-environment-check created
waiting for pods to become ready (0/3)
all pods ready (3/3)
MountPropagation is enabled!
cleaning up...
daemonset.apps "longhorn-environment-check" deleted
clean up complete
```
### Successful CSI deployment example
```
$ kubectl -n longhorn-system get pod
NAME READY STATUS RESTARTS AGE
csi-attacher-6fdc77c485-8wlpg 1/1 Running 0 9d
csi-attacher-6fdc77c485-psqlr 1/1 Running 0 9d
csi-attacher-6fdc77c485-wkn69 1/1 Running 0 9d
csi-provisioner-78f7db7d6d-rj9pr 1/1 Running 0 9d
csi-provisioner-78f7db7d6d-sgm6w 1/1 Running 0 9d
csi-provisioner-78f7db7d6d-vnjww 1/1 Running 0 9d
engine-image-ei-6e2b0e32-2p9nk 1/1 Running 0 9d
engine-image-ei-6e2b0e32-s8ggt 1/1 Running 0 9d
engine-image-ei-6e2b0e32-wgkj5 1/1 Running 0 9d
longhorn-csi-plugin-g8r4b 2/2 Running 0 9d
longhorn-csi-plugin-kbxrl 2/2 Running 0 9d
longhorn-csi-plugin-wv6sb 2/2 Running 0 9d
longhorn-driver-deployer-788984b49c-zzk7b 1/1 Running 0 9d
longhorn-manager-nr5rs 1/1 Running 0 9d
longhorn-manager-rd4k5 1/1 Running 0 9d
longhorn-manager-snb9t 1/1 Running 0 9d
longhorn-ui-67b9b6887f-n7x9q 1/1 Running 0 9d
```
For more information on CSI configuration, see [here](csi-config.md).
## Flexvolume
### Requirement for the FlexVolume driver
1. Kubernetes v1.8+
2. Make sure `curl`, `findmnt`, `grep`, `awk` and `blkid` has been installed in the every node of the Kubernetes cluster.
### Flexvolume driver directory
Longhorn now has ability to auto detect the location of Flexvolume directory.
If the Flexvolume driver wasn't installed correctly, there can be a few reasons:
1. If `kubelet` is running inside a container rather than running on the host OS, the host bind-mount path for the Flexvolume driver directory (`--volume-plugin-dir`) must be the same as the path used by the kubelet process.
1. For example, if the kubelet is using `/var/lib/kubelet/volumeplugins` as
the Flexvolume driver directory, then the host bind-mount must exist for that
directory, as e.g. `/var/lib/kubelet/volumeplugins:/var/lib/kubelet/volumeplugins` or any idential bind-mount for the parent directory.
2. It's because Longhorn would detect the directory used by the `kubelet` command line to decide where to install the driver on the host.
2. The kubelet setting for the Flexvolume driver directory must be the same across all the nodes.
1. Longhorn doesn't support heterogeneous setup at the moment.
### Successful Flexvolume deployment example
```
# kubectl -n longhorn-system get pod
NAME READY STATUS RESTARTS AGE
engine-image-ei-57b85e25-8v65d 1/1 Running 0 7d
engine-image-ei-57b85e25-gjjs6 1/1 Running 0 7d
engine-image-ei-57b85e25-t2787 1/1 Running 0 7d
longhorn-driver-deployer-5469b87b9c-b9gm7 1/1 Running 0 2h
longhorn-flexvolume-driver-lth5g 1/1 Running 0 2h
longhorn-flexvolume-driver-tpqf7 1/1 Running 0 2h
longhorn-flexvolume-driver-v9mrj 1/1 Running 0 2h
longhorn-manager-7x8x8 1/1 Running 0 9h
longhorn-manager-8kqf4 1/1 Running 0 9h
longhorn-manager-kln4h 1/1 Running 0 9h
longhorn-ui-f849dcd85-cgkgg 1/1 Running 0 5d
```

View File

@ -1,101 +0,0 @@
# Volume Expansion
## Overview
- Longhorn supports OFFLINE volume expansion only.
- Longhorn will expand frontend (e.g. block device) then expand filesystem.
## Prerequisite:
1. Longhorn version v0.8.0 or higher.
2. The volume to be expanded is state `detached`.
## Expand a Longhorn volume
There are two ways to expand a Longhorn volume:
#### Via PVC
- This method is applied only if:
1. Kubernetes version v1.16 or higher.
2. The PVC is dynamically provisioned by the Kubernetes with Longhorn StorageClass.
3. The field `allowVolumeExpansion` should be `true` in the related StorageClass.
- This method is recommended if it's applicable. Since the PVC and PV will be updated automatically and everything keeps consistent after expansion.
- Usage: Find the corresponding PVC for Longhorn volume then modify requested `storage` of the PVC spec. e.g.,
```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"longhorn-simple-pvc","namespace":"default"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"1Gi"}},"storageClassName":"longhorn"}}
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
creationTimestamp: "2019-12-21T01:36:16Z"
finalizers:
- kubernetes.io/pvc-protection
name: longhorn-simple-pvc
namespace: default
resourceVersion: "162431"
selfLink: /api/v1/namespaces/default/persistentvolumeclaims/longhorn-simple-pvc
uid: 0467ae73-22a5-4eba-803e-464cc0b9d975
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: longhorn
volumeMode: Filesystem
volumeName: pvc-0467ae73-22a5-4eba-803e-464cc0b9d975
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 1Gi
phase: Bound
```
Modify `spec.resources.requests.storage` of this PVC.
#### Via Longhorn UI
- If your Kubernetes version is v1.14 or v1.15, this method is the only choice for Longhorn volume expansion.
- Notice that The volume size will be updated after the expansion but the capacity of corresponding PVC and PV won't change. Users need to take care of them.
- Usage: On the volume page of Longhorn UI, click `Expand` for the volume.
## Frontend expansion
- To prevent the frontend expansion from being interfered by unexpected data R/W, Longhorn supports OFFLINE expansion only.
The `detached` volume will be automatically attached to a random node with maintenance mode.
- Rebuilding/adding replicas is not allowed during the expansion and vice versa.
## Filesystem expansion
#### Longhorn will try to expand the file system only if:
1. The expanded size should be greater than the current size.
2. There is a Linux filesystem in the Longhorn volume.
3. The filesystem used in the Longhorn volume is one of the followings:
1. ext4
2. XFS
4. The Longhorn volume is using block device frontend.
#### Handling volume revert:
If users revert a volume to a snapshot with smaller size, the frontend of the volume is still holding the expanded size. But the filesystem size will be the same as that of the reverted snapshot. In this case, users need to handle the filesystem manually:
1. Attach the volume to a random nodes.
2. Log into the corresponding node, expand the filesystem:
- If the filesystem is `ext4`, the volume might need to be mounted and umounted once before resizing the filesystem manually. Otherwise, executing `resize2fs` might result in an error:
```
resize2fs: Superblock checksum does not match superblock while trying to open ......
Couldn't find valid filesystem superblock.
```
Follow the steps below to resize the filesystem:
```
mount /dev/longhorn/<volume name> <arbitrary mount directory>
umount /dev/longhorn/<volume name>
mount /dev/longhorn/<volume name> <arbitrary mount directory>
resize2fs /dev/longhorn/<volume name>
umount /dev/longhorn/<volume name>
```
- If the filesystem is `xfs`, users can directly mount then expand the filesystem.
```
mount /dev/longhorn/<volume name> <arbitrary mount directory>
xfs_growfs <the mount directory>
umount /dev/longhorn/<volume name>
```

View File

@ -1,12 +0,0 @@
# Google Kubernetes Engine
1. GKE clusters must use `Ubuntu` OS instead of `Container-Optimized` OS, in order to satisfy Longhorn `open-iscsi` dependency.
2. GKE requires user to manually claim himself as cluster admin to enable RBAC. Before installing Longhorn, run the following command:
```
kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=<name@example.com>
```
where `name@example.com` is the user's account name in GCE, and it's case sensitive. See [this document](https://cloud.google.com/kubernetes-engine/docs/how-to/role-based-access-control) for more information.

View File

@ -1,24 +0,0 @@
# iSCSI support
Longhorn supports iSCSI target frontend mode. The user can connect to it
through any iSCSI client, including open-iscsi, and virtual machine
hypervisor like KVM, as long as it's in the same network with the Longhorn system.
Longhorn Driver (CSI/Flexvolume) doesn't support iSCSI mode.
To start volume with iSCSI target frontend mode, select `iSCSI` as the frontend
when creating the volume. After volume has been attached, the user will see
something like following in the `endpoint` field:
```
iscsi://10.42.0.21:3260/iqn.2014-09.com.rancher:testvolume/1
```
Here:
1. The IP and port is `10.42.0.21:3260`.
2. The target name is `iqn.2014-09.com.rancher:testvolume`. `testvolume` is the
name of the volume.
3. The LUN number is 1. Longhorn always uses LUN 1.
Then user can use above information to connect to the iSCSI target provided by
Longhorn using an iSCSI client.

View File

@ -1,39 +0,0 @@
# Workload identification for volume
Now users can identify current workloads or workload history for existing Longhorn volumes.
```
PV Name: test1-pv
PV Status: Bound
Namespace: default
PVC Name: test1-pvc
Last Pod Name: volume-test-1
Last Pod Status: Running
Last Workload Name: volume-test
Last Workload Type: Statefulset
Last time used by Pod: a few seconds ago
```
## About historical status
There are a few fields can contain the historical status instead of the current status.
Those fields can be used to help users figuring out which workload has used the volume in the past:
1. `Last time bound with PVC`: If this field is set, it indicates currently there is no bounded PVC for this volume.
The related fields will show the most recent bounded PVC.
2. `Last time used by Pod`: If these fields are set, they indicates currently there is no workload using this volume.
The related fields will show the most recent workload using this volume.
# PV/PVC creation for existing Longhorn volume
Now users can create PV/PVC via our Longhorn UI for the existing Longhorn volumes.
Only detached volume can be used by newly created pod.
## About special fields of PV/PVC
Since the Longhorn volume already exists while creating PV/PVC, StorageClass is not needed for dynamically provisioning
Longhorn volume. However, the field `storageClassName` would be set in PVC/PV, to be used for PVC bounding purpose. And
it's unnecessary for users create the related StorageClass object.
By default the StorageClass for Longhorn created PV/PVC is `longhorn-static`. Users can modified it in
`Setting - General - Default Longhorn Static StorageClass Name` as they need.
Users need to manually delete PVC and PV created by Longhorn.

View File

@ -1,49 +0,0 @@
## Create Nginx Ingress Controller with basic authentication
1. Create a basic auth file `auth`:
> It's important the file generated is named auth (actually - that the secret has a key data.auth), otherwise the ingress-controller returns a 503
`$ USER=<USERNAME_HERE>; PASSWORD=<PASSWORD_HERE>; echo "${USER}:$(openssl passwd -stdin -apr1 <<< ${PASSWORD})" >> auth`
2. Create a secret
`$ kubectl -n longhorn-system create secret generic basic-auth --from-file=auth`
3. Create an Nginx ingress controller manifest `longhorn-ingress.yml` :
```
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: longhorn-ingress
namespace: longhorn-system
annotations:
# type of authentication
nginx.ingress.kubernetes.io/auth-type: basic
# name of the secret that contains the user/password definitions
nginx.ingress.kubernetes.io/auth-secret: basic-auth
# message to display with an appropriate context why the authentication is required
nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required '
spec:
rules:
- http:
paths:
- path: /
backend:
serviceName: longhorn-frontend
servicePort: 80
```
4. Create the ingress controller:
`$ kubectl -n longhorn-system apply longhorn-ingress.yml`
#### For AWS EKS clusters:
User need to create an ELB to expose nginx ingress controller to the internet. (additional cost may apply)
1. Create pre-requisite resources:
https://github.com/kubernetes/ingress-nginx/blob/master/docs/deploy/index.md#prerequisite-generic-deployment-command
2. Create ELB:
https://github.com/kubernetes/ingress-nginx/blob/master/docs/deploy/index.md#aws

View File

@ -1,44 +0,0 @@
# Multiple disks support
Longhorn supports to use more than one disk on the nodes to store the volume data.
By default, `/var/lib/longhorn` on the host will be used for storing the volume data. You can avoid using the default directory by adding a new disk, then disable scheduling for `/var/lib/longhorn`.
## Add a disk
To add a new disk for a node, heading to `Node` tab, select one of the node, and select `Edit Disks` in the drop down menu.
To add any additional disks, user needs to:
1. Mount the disk on the host to a certain directory.
2. Add the path of the mounted disk into the disk list of the node.
Longhorn will detect the storage information (e.g. maximum space, available space) about the disk automatically, and start scheduling to it if it's possible to accomodate the volume in there. A path mounted by the existing disk won't be allowed.
User can reserve a certain amount of space of the disk to stop Longhorn from using it. It can be set in the `Space Reserved` field for the disk. It's useful for the non-dedicated storage disk on the node.
The kubelet needs to preserve node stability when available compute resources are low. This is especially important when dealing with incompressible compute resources, such as memory or disk space. If such resources are exhausted, nodes become unstable. To avoid kubelet `Disk pressure` issue after scheduling several volumes, by default, longhorn reserved 30% of root disk space (`/var/lib/longhorn`) to ensure node stability.
### Use an alternative path for disk on the node
If the users don't want to use the original mount path of a disk on the node, they can use `mount --bind` to create an alternative/alias path for the disk then use the it with Longhorn. Notice that soft link `ln -s` won't work since it will not get populated correctly inside the pod.
Longhorn will identify the disk using the path, so the users need to make sure the alternative path are correctly mounted when the node reboots, e.g. by adding it to `fstab`.
## Remove a disk
Nodes and disks can be excluded from future scheduling. Notice any scheduled storage space won't be released automatically if the scheduling was disabled for the node.
In order to remove a disk, two conditions need to be met:
1. The scheduling for the disk must be disabled
2. There is no existing replica using the disk, include the replica in error state.
Once those two conditions are met, you should be allowed to remove the disk.
## Configuration
There are two global settings affect the scheduling of the volume.
`StorageOverProvisioningPercentage` defines the upper bound of `ScheduledStorage / (MaximumStorage - ReservedStorage)` . The default value is `500` (%). That means we can schedule a total of 750 GiB Longhorn volumes on a 200 GiB disk with 50G reserved for the root file system. Because normally people won't use that large amount of data in the volume, and we store the volumes as sparse files.
`StorageMinimalAvailablePercentage` defines when a disk cannot be scheduled with more volumes. The default value is `10` (%). The bigger value between `MaximumStorage * StorageMinimalAvailablePercentage / 100` and `MaximumStorage - ReservedStorage` will be used to determine if a disk is running low and cannot be scheduled with more volumes.
Notice currently there is no guarantee that the space volumes used won't exceed the `StorageMinimalAvailablePercentage`, because:
1. Longhorn volume can be bigger than specified size, due to the snapshot contains the old state of the volume
2. And Longhorn is doing over-provisioning by default.

View File

@ -1,30 +0,0 @@
# Node Failure Handling with Longhorn
## What to expect when a Kubernetes Node fails
When a Kubernetes node fails with CSI driver installed (all the following are based on Kubernetes v1.12 with default setup):
1. After **one minute**, `kubectl get nodes` will report `NotReady` for the failure node.
2. After about **five minutes**, the states of all the pods on the `NotReady` node will change to either `Unknown` or `NodeLost`.
3. If you're deploying using StatefulSet or Deployment, you need to decide is if it's safe to force deletion the pod of the workload
running on the lost node. See [here](https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/).
1. StatefulSet has stable identity, so Kubernetes won't force deleting the Pod for the user.
2. Deployment doesn't have stable identity, but Longhorn is a Read-Write-Once type of storage, which means it can only attached
to one Pod. So the new Pod created by Kubernetes won't be able to start due to the Longhorn volume still attached to the old Pod,
on the lost Node.
3. In both cases, Kubernetes will automatically evict the pod (set deletion timestamp for the pod) on the lost node, then try to
**recreate a new one with old volumes**. Because the evicted pod gets stuck in `Terminating` state and the attached Longhorn volumes
cannot be released/reused, the new pod will get stuck in `ContainerCreating` state. That's why users need to decide is if it's safe to force deleting the pod.
4. If you decide to delete the Pod manually (and forcefully), Kubernetes will take about another **six minutes** to delete the VolumeAttachment
object associated with the Pod, thus finally detach the Longhorn volume from the lost Node and allow it to be used by the new Pod.
- This another six-minute is [hardcoded in Kubernetes](https://github.com/kubernetes/kubernetes/blob/5e31799701123c50025567b8534e1a62dbc0e9f6/pkg/controller/volume/attachdetach/attach_detach_controller.go#L95):
If the pod on the lost node is forced deleting, the related volumes won't be unmounted correctly. Then Kubernetes will wait for this fixed timeout
to directly clean up the VolumeAttachment object.
## What to expect when recovering a failed Kubernetes Node
1. If the node is **back online within 5 - 6 minutes** of the failure, Kubernetes will restart pods, unmount then re-mount volumes without volume re-attaching and VolumeAttachment cleanup.
Because the volume engines would be down after the node down, this direct remount wont work since the device no longer exists on the node.
In this case, Longhorn will detach and re-attach the volumes to recover the volume engines, so that the pods can remount/reuse the volumes safely.
2. If the node is **not back online within 5 - 6 minutes** of the failure, Kubernetes will try to delete all unreachable pods based on the pod eviction mechanism and these pods will become `Terminating` state. See [pod eviction timeout](https://kubernetes.io/docs/concepts/architecture/nodes/#condition) for details.
Then if the failed node is recovered later, Kubernetes will restart those terminating pods, detach the volumes, wait for the old VolumeAttachment cleanup, and reuse(re-attach & re-mount) the volumes. Typically these steps may take 1 ~ 7 minutes.
In this case, detaching and re-attaching operations are already included in the Kubernetes recovery procedures. Hence no extra operation is needed and the Longhorn volumes will be available after the above steps.
3. For all above recovery scenarios, Longhorn will handle those steps automatically with the association of Kubernetes. This section is aimed to inform users of what happens and what is expected during the recovery.

View File

@ -1,88 +0,0 @@
# Recover volume after unexpected detachment
## Overview
1. Now Longhorn can automatically reattach then remount volumes if unexpected detachment happens. e.g., [Kubernetes upgrade](https://github.com/longhorn/longhorn/issues/703), [Docker reboot](https://github.com/longhorn/longhorn/issues/686).
2. After **reattachment** and **remount** complete, users may need to **manually restart the related workload containers** for the volume restoration **if the following recommended setup is not applied**.
#### Reattachment
Longhorn will reattach the volume if the volume engine deads unexpectedly.
#### Remount
- Longhorn will detect and remount filesystem for the volume after the reattachment.
- But **the auto remount does not work for `xfs` filesystem**.
- Since mounting one more layers with `xfs` filesystem is not allowed and will trigger the error `XFS (sdb): Filesystem has duplicate UUID <filesystem UUID> - can't mount`.
- Users need to manually unmount then mount the `xfs` filesystem on the host. The device path on host for the attached volume is `/dev/longhorn/<volume name>`
## Recommended setup when using Longhorn volumes
In order to recover unexpectedly detached volumes automatically, users can set `restartPolicy` to `Always` then add `livenessProbe` for the workloads using Longhorn volumes.
Then those workloads will be restarted automatically after reattachment and remount.
Here is one example for the setup:
```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: longhorn-volv-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 2Gi
---
apiVersion: v1
kind: Pod
metadata:
name: volume-test
namespace: default
spec:
restartPolicy: Always
containers:
- name: volume-test
image: nginx:stable-alpine
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- ls
- /data/lost+found
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- name: volv
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: volv
persistentVolumeClaim:
claimName: longhorn-volv-pvc
```
- The directory used in the `livenessProbe` will be `<volumeMount.mountPath>/lost+found`
- Don't set a short interval for `livenessProbe.periodSeconds`, e.g., 1s. The liveness command is CPU consuming.
## Manually restart workload containers
## This solution is applied only if:
1. The Longhorn volume is reattached and remounted automatically.
2. The above setup is not included when the related workload is launched.
### Steps
1. Figure out on which node the related workload's containers are running
```
kubectl -n <namespace of your workload> get pods <workload's pod name> -o wide
```
2. Connect to the node. e.g., `ssh`
3. Figure out the containers belonging to the workload
```
docker ps
```
By checking the columns `COMMAND` and `NAMES` of the output, you can find the corresponding container
4. Restart the container
```
docker restart <the container ID of the workload>
```
### Reason
Typically the volume mount propagation is not `Bidirectional`. It means the Longhorn remount operation won't be propagated to the workload containers if the containers are not restarted.

View File

@ -1,27 +0,0 @@
# Use command restore-to-file
This command gives users the ability to restore a backup to a `raw` image or a `qcow2` image. If the backup is based on a backing file, users should provide the backing file as a `qcow2` image with `--backing file` parameter.
## Instruction
1. Copy the yaml template
1.1 Volume has no base image: Make a copy of `examples/restore_to_file.yaml.template` as e.g. `restore.yaml`.
1.2 Volume has a base image: Make a copy of `examples/restore_to_file_with_base_image.yaml.template` as e.g. `restore.yaml`, and set argument `backing-file` by replacing `<BASE_IMAGE>` with your base image, e.g. `rancher/longhorn-test:baseimage-ext4`.
2. Set the node which the output file should be placed on by replacing `<NODE_NAME>`, e.g. `node1`.
3. Specify the host path of output file by modifying field `hostpath` of volume `disk-directory`. By default the directory is `/tmp/restore/`.
4. Set the first argument (backup url) by replacing `<BACKUP_URL>`, e.g. `s3://backupbucket@us-east-1/backupstore?backup=backup-bd326da2c4414b02&volume=volumeexamplename`. Do not delete `''`.
5. Set argument `output-file` by replacing `<OUTPUT_FILE>`, e.g. `volume.raw` or `volume.qcow2`.
6. Set argument `output-format` by replacing `<OUTPUT_FORMAT>`. Now support `raw` or `qcow2` only.
7. Set S3 Credential Secret by replacing `<S3_SECRET_NAME>`, e.g. `minio-secret`.
8. Execute the yaml using e.g. `kubectl create -f restore.yaml`.
9. Watching the result using `kubectl -n longhorn-system get pod restore-to-file -w`
After the pod status changed to `Completed`, you should able to find `<OUTPUT_FILE>` at e.g. `/tmp/restore` on the `<NODE_NAME>`.

View File

@ -1,221 +0,0 @@
# Restoring Volumes for Kubernetes Stateful Sets
Longhorn supports restoring backups, and one of the use cases for this feature
is to restore data for use in a Kubernetes `Stateful Set`, which requires
restoring a volume for each replica that was backed up.
To restore, follow the below instructions based on which plugin you have
deployed. The example below uses a Stateful Set with one volume attached to
each Pod and two replicas.
- [CSI Instructions](#csi-instructions)
- [Flexvolume Instructions](#flexvolume-instructions)
### CSI Instructions
1. Connect to the `Longhorn UI` page in your web browser. Under the `Backup` tab,
select the name of the Stateful Set volume. Click the dropdown menu of the
volume entry and restore it. Name the volume something that can easily be
referenced later for the `Persistent Volumes`.
- Repeat this step for each volume you need restored.
- For example, if restoring a Stateful Set with two replicas that had
volumes named `pvc-01a` and `pvc-02b`, the restore could look like this:
| Backup Name | Restored Volume |
|-------------|-------------------|
| pvc-01a | statefulset-vol-0 |
| pvc-02b | statefulset-vol-1 |
2. In Kubernetes, create a `Persistent Volume` for each Longhorn volume that was
created. Name the volumes something that can easily be referenced later for the
`Persistent Volume Claims`. `storage` capacity, `numberOfReplicas`,
`storageClassName`, and `volumeHandle` must be replaced below. In the example,
we're referencing `statefulset-vol-0` and `statefulset-vol-1` in Longhorn and
using `longhorn` as our `storageClassName`.
```yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: statefulset-vol-0
spec:
capacity:
storage: <size> # must match size of Longhorn volume
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
csi:
driver: io.rancher.longhorn # driver must match this
fsType: ext4
volumeAttributes:
numberOfReplicas: <replicas> # must match Longhorn volume value
staleReplicaTimeout: '30' # in minutes
volumeHandle: statefulset-vol-0 # must match volume name from Longhorn
storageClassName: longhorn # must be same name that we will use later
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: statefulset-vol-1
spec:
capacity:
storage: <size> # must match size of Longhorn volume
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
csi:
driver: io.rancher.longhorn # driver must match this
fsType: ext4
volumeAttributes:
numberOfReplicas: <replicas> # must match Longhorn volume value
staleReplicaTimeout: '30'
volumeHandle: statefulset-vol-1 # must match volume name from Longhorn
storageClassName: longhorn # must be same name that we will use later
```
3. Go to [General Instructions](#general-instructions).
### Flexvolume Instructions
Because of the implementation of `Flexvolume`, creating the Longhorn volumes
from the `Longhorn UI` manually can be skipped. Instead, follow these
instructions:
1. Connect to the `Longhorn UI` page in your web browser. Under the `Backup` tab,
select the name of the `Stateful Set` volume. Click the dropdown menu of the
volume entry and select `Get URL`.
- Repeat this step for each volume you need restored. Save these URLs for the
next step.
- If using NFS backups, the URL will appear similar to:
- `nfs://longhorn-nfs-svc.default:/opt/backupstore?backup=backup-c57844b68923408f&volume=pvc-59b20247-99bf-11e8-8a92-be8835d7412a`.
- If using S3 backups, the URL will appear similar to:
- `s3://backupbucket@us-east-1/backupstore?backup=backup-1713a64cd2774c43&volume=longhorn-testvol-g1n1de`
2. Similar to `Step 2` for CSI, create a `Persistent Volume` for each volume you
want to restore. `storage` capacity, `storageClassName`, and the Flexvolume
`options` must be replaced. This example uses `longhorn` as the
`storageClassName`.
```yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: statefulset-vol-0
spec:
capacity:
storage: <size> # must match "size" parameter below
accessModes:
- ReadWriteOnce
storageClassName: longhorn # must be same name that we will use later
flexVolume:
driver: "rancher.io/longhorn" # driver must match this
fsType: "ext4"
options:
size: <size> # must match "storage" parameter above
numberOfReplicas: <replicas>
staleReplicaTimeout: <timeout>
fromBackup: <backup URL> # must be set to Longhorn backup URL
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: statefulset-vol-1
spec:
capacity:
storage: <size> # must match "size" parameter below
accessModes:
- ReadWriteOnce
storageClassName: longhorn # must be same name that we will use later
flexVolume:
driver: "rancher.io/longhorn" # driver must match this
fsType: "ext4"
options:
size: <size> # must match "storage" parameter above
numberOfReplicas: <replicas>
staleReplicaTimeout: <timeout>
fromBackup: <backup URL> # must be set to Longhorn backup URL
```
3. Go to [General Instructions](#general_instructions).
### General Instructions
**Make sure you have followed either the [CSI](#csi-instructions) or
[Flexvolume](#flexvolume-instructions) instructions before following the steps
in this section.**
1. In the `namespace` the `Stateful Set` will be deployed in, create Persistent
Volume Claims **for each** `Persistent Volume`.
- The name of the `Persistent Volume Claim` must follow this naming scheme:
`<name of Volume Claim Template>-<name of Stateful Set>-<index>`. Stateful
Set Pods are zero-indexed. In this example, the name of the `Volume Claim
Template` is `data`, the name of the `Stateful Set` is `webapp`, and there
are two replicas, which are indexes `0` and `1`.
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-webapp-0
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi # must match size from earlier
storageClassName: longhorn # must match name from earlier
volumeName: statefulset-vol-0 # must reference Persistent Volume
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-webapp-1
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi # must match size from earlier
storageClassName: longhorn # must match name from earlier
volumeName: statefulset-vol-1 # must reference Persistent Volume
```
2. Create the `Stateful Set`:
```yaml
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
name: webapp # match this with the pvc naming scheme
spec:
selector:
matchLabels:
app: nginx # has to match .spec.template.metadata.labels
serviceName: "nginx"
replicas: 2 # by default is 1
template:
metadata:
labels:
app: nginx # has to match .spec.selector.matchLabels
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: data
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: data # match this with the pvc naming scheme
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: longhorn # must match name from earlier
resources:
requests:
storage: 2Gi # must match size from earlier
```
The restored data should now be accessible from inside the `Stateful Set`
`Pods`.

View File

@ -1,79 +0,0 @@
# Settings
## Customized Default Setting
To setup setting before installing Longhorn, see [Customized Default Setting](./customized-default-setting.md) for details.
## General
#### Backup Target
* Example: `s3://backupbucket@us-east-1/backupstore`
* Description: The target used for backup. Support NFS or S3. See [Snapshot and Backup](./snapshot-backup.md) for details.
#### Backup Target Credential Secret
* Example: `s3-secret`
* Description: The Kubernetes secret associated with the backup target. See [Snapshot and Backup](./snapshot-backup.md) for details.
#### Backupstore Poll Interval
* Example: `300`
* Description: In seconds. The interval to poll the backup store for updating volumes' Last Backup field. Set to 0 to disable the polling. See [Disaster Recovery Volume](./dr-volume.md) for details.
#### Create Default Disk on Labeled Nodes
* Example: `false`
* Description: Create default Disk automatically only on Nodes with the Kubernetes label `node.longhorn.io/create-default-disk=true` if no other Disks exist. If disabled, the default Disk will be created on all new Nodes when the node was detected for the first time.
* Note: It's useful if the user want to scale the cluster but doesn't want to use the storage on the new nodes.
#### Default Data Path
* Example: `/var/lib/longhorn`
* Description: Default path to use for storing data on a host
* Note: Can be used with `Create Default Disk on Labeled Nodes` option, to make Longhorn only use the nodes with specific storage mounted at e.g. `/opt/longhorn` directory when scaling the cluster.
#### Default Engine Image
* Example: `longhornio/longhorn-engine:v0.6.0`
* Description: The default engine image used by the manager. Can be changed on the manager starting command line only
* Note: Every Longhorn release will ship with a new Longhorn engine image. If the current Longhorn volumes are not using the default engine, a green arrow will show up, indicate this volume needs to be upgraded to use the default engine.
#### Enable Upgrade Checker
* Example: `true`
* Description: Upgrade Checker will check for new Longhorn version periodically. When there is a new version available, it will notify the user using UI
#### Latest Longhorn Version
* Example: `v0.6.0`
* Description: The latest version of Longhorn available. Update by Upgrade Checker automatically
* Note: Only available if `Upgrade Checker` is enabled.
#### Default Replica Count
* Example: `3`
* Description: The default number of replicas when creating the volume from Longhorn UI. For Kubernetes, update the `numberOfReplicas` in the StorageClass
* Note: The recommended way of choosing the default replica count is: if you have more than three nodes for storage, use 3; otherwise use 2. Using a single replica on a single node cluster is also OK, but the HA functionality wouldn't be available. You can still take snapshots/backups of the volume.
#### Guaranteed Engine CPU
* Example: `0.2`
* Description: (EXPERIMENTAL FEATURE) Allow Longhorn Engine to have guaranteed CPU allocation. The value is how many CPUs should be reserved for each Engine/Replica Manager Pod created by Longhorn. For example, 0.1 means one-tenth of a CPU. This will help maintain engine stability during high node workload. It only applies to the Instance Manager Pods created after the setting took effect. WARNING: Starting the system may fail or stuck while using this feature due to the resource constraint. Disabled (\"0\") by default.
* Note: Please set to **no more than a quarter** of what the node's available CPU resources, since the option would be applied to the two instance managers on the node (engine and replica), and the future upgraded instance managers (another two for engine and replica).
#### Default Longhorn Static StorageClass Name
* Example: `longhorn-static`
* Description: The `storageClassName` is for PV/PVC when creating PV/PVC for an existing Longhorn volume. Notice that it's unnecessary for users to create the related StorageClass object in Kubernetes since the StorageClass would only be used as matching labels for PVC bounding purpose. By default 'longhorn-static'.
#### Kubernetes Taint Toleration
* Example: `nodetype=storage:NoSchedule`
* Description: By setting tolerations for Longhorn then adding taints for the nodes, the nodes with large storage can be dedicated to Longhorn only (to store replica data) and reject other general workloads.
Before modifying toleration setting, all Longhorn volumes should be detached then Longhorn components will be restarted to apply new tolerations. And toleration update will take a while. Users cannot operate Longhorn system during update. Hence it's recommended to set toleration during Longhorn deployment.
Multiple tolerations can be set here, and these tolerations are separated by semicolon. For example, "key1=value1:NoSchedule; key2:NoExecute"
* Note: See [Taint Toleration](./taint-toleration.md) for details.
## Scheduling
#### Replica Soft Anti-Affinity
* Example: `true`
* Description: Allow scheduling on nodes with existing healthy replicas of the same volume
* Note: If the users want to avoid temporarily node down caused replica rebuild, they can set this option to `false`. The volume may be kept in `Degraded` state until another node that doesn't already have a replica scheduled comes online.
#### Storage Over Provisioning Percentage
* Example: `500`
* Description: The over-provisioning percentage defines how much storage can be allocated relative to the hard drive's capacity.
* Note: The users can set this to a lower value if they don't want overprovisioning storage. See [Multiple Disks Support](./multidisk.md#configuration) for details. Also, a replica of volume may take more space than the volume's size since the snapshots would need space to store as well. The users can delete snapshots to reclaim spaces.
#### Storage Minimal Available Percentage
* Example: `10`
* Description: If one disk's available capacity to it's maximum capacity in % is less than the minimal available percentage, the disk would become unschedulable until more space freed up.
* Note: See [Multiple Disks Support](./multidisk.md#configuration) for details.

View File

@ -1,173 +0,0 @@
# Snapshot
A snapshot in Longhorn represents a volume state at a given time, stored in the same location of volume data on physical disk of the host. Snapshot creation is instant in Longhorn.
User can revert to any previous taken snapshot using the UI. Since Longhorn is a distributed block storage, please make sure the Longhorn volume is umounted from the host when revert to any previous snapshot, otherwise it will confuse the node filesystem and cause filesystem corruption.
#### Note about the block level snapshot
Longhorn is a `crash-consistent` block storage solution.
It's normal for the OS to keep content in the cache before writing into the block layer. However, it also means if the all the replicas are down, then Longhorn may not contain the immediate change before the shutdown, since the content was kept in the OS level cache and hadn't transfered to Longhorn system yet. It's similar to if your desktop was down due to a power outage, after resuming the power, you may find some weird files in the hard drive.
To force the data being written to the block layer at any given moment, the user can run `sync` command on the node manually, or umount the disk. OS would write the content from the cache to the block layer in either situation.
# Backup
A backup in Longhorn represents a volume state at a given time, stored in the secondary storage (backupstore in Longhorn word) which is outside of the Longhorn system. Backup creation will involving copying the data through the network, so it will take time.
A corresponding snapshot is needed for creating a backup. And user can choose to backup any snapshot previous created.
A backupstore is a NFS server or S3 compatible server.
A backup target represents a backupstore in Longhorn. The backup target can be set at `Settings/General/BackupTarget`
See [here](#set-backuptarget) for details on how to setup backup target.
Longhorn also supports setting up recurring snapshot/backup jobs for volumes, via Longhorn UI or Kubernetes Storage Class. See [here](#setup-recurring-snapshotbackup) for details.
## Set BackupTarget
The user can setup a S3 or NFS type backupstore to store the backups of Longhorn volumes.
If the user doesn't have access to AWS S3 or want to give a try first, we've also provided a way to [setup a local S3 testing backupstore](https://github.com/yasker/longhorn/blob/work/docs/backup.md#setup-a-local-testing-backupstore) using [Minio](https://minio.io/).
### Setup AWS S3 backupstore
1. Create a new bucket in AWS S3.
2. Follow the [guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html#id_users_create_console) to create a new AWS IAM user, with the following permissions set:
```
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GrantLonghornBackupstoreAccess0",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::<your-bucket-name>",
"arn:aws:s3:::<your-bucket-name>/*"
]
}
]
}
```
3. Create a Kubernetes secret with a name such as `aws-secret` in the namespace where longhorn is placed(`longhorn-system` by default). Put the following keys in the secret:
```
AWS_ACCESS_KEY_ID: <your_aws_access_key_id>
AWS_SECRET_ACCESS_KEY: <your_aws_secret_access_key>
```
4. Go to the Longhorn UI and set `Settings/General/BackupTarget` to
```
s3://<your-bucket-name>@<your-aws-region>/
```
Pay attention that you should have `/` at the end, otherwise you will get an error.
Also please make sure you've set **`<your-aws-region>` in the URL**.
For example, for Google Cloud Storage, you can find the region code here: https://cloud.google.com/storage/docs/locations
5. Set `Settings/General/BackupTargetSecret` to
```
aws-secret
```
Your secret name with AWS keys from 3rd point.
### Setup a local testing backupstore
We provides two testing purpose backupstore based on NFS server and Minio S3 server for testing, in `./deploy/backupstores`.
Use following command to setup a Minio S3 server for BackupStore after `longhorn-system` was created.
```
kubectl create -f https://raw.githubusercontent.com/rancher/longhorn/master/deploy/backupstores/minio-backupstore.yaml
```
Now set `Settings/General/BackupTarget` to
```
s3://backupbucket@us-east-1/
```
And `Setttings/General/BackupTargetSecret` to
```
minio-secret
```
Click the `Backup` tab in the UI, it should report an empty list without error out.
The `minio-secret` yaml looks like this:
```
apiVersion: v1
kind: Secret
metadata:
name: minio-secret
namespace: longhorn-system
type: Opaque
data:
AWS_ACCESS_KEY_ID: bG9uZ2hvcm4tdGVzdC1hY2Nlc3Mta2V5 # longhorn-test-access-key
AWS_SECRET_ACCESS_KEY: bG9uZ2hvcm4tdGVzdC1zZWNyZXQta2V5 # longhorn-test-secret-key
AWS_ENDPOINTS: aHR0cDovL21pbmlvLXNlcnZpY2UuZGVmYXVsdDo5MDAw # http://minio-service.default:9000
```
Please follow [the Kubernetes document](https://kubernetes.io/docs/concepts/configuration/secret/#creating-a-secret-manually) to create the secret.
* Make sure to use `echo -n` when generating the base64 encoding, otherwise an new line will be added at the end of the string and it will cause error when accessing the S3.
Notice the secret must be created in the `longhorn-system` namespace for Longhorn to access.
### NFS backupstore
For using NFS server as backupstore, NFS server must support NFSv4.
The target URL would looks like:
```
nfs://longhorn-test-nfs-svc.default:/opt/backupstore
```
You can find an example NFS backupstore for testing purpose [here](https://github.com/rancher/longhorn/blob/master/deploy/backupstores/nfs-backupstore.yaml).
# Setup recurring snapshot/backup
Longhorn supports recurring snapshot and backup for volumes. User only need to set when he/she wish to take the snapshot and/or backup, and how many snapshots/backups needs to be retains, then Longhorn will automatically create snapshot/backup for the user at that time, as long as the volume is attached to a node.
Users can setup recurring snapshot/backup via Longhorn UI, or Kubernetes StorageClass.
## Set up recurring jobs using Longhorn UI
User can find the setting for the recurring snapshot and backup in the `Volume Detail` page.
## Set up recurring jobs using StorageClass
Users can set field `recurringJobs` in StorageClass as parameters. Any future volumes created using this StorageClass will have those recurring jobs automatically set up.
Field `recurringJobs` should follow JSON format. e.g.
```
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn
provisioner: rancher.io/longhorn
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "30"
fromBackup: ""
recurringJobs: '[{"name":"snap", "task":"snapshot", "cron":"*/1 * * * *", "retain":1},
{"name":"backup", "task":"backup", "cron":"*/2 * * * *", "retain":1}]'
```
Explanation:
1. `name`: Name of one job. Do not use duplicate name in one `recurringJobs`. And the length of `name` should be no more than 8 characters.
2. `task`: Type of one job. It supports `snapshot` (periodically create snapshot) or `backup` (periodically create snapshot then do backup) only.
3. `cron`: Cron expression. It tells execution time of one job.
4. `retain`: How many snapshots/backups Longhorn will retain for one job. It should be no less than 1.

View File

@ -1,46 +0,0 @@
# Storage Tags
## Overview
The storage tag feature enables the user to only use certain nodes or disks for storing Longhorn volume data. For example, performance-sensitive data can use only the high-performance disks which can be tagged as `fast`, `ssd` or `nvme`, or only the high-performance node tagged as `baremetal`.
This feature supports both disks and nodes.
## Setup
The tag setup can be found at Longhorn UI:
1. *Node -> Select one node -> Edit Node and Disks*
2. Click `+New Node Tag` or `+New Disk Tag` to add new tags.
All the existing scheduled replica on the node or disk won't be affected by the new tags
## Usage
When multiple tags are specified for a volume, the disk and the node (the disk belong to) must have all the specified tags to become usable.
### UI
When creating a volume, specify the disk tag and node tag in the UI.
### Kubernetes
Use Kubernetes StorageClass setting to specify tags.
For example:
```
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-fast
provisioner: rancher.io/longhorn
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "480" # 8 hours in minutes
diskSelector: "ssd"
nodeSelector: "storage,fast"
```
## History
* [Original feature request](https://github.com/longhorn/longhorn/issues/311)
* Available since v0.6.0

View File

@ -1,31 +0,0 @@
# Taint Toleration
## Overview
If users want to create nodes with large storage spaces and/or CPU resources for Longhorn only (to store replica data) and reject other general workloads, they can taint those nodes and add tolerations for Longhorn components. Then Longhorn can be deployed on those nodes.
Notice that the taint tolerations setting for one workload will not prevent it from being scheduled to the nodes that don't contain the corresponding taints. See [Kubernetes's taint and toleration](https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/) for details.
## Setup
### During installing Longhorn
Follow the instructions to set init taint tolerations: [Customize default settings](https://github.com/longhorn/longhorn/wiki/Feature:-Customized-Default-Setting#usage)
### After Longhorn has been installed
The taint toleration setting can be found at Longhorn UI:
Setting -> General -> Kubernetes Taint Toleration
Users can modify the existing tolerations or add more tolerations here, but noted that it will result in all the Longhorn system components to be recreated.
## Usage
1. Before modifying the toleration setting, users should make sure all Longhorn volumes are `detached`. Since all Longhorn components will be restarted then the Longhorn system is unavailable temporarily. If there are running Longhorn volumes in the system, this means the Longhorn system cannot restart its components and the request will be rejected.
2. During the Longhorn system updates toleration setting and restarts its components, users shouldnt operate the Longhorn system.
3. When users set tolerations, the substring `kubernetes.io` shouldn't be contained in the setting. It is used and considered as the key of Kubernetes default tolerations.
4. Multiple tolerations can be set here, and these tolerations are separated by the semicolon. For example: `key1=value1:NoSchedule; key2:NoExecute`.
## History
[Original feature request](https://github.com/longhorn/longhorn/issues/584)
Available since v0.6.0

View File

@ -1,60 +0,0 @@
# Troubleshooting
## Common issues
### Volume can be attached/detached from UI, but Kubernetes Pod/StatefulSet etc cannot use it
#### Using with Flexvolume Plugin
Check if volume plugin directory has been set correctly. This is automatically detected unless user explicitly set it.
By default, Kubernetes uses `/usr/libexec/kubernetes/kubelet-plugins/volume/exec/`, as stated in the [official document](https://github.com/kubernetes/community/blob/master/contributors/devel/flexvolume.md#prerequisites).
Some vendors choose to change the directory for various reasons. For example, GKE uses `/home/kubernetes/flexvolume` instead.
User can find the correct directory by running `ps aux|grep kubelet` on the host and check the `--volume-plugin-dir` parameter. If there is none, the default `/usr/libexec/kubernetes/kubelet-plugins/volume/exec/` will be used.
## Troubleshooting guide
There are a few compontents in the Longhorn. Manager, Engine, Driver and UI. All of those components runnings as pods in the `longhorn-system` namespace by default inside the Kubernetes cluster.
Most of the logs are included in the Support Bundle. You can click Generate Support Bundle link at the bottom of the UI to download a zip file contains Longhorn related configuration and logs.
One exception is the `dmesg`, which need to retrieve by the user on each node.
### UI
Make use of the Longhorn UI is a good start for the troubleshooting. For example, if Kubernetes cannot mount one volume correctly, after stop the workload, try to attach and mount that volume manually on one node and access the content to check if volume is intact.
Also, the event logs in the UI dashboard provides some information of probably issues. Check for the event logs in `Warning` level.
### Manager and engines
You can get the log from Longhorn Manager and Engines to help with the troubleshooting. The most useful logs are from `longhorn-manager-xxx`, and the log inside Longhorn instance managers, e.g. `instance-manager-e-xxxx` and `instance-manager-r-xxxx`.
Since normally there are multiple Longhorn Manager running at the same time, we recommend using [kubetail](https://github.com/johanhaleby/kubetail) which is a great tool to keep track of the logs of multiple pods. You can use:
```
kubetail longhorn-manager -n longhorn-system
```
To track the manager logs in real time.
### CSI driver
For CSI driver, check the logs for `csi-attacher-0` and `csi-provisioner-0`, as well as containers in `longhorn-csi-plugin-xxx`.
### Flexvolume driver
For Flexvolume driver, first check where the driver has been installed on the node. Check the log of `longhorn-driver-deployer-xxxx` for that information.
Then check the kubelet logs. Flexvolume driver itself doesn't run inside the container. It would run along with the kubelet process.
If kubelet is running natively on the node, you can use the following command to get the log:
```
journalctl -u kubelet
```
Or if kubelet is running as a container (e.g. in RKE), use the following command instead:
```
docker logs kubelet
```
For even more detail logs of Longhorn Flexvolume, run following command on the node or inside the container (if kubelet is running as a container, e.g. in RKE):
```
touch /var/log/longhorn_driver.log
```

View File

@ -1,129 +0,0 @@
# Upgrade from v0.6.2 to v0.7.0
The users need to follow this guide to upgrade from v0.6.2 to v0.7.0.
## Preparation
1. Make sure Kubernetes version >= v1.14.0
1. Make backups for all the volumes.
1. Stop the workload using the volumes.
1. Live upgrade is not supported from v0.6.2 to v0.7.0
## Upgrade
### Use Rancher App
1. Run the following command to avoid [this 'updates to provisioner are forbidden' error](#error-longhorn-is-invalid-provisioner-forbidden-updates-to-provisioner-are-forbidden):
```
kubectl delete -f https://raw.githubusercontent.com/longhorn/longhorn/v0.7.0/examples/storageclass.yaml
```
2. Click the `Upgrade` button in the Rancher UI
3. Wait for the app to complete the upgrade.
### Use YAML file
Use `kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v0.7.0/deploy/longhorn.yaml`
And wait for all the pods to become running and Longhorn UI working.
```
$ kubectl -n longhorn-system get pod
NAME READY STATUS RESTARTS AGE
compatible-csi-attacher-69857469fd-rj5vm 1/1 Running 4 3d12h
csi-attacher-79b9bfc665-56sdb 1/1 Running 0 3d12h
csi-attacher-79b9bfc665-hdj7t 1/1 Running 0 3d12h
csi-attacher-79b9bfc665-tfggq 1/1 Running 3 3d12h
csi-provisioner-68b7d975bb-5ggp8 1/1 Running 0 3d12h
csi-provisioner-68b7d975bb-frggd 1/1 Running 2 3d12h
csi-provisioner-68b7d975bb-zrr65 1/1 Running 0 3d12h
engine-image-ei-605a0f3e-8gx4s 1/1 Running 0 3d14h
engine-image-ei-605a0f3e-97gxx 1/1 Running 0 3d14h
engine-image-ei-605a0f3e-r6wm4 1/1 Running 0 3d14h
instance-manager-e-a90b0bab 1/1 Running 0 3d14h
instance-manager-e-d1458894 1/1 Running 0 3d14h
instance-manager-e-f2caa5e5 1/1 Running 0 3d14h
instance-manager-r-04417b70 1/1 Running 0 3d14h
instance-manager-r-36d9928a 1/1 Running 0 3d14h
instance-manager-r-f25172b1 1/1 Running 0 3d14h
longhorn-csi-plugin-72bsp 4/4 Running 0 3d12h
longhorn-csi-plugin-hlbg8 4/4 Running 0 3d12h
longhorn-csi-plugin-zrvhl 4/4 Running 0 3d12h
longhorn-driver-deployer-66b6d8b97c-snjrn 1/1 Running 0 3d12h
longhorn-manager-pf5p5 1/1 Running 0 3d14h
longhorn-manager-r5npp 1/1 Running 1 3d14h
longhorn-manager-t59kt 1/1 Running 0 3d14h
longhorn-ui-b466b6d74-w7wzf 1/1 Running 0 50m
```
## TroubleShooting
### Error: `"longhorn" is invalid: provisioner: Forbidden: updates to provisioner are forbidden.`
- This means you need to clean up the old `longhorn` storageClass for Longhorn v0.7.0 upgrade, since we've changed the provisioner from `rancher.io/longhorn` to `driver.longhorn.io`.
- Noticed the PVs created by the old storageClass will still use `rancher.io/longhorn` as provisioner. Longhorn v0.7.0 supports attach/detach/deleting of the PVs created by the previous version of Longhorn, but it doesn't support creating new PVs using the old provisioner name. Please use the new StorageClass for the new volumes.
#### If you are using YAML file:
1. Clean up the deprecated StorageClass:
```
kubectl delete -f https://raw.githubusercontent.com/longhorn/longhorn/v0.7.0/examples/storageclass.yaml
```
2. Run
```
kubectl apply https://raw.githubusercontent.com/longhorn/longhorn/v0.7.0/deploy/longhorn.yaml
```
#### If you are using Rancher App:
1. Clean up the default StorageClass:
```
kubectl delete -f https://raw.githubusercontent.com/longhorn/longhorn/v0.7.0/examples/storageclass.yaml
```
2. Follow [this error instruction](#error-kind-customresourcedefinition-with-the-name-xxx-already-exists-in-the-cluster-and-wasnt-defined-in-the-previous-release)
### Error: `kind CustomResourceDefinition with the name "xxx" already exists in the cluster and wasn't defined in the previous release...`
- This is [a Helm bug](https://github.com/helm/helm/issues/6031).
- Please make sure that you have not deleted the old Longhorn CRDs via the command `curl -s https://raw.githubusercontent.com/longhorn/longhorn-manager/master/hack/cleancrds.sh | bash -s v062` or executed Longhorn uninstaller before executing the following command. Otherwise you MAY LOSE all the data stored in the Longhorn system.
1. Clean up the leftover:
```
kubectl -n longhorn-system delete ds longhorn-manager
curl -s https://raw.githubusercontent.com/longhorn/longhorn-manager/master/hack/cleancrds.sh | bash -s v070
```
2. Re-click the `Upgrade` button in the Rancher UI.
## Rollback
Since we upgrade the CSI framework from v0.4.2 to v1.1.0 in this release, rolling back from Longhorn v0.7.0 to v0.6.2 or lower means backward upgrade for the CSI plugin.
But Kubernetes does not support the CSI backward upgrade. **Hence restarting kubelet is unavoidable. Please be careful, check the conditions beforehand and follow the instruction exactly.**
Prerequisite:
* To rollback from v0.7.0 installation, you must haven't executed [the post upgrade steps](#post-upgrade).
Steps to roll back:
1. Clean up the components introduced by Longhorn v0.7.0 upgrade
```
kubectl delete -f https://raw.githubusercontent.com/longhorn/longhorn/v0.7.0/examples/storageclass.yaml
curl -s https://raw.githubusercontent.com/longhorn/longhorn-manager/master/hack/cleancrds.sh | bash -s v070
```
2. Restart the Kubelet container on all nodes or restart all the nodes. This step WILL DISRUPT all the workloads in the system.
Connect to the node then run
```
docker restart kubelet
```
3. Rollback
Use `kubectl apply` or Rancher App to rollback the Longhorn.
#### Migrate the old PVs to use the new StorageClass
TODO
## Post upgrade
1. Bring back the workload online.
1. Make sure all the volumes are back online.
1. Check all the existing manager pods are running v0.7.0. No v0.6.2 pods is running.
1. Run `kubectl -n longhorn-system get pod -o yaml|grep "longhorn-manager:v0.6.2"` should yield no result.
1. Run the following script to clean up the v0.6.2 CRDs.
1. Must make sure all the v0.6.2 pods HAVE BEEN DELETED, otherwise the data WILL BE LOST!
```
curl -s https://raw.githubusercontent.com/longhorn/longhorn-manager/master/hack/cleancrds.sh | bash -s v062
```

View File

@ -1,284 +0,0 @@
# Upgrade
Here we cover how to upgrade to latest Longhorn from all previous releases.
There are normally two steps in the upgrade process: first upgrade Longhorn manager to the latest version, then upgrade Longhorn engine to the latest version using latest Longhorn manager.
## Upgrade from v0.6.2 to v0.7.0
See [here](./upgrade-from-v0.6.2-to-v0.7.0.md)
## Upgrade from older versions to v0.6.2
## Upgrade Longhorn manager from v0.3.0 or newer
### From Longhorn App (Rancher Catalog App)
On Rancher UI, navigate to the `Catalog Apps` screen and click the
`Upgrade available` button. Do not change any of the settings. *Do not change
any of the settings right now.* Click `Upgrade`.
Access Longhorn UI. Periodically refresh the page until the version in the
bottom left corner of the screen changes. Wait until websocket indicators in
bottom right corner of the screen turn solid green. Navigate to
`Setting->Engine Image` and wait until the new Engine Image is `Ready`.
### From Longhorn deployment yaml
If you didn't change any configuration during Longhorn installation, follow [the official Longhorn Deployment instructions](../README.md#deployment) to upgrade.
Otherwise you will need to download the yaml file from [the official Longhorn Deployment instructions](../README.md#deployment), modify it to your need, then use `kubectl apply -f` to upgrade.
## Upgrade Longhorn engine
**ALWAYS MAKE BACKUPS BEFORE UPGRADE THE ENGINE IMAGES.**
### Offline upgrade
If live upgrade is not available (e.g. before v0.3.3, from v0.5.0 to v0.6.0, v0.7.0 to v0.8.0), or the volume stuck in degraded state:
1. Follow [the detach procedure for relevant workloads](upgrade.md#detach-volumes).
2. Select all the volumes using batch selection. Click batch operation button
`Upgrade Engine`, choose the engine image available in the list. It's
the default engine shipped with the manager for this release.
3. Resume all workloads by reversing the [detach volumes procedure](upgrade.md#detach-volumes).
Any volume not part of a Kubernetes workload must be attached from Longhorn UI.
### Live upgrade
Live upgrade is available since v0.3.3, with the exception of upgrade from v0.5.0 to v0.6.0 and v0.7.0 to v0.8.0.
Live upgrade should only be done with healthy volumes.
1. Select the volume you want to upgrade.
2. Click `Upgrade Engine` in the drop down.
3. Select the engine image you want to upgrade to.
1. Normally it's the only engine image in the list, since the UI exclude the current image from the list.
4. Click OK.
During the live upgrade, the user will see double number of the replicas temporarily. After upgrade complete, the user should see the same number of the replicas as before, and the `Engine Image` field of the volume should be updated.
Notice after the live upgrade, Rancher or Kubernetes would still show the old version of image for the engine, and new version for the replicas. It's expected. The upgrade is success if you see the new version of image listed as the volume image in the Volume Detail page.
### Clean up the old image
After you've done upgrade for all the images, select `Settings/Engine Image` from Longhorn UI. Now you should able to remove the non-default image.
## Migrating Between Flexvolume and CSI Driver
Ensure your Longhorn App is up to date. Follow the relevant upgrade procedure
above before proceeding.
The migration path between drivers requires backing up and restoring each
volume and will incur both API and workload downtime. This can be a tedious
process; consider what benefit switching drivers will bring before proceeding.
Consider deleting unimportant workloads using the old driver to reduce effort.
### Flexvolume to CSI
CSI is the newest out-of-tree Kubernetes storage interface.
1. [Backup existing volumes](upgrade.md#backup-existing-volumes).
2. On Rancher UI, navigate to the `Catalog Apps` screen, locate the `Longhorn`
app and click the `Up to date` button. Under `Kubernetes Driver`, select
`flexvolume`. We recommend leaving `Flexvolume Path` empty. Click `Upgrade`.
3. Restore each volume by following the [CSI restore procedure](restore_statefulset.md#csi-instructions).
This procedure is tailored to the StatefulSet workload, but the process is
approximately the same for all workloads.
### CSI to Flexvolume
If you would like to migrate from CSI to Flexvolume driver, we are interested
to hear from you. CSI is the newest out-of-tree storage interface and we
expect it to replace Flexvolume's exec-based model.
1. [Backup existing volumes](upgrade.md#backup-existing-volumes).
2. On Rancher UI, navigate to the `Catalog Apps` screen, locate the `Longhorn`
app and click the `Up to date` button. Under `Kubernetes Driver`, select
`flexvolume`. We recommend leaving `Flexvolume Path` empty. Click `Upgrade`.
3. Restore each volume by following the [Flexvolume restore procedure](restore_statefulset.md#flexvolume-instructions).
This procedure is tailored to the StatefulSet workload, but the process is
approximately the same for all workloads.
## Upgrade Longhorn manager from v0.2 and older
The upgrade procedure for Longhorn v0.2 and v0.1 deployments is more involved.
### Backup Existing Volumes
It's recommended to create a recent backup of every volume to the backupstore
before upgrade. If you don't have a on-cluster backupstore already, create one.
We'll use NFS backupstore for this example.
1. Execute following command to create the backupstore
```
kubectl apply -f https://raw.githubusercontent.com/rancher/longhorn/master/deploy/backupstores/nfs-backupstore.yaml
```
2. On Longhorn UI Settings page, set Backup Target to
`nfs://longhorn-test-nfs-svc.default:/opt/backupstore` and click `Save`.
Navigate to each volume detail page and click `Take Snapshot` (it's recommended to run `sync` in the host command line before `Take Snapshot`). Click the new
snapshot and click `Backup`. Wait for the new backup to show up in the volume's backup list before continuing.
### Check For Issues
Make sure no volume is in degraded or faulted state. Wait for degraded
volumes to heal and delete/salvage faulted volumes before proceeding.
### Detach Volumes
Shutdown all Kubernetes Pods using Longhorn volumes in order to detach the
volumes. The easiest way to achieve this is by deleting all workloads and recreate them later after upgrade. If
this is not desirable, some workloads may be suspended. We will cover how
each workload can be modified to shut down its pods.
#### Deployment
Edit the deployment with `kubectl edit deploy/<name>`.
Set `.spec.replicas` to `0`.
#### StatefulSet
Edit the statefulset with `kubectl edit statefulset/<name>`.
Set `.spec.replicas` to `0`.
#### DaemonSet
There is no way to suspend this workload.
Delete the daemonset with `kubectl delete ds/<name>`.
#### Pod
Delete the pod with `kubectl delete pod/<name>`.
There is no way to suspend a pod not managed by a workload controller.
#### CronJob
Edit the cronjob with `kubectl edit cronjob/<name>`.
Set `.spec.suspend` to `true`.
Wait for any currently executing jobs to complete, or terminate them by
deleting relevant pods.
#### Job
Consider allowing the single-run job to complete.
Otherwise, delete the job with `kubectl delete job/<name>`.
#### ReplicaSet
Edit the replicaset with `kubectl edit replicaset/<name>`.
Set `.spec.replicas` to `0`.
#### ReplicationController
Edit the replicationcontroller with `kubectl edit rc/<name>`.
Set `.spec.replicas` to `0`.
Wait for the volumes using by the Kubernetes to complete detaching.
Then detach all remaining volumes from Longhorn UI. These volumes were most likely
created and attached outside of Kubernetes via Longhorn UI or REST API.
### Uninstall the Old Version of Longhorn
Make note of `BackupTarget` on the `Setting` page. You will need to manually
set `BackupTarget` after upgrading from either v0.1 or v0.2.
Delete Longhorn components.
For Longhorn `v0.1` (most likely installed using Longhorn App in Rancher 2.0):
```
kubectl delete -f https://raw.githubusercontent.com/llparse/longhorn/v0.1/deploy/uninstall-for-upgrade.yaml
```
For Longhorn `v0.2`:
```
kubectl delete -f https://raw.githubusercontent.com/rancher/longhorn/v0.2/deploy/uninstall-for-upgrade.yaml
```
If both commands returned `Not found` for all components, Longhorn is probably
deployed in a different namespace. Determine which namespace is in use and
adjust `NAMESPACE` here accordingly:
```
NAMESPACE=<some_longhorn_namespace>
curl -sSfL https://raw.githubusercontent.com/rancher/longhorn/v0.1/deploy/uninstall-for-upgrade.yaml|sed "s#^\( *\)namespace: longhorn#\1namespace: ${NAMESPACE}#g" > longhorn.yaml
kubectl delete -f longhorn.yaml
```
### Backup Longhorn System
We're going to backup Longhorn CRD yaml to local directory, so we can restore or inspect them later.
#### Upgrade from v0.1
User must backup the CRDs for v0.1 because we will change the default deploying namespace for Longhorn.
Check your backups to make sure Longhorn was running in namespace `longhorn`, otherwise change the value of `NAMESPACE` below.
```
NAMESPACE=longhorn
kubectl -n ${NAMESPACE} get volumes.longhorn.rancher.io -o yaml > longhorn-v0.1-backup-volumes.yaml
kubectl -n ${NAMESPACE} get engines.longhorn.rancher.io -o yaml > longhorn-v0.1-backup-engines.yaml
kubectl -n ${NAMESPACE} get replicas.longhorn.rancher.io -o yaml > longhorn-v0.1-backup-replicas.yaml
kubectl -n ${NAMESPACE} get settings.longhorn.rancher.io -o yaml > longhorn-v0.1-backup-settings.yaml
```
After it's done, check those files, make sure they're not empty (unless you have no existing volumes).
#### Upgrade from v0.2
Check your backups to make sure Longhorn was running in namespace
`longhorn-system`, otherwise change the value of `NAMESPACE` below.
```
NAMESPACE=longhorn-system
kubectl -n ${NAMESPACE} get volumes.longhorn.rancher.io -o yaml > longhorn-v0.2-backup-volumes.yaml
kubectl -n ${NAMESPACE} get engines.longhorn.rancher.io -o yaml > longhorn-v0.2-backup-engines.yaml
kubectl -n ${NAMESPACE} get replicas.longhorn.rancher.io -o yaml > longhorn-v0.2-backup-replicas.yaml
kubectl -n ${NAMESPACE} get settings.longhorn.rancher.io -o yaml > longhorn-v0.2-backup-settings.yaml
```
After it's done, check those files, make sure they're not empty (unless you have no existing volumes).
### Delete CRDs in Different Namespace
This is only required for Rancher users running Longhorn App `v0.1`. Delete all
CRDs from your namespace which is `longhorn` by default.
```
NAMESPACE=longhorn
kubectl -n ${NAMESPACE} get volumes.longhorn.rancher.io -o yaml | sed "s/\- longhorn.rancher.io//g" | kubectl apply -f -
kubectl -n ${NAMESPACE} get engines.longhorn.rancher.io -o yaml | sed "s/\- longhorn.rancher.io//g" | kubectl apply -f -
kubectl -n ${NAMESPACE} get replicas.longhorn.rancher.io -o yaml | sed "s/\- longhorn.rancher.io//g" | kubectl apply -f -
kubectl -n ${NAMESPACE} get settings.longhorn.rancher.io -o yaml | sed "s/\- longhorn.rancher.io//g" | kubectl apply -f -
kubectl -n ${NAMESPACE} delete volumes.longhorn.rancher.io --all
kubectl -n ${NAMESPACE} delete engines.longhorn.rancher.io --all
kubectl -n ${NAMESPACE} delete replicas.longhorn.rancher.io --all
kubectl -n ${NAMESPACE} delete settings.longhorn.rancher.io --all
```
### Install Longhorn
#### Upgrade from v0.1
For Rancher users who are running Longhorn v0.1, **do not click the upgrade button in the Rancher App.**
1. Delete the Longhorn App from `Catalog Apps` screen in Rancher UI.
2. Launch Longhorn App template version `0.3.1`.
3. Restore Longhorn System data. This step is required for Rancher users running Longhorn App `v0.1`.
Don't change the NAMESPACE variable below, since the newly installed Longhorn system will be installed in the `longhorn-system` namespace.
```
NAMESPACE=longhorn-system
sed "s#^\( *\)namespace: .*#\1namespace: ${NAMESPACE}#g" longhorn-v0.1-backup-settings.yaml | kubectl apply -f -
sed "s#^\( *\)namespace: .*#\1namespace: ${NAMESPACE}#g" longhorn-v0.1-backup-replicas.yaml | kubectl apply -f -
sed "s#^\( *\)namespace: .*#\1namespace: ${NAMESPACE}#g" longhorn-v0.1-backup-engines.yaml | kubectl apply -f -
sed "s#^\( *\)namespace: .*#\1namespace: ${NAMESPACE}#g" longhorn-v0.1-backup-volumes.yaml | kubectl apply -f -
```
### Upgrade from v0.2
For Longhorn v0.2 users who are not using Rancher, follow
[the official Longhorn Deployment instructions](../README.md#deployment).
### Access UI and Set BackupTarget
Wait until the longhorn-ui and longhorn-manager pods are `Running`:
```
kubectl -n longhorn-system get pod -w
```
[Access the UI](../README.md#access-the-ui).
On `Setting > General`, set `Backup Target` to the backup target used in
the previous version. In our example, this is
`nfs://longhorn-test-nfs-svc.default:/opt/backupstore`.
## Note
Upgrade is always tricky. Keeping recent backups for volumes is critical. If
anything goes wrong, you can restore the volume using the backup.
If you have any issues, please report it at
https://github.com/rancher/longhorn/issues and include your backup yaml files
as well as manager logs.

View File

@ -1,44 +0,0 @@
# Volume operations
### Changing replica count of the volumes
The default replica count can be changed in the setting.
Also, when a volume is attached, the user can change the replica count for the volume in the UI.
Longhorn will always try to maintain at least given number of healthy replicas for each volume.
1. If the current healthy replica count is less than specified replica count, Longhorn will start rebuilding new replicas.
2. If the current healthy replica count is more than specified replica count, Longhorn will do nothing. In this situation, if user delete one or more healthy replicas, or there are healthy replicas failed, as long as the total healthy replica count doesn't dip below the specified replica count, Longhorn won't start rebuilding new replicas.
### Volume size
Longhorn is a thin-provisioned storage system. That means a Longhorn volume will only take the space it needs at the moment. For example, if you allocated a 20GB volume but only use 1GB of it, the actual data size on your disk would be 1GB. You can see the actual data size in the volume details in the UI.
Longhorn volume itself cannot shrink in size if you've removed content from your volume. For example, if you create a volume of 20GB, used 10GB, then removed the content of 9GB, the actual size on the disk would still be 10GB instead of 1GB. It's because currently Longhorn operates on the block level, not filesystem level, so it doesn't know if user has removed the content or not. That information is mostly kept in the filesystem level.
#### Space taken by the snapshots
Some users may found that a Longhorn volume's actual size is bigger than it's nominal size. That's because in Longhorn, snapshot stored the history data of the volume, which will also take some spaces, depends on how much data was in the snapshot. The snapshot feature enables user to revert back to a certain point in history, create a backup to secondary storage. The snapshot feature is also a part Longhorn on rebuilding process. Everytime when Longhorn detects a replica is down, it will take a (system) snapshot automatically and start rebuilding on another node.
To reduce the space taken by snapshots, user can schedule a recurring snapshot or backup with a retain number, which will
automatically create a new snapshot/backup on schedule, then clean up for any excessive snapshots/backups.
User can also delete unwanted snapshot manually through UI. Any system generated snapshots will be automatically marked for deletion if the deletion of any snapshot was triggered.
#### The latest snapshot
In Longhorn, the latest snapshot cannot be deleted. It because whenever a snapshot is deleted, Longhorn will coalesce it content with the next snapshot, makes sure the next and later snapshot will still have the correct content. But Longhorn cannot do that for the latest snapshot since there is no next snapshot to it. The next "snapshot" of the latest snapshot is the live volume(`volume-head`), which is being read/written by the user at the moment, so the coalescing process cannot happen. Instead, the latest snapshot will be marked as `removed`, and it will be cleaned up next time when possible.
If the users want to clean up the latest snapshot, they can create a new snapshot, then remove the previous "latest" snapshot.
### Maintenance mode
After v0.6.0, when the user attaching the volume from Longhorn UI, there is a checkbox for `Maintenance mode`. The option will result in attaching the volume without enabling the frontend (block device or iSCSI), to make sure no one can access the volume data when the volume is attached.
It's mainly used to perform `Snapshot Revert`. After v0.6.0, Snapshot Reverting operation required volume to be in `Maintenance mode` since we cannot modify the block device's content with the volume mounted or being used, otherwise it will cause filesystem corruptions.
It's also useful to inspect the volume state without worry that the data can be accessed by accident.
## Volume parameters
#### Stale Replica Timeout (`staleReplicaTimeout`)
Stale Replica Timeout determines when would Longhorn cleanup an error replica after the replica become `ERROR`. Unit is in minutes. Default is `2880` (48 hours)