longhorn/docs/troubleshooting.md

61 lines
3.2 KiB
Markdown
Raw Normal View History

2018-08-22 21:01:01 +00:00
# Troubleshooting
2018-08-22 21:01:01 +00:00
## Common issues
### Volume can be attached/detached from UI, but Kubernetes Pod/StatefulSet etc cannot use it
2018-10-08 22:43:24 +00:00
#### Using with Flexvolume Plugin
Check if volume plugin directory has been set correctly. This is automatically detected unless user explicitly set it.
By default, Kubernetes uses `/usr/libexec/kubernetes/kubelet-plugins/volume/exec/`, as stated in the [official document](https://github.com/kubernetes/community/blob/master/contributors/devel/flexvolume.md#prerequisites).
Some vendors choose to change the directory for various reasons. For example, GKE uses `/home/kubernetes/flexvolume` instead.
User can find the correct directory by running `ps aux|grep kubelet` on the host and check the `--volume-plugin-dir` parameter. If there is none, the default `/usr/libexec/kubernetes/kubelet-plugins/volume/exec/` will be used.
2018-08-22 21:08:01 +00:00
2018-08-22 21:01:01 +00:00
## Troubleshooting guide
2018-08-22 21:00:31 +00:00
There are a few compontents in the Longhorn. Manager, Engine, Driver and UI. All of those components runnings as pods in the `longhorn-system` namespace by default inside the Kubernetes cluster.
2019-02-22 01:02:43 +00:00
Most of the logs are included in the Support Bundle. You can click Generate Support Bundle link at the bottom of the UI to download a zip file contains Longhorn related configuration and logs.
One exception is the `dmesg`, which need to retrieve by the user on each node.
2018-08-22 21:01:01 +00:00
### UI
2018-08-22 21:00:31 +00:00
Make use of the Longhorn UI is a good start for the troubleshooting. For example, if Kubernetes cannot mount one volume correctly, after stop the workload, try to attach and mount that volume manually on one node and access the content to check if volume is intact.
Also, the event logs in the UI dashboard provides some information of probably issues. Check for the event logs in `Warning` level.
2018-08-22 21:01:01 +00:00
### Manager and engines
You can get the log from Longhorn Manager and Engines to help with the troubleshooting. The most useful logs are from `longhorn-manager-xxx`, and the log inside Longhorn instance managers, e.g. `instance-manager-e-xxxx` and `instance-manager-r-xxxx`.
2018-08-22 21:00:31 +00:00
Since normally there are multiple Longhorn Manager running at the same time, we recommend using [kubetail](https://github.com/johanhaleby/kubetail) which is a great tool to keep track of the logs of multiple pods. You can use:
```
2018-12-06 17:06:42 +00:00
kubetail longhorn-manager -n longhorn-system
2018-08-22 21:00:31 +00:00
```
To track the manager logs in real time.
2018-08-22 21:01:01 +00:00
### CSI driver
2018-08-22 21:00:31 +00:00
For CSI driver, check the logs for `csi-attacher-0` and `csi-provisioner-0`, as well as containers in `longhorn-csi-plugin-xxx`.
2018-08-22 21:01:01 +00:00
### Flexvolume driver
2018-08-22 21:00:31 +00:00
2018-08-22 21:08:01 +00:00
For Flexvolume driver, first check where the driver has been installed on the node. Check the log of `longhorn-driver-deployer-xxxx` for that information.
Then check the kubelet logs. Flexvolume driver itself doesn't run inside the container. It would run along with the kubelet process.
2018-08-22 21:00:31 +00:00
If kubelet is running natively on the node, you can use the following command to get the log:
```
journalctl -u kubelet
```
Or if kubelet is running as a container (e.g. in RKE), use the following command instead:
```
docker logs kubelet
```
For even more detail logs of Longhorn Flexvolume, run following command on the node or inside the container (if kubelet is running as a container, e.g. in RKE):
```
touch /var/log/longhorn_driver.log
```