fix some typo on doc

Signed-off-by: tgfree <tgfree7@gmail.com>
This commit is contained in:
tgfree 2022-06-17 16:57:08 +08:00 committed by David Ko
parent 30c7eab049
commit 1e8dd33559
17 changed files with 34 additions and 34 deletions

View File

@ -31,7 +31,7 @@ The latest release of Longhorn is [![Releases](https://img.shields.io/github/rel
## Release Status
| Release | Version | Type |
| --------|---------|----------------|
|---------|---------|--------|
| 1.3 | 1.3.0 | Latest |
| 1.2 | 1.2.4 | Stable |
| 1.1 | 1.1.3 | Stable |

View File

@ -15,7 +15,7 @@ https://github.com/longhorn/longhorn/issues/972
1. Previously Longhorn is using filesystem ID as keys to the map of disks on the node. But we found there is no guarantee that filesystem ID won't change after the node reboots for certain filesystems e.g. XFS.
1. We want to enable the ability to configure CRD directly, prepare for the CRD based API access in the future
1. We also need to make sure previously implemented safe guards are not impacted by this change:
1. If a disk was accidentally umounted on the node, we should detect that and stop replica from scheduling into it.
1. If a disk was accidentally unmounted on the node, we should detect that and stop replica from scheduling into it.
1. We shouldn't allow user to add two disks pointed to the same filesystem
### Non-goals

View File

@ -75,4 +75,4 @@ No special upgrade strategy is necessary. Once the user upgrades to the new vers
### Notes
- There is interest in allowing the user to decide on whether or not to retain the `Persistent Volume` (and possibly `Persistent Volume Claim`) for certain use cases such as restoring from a `Backup`. However, this would require changes to the way `go-rancher` generates the `Go` client that we use so that `Delete` requests against resources are able to take inputs.
- In the case that a `Volume` is provisioned from a `Storage Class` (and set to be `Deleted` once the `Persistent Volume Claim` utilizing that `Volume` has been deleted), the `Volume` should still be deleted properly regardless of how the deletion was initiated. If the `Volume` is deleted from the UI, the call that the `Volume Controller` makes to delete the `Persistent Volume` would only trigger one more deletion call from the `CSI` server to delete the `Volume`, which would return successfully and allow the `Persistent Volume` to be deleted and the `Volume` to be deleted as wekk. If the `Volume` is deleted because of the `Persistent Volume Claim`, the `CSI` server would be able to successfully make a `Volume` deletion call before deleting the `Persistent Volume`. The `Volume Controller` would have no additional resources to delete and be able to finish deletion of the `Volume`.
- In the case that a `Volume` is provisioned from a `Storage Class` (and set to be `Deleted` once the `Persistent Volume Claim` utilizing that `Volume` has been deleted), the `Volume` should still be deleted properly regardless of how the deletion was initiated. If the `Volume` is deleted from the UI, the call that the `Volume Controller` makes to delete the `Persistent Volume` would only trigger one more deletion call from the `CSI` server to delete the `Volume`, which would return successfully and allow the `Persistent Volume` to be deleted and the `Volume` to be deleted as well. If the `Volume` is deleted because of the `Persistent Volume Claim`, the `CSI` server would be able to successfully make a `Volume` deletion call before deleting the `Persistent Volume`. The `Volume Controller` would have no additional resources to delete and be able to finish deletion of the `Volume`.

View File

@ -16,7 +16,7 @@ https://github.com/longhorn/longhorn/issues/298
## Proposal
1. Add `Eviction Requested` with `true` and `false` selection buttons for disks and nodes. This is for user to evict or cancel the eviction of the disks or the nodes.
2. Add new `evictionRequested` field to `Node.Spec`, `Node.Spec.disks` Spec and `Replica.Status`. These will help tracking the request from user and trigger replica controller to update `Replica.Status` and volume controler to do the eviction. And this will reconcile with `scheduledReplica` of selected disks on the nodes.
2. Add new `evictionRequested` field to `Node.Spec`, `Node.Spec.disks` Spec and `Replica.Status`. These will help tracking the request from user and trigger replica controller to update `Replica.Status` and volume controller to do the eviction. And this will reconcile with `scheduledReplica` of selected disks on the nodes.
3. Display `fail to evict` error message to `Dashboard` and any other eviction errors to the `Event log`.
### User Stories
@ -47,7 +47,7 @@ From an API perspective, the call to set `Eviction Requested` to `true` or `fals
### Implementation Overview
1. On `Longhorn UI` `Node` page, for nodes eviction, adding `Eviction Requested` `true` and `false` options in the `Edit Node` sub-selection, next to `Node Scheduling`. For disks eviction, adding `Eviction Requested` `true` and `false` options in `Edit node and disks` sub-selection under `Operation` column next to each disk `Scheduling` options. This is for user to evict or cancel the eviction of the disks or the nodes.
2. Add new `evictionRequested` field to `Node.Spec`, `Node.Spec.disks` Spec and `Replica.Status`. These will help tracking the request from user and trigger replica controller to update `Replica.Status` and volume controler to do the eviction. And this will reconcile with `scheduledReplica` of selected disks on the nodes.
2. Add new `evictionRequested` field to `Node.Spec`, `Node.Spec.disks` Spec and `Replica.Status`. These will help tracking the request from user and trigger replica controller to update `Replica.Status` and volume controller to do the eviction. And this will reconcile with `scheduledReplica` of selected disks on the nodes.
3. Add a informer in `Replica Controller` to get these information and update `evictionRequested` field in `Replica.Status`.
4. Once `Eviction Requested` has been set to `true` for disks or nodes, the `evictionRequested` fields for the disks and nodes will be set to `true` (default is `false`).
5. `Replica Controller` will update `evictionRequested` field in `Replica.Status` and `Volume Controller` to get these information from it's replicas.
@ -61,7 +61,7 @@ From an API perspective, the call to set `Eviction Requested` to `true` or `fals
#### Manual Test Plan For Disks and Nodes Eviction
Positive Case:
For both `Replica Node Level Soft Anti-Affinity` has been enabled and disabled. Also the volume can be 'Attaced' or 'Detached'.
For both `Replica Node Level Soft Anti-Affinity` has been enabled and disabled. Also the volume can be 'Attached' or 'Detached'.
1. User can select one or more disks or nodes for eviction. Select `Eviction Requested` to `true` on the disabled disks or nodes, Longhorn should start rebuild replicas for the volumes which have replicas on the eviction disks or nodes, and after rebuild success, the replica number on the evicted disks or nodes should be 0. E.g. When there are 3 nodes in the cluster, and with `Replica Node Level Soft Anti-Affinity` is set to `false`, disable one node, and create a volume with replica count 2. And then evict one of them, the eviction should get stuck, then set `Replica Node Level Soft Anti-Affinity` to `true`, the eviction should go through.
Negative Cases:
@ -73,10 +73,10 @@ For `Replica Node Level Soft Anti-Affinity` is enabled, create 2 replicas on the
For `Replica Node Level Soft Anti-Affinity` is disabled, create 1 replica on a disk, and evict this disk or node, the replica should goto the other disk of node.
For node eviction, Longhorn will process the evition based on the disks for the node, this is like disk eviction. After eviction success, the replica number on the evicted node should be 0.
For node eviction, Longhorn will process the eviction based on the disks for the node, this is like disk eviction. After eviction success, the replica number on the evicted node should be 0.
#### Error Indication
During the eviction, user can click the `Replicas Number` on the `Node` page, and set which replicas are left from eviction, and click the `Replica Name` will redirect user to the `Volume` page to set if there is any error for this volume. If there is any error during the rebuild, Longhorn should display the error message from UI. The error could be `failed to schedule a replica` due to disk space or based on schedule policy, can not find a valid disk to put the relica.
During the eviction, user can click the `Replicas Number` on the `Node` page, and set which replicas are left from eviction, and click the `Replica Name` will redirect user to the `Volume` page to set if there is any error for this volume. If there is any error during the rebuild, Longhorn should display the error message from UI. The error could be `failed to schedule a replica` due to disk space or based on schedule policy, can not find a valid disk to put the replica.
### Upgrade strategy
No special upgrade strategy is necessary. Once the user upgrades to the new version of `Longhorn`, these new capabilities will be accessible from the `longhorn-ui` without any special work.

View File

@ -61,12 +61,12 @@ Same as the Design
### Test plan
1. Setup a cluster of 3 nodes
1. Install Longhorn and set `Default Replica Count = 2` (because we will turn off one node)
1. Create a SetfullSet with 2 pods using the command:
1. Create a StatefulSet with 2 pods using the command:
```
kubectl create -f https://raw.githubusercontent.com/longhorn/longhorn/master/examples/statefulset.yaml
```
1. Create a volume + pv + pvc named `vol1` and create a deployment of default ubuntu named `shell` with the usage of pvc `vol1` mounted under `/mnt/vol1`
1. Find the node which contains one pod of the StatefullSet/Deployment. Power off the node
1. Find the node which contains one pod of the StatefulSet/Deployment. Power off the node
#### StatefulSet
##### if `NodeDownPodDeletionPolicy ` is set to `do-nothing ` | `delete-deployment-pod`

View File

@ -119,7 +119,7 @@ UI modification:
* On the right volume info panel, add a <div> to display `selectedVolume.dataLocality`
* On the right volume panel, in the Health row, add an icon for data locality status.
Specifically, if `dataLocality=best-effort` but there is not a local replica then display a warning icon.
Similar to the replica node redundancy wanring [here](https://github.com/longhorn/longhorn-ui/blob/0a52c1f0bef172d8ececdf4e1e953bfe78c86f29/src/routes/volume/detail/VolumeInfo.js#L47)
Similar to the replica node redundancy warning [here](https://github.com/longhorn/longhorn-ui/blob/0a52c1f0bef172d8ececdf4e1e953bfe78c86f29/src/routes/volume/detail/VolumeInfo.js#L47)
* In the volume's actions dropdown, add a new action to update `dataLocality`
1. In Rancher UI, add a parameter `dataLocality` when create storage class using Longhorn provisioner.

View File

@ -15,7 +15,7 @@ https://github.com/longhorn/longhorn/issues/508
1. By default 'DisableRevisionCounter' is 'false', but Longhorn provides an optional for user to disable it.
2. Once user set 'DisableRevisionCounter' to 'true' globally or individually, this will improve Longhorn data path performance.
3. And for 'DisableRevisionCounter' is 'true', Longhorn will keep the ability to find the most suitable replica to recover the volume when the engine is faulted(all the replicas are in 'ERR' state).
4. Also during Longhorn Engine starting, with head file information it's unlikly to find out out of synced replicas. So will skip the check.
4. Also during Longhorn Engine starting, with head file information it's unlikely to find out out of synced replicas. So will skip the check.
## Proposal
@ -41,7 +41,7 @@ Or from StorageClass yaml file, user can set 'parameters' 'revisionCounterDisabl
User can also set 'DisableRevisionCounter' for each individual volumes created by Longhorn UI this individual setting will over write the global setting.
Once the volume has 'DisableRevisionCounter' to 'true', there won't be revision counter file. And the 'Automatic salvage' is 'true', when the engine is fauled, the engine will pick the most suitable replica as 'Source of Truth' to recover the volume.
Once the volume has 'DisableRevisionCounter' to 'true', there won't be revision counter file. And the 'Automatic salvage' is 'true', when the engine is faulted, the engine will pick the most suitable replica as 'Source of Truth' to recover the volume.
### API changes
@ -63,12 +63,12 @@ And for the API compatibility issues, always check the 'EngineImage.Statue.cliAP
1. Add 'Volume.Spec.RevisionCounterDisabled', 'Replica.Spec.RevisionCounterDisabled' and 'Engine.Spec.RevisionCounterDisabled' to volume, replica and engine objects.
2. Once 'RevisionCounterDisabled' is 'true', volume controller will set 'Volume.Spec.RevisionCounterDisabled' to true, 'Replica.Spec.RevisionCounterDisabled' and 'Engine.Spec.RevisionCounterDisabled' will set to true. And during 'ReplicaProcessCreate' and 'EngineProcessCreate' , this will be passed to engine replica process and engine controller process to start a replica and controller without revision counter.
3. During 'ReplicaProcessCreate' and 'EngineProcessCreate', if 'Replica.Spec.RevisionCounterDisabled' or 'Engine.Spec.RevisionCounterDisabled' is true, it will pass extra parameter to engine replica to start replica without revision counter or to engine controller to start controller without revision counter support, otherwise keep it the same as current and engine replica will use the default value 'false' for this extra paramter. This is the same as the engine controller to set the 'salvageRequested' flag.
3. During 'ReplicaProcessCreate' and 'EngineProcessCreate', if 'Replica.Spec.RevisionCounterDisabled' or 'Engine.Spec.RevisionCounterDisabled' is true, it will pass extra parameter to engine replica to start replica without revision counter or to engine controller to start controller without revision counter support, otherwise keep it the same as current and engine replica will use the default value 'false' for this extra parameter. This is the same as the engine controller to set the 'salvageRequested' flag.
4. Add 'RevisionCounterDisabled' in 'ReplicaInfo', when engine controller start, it will get all replica information.
4. For engine controlloer starting cases:
4. For engine controller starting cases:
- If revision counter is not disabled, stay with the current logic.
- If revision counter is disabled, engine will not check the synchronization of the replicas.
- If unexpected case (engine controller has revision counter diabled but any of the replica doesn't, or engine controller has revision counter enabled, but any of the replica doesn't), engine controller will log this as error and mark unmatched replicas to 'ERR'.
- If unexpected case (engine controller has revision counter disabled but any of the replica doesn't, or engine controller has revision counter enabled, but any of the replica doesn't), engine controller will log this as error and mark unmatched replicas to 'ERR'.
#### Add New Logic for Salvage

View File

@ -47,7 +47,7 @@ No API change is required.
3. replica eviction happens (volume.Status.Robustness is Healthy)
4. there is no potential reusable replica
5. there is a potential reusable replica but the replica replenishment wait interval is passed.
3. Reuse the failed replica by cleaning up `ReplicaSpec.HealthyAt` and `ReplicaSpec.FailedAt`. And `Replica.Spec.RebuildRetryCount` will be increasd by 1.
3. Reuse the failed replica by cleaning up `ReplicaSpec.HealthyAt` and `ReplicaSpec.FailedAt`. And `Replica.Spec.RebuildRetryCount` will be increased by 1.
4. Clean up the related record in `Replica.Spec.RebuildRetryCount` when the rebuilding replica becomes mode `RW`.
5. Guarantee the reused failed replica will be stopped before re-launching it.

View File

@ -72,7 +72,7 @@ For example, there are many times users ask us for supporting and the problems w
If there is a CPU monitoring dashboard for instance managers, those problems can be quickly detected.
#### Story 2
User want to be notified about abnomal event such as disk space limit approaching.
User want to be notified about abnormal event such as disk space limit approaching.
We can expose metrics provide information about it and user can scrape the metrics and setup alert system.
### User Experience In Detail
@ -82,7 +82,7 @@ Users can use Prometheus or other monitoring systems to collect those metrics by
Then, user can display the collected data using tools such as Grafana.
User can also setup alert by using tools such as Prometheus Alertmanager.
Below are the desciptions of metrics which Longhorn exposes and how users can use them:
Below are the descriptions of metrics which Longhorn exposes and how users can use them:
1. longhorn_volume_capacity_bytes
@ -347,7 +347,7 @@ We add a new end point `/metrics` to exposes all longhorn Prometheus metrics.
### Implementation Overview
We follow the [Prometheus best practice](https://prometheus.io/docs/instrumenting/writing_exporters/#deployment), each Longhorn manager reports information about the components it manages.
Prometheus can use service discovery mechanisim to find all longhorn-manager pods in longhorn-backend service.
Prometheus can use service discovery mechanism to find all longhorn-manager pods in longhorn-backend service.
We create a new collector for each type (volumeCollector, backupCollector, nodeCollector, etc..) and have a common baseCollector.
This structure is similar to the controller package: we have volumeController, nodeController, etc.. which have a common baseController.

View File

@ -45,7 +45,7 @@ For part 2, we upgrade engine image for a volume when the following conditions a
### User Stories
Before this enhancement, users have to manually upgrade engine images for volume after upgrading Longhorn system to a newer version.
If there are thoudsands of volumes in the system, this is a significant manual work.
If there are thousands of volumes in the system, this is a significant manual work.
After this enhancement users either have to do nothing (in case live upgrade is possible)
or they only have to scale down/up the workload (in case there is a new default IM image)

View File

@ -70,7 +70,7 @@ spec:
url: https://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img
```
Afterwards deploy the `cirros-rwx-blk.yaml` to create a live migratabale virtual machine.
Afterwards deploy the `cirros-rwx-blk.yaml` to create a live migratable virtual machine.
```yaml
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine

View File

@ -155,14 +155,14 @@ With an example of cluster set for 2 zones and default of 2 replicas volume:
- The default value is `ignored`.
- In Volume Controller `syncVolume` -> `ReconcileEngineReplicaState` -> `replenishReplicas`, calculate and add number of replicas to be rebalanced to `replenishCount`.
> The logic ignores all `soft-anti-affinity` settings. This will always try to achieve zone balance then node balance. And creating for replicas will leave for ReplicaScheduler to determine for the canidates.
> The logic ignores all `soft-anti-affinity` settings. This will always try to achieve zone balance then node balance. And creating for replicas will leave for ReplicaScheduler to determine for the candidates.
1. Skip volume replica rebalance when volume spec `replicaAutoBalance` is `disabled`.
2. Skip if volume `Robustness` is not `healthy`.
3. For `least-effort`, try to get the replica rebalance count.
1. For `zone` duplicates, get the replenish number.
1. List all the occupied node zones with volume replicas running.
- The zone is balanced when this is equal to volume spec `NumberOfReplicas`.
2. List all available and schedulabled nodes in non-occupied zones.
2. List all available and schedulable nodes in non-occupied zones.
- The zone is balanced when no available nodes are found.
3. Get the number of replicas off-balanced:
- number of replicas in volume spec - number of occupied node zones.

View File

@ -354,7 +354,7 @@ Labels
[labels/2]: [b]
```
- `Name` field should be immutable.
- `Task` field should be imuutable.
- `Task` field should be immutable.
*And* user edit the fields in the form.

View File

@ -337,7 +337,7 @@ After the enhancement, users can directly specify the BackingImage during volume
- BackingImageDataSource has not been created. Add retry would solve this case.
- BackingImageDataSource is gone but BackingImage has not been cleaned up. Longhorn can ignore BackingImageDataSource when BackingImage deletion timestamp is set.
- BackingImage disk cleanup:
- This cannot break the HA besides affacting replicas. The main idea is similar to the cleanup in BackingImage Controller.
- This cannot break the HA besides attaching replicas. The main idea is similar to the cleanup in BackingImage Controller.
9. In CSI:
- Check the backing image during the volume creation.
- The missing BackingImage will be created when both BackingImage name and data source info are provided.
@ -370,7 +370,7 @@ After the enhancement, users can directly specify the BackingImage during volume
- Similar to `Fetch`, the image will try to reuse existing files.
- The manager is responsible for managing all port. The image will use the functions provided by the manager to get then release ports.
- API `Send`: Send a backing image file to a receiver. This should be similar to replica rebuilding.
- API `Delete`: Unregister the image then delete the imge work directory. Make sure syncing or pulling will be cancelled if exists.
- API `Delete`: Unregister the image then delete the image work directory. Make sure syncing or pulling will be cancelled if exists.
- API `Get`/`List`: Collect the status of one backing image file/all backing image files.
- API `Watch`: establish a streaming connection to report BackingImage file info.
- As I mentioned above, we will use BackingImage UUID to generate work directories for each BackingImage. The work directory is like:

View File

@ -190,7 +190,7 @@ Using those methods, the Sparse-tools know where is a data/hole interval to tran
### Longhorn CSI plugin
* Advertise that Longhorn CSI driver has ability to clone a volume, `csi.ControllerServiceCapability_RPC_CLONE_VOLUME`
* When receiving a volume creat request, inspect `req.GetVolumeContentSource()` to see if it is from anther volume.
* When receiving a volume creat request, inspect `req.GetVolumeContentSource()` to see if it is from another volume.
If so, create a new Longhorn volume with appropriate `DataSource` set so Longhorn volume controller can start cloning later on.
### Test plan

View File

@ -66,7 +66,7 @@ After the enhancement, Longhorn automatically finds out the orphaned replica dir
- Users can enable the global auto-deletion on setting page. By default, the auto-deletion is disabled.
- Via `kubectl`
- Users can list the orphaned replica directoris by `kubectl -n longhorn-system get orphans`.
- Users can list the orphaned replica directories by `kubectl -n longhorn-system get orphans`.
- Users can delete the orphaned replica directories by `kubectl -n longhorn-system delete orphan <name>`.
- Users can enable the global auto-deletion by `kubectl -n longhorn-system edit settings orphan-auto-deletion`

View File

@ -29,7 +29,7 @@ What is out of scope for this enhancement? Listing non-goals helps to focus disc
This is where we get down to the nitty gritty of what the proposal actually is.
### User Stories
Detail the things that people will be able to do if this enhancement is implemented. A good practise is including a comparsion of what user cannot do before the enhancement implemented, why user would want an enhancement and what user need to do after, to make it clear why the enhancement beneficial to the user.
Detail the things that people will be able to do if this enhancement is implemented. A good practise is including a comparison of what user cannot do before the enhancement implemented, why user would want an enhancement and what user need to do after, to make it clear why the enhancement beneficial to the user.
The experience details should be in the `User Experience In Detail` later.