Compare commits

...

229 Commits

Author SHA1 Message Date
Arthur
097791a380 OKD / OCP 4.14
Signed-off-by: Arthur <arthur@arthurvardevanyan.com>
2023-10-31 21:59:14 +08:00
David Ko
548aa65973
Update README.md 2023-10-26 22:54:29 +08:00
David Ko
c8bf012b13
Update README.md 2023-10-26 22:53:44 +08:00
arlan lloyd
febfa7eef7 add conditional
Signed-off-by: arlan lloyd <arlanlloyd@gmail.com>
2023-10-23 16:15:00 +08:00
Jongwoo Han
15fae1ba47 Replace deprecated command with environment file
Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
2023-10-20 00:01:51 +08:00
Derek Su
a04760a08b example: add an example for encrypted volume in block mode
Longhorn 4883

Signed-off-by: Derek Su <derek.su@suse.com>
2023-10-19 22:57:55 +08:00
Phan Le
78fee8e05b Fix bug: check script fails to perform all checks
When piping the script to bash (cat ./environment_check.sh | bash), the
part after `kubectl exec -i` will be interpreted as the input for the
command inside kubectl exec command. As the result, the env check script
doesn't perform the steps after that kubectl exec command. Removing the
`-i` flag fixed the issue.

Also, replacing `kubectl exec -t` by `kubectl exec` because the input of
kubectl exec command is not a terminal device

longhorn-5653

Signed-off-by: Phan Le <phan.le@suse.com>
2023-10-19 21:52:26 +08:00
Phan Le
d30a970ea8 Add kernel release check to environment_check.sh
longhorn-6854

Signed-off-by: Phan Le <phan.le@suse.com>
2023-10-11 15:23:49 -07:00
James Lu
8c6a3f5142 fix(typo): codespell tested failed
Fixed some typos that codespell found.

Signed-off-by: James Lu <james.lu@suse.com>
2023-10-05 11:28:27 +08:00
Jack Lin
e1914963a6
doc(chart): add table of helm values (#6639)
Co-authored-by: David Ko <dko@suse.com>
2023-10-04 23:47:50 +08:00
James Munson
c0a258afef Add nfsOptions parameter to sample storageclass.yaml
Signed-off-by: James Munson <james.munson@suse.com>
2023-09-23 00:28:47 +08:00
Derek Su
cb61e92a13 Replace spec.engineImage with spec.Image in volume, replica and engine resources
Longhorn 6647

Signed-off-by: Derek Su <derek.su@suse.com>
2023-09-19 18:32:41 +08:00
David Ko
963ccf68eb
add require/backport to new bug issue 2023-09-14 12:03:16 +08:00
David Ko
c98bef59b8
Add require/backport for new improvement ticket 2023-09-14 12:02:42 +08:00
Eric Weber
9948983b15 Add ReplicaDiskSoftAntiAffinity setting to upgrade responder
Signed-off-by: Eric Weber <eric.weber@suse.com>
2023-09-06 14:42:45 -07:00
Eric Weber
dd3f5584f6 Add ReplicaDiskSoftAntiAffinity setting
Signed-off-by: Eric Weber <eric.weber@suse.com>
2023-09-06 14:42:45 -07:00
David Ko
8615cfc8d9
Update bug.md 2023-09-05 18:15:35 +08:00
David Ko
f6ef492a1d
Update bug.md 2023-09-05 18:14:56 +08:00
Arthur Vardevanyan
67b4b38a12
OCP / OKD Documentation and Helm Support (#5004)
* Add OCP support to helm chart.

Signed-off-by: Bin Guo <bin.guo@casa-systems.com>
Signed-off-by: Arthur <arthur@arthurvardevanyan.com>
Co-authored-by: Bin Guo <bin.guo@casa-systems.com>
Co-authored-by: David Ko <dko@suse.com>
Co-authored-by: binguo-casa <70552879+binguo-casa@users.noreply.github.com>
2023-08-24 19:12:38 +08:00
Chris
39187d64d5 add cifs backupstore manifest
ref: 6530

Signed-off-by: Chris <chris.chien@suse.com>
2023-08-24 18:52:04 +08:00
Eric Weber
ad20475f11 Clarify flag names and integration test behavior
Signed-off-by: Eric Weber <eric.weber@suse.com>
2023-08-23 14:08:14 +08:00
Eric Weber
b76d853800 Modify Engine Identity Validation LEP for changes to longhorn-engine PR
Signed-off-by: Eric Weber <eric.weber@suse.com>
2023-08-23 14:08:14 +08:00
Chin-Ya Huang
914fb89687 feat(support-bundle): version bump
ref: 6544

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-08-22 14:23:00 +08:00
davidko
b5379ad6b7 chore(ghaction): create backport issues if actor is member
Signed-off-by: davidko <dko@suse.com>
2023-08-22 09:30:26 +08:00
Chin-Ya Huang
3b04fa8c02 feat(lep): engine upgrade enforcement design
ref: 5842

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-08-14 20:25:47 +08:00
Damiano Cipriani
23e2f299b8 fix(lep): update longhorn-spdk-engine assignments
Signed-off-by: Damiano Cipriani <damiano.cipriani@suse.com>
2023-08-12 00:31:02 +08:00
Damiano Cipriani
e6d8d83c96 fix(lep): update SPDK Control Plane diagram
Signed-off-by: Damiano Cipriani <damiano.cipriani@suse.com>
2023-08-12 00:31:02 +08:00
Damiano Cipriani
e689e0da09 fix(lep): SPDK engine codespell errors
Signed-off-by: Damiano Cipriani <damiano.cipriani@suse.com>
2023-08-12 00:31:02 +08:00
Damiano Cipriani
9938548dda fix(lep): update SPDK engine with comments
Signed-off-by: Damiano Cipriani <damiano.cipriani@suse.com>
2023-08-12 00:31:02 +08:00
Damiano Cipriani
f0df91d31f feat(lep): reimplement longhorn-engine with SPDK
Signed-off-by: Damiano Cipriani <damiano.cipriani@suse.com>
2023-08-12 00:31:02 +08:00
Shuo Wu
dbee4f9d6e enhancement: Add a new enhancement 'spdk-engine'
Longhorn 5406

Signed-off-by: Shuo Wu <shuo.wu@suse.com>
2023-08-11 23:58:39 +08:00
Derek Su
c760abf0ca feat(lep): add SPDK volume support
Longhorn 5778
Longhorn 5711
Longhorn 5744
Longhorn 5827

Signed-off-by: Derek Su <derek.su@suse.com>
2023-08-11 23:48:16 +08:00
Jack Lin
ac30a7e5ea lep(backingimage): backingimage backup support
ref: longhorn/longhorn 4165

Signed-off-by: Jack Lin <jack.lin@suse.com>
2023-08-11 18:25:16 +08:00
Jack Lin
8146f37681 feat(setting): add allow empty disk node selector volume setting
ref: longhorn/longhorn 4826

Signed-off-by: Jack Lin <jack.lin@suse.com>
2023-08-08 17:15:20 +08:00
davidko
7e3e61b76b chore(chart): update version to 1.6.0-dev
Signed-off-by: davidko <dko@suse.com>
2023-08-08 16:28:30 +08:00
Eric Weber
eb3e413c6a Add LEP for disk anti-affinity
Longhorn 3823

Signed-off-by: Eric Weber <eric.weber@suse.com>
2023-08-07 22:43:19 +08:00
Yarden Shoham
339e501042 chart: Update settings based on the instance managers consolidation
- Add the setting added in https://github.com/longhorn/longhorn-manager/pull/1731 in the helm chart
- Related to https://github.com/longhorn/longhorn/issues/5208

Signed-off-by: Yarden Shoham <git@yardenshoham.com>
2023-08-07 17:18:03 +08:00
davidko
1e8bd45c63 chore(ghaction): fix invalid member check
Signed-off-by: davidko <dko@suse.com>
2023-08-03 19:24:20 +08:00
David Ko
519d087a88 chore(ghaction): format
Signed-off-by: David Ko <dko@suse.com>
Signed-off-by: davidko <dko@suse.com>
2023-08-03 17:41:01 +08:00
David Ko
d87927ed85 chore(ghaction): make auto-generated issues only from members
Signed-off-by: David Ko <dko@suse.com>
Signed-off-by: davidko <dko@suse.com>
2023-08-03 17:41:01 +08:00
David Ko
852cf2c3f0
Update README.md 2023-08-03 13:24:56 +08:00
Phan Le
8124f74317 Increase the CPU and RAM limit for the upgrade responder server
From the monitoring, sometimes the server is killed by OOM which
leads to some deep decreases in the graphs. As we are getting more
Longhorn nodes now, we should increase the server CPU and RAM

Signed-off-by: Phan Le <phan.le@suse.com>
2023-08-02 21:36:25 -07:00
Phan Le
f9794f526a [upgrade-responder] release v1.5.1 and v.1.4.3 Longhorn verions
Longhorn-6274
Longhorn-6102

Signed-off-by: Phan Le <phan.le@suse.com>
2023-08-02 21:36:25 -07:00
David Ko
e1f1d3de1b
Update test.md 2023-07-29 19:29:55 +08:00
David Ko
f7767ddc57
Update task.md 2023-07-29 19:28:36 +08:00
David Ko
833399b1d0
Update release.md 2023-07-29 19:27:40 +08:00
David Ko
e2759dae6f
Update refactor.md 2023-07-29 19:26:27 +08:00
David Ko
44f2b978ac
Update refactor.md 2023-07-29 19:26:07 +08:00
David Ko
ffa9824cd2
Update question.md 2023-07-29 19:25:03 +08:00
David Ko
944fcb2da7
Update infra.md 2023-07-29 19:23:34 +08:00
David Ko
0e382353c7
Update improvement.md 2023-07-29 19:22:38 +08:00
David Ko
faa8073f56
Update feature.md 2023-07-29 19:21:58 +08:00
David Ko
902cccd218
Update doc.md 2023-07-29 19:21:15 +08:00
David Ko
17f8382daa
Update bug.md 2023-07-29 19:18:19 +08:00
David Ko
6538e4ba72
Update bug.md 2023-07-29 19:16:50 +08:00
James Munson
a9cee48feb
Fix some small errors on StorageClass NodeSelector. (#6393)
Signed-off-by: James Munson <james.munson@suse.com>
2023-07-27 12:29:08 -07:00
James Munson
396a90c03b
Cleanup (#6243)
Remove obsolete and misleading cleanup script.

Longhorn-6316

Signed-off-by: James Munson <james.munson@suse.com>
2023-07-24 14:43:57 -07:00
David Ko
b01d2c2d18
Update and rename bug_report.md to bug.md 2023-07-20 16:18:20 +08:00
David Ko
2a5e32fc9f
Update bug_report.md 2023-07-20 16:16:24 +08:00
David Ko
8e979dce3b
Update and rename test_infra.md to infra.md 2023-07-20 08:48:05 +08:00
David Ko
cde061aa9b
Update bug_report.md 2023-07-20 08:45:54 +08:00
David Ko
929a1f9dee
Update improvement.md 2023-07-20 08:44:57 +08:00
David Ko
a988d3398d
Update feature.md 2023-07-20 08:44:31 +08:00
David Ko
b43ef9a11e
Create CHANGELOG-1.5.1.md 2023-07-19 19:21:23 +08:00
David Ko
2c9296cc4b
Update README.md 2023-07-19 19:20:28 +08:00
Derek Su
f2c474e636 chore(chart): remove webhooks and recovery-backend
Signed-off-by: Derek Su <derek.su@suse.com>
2023-07-17 12:55:00 +08:00
Austin Heyne
fab23a27aa Add reserve storage percentage in helm chart
- Add the StorageReservedPercentageForDefaultDisk configuration to the
helm chart.

Signed-off-by: Austin Heyne <aheyne@ccri.com>
2023-07-15 20:00:30 +08:00
David Ko
f8420c16c8
Create CHANGELOG-1.4.3.md 2023-07-14 22:28:04 +08:00
David Ko
132eb89bc8
Update README.md 2023-07-14 22:26:51 +08:00
Chin-Ya Huang
a43faae14a chore(support-bundle): version bump
ref: 6256

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-07-12 10:25:07 +08:00
Chin-Ya Huang
7ffd3512be fix(chart): update default setting log level
ref: 6257

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-07-11 16:46:34 +08:00
David Ko
39a724e109
Update CHANGELOG-1.5.0.md 2023-07-10 14:29:56 +08:00
David Ko
07de677d04
Update CHANGELOG-1.5.0.md 2023-07-10 14:28:38 +08:00
David Ko
f625f8d5c3
Create release.md 2023-07-10 14:26:01 +08:00
David Ko
392cd6ddbf
Create CHANGELOG-1.5.0.md 2023-07-07 14:33:40 +08:00
David Ko
63561e4d05
Create CHANGELOG-1.4.2.md 2023-07-07 14:33:12 +08:00
David Ko
33f374def5
Update MAINTAINERS 2023-06-30 12:06:30 +08:00
Derek Su
6b56bb2b72 Fix indent
Signed-off-by: Derek Su <derek.su@suse.com>
2023-06-28 11:36:39 +08:00
Derek Su
0d94b6e4cf spdk: help install git before configuring spdk environment
Signed-off-by: Derek Su <derek.su@suse.com>
2023-06-28 11:23:00 +08:00
Derek Su
46e1bb2cc3 Highlight CPU usage in v2-data-engine setting
Longhorn 6126

Signed-off-by: Derek Su <derek.su@suse.com>
2023-06-28 11:23:00 +08:00
Phan Le
a0879b8167 Add volumeattachments resource to Longhorn ClusterRole
Longhorn-6197

Signed-off-by: Phan Le <phan.le@suse.com>
2023-06-27 12:11:46 +08:00
Chin-Ya Huang
15db0882ae feat(upgrade-responder): support requestSchema in setup script
ref: 5235

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-06-26 16:10:25 +08:00
James Lu
c1d6d93374 fix(deploy): remove error line in nfs backupstore
Remove a extra error line in backupstore/nfs-backupstore.yaml.

Signed-off-by: James Lu <james.lu@suse.com>
2023-06-21 11:25:40 +08:00
David Gaster
a601ecc468 ability to specify platform arch for air gap install
Signed-off-by: David Gaster <dngaster@gmail.com>
2023-06-19 15:40:50 +08:00
David Ko
2cd1af070e Update enhancements/20230616-automatic-offline-replica-rebuild.md 2023-06-19 13:24:15 +08:00
Derek Su
d1e712de90 lep: add automatic-offline-replica-rebuild
Longhorn 6071

Signed-off-by: Derek Su <derek.su@suse.com>
2023-06-19 13:24:15 +08:00
Derek Su
27f482bd9b Reduce BackupConcurrentLimit and RestoreConcurrentLimit to 2
Longhorn 6135

Signed-off-by: Derek Su <derek.su@suse.com>
2023-06-16 17:30:06 +08:00
Derek Su
1bbefa8132 Update examples
Longhorn 6126

Signed-off-by: Derek Su <derek.su@suse.com>
2023-06-15 16:21:29 +08:00
Derek Su
cdc6447b88 Rename BackendStoreDrivers
Longhorn 6126

Signed-off-by: Derek Su <derek.su@suse.com>
2023-06-15 16:21:29 +08:00
Derek Su
b8069c547b offline rebuilding/chart: add offline-replica-rebuilding setting
Longhorn 6071

Signed-off-by: Derek Su <derek.su@suse.com>
2023-06-13 14:44:22 +08:00
Derek Su
2ae85e8dcb offline rebuilding/chart: update crd.yaml
Longhorn 6071

Signed-off-by: Derek Su <derek.su@suse.com>
2023-06-13 14:44:22 +08:00
Derek Su
975239ecc9 spdk: nvme-cli should be equal to or greater than 1.12
go-spdk-helper can support nvme-cli v2.0+.

Signed-off-by: Derek Su <derek.su@suse.com>
2023-06-13 14:28:02 +08:00
Eric Weber
34c07f3e5c Add iSCSI SELinux workaround for Fedora-like distributions
Signed-off-by: Eric Weber <eric.weber@suse.com>
2023-06-08 14:31:34 +08:00
Derek Su
7cbb97100e spdk: nvme-cli should be between 1.12 and 1.16
Signed-off-by: Derek Su <derek.su@suse.com>
2023-06-08 12:31:49 +08:00
Derek Su
a5041e1cf3 spdk: update expected-nr-hugepages to 512 in environment_check.sh
Longhorn 5739

Signed-off-by: Derek Su <derek.su@suse.com>
2023-06-06 12:17:34 +08:00
Derek Su
fa04ba6d29 spdk: use 1024 MiB huge pages by default
Signed-off-by: Derek Su <derek.su@suse.com>
2023-06-05 20:30:29 +08:00
Tyler Hawkins
e45a9c04f3 fix: (chart) fix nodeDrainPolicy key
Removing a space between the key and colon.

Signed-off-by: Tyler Hawkins <3319104+tyzbit@users.noreply.github.com>
2023-06-03 06:04:08 +08:00
Eric Weber
b515d93963 Remove longhorn-manager affinity support for now
Signed-off-by: Eric Weber <eric.weber@suse.com>
2023-06-02 17:40:37 +08:00
Derek Su
c81ddd6e96 Update settings
Signed-off-by: Derek Su <derek.su@suse.com>
2023-06-02 16:48:02 +08:00
James Lu
115edc0551 feat(dr-volume): force activate a dr volume LEP
Activate a dr volume as long as there is a ready replica.

Ref: 1512

Signed-off-by: James Lu <james.lu@suse.com>
2023-06-02 14:55:40 +08:00
Derek Su
9befa479b9 Update settings in values.yaml
Signed-off-by: Derek Su <derek.su@suse.com>
2023-06-02 13:50:48 +08:00
Phan Le
985634be7f Remove deprecated allow-node-drain-with-last-healthy-replica setting
longhorn-5620

Signed-off-by: Phan Le <phan.le@suse.com>
2023-06-02 07:46:58 +08:00
Derek Su
f6f0db84be spdk/chart: update chart template nad deployment manifest
Signed-off-by: Derek Su <derek.su@suse.com>
2023-06-01 23:48:39 +08:00
Derek Su
32eaf99217 spdk/example: add example manifests
Signed-off-by: Derek Su <derek.su@suse.com>
2023-06-01 23:48:39 +08:00
Derek Su
3a44ec93c9 spdk: reduce hugepage size to 1024MiB and persistent the hugepage setting
Signed-off-by: Derek Su <derek.su@suse.com>
2023-06-01 23:23:05 +08:00
Chin-Ya Huang
ffaa3d2113 chore(cleanup): update YAML
ref: 3289

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-05-31 10:51:37 +08:00
Chin-Ya Huang
779b7551fa chore(cleanup): update chart
Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-05-31 10:51:37 +08:00
Chin-Ya Huang
036ea2be75 feat(system-backup): update YAML
ref: 5011

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-05-30 15:27:34 +08:00
Chin-Ya Huang
5fc22ebca9 feat(system-backup): update chart
ref: 5011

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-05-30 15:27:34 +08:00
James Lu
7b3b230f47 chore: make backupstores deployment
Make backupstores deployments instead of pods.

Signed-off-by: James Lu <james.lu@suse.com>
2023-05-29 19:52:32 +08:00
James Lu
2a811e282b feat(backupstore): add azurite emulator
Add the Azurite manifest to start Azurite server for local testing

Ref: 1309

Signed-off-by: James Lu <james.lu@suse.com>
2023-05-29 19:52:32 +08:00
Chin-Ya Huang
9681de43de feat(lep): volume backup policy for Longhorn system backup design
ref: 5011

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-05-29 17:52:13 +08:00
Jack Lin
5a8f33df0f fix: change log level to Debug
ref: longhorn/longhorn 5888

Signed-off-by: Jack Lin <jack.lin@suse.com>
2023-05-29 15:23:46 +08:00
Eric Weber
b963844fec Change controller-instance-name flag and increment InstanceManagerProxyAPIVersion
Signed-off-by: Eric Weber <eric.weber@suse.com>
2023-05-26 23:21:13 +08:00
Eric Weber
a156587c6e Add Engine Identity Validation LEP
Signed-off-by: Eric Weber <eric.weber@suse.com>
2023-05-26 23:21:13 +08:00
Chin-Ya Huang
15a73c9e36 chore: update YAML
ref: 2865

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-05-26 10:54:55 +08:00
Chin-Ya Huang
6e1524ef46 chore: update chart
ref: 2865

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-05-26 10:54:55 +08:00
Jack Lin
7a0f6d99c6 feat(backup): store storage class name to backup volume
ref: longhorn/longhorn 3506

Signed-off-by: Jack Lin <jack.lin@suse.com>
2023-05-25 19:04:44 +08:00
James Lu
a929ab5644 chore: correct pre upgrade pod name
Correct `pre-upgrade job` pod name.

Ref: 5131

Signed-off-by: James Lu <james.lu@suse.com>
2023-05-24 20:14:36 +08:00
Chin-Ya Huang
1ce0fbabc1 chore: update YAML
ref: 5917

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-05-23 17:24:46 +08:00
Chin-Ya Huang
398d05c997 chore: update chart
ref: 5917

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-05-23 17:24:46 +08:00
Chin-Ya Huang
7a878def1a feat(lep): update set RecurringJob to PVCs design
ref: 5791

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-05-23 17:22:40 +08:00
Jack Lin
8364519d61 feat(network policy): add network policy into chart
ref: longhorn/longhor 5403

Signed-off-by: Jack Lin <jack.lin@suse.com>
2023-05-23 13:22:58 +08:00
Chin-Ya Huang
77392d6ad8 feat(lep): set RecurringJob to PVCs design
ref: 5791

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-05-23 13:19:20 +08:00
Jack Lin
d6b173977b
Merge pull request #5923
* fix(csi): no need to check if volume is attached when creating backin…

* Merge master

* Merge branch 'master' into LH5005_fix_backingimage_management_via_csi…
2023-05-22 23:52:49 +08:00
Eric Weber
68c1dae851 Remove disable-replica-rebuild setting
Signed-off-by: Eric Weber <eric.weber@suse.com>
2023-05-22 23:40:49 +08:00
Jack Lin
309e228591 feat(log): add global setting for log level
ref: longhorn/longhorn 5888

Signed-off-by: Jack Lin <jack.lin@suse.com>
2023-05-19 16:52:34 +08:00
Derek Su
e580550561 prerequisite: set PCI_ALLOWED to none when setting up spdk environment
Currently, longhorn uses SPDK AIO feature.

Signed-off-by: Derek Su <derek.su@suse.com>
2023-05-18 00:55:44 +08:00
David Ko
7781fbef0e
Update README.md 2023-05-16 22:36:50 +08:00
David Ko
02f7e12546
Update create-issue.yml 2023-05-16 16:37:48 +08:00
Phan Le
7680931f88 Update chart for new AD refactoring
longhorn-3715

Signed-off-by: Phan Le <phan.le@suse.com>
2023-05-16 08:13:47 +08:00
Phan Le
17ce9ec445 Release v1.4.2 into Longhorn upgrade responder
longhorn-5864

Signed-off-by: Phan Le <phan.le@suse.com>
2023-05-13 06:34:15 +08:00
David Ko
51d0c51ee2
Update README.md 2023-05-12 16:50:38 +08:00
David Ko
3904838518
Update README.md 2023-05-12 14:48:04 +08:00
Jack Lin
2cd50c6ff8 feat: add softAntiAffinity and zoneSoftAntiAffiny to volume spec
ref: longhorn/longhorn 5358

Signed-off-by: Jack Lin <jack.lin@suse.com>
2023-05-12 11:35:00 +08:00
Chin-Ya Huang
13bf7b6af0 feat(upgrade-responder): setup dev environment
ref: 5235

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-05-11 18:12:34 +08:00
Chris
33c53e101a Update extend-csi-snapshot-to-support-backingimage.md
ref longhorn 5005

Signed-off-by: Chris <chris.chien@suse.com>
2023-05-11 18:03:25 +08:00
Derek Su
cd461a9333 prerequisite: check SSE4.2 on x86-64 platform for SPDK feature
Longhorn 5738

Signed-off-by: Derek Su <derek.su@suse.com>
2023-05-10 11:17:32 +08:00
Derek Su
dab06c96e4 prerequisite: check uio_pci_generic instead of uio
Longhorn 5738

Signed-off-by: Derek Su <derek.su@suse.com>
2023-05-10 01:27:42 +08:00
Derek Su
9c7cfd7a53 prerequisite: fix old bash complains "delcare -A"
Longhorn 5738

Signed-off-by: Derek Su <derek.su@suse.com>
2023-05-10 01:27:42 +08:00
Derek Su
b15eac47a6 prerequisite: check the nvme-cli version
Old "nvme connect" does not support output format "-o" option.

Longhorn 5738

Signed-off-by: Derek Su <derek.su@suse.com>
2023-05-10 01:27:42 +08:00
Phan Le
1b3398e54e Release v1.3.3 in Longhorn upgrade responder
longhorn-5581

Signed-off-by: Phan Le <phan.le@suse.com>
2023-05-09 14:33:13 +08:00
Chin-Ya Huang
433a5fa6c7 feat(lep): upgrade checker info collection
Ref: 5235

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-05-09 14:30:42 +08:00
Derek Su
9a0883e8f2 prerequisite: add spdk environment setup manifest
Longhorn 5738

Signed-off-by: Derek Su <derek.su@suse.com>
2023-05-09 00:36:57 +08:00
Derek Su
73a8bda8bd
Environment check for SPDK support (#5880)
* Environment check for SPDK support

Longhorn 5738
Longhorn 5380

Signed-off-by: Derek Su <derek.su@suse.com>
Co-authored-by: David Ko <dko@suse.com>
2023-05-08 17:21:47 +08:00
James Lu
d1c3f58399 fix: remove privileged from lifecycle jobs
Remove `privileged` requirement from lifecycle jobs in
`uninstall/uninstall.yaml`.

Ref: 5862

Signed-off-by: James Lu <james.lu@suse.com>
2023-05-08 16:21:00 +08:00
James Lu
094b61b66c fix: remove privileged from lifecycle jobs
Remove `privileged` requirement from lifecycle jobs `post-upgrade`
and `uninstall`.

Ref: 5862

Signed-off-by: James Lu <james.lu@suse.com>
2023-05-08 15:47:08 +08:00
James Lu
e38d6aed78 feat: introduce helm hook pre-upgrade
Add helm hook `pre-upgrade` file `preupgrade-job.yaml` to protect
the system from unsupported upgrade path.

Ref: 5131

Signed-off-by: James Lu <james.lu@suse.com>
2023-05-08 12:47:56 +08:00
Derek Su
dec0d4c11d prerequisite: add nvme-cli installation manifest
Longhorn 5738

Signed-off-by: Derek Su <derek.su@suse.com>
2023-05-08 10:00:29 +08:00
Derek Su
3f16363ff1 Update crd template and longhorn.yaml
Longhorn 5143

Signed-off-by: Derek Su <derek.su@suse.com>
2023-05-04 17:32:15 +08:00
Eric Weber
e38f29772d Remove deprecated mkfs-ext4-parameters setting
Signed-off-by: Eric Weber <eric.weber@suse.com>
2023-05-03 22:53:08 +08:00
Chin-Ya Huang
81adad7ae4 feat(consolidate-im): update YAML
Ref: 5208

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-05-03 11:02:52 +08:00
Chin-Ya Huang
498fa5afe7 feat(consolidate-im): update chart
Ref: 5208

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-05-03 11:02:52 +08:00
Chin-Ya Huang
5f4111249a feat(lep): consolidate instance managers
Ref: 5208

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-05-03 09:57:48 +08:00
Chin-Ya Huang
26a6c23156 feat(recurring-job): update YAML
Ref: 5186

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-05-03 08:20:46 +08:00
Chin-Ya Huang
ec91e90f08 feat(recurring-job): update chart
Ref: 5186

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-05-03 08:20:46 +08:00
Jack Lin
28ed96a319 feat(lep): extend csi snapshot to support backingimage
ref: longhorn/longhorn 5005

Signed-off-by: Jack Lin <jack.lin@suse.com>
2023-04-27 16:18:25 +08:00
Jack Lin
88101a2274 refactor (webhook and recovery service): merge webhook and recovery service into longhorn manager daemonset
Ref: longhorn/longhorn5590

Signed-off-by: Jack Lin <jack.lin@suse.com>
2023-04-27 16:17:46 +08:00
Shuo Wu
ab67f9c98c example: Update network-policy
Signed-off-by: Shuo Wu <shuo.wu@suse.com>
2023-04-26 20:08:56 +08:00
James Lu
7cc3351ff8 LEP for Azure Blob Storage Backup Store Support
Ref: 1309

Signed-off-by: James Lu <james.lu@suse.com>
2023-04-26 15:31:58 +08:00
Mohit Bisht
e454db847d Update README.md 2023-04-25 16:28:13 +08:00
Chin-Ya Huang
e3e006cbcc chore(support-bundle): version bump
Ref: 5614

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-04-19 13:05:21 +08:00
Phan Le
4af6f26acc Release v1.4.1 in the upgrade responder server
longhorn-5445

Signed-off-by: Phan Le <phan.le@suse.com>
2023-04-15 09:47:39 +08:00
Chin-Ya Huang
e1cc7af587 chore(support-bundle): version bump
Ref: 5614

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-04-14 18:22:25 +08:00
James Lu
58ed0277e3 feat(lep): upgrade path enforcement
Ref: 5131

Signed-off-by: James Lu <james.lu@suse.com>
2023-04-14 16:18:12 +08:00
Eric Weber
ccf9f3a32d Bump CSI sidecars for K8s v1.20+ and Longhorn v1.5
Signed-off-by: Eric Weber <eric.weber@suse.com>
2023-04-14 00:56:26 +08:00
Eric Weber
d5f5cec2f9 Bump CSI sidecars for K8s v1.17+
Signed-off-by: Eric Weber <eric.weber@suse.com>
2023-04-14 00:56:26 +08:00
Tarasovych
3f5e636bc3 Update values.yaml 2023-04-07 12:32:14 +08:00
Chin-Ya Huang
54e6163356 chore(support-bundle): version bump
Ref: 5614

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-04-07 11:32:12 +08:00
JenTing Hsiao
6764850dca chore: add CODEOWNERS file
Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>
2023-03-22 23:32:46 +08:00
James Lu
15701bbe26 feat(recurring-job): update chart for new tasks
Ref: 4898

Signed-off-by: James Lu <james.lu@suse.com>
2023-03-20 16:17:26 +08:00
James Lu
6c6cb23be1 feat(recurring-job): update YAML for new tasks
Ref: 4898

Signed-off-by: James Lu <james.lu@suse.com>
2023-03-20 16:17:26 +08:00
Ray Chang
9abb26714b fix(support-bundle): version bump to v0.0.20
Longhorn 5073

- New parameter: `SUPPORT_BUNDLE_COLLECTOR` to execute specified support-bundle-kit collector 

Signed-off-by: Ray Chang <ray.chang@suse.com>
2023-03-20 09:58:53 +08:00
Phan Le
86d06696df Add nodeDrainPolicy setting
longhorn-5549

Signed-off-by: Phan Le <phan.le@suse.com>
2023-03-18 08:25:13 +08:00
Phan Le
702c2e65d3 Modify the LEP: Use PDB to protect Longhorn components from drains
longhorn-3304

Signed-off-by: Phan Le <phan.le@suse.com>
2023-03-16 10:05:04 +08:00
Chin-Ya Huang
f82928c33e feat(lep): recurring filesystem trim design
Ref: 5186

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-03-15 17:34:48 +08:00
David Ko
ec6480dd4c docs: add 1.4.1 to changelog
Signed-off-by: David Ko <dko@suse.com>
2023-03-14 13:24:13 +08:00
David Ko
a22e7cd960 docs: make 1.4.1 stable
Signed-off-by: David Ko <dko@suse.com>
2023-03-13 19:00:54 +08:00
Phan Le
3f1666ec24 Add LEP: Use PDB to protect Longhorn components from drains
longhorn-3304

Signed-off-by: Phan Le <phan.le@suse.com>
2023-03-13 10:43:18 +08:00
ChanYiLin
55babc8300 doc: update prerequisites in chart readme to make it consistent with documentation
Signed-off-by: Jack Lin <jack.lin@suse.com>
2023-03-13 10:30:52 +08:00
Chin-Ya Huang
cb6307b799 docs: fix typo
Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-03-09 08:47:11 +08:00
Viktor Hedefalk
92fd5b54ed Update data_migration.yaml
Fixes #5484
2023-03-08 15:55:58 +08:00
Chin-Ya Huang
5a3f8d714b fix(support-bundle): version bump
- fix support-bundle agent missing registry secret

Ref: 5467

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-03-03 08:37:03 +08:00
Rayan Das
e1ea3d7515 update k8s.gcr.io to registry.k8s.io
Signed-off-by: Rayan Das <rayandas91@gmail.com>
2023-02-22 13:29:23 +08:00
Chin-Ya Huang
2ea5513286 feat(recurring-job): update YAML for new tasks
Ref: 3836

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-02-21 11:59:07 +08:00
Chin-Ya Huang
761abc7611 feat(recurring-job): update chart for new tasks
Ref: 3836

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-02-21 11:59:07 +08:00
Chin-Ya Huang
4b17f8fbcd fix(crd): update YAML
Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-02-17 15:12:24 +08:00
Chin-Ya Huang
8c5dd01964 fix(crd): update chart
Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-02-17 15:12:24 +08:00
Derek Su
bb1bd7d4db Add cifs-utils installation manifest
Longhorn 3599

Signed-off-by: Derek Su <derek.su@suse.com>
2023-02-13 22:05:49 +08:00
Derek Su
b1ed0589b2 Add LEP for SMB/CIFS Backup Store Support
Longhorn 3599

Signed-off-by: Derek Su <derek.su@suse.com>
2023-02-09 13:19:15 +08:00
Phan Le
1deb51287b Update PSP validation
Longhorn-5339

Signed-off-by: Phan Le <phan.le@suse.com>
2023-02-07 15:57:09 +08:00
achims311
94a23e5b05
Fix for bug #5304 (second version including POSIX way to call subroutine) (#5314)
* Fix for bug #5304.

It uses the same technologie to get the kernel release as it was used
before to get the os of the node

Signed-off-by: Achim Schaefer <longhorn@schaefer-home.eu>

* used a lower case variable name as suggested by innobead

Signed-off-by: Achim Schaefer <longhorn@schaefer-home.eu>

---------

Signed-off-by: Achim Schaefer <longhorn@schaefer-home.eu>
Co-authored-by: David Ko <dko@suse.com>
2023-02-07 14:58:24 +08:00
Derek Su
a7119b5bda backup: update crds
Longhorn 5189

Signed-off-by: Derek Su <derek.su@suse.com>
2023-02-06 18:14:30 +08:00
Thomas Fenzl
674cdd0df0 update iscsi installation image to latest alpine. 2023-02-05 23:15:18 +08:00
David Ko
d8a5c4ffd5 fix: wrong indentation of priorityClassName in deployment-webhook.yaml
Signed-off-by: David Ko <dko@suse.com>
2023-02-05 23:02:07 +08:00
Phan Le
3a30bd8fed Remove the share parameter since the accessMode should be determine by
the PVC

Signed-off-by: Phan Le <phan.le@suse.com>
2023-02-02 17:29:02 -08:00
Haribo112
5a071e502c
Made environment_check.sh POSIX compliant (#5310)
Made environment_check.sh POSIX compliant

Signed-off-by: Harold Holsappel <h.holsappel@iwink.nl>
Co-authored-by: Harold Holsappel <h.holsappel@iwink.nl>
2023-01-23 14:40:17 -08:00
Derek Su
4fa27a3ca9 Add LEP for Improve Backup and Restore Efficiency using Multiple Threads and Faster Compression Methods
Longhorn 5189

Signed-off-by: Derek Su <derek.su@suse.com>
2023-01-13 11:01:48 +08:00
Ray Chang
ccf3740b5b fix: update the supportBundleKit image description
Signed-off-by: Ray Chang <ray.chang@suse.com>
2023-01-12 12:24:05 +08:00
Ray Chang
4250b68b0f fix: add Support Bundle Kit image related variables in questions.yaml
Signed-off-by: Ray Chang <ray.chang@suse.com>
2023-01-12 10:58:36 +08:00
Chin-Ya Huang
9c1c474dc2 feat(lep): recurring snapshot delete design
Ref: 3836

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-01-11 18:31:15 +08:00
Phan Le
69dcfa5277 Update uninstallation info to include the 'Deleting Confirmation Flag' in chart
longhorn-5250

Signed-off-by: Phan Le <phan.le@suse.com>
2023-01-11 14:56:45 +08:00
Ray Chang
a7e4b23350 fix: fix the CSI Liveness Prob group in questions.yaml
Signed-off-by: Ray Chang <ray.chang@suse.com>
2023-01-11 11:14:09 +08:00
Phan Le
b8ec64414c Add Longhorn v1.4.0 in upgrade responder server
Signed-off-by: Phan Le <phan.le@suse.com>
2023-01-10 00:31:13 +08:00
David Ko
86a9be5c33
Update README.md 2023-01-08 00:31:14 +08:00
Ray Chang
145b166720 fix: Correct formatting error in question.yaml file
Signed-off-by: Ray Chang <ray.chang@suse.com>
2023-01-05 17:56:19 +08:00
James Lu
b06ce86784 fix: refine the indentation
The indentation of chart/questions.yaml in
`variable: defaultSettings.restoreVolumeRecurringJobs` is not
corrcet.

ref: 5196

Signed-off-by: James Lu <james.lu@suse.com>
2023-01-05 13:29:37 +08:00
Sjouke de Vries
68d6e221a1 feat(helm): support affinity for longhornManager
Signed-off-by: Sjouke de Vries <info@sdvservices.nl>
2023-01-03 17:59:07 +08:00
David Ko
76eaa3d3c1 release: update 1.4.0 changelog
Signed-off-by: David Ko <dko@suse.com>
2022-12-30 12:57:22 +08:00
David Ko
15f55be936 release: update 1.4.0 changelog
Signed-off-by: David Ko <dko@suse.com>
2022-12-29 17:35:28 +08:00
David Ko
715dd93150 release: rename changelog to uppercase
Signed-off-by: David Ko <dko@suse.com>
2022-12-29 12:34:22 +08:00
David Ko
ab92fece63 release: add 1.4.0 changelog
Signed-off-by: David Ko <dko@suse.com>
2022-12-29 12:25:41 +08:00
Derek Su
62998adab2 environment check: precisely check kernel option
Longhorn 3157

Signed-off-by: Derek Su <derek.su@suse.com>
2022-12-26 20:22:58 +08:00
Derek Su
c83497b685 environment_check.sh: add nfs client kernel support
Longhorn 3157

Signed-off-by: Derek Su <derek.su@suse.com>
2022-12-26 16:09:24 +08:00
Chin-Ya Huang
38aa0d01d5 fix(uninstall): missing resource in ClusterRole
Ref: 5132, 5133

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2022-12-23 13:54:02 +08:00
David Ko
c9488eb1f9 ci: add mergify for auto merge
Signed-off-by: David Ko <dko@suse.com>
2022-12-21 17:06:52 +08:00
Derek Su
3a36dab7ca chart: add replicaFileSyncHttpClientTimeout
Longhorn 5110

Signed-off-by: Derek Su <derek.su@suse.com>
2022-12-21 14:12:30 +08:00
James Lu
c4bf0b3a47 build(image): bump support-bundle-kit
bump support-bundle-kit version to v0.0.17

Ref: 5107

Signed-off-by: James Lu <james.lu@suse.com>
2022-12-20 16:31:08 +08:00
David Ko
6b53539738
Update README.md 2022-12-17 23:06:25 +08:00
David Ko
b08acf2457
Update README.md 2022-12-17 21:25:00 +08:00
David Ko
7ef16f1240
Update README.md 2022-12-17 21:18:52 +08:00
David Ko
5f9bb1aaa6
Update README.md 2022-12-17 21:13:41 +08:00
David Ko
3ca724d928
Update README.md 2022-12-17 21:11:02 +08:00
David Ko
fa0e458b3e
Update README.md 2022-12-17 21:08:35 +08:00
David Ko
ca36721b81
Update README.md 2022-12-17 21:07:25 +08:00
David Ko
604acd1870
Update README.md 2022-12-17 20:58:17 +08:00
Derek Su
867257d59a chart: support customized number of replicas of webhook and recovery-backend
Longhorn 5087

Signed-off-by: Derek Su <derek.su@suse.com>
2022-12-16 20:31:12 +08:00
James Lu
06c4189bf9 chore(ui): modify Affinity of UI for helm chart
Change the number of the replica from 1 to 2 for helm chart

Ref: 4987

Signed-off-by: James Lu <james.lu@suse.com>
2022-12-15 16:18:56 +08:00
James Lu
5fa7579794 chore(ui): modify Affinity of UI in deploy.yaml
Change the number of the replica from 1 to 2.

Ref: 4987

Signed-off-by: James Lu <james.lu@suse.com>
2022-12-15 16:18:56 +08:00
124 changed files with 9257 additions and 1438 deletions

1
.github/CODEOWNERS vendored Normal file
View File

@ -0,0 +1 @@
* @longhorn/dev

48
.github/ISSUE_TEMPLATE/bug.md vendored Normal file
View File

@ -0,0 +1,48 @@
---
name: Bug report
about: Create a bug report
title: "[BUG]"
labels: ["kind/bug", "require/qa-review-coverage", "require/backport"]
assignees: ''
---
## Describe the bug (🐛 if you encounter this issue)
<!--A clear and concise description of what the bug is.-->
## To Reproduce
<!--Provide the steps to reproduce the behavior.-->
## Expected behavior
<!--A clear and concise description of what you expected to happen.-->
## Support bundle for troubleshooting
<!--Provide a support bundle when the issue happens. You can generate a support bundle using the link at the footer of the Longhorn UI. Check [here](https://longhorn.io/docs/latest/advanced-resources/support-bundle/).-->
## Environment
<!-- Suggest checking the doc of the best practices of using Longhorn. [here](https://longhorn.io/docs/1.5.1/best-practices)-->
- Longhorn version:
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
- Number of management node in the cluster:
- Number of worker node in the cluster:
- Node config
- OS type and version:
- Kernel version:
- CPU per node:
- Memory per node:
- Disk type(e.g. SSD/NVMe/HDD):
- Network bandwidth between the nodes:
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
- Number of Longhorn volumes in the cluster:
- Impacted Longhorn resources:
- Volume names:
## Additional context
<!--Add any other context about the problem here.-->

View File

@ -1,49 +0,0 @@
---
name: Bug report
about: Create a bug report
title: "[BUG]"
labels: kind/bug
assignees: ''
---
## Describe the bug (🐛 if you encounter this issue)
A clear and concise description of what the bug is.
## To Reproduce
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Perform '....'
4. See error
## Expected behavior
A clear and concise description of what you expected to happen.
## Log or Support bundle
If applicable, add the Longhorn managers' log or support bundle when the issue happens.
You can generate a Support Bundle using the link at the footer of the Longhorn UI.
## Environment
- Longhorn version:
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
- Number of management node in the cluster:
- Number of worker node in the cluster:
- Node config
- OS type and version:
- CPU per node:
- Memory per node:
- Disk type(e.g. SSD/NVMe):
- Network bandwidth between the nodes:
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
- Number of Longhorn volumes in the cluster:
## Additional context
Add any other context about the problem here.

View File

@ -7,15 +7,10 @@ assignees: ''
--- ---
## What's the task? Please describe ## What's the document you plan to update? Why? Please describe
A clear and concise description of what the document is. <!--A clear and concise description of what the document is.-->
## Describe the items of the task (DoD, definition of done) you'd like
> Please use a task list for items on a separate line with a clickable checkbox https://docs.github.com/en/issues/tracking-your-work-with-issues/about-task-lists
- [ ] `item 1`
## Additional context ## Additional context
Add any other context or screenshots about the document request here. <!--Add any other context or screenshots about the document request here.-->

View File

@ -2,23 +2,23 @@
name: Feature request name: Feature request
about: Suggest an idea/feature about: Suggest an idea/feature
title: "[FEATURE] " title: "[FEATURE] "
labels: kind/enhancement labels: ["kind/enhancement", "require/lep", "require/doc", "require/auto-e2e-test"]
assignees: '' assignees: ''
--- ---
## Is your feature request related to a problem? Please describe (👍 if you like this request) ## Is your feature request related to a problem? Please describe (👍 if you like this request)
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] <!--A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]-->
## Describe the solution you'd like ## Describe the solution you'd like
A clear and concise description of what you want to happen <!--A clear and concise description of what you want to happen-->
## Describe alternatives you've considered ## Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered. <!--A clear and concise description of any alternative solutions or features you've considered.-->
## Additional context ## Additional context
Add any other context or screenshots about the feature request here. <!--Add any other context or screenshots about the feature request here.-->

View File

@ -2,23 +2,23 @@
name: Improvement request name: Improvement request
about: Suggest an improvement of an existing feature about: Suggest an improvement of an existing feature
title: "[IMPROVEMENT] " title: "[IMPROVEMENT] "
labels: kind/improvement labels: ["kind/improvement", "require/doc", "require/auto-e2e-test", "require/backport"]
assignees: '' assignees: ''
--- ---
## Is your improvement request related to a feature? Please describe (👍 if you like this request) ## Is your improvement request related to a feature? Please describe (👍 if you like this request)
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] <!--A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]-->
## Describe the solution you'd like ## Describe the solution you'd like
A clear and concise description of what you want to happen. <!--A clear and concise description of what you want to happen.-->
## Describe alternatives you've considered ## Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered. <!--A clear and concise description of any alternative solutions or features you've considered.-->
## Additional context ## Additional context
Add any other context or screenshots about the feature request here. <!--Add any other context or screenshots about the feature request here.-->

24
.github/ISSUE_TEMPLATE/infra.md vendored Normal file
View File

@ -0,0 +1,24 @@
---
name: Infra
about: Create an test/dev infra task
title: "[INFRA] "
labels: kind/infra
assignees: ''
---
## What's the test to develop? Please describe
<!--A clear and concise description of what test/dev infra you want to develop.-->
## Describe the items of the test development (DoD, definition of done) you'd like
<!--
Please use a task list for items on a separate line with a clickable checkbox https://docs.github.com/en/issues/tracking-your-work-with-issues/about-task-lists
- [ ] `item 1`
-->
## Additional context
<!--Add any other context or screenshots about the test infra request here.-->

View File

@ -7,7 +7,8 @@ assignees: ''
--- ---
## Question ## Question
> Suggest to use https://github.com/longhorn/longhorn/discussions to ask questions.
<!--Suggest to use https://github.com/longhorn/longhorn/discussions to ask questions.-->
## Environment ## Environment
@ -15,6 +16,7 @@ assignees: ''
- Kubernetes version: - Kubernetes version:
- Node config - Node config
- OS type and version - OS type and version
- Kernel version
- CPU per node: - CPU per node:
- Memory per node: - Memory per node:
- Disk type - Disk type
@ -23,4 +25,4 @@ assignees: ''
## Additional context ## Additional context
Add any other context about the problem here. <!--Add any other context about the problem here.-->

View File

@ -1,6 +1,6 @@
--- ---
name: Refactoring request name: Refactor request
about: Suggest a refactoring request of an existing feature or design about: Suggest a refactoring request for an existing implementation
title: "[REFACTOR] " title: "[REFACTOR] "
labels: kind/refactoring labels: kind/refactoring
assignees: '' assignees: ''
@ -9,16 +9,16 @@ assignees: ''
## Is your improvement request related to a feature? Please describe ## Is your improvement request related to a feature? Please describe
A clear and concise description of what the problem is. <!--A clear and concise description of what the problem is.-->
## Describe the solution you'd like ## Describe the solution you'd like
A clear and concise description of what you want to happen. <!--A clear and concise description of what you want to happen.-->
## Describe alternatives you've considered ## Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered. <!--A clear and concise description of any alternative solutions or features you've considered.-->
## Additional context ## Additional context
Add any other context or screenshots about the refactoring request here. <!--Add any other context or screenshots about the refactoring request here.-->

35
.github/ISSUE_TEMPLATE/release.md vendored Normal file
View File

@ -0,0 +1,35 @@
---
name: Release task
about: Create a release task
title: "[RELEASE]"
labels: release/task
assignees: ''
---
**What's the task? Please describe.**
Action items for releasing v<x.y.z>
**Describe the sub-tasks.**
- Pre-Release
- [ ] Regression test plan (manual) - @khushboo-rancher
- [ ] Run e2e regression for pre-GA milestones (`install`, `upgrade`) - @yangchiu
- [ ] Run security testing of container images for pre-GA milestones - @yangchiu
- [ ] Verify longhorn chart PR to ensure all artifacts are ready for GA (`install`, `upgrade`) @chriscchien
- [ ] Run core testing (install, upgrade) for the GA build from the previous patch and the last patch of the previous feature release (1.4.2). - @yangchiu
- Release
- [ ] Release longhorn/chart from the release branch to publish to ArtifactHub
- [ ] Release note
- [ ] Deprecation note
- [ ] Upgrade notes including highlighted notes, deprecation, compatible changes, and others impacting the current users
- Post-Release
- [ ] Create a new release branch of manager/ui/tests/engine/longhorn instance-manager/share-manager/backing-image-manager when creating the RC1
- [ ] Update https://github.com/longhorn/longhorn/blob/master/deploy/upgrade_responder_server/chart-values.yaml @PhanLe1010
- [ ] Add another request for the rancher charts for the next patch release (`1.5.1`) @rebeccazzzz
- Rancher charts: verify the chart is able to install & upgrade - @khushboo-rancher
- [ ] rancher/image-mirrors update @weizhe0422 (@PhanLe1010 )
- https://github.com/rancher/image-mirror/pull/412
- [ ] rancher/charts 2.7 branches for rancher marketplace @weizhe0422 (@PhanLe1010)
- `dev-2.7`: https://github.com/rancher/charts/pull/2766
cc @longhorn/qa @longhorn/dev

View File

@ -9,13 +9,16 @@ assignees: ''
## What's the task? Please describe ## What's the task? Please describe
A clear and concise description of what the task is. <!--A clear and concise description of what the task is.-->
## Describe the items of the task (DoD, definition of done) you'd like ## Describe the sub-tasks
> Please use a task list for items on a separate line with a clickable checkbox https://docs.github.com/en/issues/tracking-your-work-with-issues/about-task-lists
<!--
Please use a task list for items on a separate line with a clickable checkbox https://docs.github.com/en/issues/tracking-your-work-with-issues/about-task-lists
- [ ] `item 1` - [ ] `item 1`
-->
## Additional context ## Additional context
Add any other context or screenshots about the task request here. <!--Add any other context or screenshots about the task request here.-->

View File

@ -9,13 +9,16 @@ assignees: ''
## What's the test to develop? Please describe ## What's the test to develop? Please describe
A clear and concise description of what the test you want to develop. <!--A clear and concise description of what test you want to develop.-->
## Describe the items of the test development (DoD, definition of done) you'd like ## Describe the tasks for the test
> Please use a task list for items on a separate line with a clickable checkbox https://docs.github.com/en/issues/tracking-your-work-with-issues/about-task-lists
<!--
Please use a task list for items on a separate line with a clickable checkbox https://docs.github.com/en/issues/tracking-your-work-with-issues/about-task-lists
- [ ] `item 1` - [ ] `item 1`
-->
## Additional context ## Additional context
Add any other context or screenshots about the test request here. <!--Add any other context or screenshots about the test request here.-->

View File

@ -1,21 +0,0 @@
---
name: Test infra
about: Create a test-infra task
title: "[TEST-INFRA] "
labels: kind/test
assignees: ''
---
## What's the test to develop? Please describe
A clear and concise description of what the test infra you want to develop.
## Describe the items of the test development (DoD, definition of done) you'd like
> Please use a task list for items on a separate line with a clickable checkbox https://docs.github.com/en/issues/tracking-your-work-with-issues/about-task-lists
- [ ] `item 1`
## Additional context
Add any other context or screenshots about the test infra request here.

34
.github/mergify.yml vendored Normal file
View File

@ -0,0 +1,34 @@
pull_request_rules:
- name: automatic merge after review
conditions:
- check-success=continuous-integration/drone/pr
- check-success=DCO
- check-success=CodeFactor
- check-success=codespell
- "#approved-reviews-by>=1"
- approved-reviews-by=@longhorn/maintainer
- label=ready-to-merge
actions:
merge:
method: rebase
- name: ask to resolve conflict
conditions:
- conflict
actions:
comment:
message: This pull request is now in conflicts. Could you fix it @{{author}}? 🙏
# Comment on the PR to trigger backport. ex: @Mergifyio copy stable/3.1 stable/4.0
- name: backport patches to stable branch
conditions:
- base=master
actions:
backport:
title: "[BACKPORT][{{ destination_branch }}] {{ title }}"
body: |
This is an automatic backport of pull request #{{number}}.
{{cherry_pick_error}}
assignees:
- "{{ author }}"

View File

@ -1,32 +1,40 @@
name: Add-To-Projects name: Add-To-Projects
on: on:
issues: issues:
types: [opened, labeled] types: [ opened, labeled ]
jobs: jobs:
community: community:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- name: Is Longhorn Member - name: Is Longhorn Member
uses: tspascoal/get-user-teams-membership@v1.0.4 uses: tspascoal/get-user-teams-membership@v1.0.4
id: is-longhorn-member id: is-longhorn-member
with: with:
username: ${{ github.event.issue.user.login }} username: ${{ github.event.issue.user.login }}
organization: longhorn organization: longhorn
GITHUB_TOKEN: ${{ secrets.CUSTOM_GITHUB_TOKEN }} GITHUB_TOKEN: ${{ secrets.CUSTOM_GITHUB_TOKEN }}
- name: Add To Community Project - name: Add To Community Project
if: fromJSON(steps.is-longhorn-member.outputs.teams)[0] == null if: fromJSON(steps.is-longhorn-member.outputs.teams)[0] == null
uses: actions/add-to-project@v0.3.0 uses: actions/add-to-project@v0.3.0
with: with:
project-url: https://github.com/orgs/longhorn/projects/5 project-url: https://github.com/orgs/longhorn/projects/5
github-token: ${{ secrets.CUSTOM_GITHUB_TOKEN }} github-token: ${{ secrets.CUSTOM_GITHUB_TOKEN }}
qa: qa:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- name: Add To QA & Devops Project - name: Is Longhorn Member
uses: actions/add-to-project@v0.3.0 uses: tspascoal/get-user-teams-membership@v1.0.4
with: id: is-longhorn-member
project-url: https://github.com/orgs/longhorn/projects/4 with:
github-token: ${{ secrets.CUSTOM_GITHUB_TOKEN }} username: ${{ github.event.issue.user.login }}
labeled: kind/test, area/test-infra organization: longhorn
label-operator: OR GITHUB_TOKEN: ${{ secrets.CUSTOM_GITHUB_TOKEN }}
- name: Add To QA & DevOps Project
if: fromJSON(steps.is-longhorn-member.outputs.teams)[0] != null
uses: actions/add-to-project@v0.3.0
with:
project-url: https://github.com/orgs/longhorn/projects/4
github-token: ${{ secrets.CUSTOM_GITHUB_TOKEN }}
labeled: kind/test, area/infra
label-operator: OR

View File

@ -1,50 +1,50 @@
name: Close-Issue name: Close-Issue
on: on:
issues: issues:
types: [unlabeled] types: [ unlabeled ]
jobs: jobs:
backport: backport:
runs-on: ubuntu-latest runs-on: ubuntu-latest
if: contains(github.event.label.name, 'backport/') if: contains(github.event.label.name, 'backport/')
steps: steps:
- name: Get Backport Version - name: Get Backport Version
uses: xom9ikk/split@v1 uses: xom9ikk/split@v1
id: split id: split
with: with:
string: ${{ github.event.label.name }} string: ${{ github.event.label.name }}
separator: / separator: /
- name: Check if Backport Issue Exists - name: Check if Backport Issue Exists
uses: actions-cool/issues-helper@v3 uses: actions-cool/issues-helper@v3
id: if-backport-issue-exists id: if-backport-issue-exists
with: with:
actions: 'find-issues' actions: 'find-issues'
token: ${{ github.token }} token: ${{ github.token }}
title-includes: | title-includes: |
[BACKPORT][v${{ steps.split.outputs._1 }}]${{ github.event.issue.title }} [BACKPORT][v${{ steps.split.outputs._1 }}]${{ github.event.issue.title }}
- name: Close Backport Issue - name: Close Backport Issue
if: fromJSON(steps.if-backport-issue-exists.outputs.issues)[0] != null if: fromJSON(steps.if-backport-issue-exists.outputs.issues)[0] != null
uses: actions-cool/issues-helper@v3 uses: actions-cool/issues-helper@v3
with: with:
actions: 'close-issue' actions: 'close-issue'
token: ${{ github.token }} token: ${{ github.token }}
issue-number: ${{ fromJSON(steps.if-backport-issue-exists.outputs.issues)[0].number }} issue-number: ${{ fromJSON(steps.if-backport-issue-exists.outputs.issues)[0].number }}
automation: automation:
runs-on: ubuntu-latest runs-on: ubuntu-latest
if: contains(github.event.label.name, 'require/automation-e2e') if: contains(github.event.label.name, 'require/automation-e2e')
steps: steps:
- name: Check if Automation Issue Exists - name: Check if Automation Issue Exists
uses: actions-cool/issues-helper@v3 uses: actions-cool/issues-helper@v3
id: if-automation-issue-exists id: if-automation-issue-exists
with: with:
actions: 'find-issues' actions: 'find-issues'
token: ${{ github.token }} token: ${{ github.token }}
title-includes: | title-includes: |
[TEST]${{ github.event.issue.title }} [TEST]${{ github.event.issue.title }}
- name: Close Automation Test Issue - name: Close Automation Test Issue
if: fromJSON(steps.if-automation-issue-exists.outputs.issues)[0] != null if: fromJSON(steps.if-automation-issue-exists.outputs.issues)[0] != null
uses: actions-cool/issues-helper@v3 uses: actions-cool/issues-helper@v3
with: with:
actions: 'close-issue' actions: 'close-issue'
token: ${{ github.token }} token: ${{ github.token }}
issue-number: ${{ fromJSON(steps.if-automation-issue-exists.outputs.issues)[0].number }} issue-number: ${{ fromJSON(steps.if-automation-issue-exists.outputs.issues)[0].number }}

View File

@ -1,97 +1,114 @@
name: Create-Issue name: Create-Issue
on: on:
issues: issues:
types: [labeled] types: [ labeled ]
jobs: jobs:
backport: backport:
runs-on: ubuntu-latest runs-on: ubuntu-latest
if: contains(github.event.label.name, 'backport/') if: contains(github.event.label.name, 'backport/')
steps: steps:
- name: Get Backport Version - name: Is Longhorn Member
uses: xom9ikk/split@v1 uses: tspascoal/get-user-teams-membership@v1.0.4
id: split id: is-longhorn-member
with: with:
string: ${{ github.event.label.name }} username: ${{ github.actor }}
separator: / organization: longhorn
- name: Check if Backport Issue Exists GITHUB_TOKEN: ${{ secrets.CUSTOM_GITHUB_TOKEN }}
uses: actions-cool/issues-helper@v3 - name: Get Backport Version
id: if-backport-issue-exists if: fromJSON(steps.is-longhorn-member.outputs.teams)[0] != null
with: uses: xom9ikk/split@v1
actions: 'find-issues' id: split
token: ${{ github.token }} with:
issue-state: 'all' string: ${{ github.event.label.name }}
title-includes: | separator: /
[BACKPORT][v${{ steps.split.outputs._1 }}]${{ github.event.issue.title }} - name: Check if Backport Issue Exists
- name: Get Milestone Object if: fromJSON(steps.is-longhorn-member.outputs.teams)[0] != null
if: fromJSON(steps.if-backport-issue-exists.outputs.issues)[0] == null uses: actions-cool/issues-helper@v3
uses: longhorn/bot/milestone-action@master id: if-backport-issue-exists
id: milestone with:
with: actions: 'find-issues'
token: ${{ github.token }} token: ${{ github.token }}
repository: ${{ github.repository }} issue-state: 'all'
milestone_name: v${{ steps.split.outputs._1 }} title-includes: |
- name: Get Labels [BACKPORT][v${{ steps.split.outputs._1 }}]${{ github.event.issue.title }}
if: fromJSON(steps.if-backport-issue-exists.outputs.issues)[0] == null - name: Get Milestone Object
id: labels if: fromJSON(steps.is-longhorn-member.outputs.teams)[0] != null && fromJSON(steps.if-backport-issue-exists.outputs.issues)[0] == null
run: | uses: longhorn/bot/milestone-action@master
RAW_LABELS="${{ join(github.event.issue.labels.*.name, ' ') }}" id: milestone
RAW_LABELS="${RAW_LABELS} kind/backport" with:
echo "RAW LABELS: $RAW_LABELS" token: ${{ github.token }}
LABELS=$(echo "$RAW_LABELS" | sed -r 's/\s*backport\S+//g' | sed -r 's/\s*require\/automation-e2e//g' | xargs | sed 's/ /, /g') repository: ${{ github.repository }}
echo "LABELS: $LABELS" milestone_name: v${{ steps.split.outputs._1 }}
echo "::set-output name=labels::$LABELS" - name: Get Labels
- name: Create Backport Issue if: fromJSON(steps.is-longhorn-member.outputs.teams)[0] != null && fromJSON(steps.if-backport-issue-exists.outputs.issues)[0] == null
if: fromJSON(steps.if-backport-issue-exists.outputs.issues)[0] == null id: labels
uses: dacbd/create-issue-action@v1 run: |
id: new-issue RAW_LABELS="${{ join(github.event.issue.labels.*.name, ' ') }}"
with: RAW_LABELS="${RAW_LABELS} kind/backport"
token: ${{ github.token }} echo "RAW LABELS: $RAW_LABELS"
title: | LABELS=$(echo "$RAW_LABELS" | sed -r 's/\s*backport\S+//g' | sed -r 's/\s*require\/auto-e2e-test//g' | xargs | sed 's/ /, /g')
[BACKPORT][v${{ steps.split.outputs._1 }}]${{ github.event.issue.title }} echo "LABELS: $LABELS"
body: | echo "labels=$LABELS" >> $GITHUB_OUTPUT
backport ${{ github.event.issue.html_url }} - name: Create Backport Issue
labels: ${{ steps.labels.outputs.labels }} if: fromJSON(steps.is-longhorn-member.outputs.teams)[0] != null && fromJSON(steps.if-backport-issue-exists.outputs.issues)[0] == null
milestone: ${{ fromJSON(steps.milestone.outputs.data).number }} uses: dacbd/create-issue-action@v1
assignees: ${{ join(github.event.issue.assignees.*.login, ', ') }} id: new-issue
- name: Get Repo Id with:
if: fromJSON(steps.if-backport-issue-exists.outputs.issues)[0] == null token: ${{ github.token }}
uses: octokit/request-action@v2.x title: |
id: repo [BACKPORT][v${{ steps.split.outputs._1 }}]${{ github.event.issue.title }}
with: body: |
route: GET /repos/${{ github.repository }} backport ${{ github.event.issue.html_url }}
env: labels: ${{ steps.labels.outputs.labels }}
GITHUB_TOKEN: ${{ github.token }} milestone: ${{ fromJSON(steps.milestone.outputs.data).number }}
- name: Add Backport Issue To Release assignees: ${{ join(github.event.issue.assignees.*.login, ', ') }}
if: fromJSON(steps.if-backport-issue-exists.outputs.issues)[0] == null - name: Get Repo Id
uses: longhorn/bot/add-zenhub-release-action@master if: fromJSON(steps.is-longhorn-member.outputs.teams)[0] != null && fromJSON(steps.if-backport-issue-exists.outputs.issues)[0] == null
with: uses: octokit/request-action@v2.x
zenhub_token: ${{ secrets.ZENHUB_TOKEN }} id: repo
repo_id: ${{ fromJSON(steps.repo.outputs.data).id }} with:
issue_number: ${{ steps.new-issue.outputs.number }} route: GET /repos/${{ github.repository }}
release_name: ${{ steps.split.outputs._1 }} env:
GITHUB_TOKEN: ${{ github.token }}
- name: Add Backport Issue To Release
if: fromJSON(steps.is-longhorn-member.outputs.teams)[0] != null && fromJSON(steps.if-backport-issue-exists.outputs.issues)[0] == null
uses: longhorn/bot/add-zenhub-release-action@master
with:
zenhub_token: ${{ secrets.ZENHUB_TOKEN }}
repo_id: ${{ fromJSON(steps.repo.outputs.data).id }}
issue_number: ${{ steps.new-issue.outputs.number }}
release_name: ${{ steps.split.outputs._1 }}
automation: automation:
runs-on: ubuntu-latest runs-on: ubuntu-latest
if: contains(github.event.label.name, 'require/automation-e2e') if: contains(github.event.label.name, 'require/auto-e2e-test')
steps: steps:
- name: Check if Automation Issue Exists - name: Is Longhorn Member
uses: actions-cool/issues-helper@v3 uses: tspascoal/get-user-teams-membership@v1.0.4
id: if-automation-issue-exists id: is-longhorn-member
with: with:
actions: 'find-issues' username: ${{ github.actor }}
token: ${{ github.token }} organization: longhorn
issue-state: 'all' GITHUB_TOKEN: ${{ secrets.CUSTOM_GITHUB_TOKEN }}
title-includes: | - name: Check if Automation Issue Exists
[TEST]${{ github.event.issue.title }} if: fromJSON(steps.is-longhorn-member.outputs.teams)[0] != null
- name: Create Automation Test Issue uses: actions-cool/issues-helper@v3
if: fromJSON(steps.if-automation-issue-exists.outputs.issues)[0] == null id: if-automation-issue-exists
uses: dacbd/create-issue-action@v1 with:
with: actions: 'find-issues'
token: ${{ github.token }} token: ${{ github.token }}
title: | issue-state: 'all'
[TEST]${{ github.event.issue.title }} title-includes: |
body: | [TEST]${{ github.event.issue.title }}
adding/updating auto e2e test cases for ${{ github.event.issue.html_url }} if they can be automated - name: Create Automation Test Issue
if: fromJSON(steps.is-longhorn-member.outputs.teams)[0] != null && fromJSON(steps.if-automation-issue-exists.outputs.issues)[0] == null
uses: dacbd/create-issue-action@v1
with:
token: ${{ github.token }}
title: |
[TEST]${{ github.event.issue.title }}
body: |
adding/updating auto e2e test cases for ${{ github.event.issue.html_url }} if they can be automated
cc @longhorn/qa cc @longhorn/qa
labels: kind/test labels: kind/test

View File

@ -4,25 +4,25 @@ on:
workflow_call: workflow_call:
workflow_dispatch: workflow_dispatch:
schedule: schedule:
- cron: '30 1 * * *' - cron: '30 1 * * *'
jobs: jobs:
stale: stale:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: actions/stale@v4 - uses: actions/stale@v4
with: with:
stale-issue-message: 'This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.' stale-issue-message: 'This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.'
stale-pr-message: 'This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.' stale-pr-message: 'This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.'
close-issue-message: 'This issue was closed because it has been stalled for 5 days with no activity.' close-issue-message: 'This issue was closed because it has been stalled for 5 days with no activity.'
close-pr-message: 'This PR was closed because it has been stalled for 10 days with no activity.' close-pr-message: 'This PR was closed because it has been stalled for 10 days with no activity.'
days-before-stale: 30 days-before-stale: 30
days-before-pr-stale: 45 days-before-pr-stale: 45
days-before-close: 5 days-before-close: 5
days-before-pr-close: 10 days-before-pr-close: 10
stale-issue-label: 'stale' stale-issue-label: 'stale'
stale-pr-label: 'stale' stale-pr-label: 'stale'
exempt-all-assignees: true exempt-all-assignees: true
exempt-issue-labels: 'kind/bug,kind/doc,kind/enhancement,kind/poc,kind/refactoring,kind/test,kind/task,kind/backport,kind/regression,kind/evaluation' exempt-issue-labels: 'kind/bug,kind/doc,kind/enhancement,kind/poc,kind/refactoring,kind/test,kind/task,kind/backport,kind/regression,kind/evaluation'
exempt-draft-pr: true exempt-draft-pr: true
exempt-all-milestones: true exempt-all-milestones: true

View File

@ -0,0 +1,283 @@
## Release Note
**v1.4.0 released!** 🎆
This release introduces many enhancements, improvements, and bug fixes as described below about stability, performance, data integrity, troubleshooting, and so on. Please try it and feedback. Thanks for all the contributions!
- [Kubernetes 1.25 Support](https://github.com/longhorn/longhorn/issues/4003) [[doc]](https://longhorn.io/docs/1.4.0/deploy/important-notes/#pod-security-policies-disabled--pod-security-admission-introduction)
In the previous versions, Longhorn relies on Pod Security Policy (PSP) to authorize Longhorn components for privileged operations. From Kubernetes 1.25, PSP has been removed and replaced with Pod Security Admission (PSA). Longhorn v1.4.0 supports opt-in PSP enablement, so it can support Kubernetes versions with or without PSP.
- [ARM64 GA](https://github.com/longhorn/longhorn/issues/4206)
ARM64 has been experimental from Longhorn v1.1.0. After receiving more user feedback and increasing testing coverage, ARM64 distribution has been stabilized with quality as per our regular regression testing, so it is qualified for general availability.
- [RWX GA](https://github.com/longhorn/longhorn/issues/2293) [[lep]](https://github.com/longhorn/longhorn/blob/master/enhancements/20220727-dedicated-recovery-backend-for-rwx-volume-nfs-server.md)[[doc]](https://longhorn.io/docs/1.4.0/advanced-resources/rwx-workloads/)
RWX has been experimental from Longhorn v1.1.0, but it lacks availability support when the Longhorn Share Manager component behind becomes unavailable. Longhorn v1.4.0 supports NFS recovery backend based on Kubernetes built-in resource, ConfigMap, for recovering NFS client connection during the fail-over period. Also, the NFS client hard mode introduction will further avoid previous potential data loss. For the detail, please check the issue and enhancement proposal.
- [Volume Snapshot Checksum](https://github.com/longhorn/longhorn/issues/4210) [[lep]](https://github.com/longhorn/longhorn/blob/master/enhancements/20220922-snapshot-checksum-and-bit-rot-detection.md)[[doc]](https://longhorn.io/docs/1.4.0/references/settings/#snapshot-data-integrity)
Data integrity is a continuous effort for Longhorn. In this version, Snapshot Checksum has been introduced w/ some settings to allow users to enable or disable checksum calculation with different modes.
- [Volume Bit-rot Protection](https://github.com/longhorn/longhorn/issues/3198) [[lep]](https://github.com/longhorn/longhorn/blob/master/enhancements/20220922-snapshot-checksum-and-bit-rot-detection.md)[[doc]](https://longhorn.io/docs/1.4.0/references/settings/#snapshot-data-integrity)
When enabling the Volume Snapshot Checksum feature, Longhorn will periodically calculate and check the checksums of volume snapshots, find corrupted snapshots, then fix them.
- [Volume Replica Rebuilding Speedup](https://github.com/longhorn/longhorn/issues/4783)
When enabling the Volume Snapshot Checksum feature, Longhorn will use the calculated snapshot checksum to avoid needless snapshot replication between nodes for improving replica rebuilding speed and resource consumption.
- [Volume Trim](https://github.com/longhorn/longhorn/issues/836) [[lep]](https://github.com/longhorn/longhorn/blob/master/enhancements/20221103-filesystem-trim.md)[[doc]](https://longhorn.io/docs/1.4.0/volumes-and-nodes/trim-filesystem/#trim-the-filesystem-in-a-longhorn-volume)
Longhorn engine supports UNMAP SCSI command to reclaim space from the block volume.
- [Online Volume Expansion](https://github.com/longhorn/longhorn/issues/1674) [[doc]](https://longhorn.io/docs/1.4.0/volumes-and-nodes/expansion)
Longhorn engine supports optional parameters to pass size expansion requests when updating the volume frontend to support online volume expansion and resize the filesystem via CSI node driver.
- [Local Volume via Data Locality Strict Mode](https://github.com/longhorn/longhorn/issues/3957) [[lep]](https://github.com/longhorn/longhorn/blob/master/enhancements/20200819-keep-a-local-replica-to-engine.md)[[doc]](https://longhorn.io/docs/1.4.0/references/settings/#default-data-locality)
Local volume is based on a new Data Locality setting, Strict Local. It will allow users to create one replica volume staying in a consistent location, and the data transfer between the volume frontend and engine will be through a local socket instead of the TCP stack to improve performance and reduce resource consumption.
- [Volume Recurring Job Backup Restore](https://github.com/longhorn/longhorn/issues/2227) [[lep]](https://github.com/longhorn/longhorn/blob/master/enhancements/20201002-allow-recurring-backup-detached-volumes.md)[[doc]](https://longhorn.io/docs/1.4.0/snapshots-and-backups/backup-and-restore/restore-recurring-jobs-from-a-backup/)
Recurring jobs binding to a volume can be backed up to the remote backup target together with the volume backup metadata. They can be restored back as well for a better operation experience.
- [Volume IO Metrics](https://github.com/longhorn/longhorn/issues/2406) [[doc]](https://longhorn.io/docs/1.4.0/monitoring/metrics/#volume)
Longhorn enriches Volume metrics by providing real-time IO stats including IOPS, latency, and throughput of R/W IO. Users can set up a monotoning solution like Prometheus to monitor volume performance.
- [Longhorn System Backup & Restore](https://github.com/longhorn/longhorn/issues/1455) [[lep]](https://github.com/longhorn/longhorn/blob/master/enhancements/20220913-longhorn-system-backup-restore.md)[[doc]](https://longhorn.io/docs/1.4.0/advanced-resources/system-backup-restore/)
Users can back up the longhorn system to the remote backup target. Afterward, it's able to restore back to an existing cluster in place or a new cluster for specific operational purposes.
- [Support Bundle Enhancement](https://github.com/longhorn/longhorn/issues/2759) [[lep]](https://github.com/longhorn/longhorn/blob/master/enhancements/20221109-support-bundle-enhancement.md)
Longhorn introduces a new support bundle integration based on a general [support bundle kit](https://github.com/rancher/support-bundle-kit) solution. This can help us collect more complete troubleshooting info and simulate the cluster environment.
- [Tunable Timeout between Engine and Replica](https://github.com/longhorn/longhorn/issues/4491) [[doc]](https://longhorn.io/docs/1.4.0/references/settings/#engine-to-replica-timeout)
In the current Longhorn versions, the default timeout between the Longhorn engine and replica is fixed without any exposed user settings. This will potentially bring some challenges for users having a low-spec infra environment. By exporting the setting configurable, it will allow users adaptively tune the stability of volume operations.
## Installation
> **Please ensure your Kubernetes cluster is at least v1.21 before installing Longhorn v1.4.0.**
Longhorn supports 3 installation ways including Rancher App Marketplace, Kubectl, and Helm. Follow the installation instructions [here](https://longhorn.io/docs/1.4.0/deploy/install/).
## Upgrade
> **Please ensure your Kubernetes cluster is at least v1.21 before upgrading to Longhorn v1.4.0 from v1.3.x. Only support upgrading from 1.3.x.**
Follow the upgrade instructions [here](https://longhorn.io/docs/1.4.0/deploy/upgrade/).
## Deprecation & Incompatibilities
- Pod Security Policy is an opt-in setting. If installing Longhorn with PSP support, need to enable it first.
- The built-in CSI Snapshotter sidecar is upgraded to v5.0.1. The v1beta1 version of Volume Snapshot custom resource is deprecated but still supported. However, it will be removed after upgrading CSI Snapshotter to 6.1 or later versions in the future, so please start using v1 version instead before the deprecated version is removed.
## Known Issues after Release
Please follow up on [here](https://github.com/longhorn/longhorn/wiki/Outstanding-Known-Issues-of-Releases) about any outstanding issues found after this release.
## Highlights
- [FEATURE] Reclaim/Shrink space of volume ([836](https://github.com/longhorn/longhorn/issues/836)) - @yangchiu @derekbit @smallteeths @shuo-wu
- [FEATURE] Backup/Restore Longhorn System ([1455](https://github.com/longhorn/longhorn/issues/1455)) - @c3y1huang @khushboo-rancher
- [FEATURE] Online volume expansion ([1674](https://github.com/longhorn/longhorn/issues/1674)) - @shuo-wu @chriscchien
- [FEATURE] Record recurring schedule in the backups and allow user choose to use it for the restored volume ([2227](https://github.com/longhorn/longhorn/issues/2227)) - @yangchiu @mantissahz
- [FEATURE] NFS support (RWX) GA ([2293](https://github.com/longhorn/longhorn/issues/2293)) - @derekbit @chriscchien
- [FEATURE] Support metrics for Volume IOPS, throughput and latency real time ([2406](https://github.com/longhorn/longhorn/issues/2406)) - @derekbit @roger-ryao
- [FEATURE] Support bundle enhancement ([2759](https://github.com/longhorn/longhorn/issues/2759)) - @c3y1huang @chriscchien
- [FEATURE] Automatic identifying of corrupted replica (bit rot detection) ([3198](https://github.com/longhorn/longhorn/issues/3198)) - @yangchiu @derekbit
- [FEATURE] Local volume for distributed data workloads ([3957](https://github.com/longhorn/longhorn/issues/3957)) - @derekbit @chriscchien
- [IMPROVEMENT] Support K8s 1.25 by updating removed deprecated resource versions like PodSecurityPolicy ([4003](https://github.com/longhorn/longhorn/issues/4003)) - @PhanLe1010 @chriscchien
- [IMPROVEMENT] Faster resync time for fresh replica rebuilding ([4092](https://github.com/longhorn/longhorn/issues/4092)) - @yangchiu @derekbit
- [FEATURE] Introduce checksum for snapshots ([4210](https://github.com/longhorn/longhorn/issues/4210)) - @derekbit @roger-ryao
- [FEATURE] Update K8s version support and component/pkg/build dependencies ([4239](https://github.com/longhorn/longhorn/issues/4239)) - @yangchiu @PhanLe1010
- [BUG] data corruption due to COW and block size not being aligned during rebuilding replicas ([4354](https://github.com/longhorn/longhorn/issues/4354)) - @PhanLe1010 @chriscchien
- [IMPROVEMENT] Adjust the iSCSI timeout and the engine-to-replica timeout settings ([4491](https://github.com/longhorn/longhorn/issues/4491)) - @yangchiu @derekbit
- [IMPROVEMENT] Using specific block size in Longhorn volume's filesystem ([4594](https://github.com/longhorn/longhorn/issues/4594)) - @derekbit @roger-ryao
- [IMPROVEMENT] Speed up replica rebuilding by the metadata such as ctime of snapshot disk files ([4783](https://github.com/longhorn/longhorn/issues/4783)) - @yangchiu @derekbit
## Enhancements
- [FEATURE] Configure successfulJobsHistoryLimit of CronJobs ([1711](https://github.com/longhorn/longhorn/issues/1711)) - @weizhe0422 @chriscchien
- [FEATURE] Allow customization of the cipher used by cryptsetup in volume encryption ([3353](https://github.com/longhorn/longhorn/issues/3353)) - @mantissahz @chriscchien
- [FEATURE] New setting to limit the concurrent volume restoring from backup ([4558](https://github.com/longhorn/longhorn/issues/4558)) - @c3y1huang @chriscchien
- [FEATURE] Make FS format options configurable in storage class ([4642](https://github.com/longhorn/longhorn/issues/4642)) - @weizhe0422 @chriscchien
## Improvement
- [IMPROVEMENT] Change the script into a docker run command mentioned in 'recovery from longhorn backup without system installed' doc ([1521](https://github.com/longhorn/longhorn/issues/1521)) - @weizhe0422 @chriscchien
- [IMPROVEMENT] Improve 'recovery from longhorn backup without system installed' doc. ([1522](https://github.com/longhorn/longhorn/issues/1522)) - @weizhe0422 @roger-ryao
- [IMPROVEMENT] Dump NFS ganesha logs to pod stdout ([2380](https://github.com/longhorn/longhorn/issues/2380)) - @weizhe0422 @roger-ryao
- [IMPROVEMENT] Support failed/obsolete orphaned backup cleanup ([3898](https://github.com/longhorn/longhorn/issues/3898)) - @mantissahz @chriscchien
- [IMPROVEMENT] liveness and readiness probes with longhorn csi plugin daemonset ([3907](https://github.com/longhorn/longhorn/issues/3907)) - @c3y1huang @roger-ryao
- [IMPROVEMENT] Longhorn doesn't reuse failed replica on a disk with full allocated space ([3921](https://github.com/longhorn/longhorn/issues/3921)) - @PhanLe1010 @chriscchien
- [IMPROVEMENT] Reduce syscalls while reading and writing requests in longhorn-engine (engine <-> replica) ([4122](https://github.com/longhorn/longhorn/issues/4122)) - @yangchiu @derekbit
- [IMPROVEMENT] Reduce read and write calls in liblonghorn (tgt <-> engine) ([4133](https://github.com/longhorn/longhorn/issues/4133)) - @derekbit
- [IMPROVEMENT] Replace the GCC allocator in liblonghorn with a more efficient memory allocator ([4136](https://github.com/longhorn/longhorn/issues/4136)) - @yangchiu @derekbit
- [DOC] Update Helm readme and document ([4175](https://github.com/longhorn/longhorn/issues/4175)) - @derekbit
- [IMPROVEMENT] Purging a volume before rebuilding starts ([4183](https://github.com/longhorn/longhorn/issues/4183)) - @yangchiu @shuo-wu
- [IMPROVEMENT] Schedule volumes based on available disk space ([4185](https://github.com/longhorn/longhorn/issues/4185)) - @yangchiu @c3y1huang
- [IMPROVEMENT] Recognize default toleration and node selector to allow Longhorn run on the RKE mixed cluster ([4246](https://github.com/longhorn/longhorn/issues/4246)) - @c3y1huang @chriscchien
- [IMPROVEMENT] Support bundle doesn't collect the snapshot yamls ([4285](https://github.com/longhorn/longhorn/issues/4285)) - @yangchiu @PhanLe1010
- [IMPROVEMENT] Avoid accidentally deleting engine images that are still in use ([4332](https://github.com/longhorn/longhorn/issues/4332)) - @derekbit @chriscchien
- [IMPROVEMENT] Show non-JSON error from backup store ([4336](https://github.com/longhorn/longhorn/issues/4336)) - @c3y1huang
- [IMPROVEMENT] Update nfs-ganesha to v4.0 ([4351](https://github.com/longhorn/longhorn/issues/4351)) - @derekbit
- [IMPROVEMENT] show error when failed to init frontend ([4362](https://github.com/longhorn/longhorn/issues/4362)) - @c3y1huang
- [IMPROVEMENT] Too many debug-level log messages in engine instance-manager ([4427](https://github.com/longhorn/longhorn/issues/4427)) - @derekbit @chriscchien
- [IMPROVEMENT] Add prep work for fixing the corrupted filesystem using fsck in KB ([4440](https://github.com/longhorn/longhorn/issues/4440)) - @derekbit
- [IMPROVEMENT] Prevent users from accidentally uninstalling Longhorn ([4509](https://github.com/longhorn/longhorn/issues/4509)) - @yangchiu @PhanLe1010
- [IMPROVEMENT] add possibility to use nodeSelector on the storageClass ([4574](https://github.com/longhorn/longhorn/issues/4574)) - @weizhe0422 @roger-ryao
- [IMPROVEMENT] Check if node schedulable condition is set before trying to read it ([4581](https://github.com/longhorn/longhorn/issues/4581)) - @weizhe0422 @roger-ryao
- [IMPROVEMENT] Review/consolidate the sectorSize in replica server, replica volume, and engine ([4599](https://github.com/longhorn/longhorn/issues/4599)) - @yangchiu @derekbit
- [IMPROVEMENT] Reorganize longhorn-manager/k8s/patches and auto-generate preserveUnknownFields field ([4600](https://github.com/longhorn/longhorn/issues/4600)) - @yangchiu @derekbit
- [IMPROVEMENT] share-manager pod bypasses the kubernetes scheduler ([4789](https://github.com/longhorn/longhorn/issues/4789)) - @joshimoo @chriscchien
- [IMPROVEMENT] Unify the format of returned error messages in longhorn-engine ([4828](https://github.com/longhorn/longhorn/issues/4828)) - @derekbit
- [IMPROVEMENT] Longhorn system backup/restore UI ([4855](https://github.com/longhorn/longhorn/issues/4855)) - @smallteeths
- [IMPROVEMENT] Replace the modTime (mtime) with ctime in snapshot hash ([4934](https://github.com/longhorn/longhorn/issues/4934)) - @derekbit @chriscchien
- [BUG] volume is stuck in attaching/detaching loop with error `Failed to init frontend: device...` ([4959](https://github.com/longhorn/longhorn/issues/4959)) - @derekbit @PhanLe1010 @chriscchien
- [IMPROVEMENT] Affinity in the longhorn-ui deployment within the helm chart ([4987](https://github.com/longhorn/longhorn/issues/4987)) - @mantissahz @chriscchien
- [IMPROVEMENT] Allow users to change volume.spec.snapshotDataIntegrity on UI ([4994](https://github.com/longhorn/longhorn/issues/4994)) - @yangchiu @smallteeths
- [IMPROVEMENT] Backup and restore recurring jobs on UI ([5009](https://github.com/longhorn/longhorn/issues/5009)) - @smallteeths @chriscchien
- [IMPROVEMENT] Disable `Automatically Delete Workload Pod when The Volume Is Detached Unexpectedly` for RWX volumes ([5017](https://github.com/longhorn/longhorn/issues/5017)) - @derekbit @chriscchien
- [IMPROVEMENT] Enable fast replica rebuilding by default ([5023](https://github.com/longhorn/longhorn/issues/5023)) - @derekbit @roger-ryao
- [IMPROVEMENT] Upgrade tcmalloc in longhorn-engine ([5050](https://github.com/longhorn/longhorn/issues/5050)) - @derekbit
- [IMPROVEMENT] UI show error when backup target is empty for system backup ([5056](https://github.com/longhorn/longhorn/issues/5056)) - @smallteeths @khushboo-rancher
- [IMPROVEMENT] System restore job name should be Longhorn prefixed ([5057](https://github.com/longhorn/longhorn/issues/5057)) - @c3y1huang @khushboo-rancher
- [BUG] Error in logs while restoring the system backup ([5061](https://github.com/longhorn/longhorn/issues/5061)) - @c3y1huang @chriscchien
- [IMPROVEMENT] Add warning message to when deleting the restoring backups ([5065](https://github.com/longhorn/longhorn/issues/5065)) - @smallteeths @khushboo-rancher @roger-ryao
- [IMPROVEMENT] Inconsistent name convention across volume backup restore and system backup restore ([5066](https://github.com/longhorn/longhorn/issues/5066)) - @smallteeths @roger-ryao
- [IMPROVEMENT] System restore should proceed to restore other volumes if restoring one volume keeps failing for a certain time. ([5086](https://github.com/longhorn/longhorn/issues/5086)) - @c3y1huang @khushboo-rancher @roger-ryao
- [IMPROVEMENT] Support customized number of replicas of webhook and recovery-backend ([5087](https://github.com/longhorn/longhorn/issues/5087)) - @derekbit @chriscchien
- [IMPROVEMENT] Simplify the page by placing some configuration items in the advanced configuration when creating the volume ([5090](https://github.com/longhorn/longhorn/issues/5090)) - @yangchiu @smallteeths
- [IMPROVEMENT] Support replica sync client timeout setting to stabilize replica rebuilding ([5110](https://github.com/longhorn/longhorn/issues/5110)) - @derekbit @chriscchien
- [IMPROVEMENT] Set a newly created volume's data integrity from UI to `ignored` rather than `Fast-Check`. ([5126](https://github.com/longhorn/longhorn/issues/5126)) - @yangchiu @smallteeths
## Performance
- [BUG] Turn a node down and up, workload takes longer time to come back online in Longhorn v1.2.0 ([2947](https://github.com/longhorn/longhorn/issues/2947)) - @yangchiu @PhanLe1010
- [TASK] RWX volume performance measurement and investigation ([3665](https://github.com/longhorn/longhorn/issues/3665)) - @derekbit
- [TASK] Verify spinning disk/HDD via the current e2e regression ([4182](https://github.com/longhorn/longhorn/issues/4182)) - @yangchiu
- [BUG] test_csi_snapshot_snap_create_volume_from_snapshot failed when using HDD as Longhorn disks ([4227](https://github.com/longhorn/longhorn/issues/4227)) - @yangchiu @PhanLe1010
- [TASK] Disable tcmalloc in data path because newer tcmalloc version leads to performance drop ([5096](https://github.com/longhorn/longhorn/issues/5096)) - @derekbit @chriscchien
## Stability
- [BUG] Longhorn won't fail all replicas if there is no valid backend during the engine starting stage ([1330](https://github.com/longhorn/longhorn/issues/1330)) - @derekbit @roger-ryao
- [BUG] Every other backup fails and crashes the volume (Segmentation Fault) ([1768](https://github.com/longhorn/longhorn/issues/1768)) - @olljanat @mantissahz
- [BUG] Backend sizes do not match 5368709120 != 10737418240 in the engine initiation phase ([3601](https://github.com/longhorn/longhorn/issues/3601)) - @derekbit @chriscchien
- [BUG] Somehow the Rebuilding field inside volume.meta is set to true causing the volume to stuck in attaching/detaching loop ([4212](https://github.com/longhorn/longhorn/issues/4212)) - @yangchiu @derekbit
- [BUG] Engine binary cannot be recovered after being removed accidentally ([4380](https://github.com/longhorn/longhorn/issues/4380)) - @yangchiu @c3y1huang
- [TASK] Disable tcmalloc in longhorn-engine and longhorn-instance-manager ([5068](https://github.com/longhorn/longhorn/issues/5068)) - @derekbit
## Bugs
- [BUG] Removing old instance records after the new IM pod is launched will take 1 minute ([1363](https://github.com/longhorn/longhorn/issues/1363)) - @mantissahz
- [BUG] Restoring volume stuck forever if the backup is already deleted. ([1867](https://github.com/longhorn/longhorn/issues/1867)) - @mantissahz @chriscchien
- [BUG] Duplicated default instance manager leads to engine/replica cannot be started ([3000](https://github.com/longhorn/longhorn/issues/3000)) - @PhanLe1010 @roger-ryao
- [BUG] Restore from backup sometimes failed if having high frequent recurring backup job w/ retention ([3055](https://github.com/longhorn/longhorn/issues/3055)) - @mantissahz @roger-ryao
- [BUG] Newly created backup stays in `InProgress` when the volume deleted before backup finished ([3122](https://github.com/longhorn/longhorn/issues/3122)) - @mantissahz @chriscchien
- [Bug] Degraded volume generate failed replica make volume unschedulable ([3220](https://github.com/longhorn/longhorn/issues/3220)) - @derekbit @chriscchien
- [BUG] The default access mode of a restored RWX volume is RWO ([3444](https://github.com/longhorn/longhorn/issues/3444)) - @weizhe0422 @roger-ryao
- [BUG] Replica rebuilding failure with error "Replica must be closed, Can not add in state: open" ([3828](https://github.com/longhorn/longhorn/issues/3828)) - @mantissahz @roger-ryao
- [BUG] Max length of volume name not consist between frontend and backend ([3917](https://github.com/longhorn/longhorn/issues/3917)) - @weizhe0422 @roger-ryao
- [BUG] Can't delete volumesnapshot if backup removed first ([4107](https://github.com/longhorn/longhorn/issues/4107)) - @weizhe0422 @chriscchien
- [BUG] A IM-proxy connection not closed in full regression 1.3 ([4113](https://github.com/longhorn/longhorn/issues/4113)) - @c3y1huang @chriscchien
- [BUG] Scale replica warning ([4120](https://github.com/longhorn/longhorn/issues/4120)) - @c3y1huang @chriscchien
- [BUG] Wrong nodeOrDiskEvicted collected in node monitor ([4143](https://github.com/longhorn/longhorn/issues/4143)) - @yangchiu @derekbit
- [BUG] Misleading log "BUG: replica is running but storage IP is empty" ([4153](https://github.com/longhorn/longhorn/issues/4153)) - @shuo-wu @chriscchien
- [BUG] longhorn-manager cannot start while upgrading if the configmap contains volume sensitive settings ([4160](https://github.com/longhorn/longhorn/issues/4160)) - @derekbit @chriscchien
- [BUG] Replica stuck in buggy state with status.currentState is error and the spec.desireState is running ([4197](https://github.com/longhorn/longhorn/issues/4197)) - @yangchiu @PhanLe1010
- [BUG] After updating longhorn to version 1.3.0, only 1 node had problems and I can't even delete it ([4213](https://github.com/longhorn/longhorn/issues/4213)) - @derekbit @c3y1huang @chriscchien
- [BUG] Unable to use a TTY error when running environment_check.sh ([4216](https://github.com/longhorn/longhorn/issues/4216)) - @flkdnt @chriscchien
- [BUG] The last healthy replica may be evicted or removed ([4238](https://github.com/longhorn/longhorn/issues/4238)) - @yangchiu @shuo-wu
- [BUG] Volume detaching and attaching repeatedly while creating multiple snapshots with a same id ([4250](https://github.com/longhorn/longhorn/issues/4250)) - @yangchiu @derekbit
- [BUG] Backing image is not deleted and recreated correctly ([4256](https://github.com/longhorn/longhorn/issues/4256)) - @shuo-wu @chriscchien
- [BUG] longhorn-ui fails to start on RKE2 with cis-1.6 profile for Longhorn v1.3.0 with helm install ([4266](https://github.com/longhorn/longhorn/issues/4266)) - @yangchiu @mantissahz
- [BUG] Longhorn volume stuck in deleting state ([4278](https://github.com/longhorn/longhorn/issues/4278)) - @yangchiu @PhanLe1010
- [BUG] the IP address is duplicate when using storage network and the second network is contronllerd by ovs-cni. ([4281](https://github.com/longhorn/longhorn/issues/4281)) - @mantissahz
- [BUG] build longhorn-ui image error ([4283](https://github.com/longhorn/longhorn/issues/4283)) - @smallteeths
- [BUG] Wrong conditions in the Chart default-setting manifest for Rancher deployed Windows Cluster feature ([4289](https://github.com/longhorn/longhorn/issues/4289)) - @derekbit @chriscchien
- [BUG] Volume operations/rebuilding error during eviction ([4294](https://github.com/longhorn/longhorn/issues/4294)) - @yangchiu @shuo-wu
- [BUG] longhorn-manager deletes same pod multi times when rebooting ([4302](https://github.com/longhorn/longhorn/issues/4302)) - @mantissahz @w13915984028
- [BUG] test_setting_backing_image_auto_cleanup failed because the backing image file isn't deleted on the corresponding node as expected ([4308](https://github.com/longhorn/longhorn/issues/4308)) - @shuo-wu @chriscchien
- [BUG] After automatically force delete terminating pods of deployment on down node, data lost and I/O error ([4384](https://github.com/longhorn/longhorn/issues/4384)) - @yangchiu @derekbit @PhanLe1010
- [BUG] Volume can not attach to node when engine image DaemonSet pods are not fully deployed ([4386](https://github.com/longhorn/longhorn/issues/4386)) - @PhanLe1010 @chriscchien
- [BUG] Error/warning during uninstallation of Longhorn v1.3.1 via manifest ([4405](https://github.com/longhorn/longhorn/issues/4405)) - @PhanLe1010 @roger-ryao
- [BUG] can't upgrade engine if a volume was created in Longhorn v1.0 and the volume.spec.dataLocality is `""` ([4412](https://github.com/longhorn/longhorn/issues/4412)) - @derekbit @chriscchien
- [BUG] Confusing description the label for replica delition ([4430](https://github.com/longhorn/longhorn/issues/4430)) - @yangchiu @smallteeths
- [BUG] Update the Longhorn document in Using the Environment Check Script ([4450](https://github.com/longhorn/longhorn/issues/4450)) - @weizhe0422 @roger-ryao
- [BUG] Unable to search 1.3.1 doc by algolia ([4457](https://github.com/longhorn/longhorn/issues/4457)) - @mantissahz @roger-ryao
- [BUG] Misleading message "The volume is in expansion progress from size 20Gi to 10Gi" if the expansion is invalid ([4475](https://github.com/longhorn/longhorn/issues/4475)) - @yangchiu @smallteeths
- [BUG] Flaky case test_autosalvage_with_data_locality_enabled ([4489](https://github.com/longhorn/longhorn/issues/4489)) - @weizhe0422
- [BUG] Continuously rebuild when auto-balance==least-effort and existing node becomes unschedulable ([4502](https://github.com/longhorn/longhorn/issues/4502)) - @yangchiu @c3y1huang
- [BUG] Inconsistent system snapshots between replicas after rebuilding ([4513](https://github.com/longhorn/longhorn/issues/4513)) - @derekbit
- [BUG] Prometheus metric for backup state (longhorn_backup_state) returns wrong values ([4521](https://github.com/longhorn/longhorn/issues/4521)) - @mantissahz @roger-ryao
- [BUG] Longhorn accidentally schedule all replicas onto a worker node even though the setting Replica Node Level Soft Anti-Affinity is currently disabled ([4546](https://github.com/longhorn/longhorn/issues/4546)) - @yangchiu @mantissahz
- [BUG] LH continuously reports `invalid customized default setting taint-toleration` ([4554](https://github.com/longhorn/longhorn/issues/4554)) - @weizhe0422 @roger-ryao
- [BUG] the values.yaml in the longhorn helm chart contains values not used. ([4601](https://github.com/longhorn/longhorn/issues/4601)) - @weizhe0422 @roger-ryao
- [BUG] longhorn-engine integration test test_restore_to_file_with_backing_file failed after upgrade to sles 15.4 ([4632](https://github.com/longhorn/longhorn/issues/4632)) - @mantissahz
- [BUG] Can not pull a backup created by another Longhorn system from the remote backup target ([4637](https://github.com/longhorn/longhorn/issues/4637)) - @yangchiu @mantissahz @roger-ryao
- [BUG] Fix the share-manager deletion failure if the confimap is not existing ([4648](https://github.com/longhorn/longhorn/issues/4648)) - @derekbit @roger-ryao
- [BUG] Updating volume-scheduling-error failure for RWX volumes and expanding volumes ([4654](https://github.com/longhorn/longhorn/issues/4654)) - @derekbit @chriscchien
- [BUG] charts/longhorn/questions.yaml include oudated csi-image tags ([4669](https://github.com/longhorn/longhorn/issues/4669)) - @PhanLe1010 @roger-ryao
- [BUG] rebuilding the replica failed after upgrading from 1.2.4 to 1.3.2-rc2 ([4705](https://github.com/longhorn/longhorn/issues/4705)) - @derekbit @chriscchien
- [BUG] Cannot re-run helm uninstallation if the first one failed and cannot fetch logs of failed uninstallation pod ([4711](https://github.com/longhorn/longhorn/issues/4711)) - @yangchiu @PhanLe1010 @roger-ryao
- [BUG] The old instance-manager-r Pods are not deleted after upgrade ([4726](https://github.com/longhorn/longhorn/issues/4726)) - @mantissahz @chriscchien
- [BUG] Replica Auto Balance repeatedly delete the local replica and trigger rebuilding ([4761](https://github.com/longhorn/longhorn/issues/4761)) - @c3y1huang @roger-ryao
- [BUG] Volume metafile getting deleted or empty results in a detach-attach loop ([4846](https://github.com/longhorn/longhorn/issues/4846)) - @mantissahz @chriscchien
- [BUG] Backing image is stuck at `in-progress` status if the provided checksum is incorrect ([4852](https://github.com/longhorn/longhorn/issues/4852)) - @FrankYang0529 @chriscchien
- [BUG] Duplicate channel close error in the backing image manage related components ([4865](https://github.com/longhorn/longhorn/issues/4865)) - @weizhe0422 @roger-ryao
- [BUG] The node ID of backing image data source somehow get changed then lead to file handling failed ([4887](https://github.com/longhorn/longhorn/issues/4887)) - @shuo-wu @chriscchien
- [BUG] Cannot upload a backing image larger than 10G ([4902](https://github.com/longhorn/longhorn/issues/4902)) - @smallteeths @shuo-wu @chriscchien
- [BUG] Failed to build longhorn-instance-manager master branch ([4946](https://github.com/longhorn/longhorn/issues/4946)) - @derekbit
- [BUG] PVC only works with plural annotation `volumes.kubernetes.io/storage-provisioner: driver.longhorn.io` ([4951](https://github.com/longhorn/longhorn/issues/4951)) - @weizhe0422
- [BUG] Failed to create a replenished replica process because of the newly adding option ([4962](https://github.com/longhorn/longhorn/issues/4962)) - @yangchiu @derekbit
- [BUG] Incorrect log messages in longhorn-engine processRemoveSnapshot() ([4980](https://github.com/longhorn/longhorn/issues/4980)) - @derekbit
- [BUG] System backup showing wrong age ([5047](https://github.com/longhorn/longhorn/issues/5047)) - @smallteeths @khushboo-rancher
- [BUG] System backup should validate empty backup target ([5055](https://github.com/longhorn/longhorn/issues/5055)) - @c3y1huang @khushboo-rancher
- [BUG] missing the `restoreVolumeRecurringJob` parameter in the VolumeGet API ([5062](https://github.com/longhorn/longhorn/issues/5062)) - @mantissahz @roger-ryao
- [BUG] System restore stuck in restoring if pvc exists with identical name ([5064](https://github.com/longhorn/longhorn/issues/5064)) - @c3y1huang @roger-ryao
- [BUG] No error shown on UI if system backup conf not available ([5072](https://github.com/longhorn/longhorn/issues/5072)) - @c3y1huang @khushboo-rancher
- [BUG] System restore missing services ([5074](https://github.com/longhorn/longhorn/issues/5074)) - @yangchiu @c3y1huang
- [BUG] In a system restore, PV & PVC are not restored if PVC was created with 'longhorn-static' (created via Longhorn GUI) ([5091](https://github.com/longhorn/longhorn/issues/5091)) - @c3y1huang @khushboo-rancher
- [BUG][v1.4.0-rc1] image security scan CRITICAL issues ([5107](https://github.com/longhorn/longhorn/issues/5107)) - @yangchiu @mantissahz
- [BUG] Snapshot trim wrong label in the volume detail page. ([5127](https://github.com/longhorn/longhorn/issues/5127)) - @smallteeths @chriscchien
- [BUG] Filesystem on the volume with a backing image is corrupted after applying trim operation ([5129](https://github.com/longhorn/longhorn/issues/5129)) - @derekbit @chriscchien
- [BUG] Error in uninstall job ([5132](https://github.com/longhorn/longhorn/issues/5132)) - @c3y1huang @chriscchien
- [BUG] Uninstall job unable to delete the systembackup and systemrestore cr. ([5133](https://github.com/longhorn/longhorn/issues/5133)) - @c3y1huang @chriscchien
- [BUG] Nil pointer dereference error on restoring the system backup ([5134](https://github.com/longhorn/longhorn/issues/5134)) - @yangchiu @c3y1huang
- [BUG] UI option Update Replicas Auto Balance should use capital letter like others ([5154](https://github.com/longhorn/longhorn/issues/5154)) - @smallteeths @chriscchien
- [BUG] System restore cannot roll out when volume name is different to the PV ([5157](https://github.com/longhorn/longhorn/issues/5157)) - @yangchiu @c3y1huang
- [BUG] Online expansion doesn't succeed after a failed expansion ([5169](https://github.com/longhorn/longhorn/issues/5169)) - @derekbit @shuo-wu @khushboo-rancher
## Misc
- [DOC] RWX support for NVIDIA JETSON Ubuntu 18.4LTS kernel requires enabling NFSV4.1 ([3157](https://github.com/longhorn/longhorn/issues/3157)) - @yangchiu @derekbit
- [DOC] Add information about encryption algorithm to documentation ([3285](https://github.com/longhorn/longhorn/issues/3285)) - @mantissahz
- [DOC] Update the doc of volume size after introducing snapshot prune ([4158](https://github.com/longhorn/longhorn/issues/4158)) - @shuo-wu
- [Doc] Update the outdated "Customizing Default Settings" document ([4174](https://github.com/longhorn/longhorn/issues/4174)) - @derekbit
- [TASK] Refresh distro version support for 1.4 ([4401](https://github.com/longhorn/longhorn/issues/4401)) - @weizhe0422
- [TASK] Update official document Longhorn Networking ([4478](https://github.com/longhorn/longhorn/issues/4478)) - @derekbit
- [TASK] Update preserveUnknownFields fields in longhorn-manager CRD manifest ([4505](https://github.com/longhorn/longhorn/issues/4505)) - @derekbit @roger-ryao
- [TASK] Disable doc search for archived versions < 1.1 ([4524](https://github.com/longhorn/longhorn/issues/4524)) - @mantissahz
- [TASK] Update longhorn components with the latest backupstore ([4552](https://github.com/longhorn/longhorn/issues/4552)) - @derekbit
- [TASK] Update base image of all components from BCI 15.3 to 15.4 ([4617](https://github.com/longhorn/longhorn/issues/4617)) - @yangchiu
- [DOC] Update the Longhorn document in Install with Helm ([4745](https://github.com/longhorn/longhorn/issues/4745)) - @roger-ryao
- [TASK] Create longhornio support-bundle-kit image ([4911](https://github.com/longhorn/longhorn/issues/4911)) - @yangchiu
- [DOC] Add Recurring * Jobs History Limit to setting reference ([4912](https://github.com/longhorn/longhorn/issues/4912)) - @weizhe0422 @roger-ryao
- [DOC] Add Failed Backup TTL to setting reference ([4913](https://github.com/longhorn/longhorn/issues/4913)) - @mantissahz
- [TASK] Create longhornio liveness probe image ([4945](https://github.com/longhorn/longhorn/issues/4945)) - @yangchiu
- [TASK] Make system managed components branch-based build ([5024](https://github.com/longhorn/longhorn/issues/5024)) - @yangchiu
- [TASK] Remove unstable s390x from PR check for all repos ([5040](https://github.com/longhorn/longhorn/issues/5040)) -
- [TASK] Update longhorn-share-manager's nfs-ganesha to V4.2.1 ([5083](https://github.com/longhorn/longhorn/issues/5083)) - @derekbit @mantissahz
- [DOC] Update the Longhorn document in Setting up Prometheus and Grafana ([5158](https://github.com/longhorn/longhorn/issues/5158)) - @roger-ryao
## Contributors
- @FrankYang0529
- @PhanLe1010
- @c3y1huang
- @chriscchien
- @derekbit
- @flkdnt
- @innobead
- @joshimoo
- @khushboo-rancher
- @mantissahz
- @olljanat
- @roger-ryao
- @shuo-wu
- @smallteeths
- @w13915984028
- @weizhe0422
- @yangchiu

View File

@ -0,0 +1,88 @@
## Release Note
**v1.4.1 released!** 🎆
This release introduces improvements and bug fixes as described below about stability, performance, space efficiency, resilience, and so on. Please try it and feedback. Thanks for all the contributions!
## Installation
> **Please ensure your Kubernetes cluster is at least v1.21 before installing Longhorn v1.4.1.**
Longhorn supports 3 installation ways including Rancher App Marketplace, Kubectl, and Helm. Follow the installation instructions [here](https://longhorn.io/docs/1.4.1/deploy/install/).
## Upgrade
> **Please ensure your Kubernetes cluster is at least v1.21 before upgrading to Longhorn v1.4.1 from v1.3.x/v1.4.0, which are only supported source versions.**
Follow the upgrade instructions [here](https://longhorn.io/docs/1.4.1/deploy/upgrade/).
## Deprecation & Incompatibilities
N/A
## Known Issues after Release
Please follow up on [here](https://github.com/longhorn/longhorn/wiki/Outstanding-Known-Issues-of-Releases) about any outstanding issues found after this release.
## Highlights
- [IMPROVEMENT] Periodically clean up volume snapshots ([3836](https://github.com/longhorn/longhorn/issues/3836)) - @c3y1huang @chriscchien
## Improvement
- [IMPROVEMENT] Do not count the failure replica reuse failure caused by the disconnection ([1923](https://github.com/longhorn/longhorn/issues/1923)) - @yangchiu @mantissahz
- [IMPROVEMENT] Update uninstallation info to include the 'Deleting Confirmation Flag' in chart ([5250](https://github.com/longhorn/longhorn/issues/5250)) - @PhanLe1010 @roger-ryao
- [IMPROVEMENT] Fix Guaranteed Engine Manager CPU recommendation formula in UI ([5338](https://github.com/longhorn/longhorn/issues/5338)) - @c3y1huang @smallteeths @roger-ryao
- [IMPROVEMENT] Update PSP validation in the Longhorn upstream chart ([5339](https://github.com/longhorn/longhorn/issues/5339)) - @yangchiu @PhanLe1010
- [IMPROVEMENT] Update ganesha nfs to 4.2.3 ([5356](https://github.com/longhorn/longhorn/issues/5356)) - @derekbit @roger-ryao
- [IMPROVEMENT] Set write-cache of longhorn block device to off explicitly ([5382](https://github.com/longhorn/longhorn/issues/5382)) - @derekbit @chriscchien
## Stability
- [BUG] Memory leak in CSI plugin caused by stuck umount processes if the RWX volume is already gone ([5296](https://github.com/longhorn/longhorn/issues/5296)) - @derekbit @roger-ryao
- [BUG] share-manager pod failed to restart after kubelet restart ([5507](https://github.com/longhorn/longhorn/issues/5507)) - @yangchiu @derekbit
## Bugs
- [BUG] Longhorn 1.3.2 fails to backup & restore volumes behind Internet proxy ([5054](https://github.com/longhorn/longhorn/issues/5054)) - @mantissahz @chriscchien
- [BUG] RWX doesn't work with release 1.4.0 due to end grace update error from recovery backend ([5183](https://github.com/longhorn/longhorn/issues/5183)) - @derekbit @chriscchien
- [BUG] Incorrect indentation of charts/questions.yaml ([5196](https://github.com/longhorn/longhorn/issues/5196)) - @mantissahz @roger-ryao
- [BUG] Updating option "Allow snapshots removal during trim" for old volumes failed ([5218](https://github.com/longhorn/longhorn/issues/5218)) - @shuo-wu @roger-ryao
- [BUG] Incorrect router retry mechanism ([5259](https://github.com/longhorn/longhorn/issues/5259)) - @mantissahz @chriscchien
- [BUG] System Backup is stuck at Uploading if there are PVs not provisioned by CSI driver ([5286](https://github.com/longhorn/longhorn/issues/5286)) - @c3y1huang @chriscchien
- [BUG] Sync up with backup target during DR volume activation ([5292](https://github.com/longhorn/longhorn/issues/5292)) - @yangchiu @weizhe0422
- [BUG] environment_check.sh does not handle different kernel versions in cluster correctly ([5304](https://github.com/longhorn/longhorn/issues/5304)) - @achims311 @roger-ryao
- [BUG] instance-manager-r high memory consumption ([5312](https://github.com/longhorn/longhorn/issues/5312)) - @derekbit @roger-ryao
- [BUG] Replica rebuilding caused by rke2/kubelet restart ([5340](https://github.com/longhorn/longhorn/issues/5340)) - @derekbit @chriscchien
- [BUG] Error message not consistent between create/update recurring job when retain number greater than 50 ([5434](https://github.com/longhorn/longhorn/issues/5434)) - @c3y1huang @chriscchien
- [BUG] Do not copy Host header to API requests forwarded to Longhorn Manager ([5438](https://github.com/longhorn/longhorn/issues/5438)) - @yangchiu @smallteeths
- [BUG] RWX Volume attachment is getting Failed ([5456](https://github.com/longhorn/longhorn/issues/5456)) - @derekbit
- [BUG] test case test_backup_lock_deletion_during_restoration failed ([5458](https://github.com/longhorn/longhorn/issues/5458)) - @yangchiu @derekbit
- [BUG] [master] [v1.4.1-rc1] Volume restoration will never complete if attached node is down ([5464](https://github.com/longhorn/longhorn/issues/5464)) - @derekbit @weizhe0422 @chriscchien
- [BUG] Unable to create support bundle agent pod in air-gap environment ([5467](https://github.com/longhorn/longhorn/issues/5467)) - @yangchiu @c3y1huang
- [BUG] Node disconnection test failed ([5476](https://github.com/longhorn/longhorn/issues/5476)) - @yangchiu @derekbit
- [BUG] Physical node down test failed ([5477](https://github.com/longhorn/longhorn/issues/5477)) - @derekbit @chriscchien
- [BUG] Backing image with sync failure ([5481](https://github.com/longhorn/longhorn/issues/5481)) - @ChanYiLin @roger-ryao
- [BUG] Example of data migration doesn't work for hidden/./dot-files) ([5484](https://github.com/longhorn/longhorn/issues/5484)) - @hedefalk @shuo-wu @chriscchien
- [BUG] test case test_dr_volume_with_backup_block_deletion failed ([5489](https://github.com/longhorn/longhorn/issues/5489)) - @yangchiu @derekbit
## Misc
- [TASK][UI] add new recurring job tasks ([5272](https://github.com/longhorn/longhorn/issues/5272)) - @smallteeths @chriscchien
## Contributors
- @ChanYiLin
- @PhanLe1010
- @achims311
- @c3y1huang
- @chriscchien
- @derekbit
- @hedefalk
- @innobead
- @mantissahz
- @roger-ryao
- @shuo-wu
- @smallteeths
- @weizhe0422
- @yangchiu

View File

@ -0,0 +1,92 @@
## Release Note
### **v1.4.2 released!** 🎆
Longhorn v1.4.2 is the latest stable version of Longhorn 1.4.
It introduces improvements and bug fixes in the areas of stability, performance, space efficiency, resilience, and so on. Please try it out and provide feedback. Thanks for all the contributions!
> For the definition of stable or latest release, please check [here](https://github.com/longhorn/longhorn#releases).
## Installation
> **Please ensure your Kubernetes cluster is at least v1.21 before installing v1.4.2.**
Longhorn supports 3 installation ways including Rancher App Marketplace, Kubectl, and Helm. Follow the installation instructions [here](https://longhorn.io/docs/1.4.2/deploy/install/).
## Upgrade
> **Please read the [important notes](https://longhorn.io/docs/1.4.2/deploy/important-notes/) first and ensure your Kubernetes cluster is at least v1.21 before upgrading to Longhorn v1.4.2 from v1.3.x/v1.4.x, which are only supported source versions.**
Follow the upgrade instructions [here](https://longhorn.io/docs/1.4.2/deploy/upgrade/).
## Deprecation & Incompatibilities
N/A
## Known Issues after Release
Please follow up on [here](https://github.com/longhorn/longhorn/wiki/Outstanding-Known-Issues-of-Releases) about any outstanding issues found after this release.
## Highlights
- [IMPROVEMENT] Use PDB to protect Longhorn components from unexpected drains ([3304](https://github.com/longhorn/longhorn/issues/3304)) - @yangchiu @PhanLe1010
- [IMPROVEMENT] Introduce timeout mechanism for the sparse file syncing service ([4305](https://github.com/longhorn/longhorn/issues/4305)) - @yangchiu @ChanYiLin
- [IMPROVEMENT] Recurring jobs create new snapshots while being not able to clean up old ones ([4898](https://github.com/longhorn/longhorn/issues/4898)) - @mantissahz @chriscchien
## Improvement
- [IMPROVEMENT] Support bundle collects dmesg, syslog and related information of longhorn nodes ([5073](https://github.com/longhorn/longhorn/issues/5073)) - @weizhe0422 @roger-ryao
- [IMPROVEMENT] Fix BackingImage uploading/downloading flow to prevent client timeout ([5443](https://github.com/longhorn/longhorn/issues/5443)) - @ChanYiLin @chriscchien
- [IMPROVEMENT] Create a new setting so that Longhorn removes PDB for instance-manager-r that doesn't have any running instance inside it ([5549](https://github.com/longhorn/longhorn/issues/5549)) - @PhanLe1010 @khushboo-rancher
- [IMPROVEMENT] Deprecate the setting `allow-node-drain-with-last-healthy-replica` and replace it by `node-drain-policy` setting ([5585](https://github.com/longhorn/longhorn/issues/5585)) - @yangchiu @PhanLe1010
- [IMPROVEMENT][UI] Recurring jobs create new snapshots while being not able to clean up old one ([5610](https://github.com/longhorn/longhorn/issues/5610)) - @mantissahz @smallteeths @roger-ryao
- [IMPROVEMENT] Only activate replica if it doesn't have deletion timestamp during volume engine upgrade ([5632](https://github.com/longhorn/longhorn/issues/5632)) - @PhanLe1010 @roger-ryao
- [IMPROVEMENT] Clean up backup target if the backup target setting is unset ([5655](https://github.com/longhorn/longhorn/issues/5655)) - @yangchiu @ChanYiLin
## Resilience
- [BUG] Directly mark replica as failed if the node is deleted ([5542](https://github.com/longhorn/longhorn/issues/5542)) - @weizhe0422 @roger-ryao
- [BUG] RWX volume is stuck at detaching when the attached node is down ([5558](https://github.com/longhorn/longhorn/issues/5558)) - @derekbit @roger-ryao
- [BUG] Backup monitor gets stuck in an infinite loop if backup isn't found ([5662](https://github.com/longhorn/longhorn/issues/5662)) - @derekbit @chriscchien
- [BUG] Resources such as replicas are somehow not mutated when network is unstable ([5762](https://github.com/longhorn/longhorn/issues/5762)) - @derekbit @roger-ryao
- [BUG] Instance manager may not update instance status for a minute after starting ([5809](https://github.com/longhorn/longhorn/issues/5809)) - @ejweber @chriscchien
## Bugs
- [BUG] Delete a uploading backing image, the corresponding LH temp file is not deleted ([3682](https://github.com/longhorn/longhorn/issues/3682)) - @ChanYiLin @chriscchien
- [BUG] Can not create backup in engine image not fully deployed cluster ([5248](https://github.com/longhorn/longhorn/issues/5248)) - @ChanYiLin @roger-ryao
- [BUG] Upgrade engine --> spec.restoreVolumeRecurringJob and spec.snapshotDataIntegrity Unsupported value ([5485](https://github.com/longhorn/longhorn/issues/5485)) - @yangchiu @derekbit
- [BUG] Bulk backup deletion cause restoring volume to finish with attached state. ([5506](https://github.com/longhorn/longhorn/issues/5506)) - @ChanYiLin @roger-ryao
- [BUG] volume expansion starts for no reason, gets stuck on current size > expected size ([5513](https://github.com/longhorn/longhorn/issues/5513)) - @mantissahz @roger-ryao
- [BUG] RWX volume attachment failed if tried more enough times ([5537](https://github.com/longhorn/longhorn/issues/5537)) - @yangchiu @derekbit
- [BUG] instance-manager-e emits `Wait for process pvc-xxxx to shutdown` constantly ([5575](https://github.com/longhorn/longhorn/issues/5575)) - @derekbit @roger-ryao
- [BUG] Support bundle kit should respect node selector & taint toleration ([5614](https://github.com/longhorn/longhorn/issues/5614)) - @yangchiu @c3y1huang
- [BUG] Value overlapped in page Instance Manager Image ([5622](https://github.com/longhorn/longhorn/issues/5622)) - @smallteeths @chriscchien
- [BUG] Instance manager PDB created with wrong selector thus blocking the draining of the wrongly selected node forever ([5680](https://github.com/longhorn/longhorn/issues/5680)) - @PhanLe1010 @chriscchien
- [BUG] During volume live engine upgrade, if the replica pod is killed, the volume is stuck in upgrading forever ([5684](https://github.com/longhorn/longhorn/issues/5684)) - @yangchiu @PhanLe1010
- [BUG] Instance manager PDBs cannot be removed if the longhorn-manager pod on its spec node is not available ([5688](https://github.com/longhorn/longhorn/issues/5688)) - @PhanLe1010 @roger-ryao
- [BUG] Rebuild rebuilding is possibly issued to a wrong replica ([5709](https://github.com/longhorn/longhorn/issues/5709)) - @ejweber @roger-ryao
- [BUG] longhorn upgrade is not upgrading engineimage ([5740](https://github.com/longhorn/longhorn/issues/5740)) - @shuo-wu @chriscchien
- [BUG] `test_replica_auto_balance_when_replica_on_unschedulable_node` Error in creating volume with nodeSelector and dataLocality parameters ([5745](https://github.com/longhorn/longhorn/issues/5745)) - @c3y1huang @roger-ryao
- [BUG] Unable to backup volume after NFS server IP change ([5856](https://github.com/longhorn/longhorn/issues/5856)) - @derekbit @roger-ryao
## Misc
- [TASK] Check and update the networking doc & example YAMLs ([5651](https://github.com/longhorn/longhorn/issues/5651)) - @yangchiu @shuo-wu
## Contributors
- @ChanYiLin
- @PhanLe1010
- @c3y1huang
- @chriscchien
- @derekbit
- @ejweber
- @innobead
- @khushboo-rancher
- @mantissahz
- @roger-ryao
- @shuo-wu
- @smallteeths
- @weizhe0422
- @yangchiu

View File

@ -0,0 +1,74 @@
## Release Note
### **v1.4.3 released!** 🎆
Longhorn v1.4.3 is the latest stable version of Longhorn 1.4.
It introduces improvements and bug fixes in the areas of stability, resilience, and so on. Please try it out and provide feedback. Thanks for all the contributions!
> For the definition of stable or latest release, please check [here](https://github.com/longhorn/longhorn#releases).
## Installation
> **Please ensure your Kubernetes cluster is at least v1.21 before installing v1.4.3.**
Longhorn supports 3 installation ways including Rancher App Marketplace, Kubectl, and Helm. Follow the installation instructions [here](https://longhorn.io/docs/1.4.3/deploy/install/).
## Upgrade
> **Please read the [important notes](https://longhorn.io/docs/1.4.3/deploy/important-notes/) first and ensure your Kubernetes cluster is at least v1.21 before upgrading to Longhorn v1.4.3 from v1.3.x/v1.4.x, which are only supported source versions.**
Follow the upgrade instructions [here](https://longhorn.io/docs/1.4.3/deploy/upgrade/).
## Deprecation & Incompatibilities
N/A
## Known Issues after Release
Please follow up on [here](https://github.com/longhorn/longhorn/wiki/Outstanding-Known-Issues-of-Releases) about any outstanding issues found after this release.
## Improvement
- [IMPROVEMENT] Assign the pods to the same node where the strict-local volume is present ([5448](https://github.com/longhorn/longhorn/issues/5448)) - @c3y1huang @chriscchien
## Resilience
- [BUG] filesystem corrupted after delete instance-manager-r for a locality best-effort volume ([5801](https://github.com/longhorn/longhorn/issues/5801)) - @yangchiu @ChanYiLin @mantissahz
## Bugs
- [BUG] 'Upgrade Engine' still shows up in a specific situation when engine already upgraded ([3063](https://github.com/longhorn/longhorn/issues/3063)) - @weizhe0422 @PhanLe1010 @smallteeths
- [BUG] DR volume even after activation remains in standby mode if there are one or more failed replicas. ([3069](https://github.com/longhorn/longhorn/issues/3069)) - @yangchiu @mantissahz
- [BUG] Prevent Longhorn uninstallation from getting stuck due to backups in error ([5868](https://github.com/longhorn/longhorn/issues/5868)) - @ChanYiLin @mantissahz
- [BUG] Unable to create support bundle if the previous one stayed in ReadyForDownload phase ([5882](https://github.com/longhorn/longhorn/issues/5882)) - @c3y1huang @roger-ryao
- [BUG] share-manager for a given pvc keep restarting (other pvc are working fine) ([5954](https://github.com/longhorn/longhorn/issues/5954)) - @yangchiu @derekbit
- [BUG] Replica auto-rebalance doesn't respect node selector ([5971](https://github.com/longhorn/longhorn/issues/5971)) - @c3y1huang @roger-ryao
- [BUG] Extra snapshot generated when clone from a detached volume ([5986](https://github.com/longhorn/longhorn/issues/5986)) - @weizhe0422 @ejweber
- [BUG] User created snapshot deleted after node drain and uncordon ([5992](https://github.com/longhorn/longhorn/issues/5992)) - @yangchiu @mantissahz
- [BUG] In some specific situation, system backup auto deleted when creating another one ([6045](https://github.com/longhorn/longhorn/issues/6045)) - @c3y1huang @chriscchien
- [BUG] Backing Image deletion stuck if it's deleted during uploading process and bids is ready-for-transfer state ([6086](https://github.com/longhorn/longhorn/issues/6086)) - @WebberHuang1118 @chriscchien
- [BUG] Backing image manager fails when SELinux is enabled ([6108](https://github.com/longhorn/longhorn/issues/6108)) - @ejweber @chriscchien
- [BUG] test_dr_volume_with_restore_command_error failed ([6130](https://github.com/longhorn/longhorn/issues/6130)) - @mantissahz @roger-ryao
- [BUG] Longhorn doesn't remove the system backups crd on uninstallation ([6185](https://github.com/longhorn/longhorn/issues/6185)) - @c3y1huang @khushboo-rancher
- [BUG] Test case test_ha_backup_deletion_recovery failed in rhel or rockylinux arm64 environment ([6213](https://github.com/longhorn/longhorn/issues/6213)) - @yangchiu @ChanYiLin @mantissahz
- [BUG] Engine continues to attempt to rebuild replica while detaching ([6217](https://github.com/longhorn/longhorn/issues/6217)) - @yangchiu @ejweber
- [BUG] Unable to receive support bundle from UI when it's large (400MB+) ([6256](https://github.com/longhorn/longhorn/issues/6256)) - @c3y1huang @chriscchien
- [BUG] Migration test case failed: unable to detach volume migration is not ready yet ([6238](https://github.com/longhorn/longhorn/issues/6238)) - @yangchiu @PhanLe1010 @khushboo-rancher
- [BUG] Restored Volumes stuck in attaching state ([6239](https://github.com/longhorn/longhorn/issues/6239)) - @derekbit @roger-ryao
## Contributors
- @ChanYiLin
- @PhanLe1010
- @WebberHuang1118
- @c3y1huang
- @chriscchien
- @derekbit
- @ejweber
- @innobead
- @khushboo-rancher
- @mantissahz
- @roger-ryao
- @smallteeths
- @weizhe0422
- @yangchiu

View File

@ -0,0 +1,301 @@
## Release Note
### **v1.5.0 released!** 🎆
Longhorn v1.5.0 is the latest version of Longhorn 1.5.
It introduces many enhancements, improvements, and bug fixes as described below including performance, stability, maintenance, resilience, and so on. Please try it and feedback. Thanks for all the contributions!
> For the definition of stable or latest release, please check [here](https://github.com/longhorn/longhorn#releases).
- [v2 Data Engine based on SPDK - Preview](https://github.com/longhorn/longhorn/issues/5751)
> **Please note that this is a preview feature, so should not be used in any production environment. A preview feature is disabled by default and would be changed in the following versions until it becomes general availability.**
In addition to the existing iSCSI stack (v1) data engine, we are introducing the v2 data engine based on SPDK (Storage Performance Development Kit). This release includes the introduction of volume lifecycle management, degraded volume handling, offline replica rebuilding, block device management, and orphaned replica management. For the performance benchmark and comparison with v1, check the report [here](https://longhorn.io/docs/1.5.0/spdk/performance-benchmark/).
- [Longhorn Volume Attachment](https://github.com/longhorn/longhorn/issues/3715)
Introducing the new Longhorn VolumeAttachment CR, which ensures exclusive attachment and supports automatic volume attachment and detachment for various headless operations such as volume cloning, backing image export, and recurring jobs.
- [Cluster Autoscaler - GA](https://github.com/longhorn/longhorn/issues/5238)
Cluster Autoscaler was initially introduced as an experimental feature in v1.3. After undergoing automatic validation on different public cloud Kubernetes distributions and receiving user feedback, it has now reached general availability.
- [Instance Manager Engine & Replica Consolidation](https://github.com/longhorn/longhorn/issues/5208)
Previously, there were two separate instance manager pods responsible for volume engine and replica process management. However, this setup required high resource usage, especially during live upgrades. In this release, we have merged these pods into a single instance manager, reducing the initial resource requirements.
- [Volume Backup Compression Methods](https://github.com/longhorn/longhorn/issues/5189)
Longhorn supports different compression methods for volume backups, including lz4, gzip, or no compression. This allows users to choose the most suitable method based on their data type and usage requirements.
- [Automatic Volume Trim Recurring Job](https://github.com/longhorn/longhorn/issues/5186)
While volume filesystem trim was introduced in v1.4, users had to perform the operation manually. From this release, users can create a recurring job that automatically runs the trim process, improving space efficiency without requiring human intervention.
- [RWX Volume Trim](https://github.com/longhorn/longhorn/issues/5143)
Longhorn supports filesystem trim for RWX (Read-Write-Many) volumes, expanding the trim functionality beyond RWO (Read-Write-Once) volumes only.
- [Upgrade Path Enforcement & Downgrade Prevention](https://github.com/longhorn/longhorn/issues/5131)
To ensure compatibility after an upgrade, we have implemented upgrade path enforcement. This prevents unintended downgrades and ensures the system and data remain intact.
- [Backing Image Management via CSI VolumeSnapshot](https://github.com/longhorn/longhorn/issues/5005)
Users can now utilize the unified CSI VolumeSnapshot interface to manage Backing Images similar to volume snapshots and backups.
- [Snapshot Cleanup & Delete Recurring Job](https://github.com/longhorn/longhorn/issues/3836)
Introducing two new recurring job types specifically designed for snapshot cleanup and deletion. These jobs allow users to remove unnecessary snapshots for better space efficiency.
- [CIFS Backup Store](https://github.com/longhorn/longhorn/issues/3599) & [Azure Backup Store](https://github.com/longhorn/longhorn/issues/1309)
To enhance users' backup strategies and align with data governance policies, Longhorn now supports additional backup storage protocols, including CIFS and Azure.
- [Kubernetes Upgrade Node Drain Policy](https://github.com/longhorn/longhorn/issues/3304)
The new Node Drain Policy provides flexible strategies to protect volume data during Kubernetes upgrades or node maintenance operations. This ensures the integrity and availability of your volumes.
## Installation
> **Please ensure your Kubernetes cluster is at least v1.21 before installing Longhorn v1.5.0.**
Longhorn supports 3 installation ways including Rancher App Marketplace, Kubectl, and Helm. Follow the installation instructions [here](https://longhorn.io/docs/1.5.0/deploy/install/).
## Upgrade
> **Please ensure your Kubernetes cluster is at least v1.21 before upgrading to Longhorn v1.5.0 from v1.4.x. Only support upgrading from 1.4.x.**
Follow the upgrade instructions [here](https://longhorn.io/docs/1.5.0/deploy/upgrade/).
## Deprecation & Incompatibilities
Please check the [important notes](https://longhorn.io/docs/1.5.0/deploy/important-notes/) to know more about deprecated, removed, incompatible features and important changes. If you upgrade indirectly from an older version like v1.3.x, please also check the corresponding important note for each upgrade version path.
## Known Issues after Release
Please follow up on [here](https://github.com/longhorn/longhorn/wiki/Outstanding-Known-Issues-of-Releases) about any outstanding issues found after this release.
## Highlights
- [DOC] Provide the user guide for Kubernetes upgrade ([494](https://github.com/longhorn/longhorn/issues/494)) - @PhanLe1010
- [FEATURE] Backups to Azure Blob Storage ([1309](https://github.com/longhorn/longhorn/issues/1309)) - @mantissahz @chriscchien
- [IMPROVEMENT] Use PDB to protect Longhorn components from unexpected drains ([3304](https://github.com/longhorn/longhorn/issues/3304)) - @yangchiu @PhanLe1010
- [FEATURE] CIFS Backup Store Support ([3599](https://github.com/longhorn/longhorn/issues/3599)) - @derekbit @chriscchien
- [IMPROVEMENT] Consolidate volume attach/detach implementation ([3715](https://github.com/longhorn/longhorn/issues/3715)) - @yangchiu @PhanLe1010
- [IMPROVEMENT] Periodically clean up volume snapshots ([3836](https://github.com/longhorn/longhorn/issues/3836)) - @c3y1huang @chriscchien
- [IMPROVEMENT] Introduce timeout mechanism for the sparse file syncing service ([4305](https://github.com/longhorn/longhorn/issues/4305)) - @yangchiu @ChanYiLin
- [IMPROVEMENT] Recurring jobs create new snapshots while being not able to clean up old ones ([4898](https://github.com/longhorn/longhorn/issues/4898)) - @mantissahz @chriscchien
- [FEATURE] BackingImage Management via VolumeSnapshot ([5005](https://github.com/longhorn/longhorn/issues/5005)) - @ChanYiLin @chriscchien
- [FEATURE] Upgrade path enforcement & downgrade prevention ([5131](https://github.com/longhorn/longhorn/issues/5131)) - @yangchiu @mantissahz
- [FEATURE] Support RWX volume trim ([5143](https://github.com/longhorn/longhorn/issues/5143)) - @derekbit @chriscchien
- [FEATURE] Auto Trim via recurring job ([5186](https://github.com/longhorn/longhorn/issues/5186)) - @c3y1huang @chriscchien
- [FEATURE] Introduce faster compression and multiple threads for volume backup & restore ([5189](https://github.com/longhorn/longhorn/issues/5189)) - @derekbit @roger-ryao
- [FEATURE] Consolidate Instance Manager Engine & Replica for resource consumption reduction ([5208](https://github.com/longhorn/longhorn/issues/5208)) - @yangchiu @c3y1huang
- [FEATURE] Cluster Autoscaler Support GA ([5238](https://github.com/longhorn/longhorn/issues/5238)) - @yangchiu @c3y1huang
- [FEATURE] Update K8s version support and component/pkg/build dependencies for Longhorn 1.5 ([5595](https://github.com/longhorn/longhorn/issues/5595)) - @yangchiu @ejweber
- [FEATURE] Support SPDK Data Engine - Preview ([5751](https://github.com/longhorn/longhorn/issues/5751)) - @derekbit @shuo-wu @DamiaSan
## Enhancements
- [FEATURE] Allow users to directly activate a restoring/DR volume as long as there is one ready replica. ([1512](https://github.com/longhorn/longhorn/issues/1512)) - @mantissahz @weizhe0422
- [REFACTOR] volume controller refactoring/split up, to simplify the control flow ([2527](https://github.com/longhorn/longhorn/issues/2527)) - @PhanLe1010 @chriscchien
- [FEATURE] Import and export SPDK longhorn volumes to longhorn sparse file directory ([4100](https://github.com/longhorn/longhorn/issues/4100)) - @DamiaSan
- [FEATURE] Add a global `storage reserved` setting for newly created longhorn nodes' disks ([4773](https://github.com/longhorn/longhorn/issues/4773)) - @mantissahz @chriscchien
- [FEATURE] Support backup volumes during system backup ([5011](https://github.com/longhorn/longhorn/issues/5011)) - @c3y1huang @chriscchien
- [FEATURE] Support SPDK lvol shallow copy for newly replica creation ([5217](https://github.com/longhorn/longhorn/issues/5217)) - @DamiaSan
- [FEATURE] Introduce longhorn-spdk-engine for SPDK volume management ([5282](https://github.com/longhorn/longhorn/issues/5282)) - @shuo-wu
- [FEATURE] Support replica-zone-soft-anti-affinity setting per volume ([5358](https://github.com/longhorn/longhorn/issues/5358)) - @ChanYiLin @smallteeths @chriscchien
- [FEATURE] Install Opt-In NetworkPolicies ([5403](https://github.com/longhorn/longhorn/issues/5403)) - @yangchiu @ChanYiLin
- [FEATURE] Create Longhorn SPDK Engine component with basic fundamental functions ([5406](https://github.com/longhorn/longhorn/issues/5406)) - @shuo-wu
- [FEATURE] Add status APIs for shallow copy and IO pause/resume ([5647](https://github.com/longhorn/longhorn/issues/5647)) - @DamiaSan
- [FEATURE] Introduce a new disk type, disk management and replica scheduler for SPDK volumes ([5683](https://github.com/longhorn/longhorn/issues/5683)) - @derekbit @roger-ryao
- [FEATURE] Support replica scheduling for SPDK volume ([5711](https://github.com/longhorn/longhorn/issues/5711)) - @derekbit
- [FEATURE] Create SPDK gRPC service for instance manager ([5712](https://github.com/longhorn/longhorn/issues/5712)) - @shuo-wu
- [FEATURE] Environment check script for Longhorn with SPDK ([5738](https://github.com/longhorn/longhorn/issues/5738)) - @derekbit @chriscchien
- [FEATURE] Deployment manifests for helping install SPDK dependencies, utilities and libraries ([5739](https://github.com/longhorn/longhorn/issues/5739)) - @yangchiu @derekbit
- [FEATURE] Implement Disk gRPC Service in Instance Manager for collecting SPDK disk statistics from SPDK gRPC service ([5744](https://github.com/longhorn/longhorn/issues/5744)) - @derekbit @chriscchien
- [FEATURE] Support for SPDK RAID1 by setting the minimum number of base_bdevs to 1 ([5758](https://github.com/longhorn/longhorn/issues/5758)) - @yangchiu @DamiaSan
- [FEATURE] Add a global setting for enabling and disabling SPDK feature ([5778](https://github.com/longhorn/longhorn/issues/5778)) - @yangchiu @derekbit
- [FEATURE] Identify and manage orphaned lvols and raid bdevs if the associated `Volume` resources are not existing ([5827](https://github.com/longhorn/longhorn/issues/5827)) - @yangchiu @derekbit
- [FEATURE] Longhorn UI for SPDK feature ([5846](https://github.com/longhorn/longhorn/issues/5846)) - @smallteeths @chriscchien
- [FEATURE] UI modification to work with new AD mechanism (Longhorn UI -> Longhorn API) ([6004](https://github.com/longhorn/longhorn/issues/6004)) - @yangchiu @smallteeths
- [FEATURE] Replica offline rebuild over SPDK - data engine ([6067](https://github.com/longhorn/longhorn/issues/6067)) - @shuo-wu
- [FEATURE] Support automatic offline replica rebuilding of volumes using SPDK data engine ([6071](https://github.com/longhorn/longhorn/issues/6071)) - @yangchiu @derekbit
## Improvement
- [IMPROVEMENT] Do not count the failure replica reuse failure caused by the disconnection ([1923](https://github.com/longhorn/longhorn/issues/1923)) - @yangchiu @mantissahz
- [IMPROVEMENT] Consider changing the over provisioning default/recommendation to 100% percentage (no over provisioning) ([2694](https://github.com/longhorn/longhorn/issues/2694)) - @c3y1huang @chriscchien
- [BUG] StorageClass of pv and pvc of a recovered pv should not always be default. ([3506](https://github.com/longhorn/longhorn/issues/3506)) - @ChanYiLin @smallteeths @roger-ryao
- [IMPROVEMENT] Auto-attach volume for K8s CSI snapshot ([3726](https://github.com/longhorn/longhorn/issues/3726)) - @weizhe0422 @PhanLe1010
- [IMPROVEMENT] Change Longhorn API to create/delete snapshot CRs instead of calling engine CLI ([3995](https://github.com/longhorn/longhorn/issues/3995)) - @yangchiu @PhanLe1010
- [IMPROVEMENT] Add support for crypto parameters for RWX volumes ([4829](https://github.com/longhorn/longhorn/issues/4829)) - @mantissahz @roger-ryao
- [IMPROVEMENT] Remove the global setting `mkfs-ext4-parameters` ([4914](https://github.com/longhorn/longhorn/issues/4914)) - @ejweber @roger-ryao
- [IMPROVEMENT] Move all snapshot related settings at one place. ([4930](https://github.com/longhorn/longhorn/issues/4930)) - @smallteeths @roger-ryao
- [IMPROVEMENT] Remove system managed component image settings ([5028](https://github.com/longhorn/longhorn/issues/5028)) - @mantissahz @chriscchien
- [IMPROVEMENT] Set default `engine-replica-timeout` value for engine controller start command ([5031](https://github.com/longhorn/longhorn/issues/5031)) - @derekbit @chriscchien
- [IMPROVEMENT] Support bundle collects dmesg, syslog and related information of longhorn nodes ([5073](https://github.com/longhorn/longhorn/issues/5073)) - @weizhe0422 @roger-ryao
- [IMPROVEMENT] Collect volume, system, feature info for metrics for better usage awareness ([5235](https://github.com/longhorn/longhorn/issues/5235)) - @c3y1huang @chriscchien @roger-ryao
- [IMPROVEMENT] Update uninstallation info to include the 'Deleting Confirmation Flag' in chart ([5250](https://github.com/longhorn/longhorn/issues/5250)) - @PhanLe1010 @roger-ryao
- [IMPROVEMENT] Disable Revision Counter for Strict-Local dataLocality ([5257](https://github.com/longhorn/longhorn/issues/5257)) - @derekbit @roger-ryao
- [IMPROVEMENT] Fix Guaranteed Engine Manager CPU recommendation formula in UI ([5338](https://github.com/longhorn/longhorn/issues/5338)) - @c3y1huang @smallteeths @roger-ryao
- [IMPROVEMENT] Update PSP validation in the Longhorn upstream chart ([5339](https://github.com/longhorn/longhorn/issues/5339)) - @yangchiu @PhanLe1010
- [IMPROVEMENT] Update ganesha nfs to 4.2.3 ([5356](https://github.com/longhorn/longhorn/issues/5356)) - @derekbit @roger-ryao
- [IMPROVEMENT] Set write-cache of longhorn block device to off explicitly ([5382](https://github.com/longhorn/longhorn/issues/5382)) - @derekbit @chriscchien
- [IMPROVEMENT] Clean up unused backupstore mountpoint ([5391](https://github.com/longhorn/longhorn/issues/5391)) - @derekbit @chriscchien
- [DOC] Update Kubernetes version info to have consistent description from the longhorn documentation in chart ([5399](https://github.com/longhorn/longhorn/issues/5399)) - @ChanYiLin @roger-ryao
- [IMPROVEMENT] Fix BackingImage uploading/downloading flow to prevent client timeout ([5443](https://github.com/longhorn/longhorn/issues/5443)) - @ChanYiLin @chriscchien
- [IMPROVEMENT] Assign the pods to the same node where the strict-local volume is present ([5448](https://github.com/longhorn/longhorn/issues/5448)) - @c3y1huang @chriscchien
- [IMPROVEMENT] Have explicitly message when trying to attach a volume which it's engine and replica were on deleted node ([5545](https://github.com/longhorn/longhorn/issues/5545)) - @ChanYiLin @chriscchien
- [IMPROVEMENT] Create a new setting so that Longhorn removes PDB for instance-manager-r that doesn't have any running instance inside it ([5549](https://github.com/longhorn/longhorn/issues/5549)) - @PhanLe1010 @roger-ryao
- [IMPROVEMENT] Merge conversion/admission webhook and recovery backend services into longhorn-manager ([5590](https://github.com/longhorn/longhorn/issues/5590)) - @ChanYiLin @chriscchien
- [IMPROVEMENT][UI] Recurring jobs create new snapshots while being not able to clean up old one ([5610](https://github.com/longhorn/longhorn/issues/5610)) - @mantissahz @smallteeths @roger-ryao
- [IMPROVEMENT] Only activate replica if it doesn't have deletion timestamp during volume engine upgrade ([5632](https://github.com/longhorn/longhorn/issues/5632)) - @PhanLe1010 @roger-ryao
- [IMPROVEMENT] Clean up backup target if the backup target setting is unset ([5655](https://github.com/longhorn/longhorn/issues/5655)) - @yangchiu @ChanYiLin
- [IMPROVEMENT] Bump CSI sidecar components' version ([5672](https://github.com/longhorn/longhorn/issues/5672)) - @yangchiu @ejweber
- [IMPROVEMENT] Configure log level of Longhorn components ([5888](https://github.com/longhorn/longhorn/issues/5888)) - @ChanYiLin @weizhe0422
- [IMPROVEMENT] Remove development toolchain from Longhorn images ([6022](https://github.com/longhorn/longhorn/issues/6022)) - @ChanYiLin @derekbit
- [IMPROVEMENT] Reduce replica process's number of allocated ports ([6079](https://github.com/longhorn/longhorn/issues/6079)) - @ChanYiLin @derekbit
- [IMPROVEMENT] UI supports automatic replica rebuilding for SPDK volumes ([6107](https://github.com/longhorn/longhorn/issues/6107)) - @smallteeths @roger-ryao
- [IMPROVEMENT] Minor UX changes for Longhorn SPDK ([6126](https://github.com/longhorn/longhorn/issues/6126)) - @derekbit @roger-ryao
- [IMPROVEMENT] Instance manager spdk_tgt resilience due to spdk_tgt crash ([6155](https://github.com/longhorn/longhorn/issues/6155)) - @yangchiu @derekbit
- [IMPROVEMENT] Determine number of replica/engine port count in longhorn-manager (control plane) instead ([6163](https://github.com/longhorn/longhorn/issues/6163)) - @derekbit @chriscchien
- [IMPROVEMENT] SPDK client should functions after encountering decoding error ([6191](https://github.com/longhorn/longhorn/issues/6191)) - @yangchiu @shuo-wu
## Performance
- [REFACTORING] Evaluate the impact of removing the client side compression for backup blocks ([1409](https://github.com/longhorn/longhorn/issues/1409)) - @derekbit
## Resilience
- [BUG] If backing image downloading fails on one node, it doesn't try on other nodes. ([3746](https://github.com/longhorn/longhorn/issues/3746)) - @ChanYiLin
- [BUG] Replica rebuilding caused by rke2/kubelet restart ([5340](https://github.com/longhorn/longhorn/issues/5340)) - @derekbit @chriscchien
- [BUG] Volume restoration will never complete if attached node is down ([5464](https://github.com/longhorn/longhorn/issues/5464)) - @derekbit @weizhe0422 @chriscchien
- [BUG] Node disconnection test failed ([5476](https://github.com/longhorn/longhorn/issues/5476)) - @yangchiu @derekbit
- [BUG] Physical node down test failed ([5477](https://github.com/longhorn/longhorn/issues/5477)) - @derekbit @chriscchien
- [BUG] Backing image with sync failure ([5481](https://github.com/longhorn/longhorn/issues/5481)) - @ChanYiLin @roger-ryao
- [BUG] share-manager pod failed to restart after kubelet restart ([5507](https://github.com/longhorn/longhorn/issues/5507)) - @yangchiu @derekbit
- [BUG] Directly mark replica as failed if the node is deleted ([5542](https://github.com/longhorn/longhorn/issues/5542)) - @weizhe0422 @roger-ryao
- [BUG] RWX volume is stuck at detaching when the attached node is down ([5558](https://github.com/longhorn/longhorn/issues/5558)) - @derekbit @roger-ryao
- [BUG] Unable to export RAID1 bdev in degraded state ([5650](https://github.com/longhorn/longhorn/issues/5650)) - @chriscchien @DamiaSan
- [BUG] Backup monitor gets stuck in an infinite loop if backup isn't found ([5662](https://github.com/longhorn/longhorn/issues/5662)) - @derekbit @chriscchien
- [BUG] Resources such as replicas are somehow not mutated when network is unstable ([5762](https://github.com/longhorn/longhorn/issues/5762)) - @derekbit @roger-ryao
- [BUG] filesystem corrupted after delete instance-manager-r for a locality best-effort volume ([5801](https://github.com/longhorn/longhorn/issues/5801)) - @yangchiu @ChanYiLin @mantissahz
## Stability
- [BUG] nfs backup broken - NFS server: mkdir - file exists ([4626](https://github.com/longhorn/longhorn/issues/4626)) - @yangchiu @derekbit
- [BUG] Memory leak in CSI plugin caused by stuck umount processes if the RWX volume is already gone ([5296](https://github.com/longhorn/longhorn/issues/5296)) - @derekbit @roger-ryao
## Bugs
- [BUG] 'Upgrade Engine' still shows up in a specific situation when engine already upgraded ([3063](https://github.com/longhorn/longhorn/issues/3063)) - @weizhe0422 @PhanLe1010 @smallteeths
- [BUG] DR volume even after activation remains in standby mode if there are one or more failed replicas. ([3069](https://github.com/longhorn/longhorn/issues/3069)) - @yangchiu @mantissahz
- [BUG] volume not able to attach with raw type backing image ([3437](https://github.com/longhorn/longhorn/issues/3437)) - @yangchiu @ChanYiLin
- [BUG] Delete a uploading backing image, the corresponding LH temp file is not deleted ([3682](https://github.com/longhorn/longhorn/issues/3682)) - @ChanYiLin @chriscchien
- [BUG] Cloned PVC from detached volume will stuck at not ready for workload ([3692](https://github.com/longhorn/longhorn/issues/3692)) - @PhanLe1010 @chriscchien
- [BUG] Block device volume failed to unmount when it is detached unexpectedly ([3778](https://github.com/longhorn/longhorn/issues/3778)) - @PhanLe1010 @chriscchien
- [BUG] After migration of Longhorn from Rancher old UI to dashboard, the csi-plugin doesn't update ([4519](https://github.com/longhorn/longhorn/issues/4519)) - @mantissahz @roger-ryao
- [BUG] Volumes Stuck in Attach/Detach Loop when running on OpenShift/OKD ([4988](https://github.com/longhorn/longhorn/issues/4988)) - @ChanYiLin
- [BUG] Longhorn 1.3.2 fails to backup & restore volumes behind Internet proxy ([5054](https://github.com/longhorn/longhorn/issues/5054)) - @mantissahz @chriscchien
- [BUG] Instance manager pod does not respect of node taint? ([5161](https://github.com/longhorn/longhorn/issues/5161)) - @ejweber
- [BUG] RWX doesn't work with release 1.4.0 due to end grace update error from recovery backend ([5183](https://github.com/longhorn/longhorn/issues/5183)) - @derekbit @chriscchien
- [BUG] Incorrect indentation of charts/questions.yaml ([5196](https://github.com/longhorn/longhorn/issues/5196)) - @mantissahz @roger-ryao
- [BUG] Updating option "Allow snapshots removal during trim" for old volumes failed ([5218](https://github.com/longhorn/longhorn/issues/5218)) - @shuo-wu @roger-ryao
- [BUG] Since 1.4.0 RWX volume failing regularly ([5224](https://github.com/longhorn/longhorn/issues/5224)) - @derekbit
- [BUG] Can not create backup in engine image not fully deployed cluster ([5248](https://github.com/longhorn/longhorn/issues/5248)) - @ChanYiLin @roger-ryao
- [BUG] Incorrect router retry mechanism ([5259](https://github.com/longhorn/longhorn/issues/5259)) - @mantissahz @chriscchien
- [BUG] System Backup is stuck at Uploading if there are PVs not provisioned by CSI driver ([5286](https://github.com/longhorn/longhorn/issues/5286)) - @c3y1huang @chriscchien
- [BUG] Sync up with backup target during DR volume activation ([5292](https://github.com/longhorn/longhorn/issues/5292)) - @yangchiu @weizhe0422
- [BUG] environment_check.sh does not handle different kernel versions in cluster correctly ([5304](https://github.com/longhorn/longhorn/issues/5304)) - @achims311 @roger-ryao
- [BUG] instance-manager-r high memory consumption ([5312](https://github.com/longhorn/longhorn/issues/5312)) - @derekbit @roger-ryao
- [BUG] Unable to upgrade longhorn from v1.3.2 to master-head ([5368](https://github.com/longhorn/longhorn/issues/5368)) - @yangchiu @derekbit
- [BUG] Modify engineManagerCPURequest and replicaManagerCPURequest won't raise resource request in instance-manager-e pod ([5419](https://github.com/longhorn/longhorn/issues/5419)) - @c3y1huang
- [BUG] Error message not consistent between create/update recurring job when retain number greater than 50 ([5434](https://github.com/longhorn/longhorn/issues/5434)) - @c3y1huang @chriscchien
- [BUG] Do not copy Host header to API requests forwarded to Longhorn Manager ([5438](https://github.com/longhorn/longhorn/issues/5438)) - @yangchiu @smallteeths
- [BUG] RWX Volume attachment is getting Failed ([5456](https://github.com/longhorn/longhorn/issues/5456)) - @derekbit
- [BUG] test case test_backup_lock_deletion_during_restoration failed ([5458](https://github.com/longhorn/longhorn/issues/5458)) - @yangchiu @derekbit
- [BUG] Unable to create support bundle agent pod in air-gap environment ([5467](https://github.com/longhorn/longhorn/issues/5467)) - @yangchiu @c3y1huang
- [BUG] Example of data migration doesn't work for hidden/./dot-files) ([5484](https://github.com/longhorn/longhorn/issues/5484)) - @hedefalk @shuo-wu @chriscchien
- [BUG] Upgrade engine --> spec.restoreVolumeRecurringJob and spec.snapshotDataIntegrity Unsupported value ([5485](https://github.com/longhorn/longhorn/issues/5485)) - @yangchiu @derekbit
- [BUG] test case test_dr_volume_with_backup_block_deletion failed ([5489](https://github.com/longhorn/longhorn/issues/5489)) - @yangchiu @derekbit
- [BUG] Bulk backup deletion cause restoring volume to finish with attached state. ([5506](https://github.com/longhorn/longhorn/issues/5506)) - @ChanYiLin @roger-ryao
- [BUG] volume expansion starts for no reason, gets stuck on current size > expected size ([5513](https://github.com/longhorn/longhorn/issues/5513)) - @mantissahz @roger-ryao
- [BUG] RWX volume attachment failed if tried more enough times ([5537](https://github.com/longhorn/longhorn/issues/5537)) - @yangchiu @derekbit
- [BUG] instance-manager-e emits `Wait for process pvc-xxxx to shutdown` constantly ([5575](https://github.com/longhorn/longhorn/issues/5575)) - @derekbit @roger-ryao
- [BUG] Support bundle kit should respect node selector & taint toleration ([5614](https://github.com/longhorn/longhorn/issues/5614)) - @yangchiu @c3y1huang
- [BUG] Value overlapped in page Instance Manager Image ([5622](https://github.com/longhorn/longhorn/issues/5622)) - @smallteeths @chriscchien
- [BUG] Updated Rocky 9 (and others) can't attach due to SELinux ([5627](https://github.com/longhorn/longhorn/issues/5627)) - @yangchiu @ejweber
- [BUG] Fix misleading error messages when creating a mount point for a backup store ([5630](https://github.com/longhorn/longhorn/issues/5630)) - @derekbit
- [BUG] Instance manager PDB created with wrong selector thus blocking the draining of the wrongly selected node forever ([5680](https://github.com/longhorn/longhorn/issues/5680)) - @PhanLe1010 @chriscchien
- [BUG] During volume live engine upgrade, if the replica pod is killed, the volume is stuck in upgrading forever ([5684](https://github.com/longhorn/longhorn/issues/5684)) - @yangchiu @PhanLe1010
- [BUG] Instance manager PDBs cannot be removed if the longhorn-manager pod on its spec node is not available ([5688](https://github.com/longhorn/longhorn/issues/5688)) - @PhanLe1010 @roger-ryao
- [BUG] Rebuild rebuilding is possibly issued to a wrong replica ([5709](https://github.com/longhorn/longhorn/issues/5709)) - @ejweber @roger-ryao
- [BUG] Observing repilca on new IM-r before upgrading of volume ([5729](https://github.com/longhorn/longhorn/issues/5729)) - @c3y1huang
- [BUG] longhorn upgrade is not upgrading engineimage ([5740](https://github.com/longhorn/longhorn/issues/5740)) - @shuo-wu @chriscchien
- [BUG] `test_replica_auto_balance_when_replica_on_unschedulable_node` Error in creating volume with nodeSelector and dataLocality parameters ([5745](https://github.com/longhorn/longhorn/issues/5745)) - @c3y1huang @roger-ryao
- [BUG] Unable to backup volume after NFS server IP change ([5856](https://github.com/longhorn/longhorn/issues/5856)) - @derekbit @roger-ryao
- [BUG] Prevent Longhorn uninstallation from getting stuck due to backups in error ([5868](https://github.com/longhorn/longhorn/issues/5868)) - @ChanYiLin @mantissahz
- [BUG] Unable to create support bundle if the previous one stayed in ReadyForDownload phase ([5882](https://github.com/longhorn/longhorn/issues/5882)) - @c3y1huang @roger-ryao
- [BUG] share-manager for a given pvc keep restarting (other pvc are working fine) ([5954](https://github.com/longhorn/longhorn/issues/5954)) - @yangchiu @derekbit
- [BUG] Replica auto-rebalance doesn't respect node selector ([5971](https://github.com/longhorn/longhorn/issues/5971)) - @c3y1huang @roger-ryao
- [BUG] Volume detached automatically after upgrade Longhorn ([5983](https://github.com/longhorn/longhorn/issues/5983)) - @yangchiu @PhanLe1010
- [BUG] Extra snapshot generated when clone from a detached volume ([5986](https://github.com/longhorn/longhorn/issues/5986)) - @weizhe0422 @ejweber
- [BUG] User created snapshot deleted after node drain and uncordon ([5992](https://github.com/longhorn/longhorn/issues/5992)) - @yangchiu @mantissahz
- [BUG] Webhook PDBs are not removed after upgrading to master-head ([6026](https://github.com/longhorn/longhorn/issues/6026)) - @weizhe0422 @PhanLe1010
- [BUG] In some specific situation, system backup auto deleted when creating another one ([6045](https://github.com/longhorn/longhorn/issues/6045)) - @c3y1huang @chriscchien
- [BUG] Backing Image deletion stuck if it's deleted during uploading process and bids is ready-for-transfer state ([6086](https://github.com/longhorn/longhorn/issues/6086)) - @WebberHuang1118 @chriscchien
- [BUG] A backup target backed by a Samba server is not recognized ([6100](https://github.com/longhorn/longhorn/issues/6100)) - @derekbit @weizhe0422
- [BUG] Backing image manager fails when SELinux is enabled ([6108](https://github.com/longhorn/longhorn/issues/6108)) - @ejweber @chriscchien
- [BUG] Force delete volume make SPDK disk unschedule ([6110](https://github.com/longhorn/longhorn/issues/6110)) - @derekbit
- [BUG] share-manager terminated during Longhorn upgrading causes rwx volume not working ([6120](https://github.com/longhorn/longhorn/issues/6120)) - @yangchiu @derekbit
- [BUG] SPDK Volume snapshotList API Error ([6123](https://github.com/longhorn/longhorn/issues/6123)) - @derekbit @chriscchien
- [BUG] test_recurring_jobs_allow_detached_volume failed ([6124](https://github.com/longhorn/longhorn/issues/6124)) - @ChanYiLin @roger-ryao
- [BUG] Cron job triggered replica rebuilding keeps repeating itself after corrupting snapshot data ([6129](https://github.com/longhorn/longhorn/issues/6129)) - @yangchiu @mantissahz
- [BUG] test_dr_volume_with_restore_command_error failed ([6130](https://github.com/longhorn/longhorn/issues/6130)) - @mantissahz @roger-ryao
- [BUG] RWX volume remains attached after workload deleted if it's upgraded from v1.4.2 ([6139](https://github.com/longhorn/longhorn/issues/6139)) - @PhanLe1010 @chriscchien
- [BUG] timestamp or checksum not matched in test_snapshot_hash_detect_corruption test case ([6145](https://github.com/longhorn/longhorn/issues/6145)) - @yangchiu @derekbit
- [BUG] When a v2 volume is attached in maintenance mode, removing a replica will lead to volume stuck in attaching-detaching loop ([6166](https://github.com/longhorn/longhorn/issues/6166)) - @derekbit @chriscchien
- [BUG] Misleading offline rebuilding hint if offline rebuilding is not enabled ([6169](https://github.com/longhorn/longhorn/issues/6169)) - @smallteeths @roger-ryao
- [BUG] Longhorn doesn't remove the system backups crd on uninstallation ([6185](https://github.com/longhorn/longhorn/issues/6185)) - @c3y1huang @khushboo-rancher
- [BUG] Volume attachment related error logs in uninstaller pod ([6197](https://github.com/longhorn/longhorn/issues/6197)) - @yangchiu @PhanLe1010
- [BUG] Test case test_ha_backup_deletion_recovery failed in rhel or rockylinux arm64 environment ([6213](https://github.com/longhorn/longhorn/issues/6213)) - @yangchiu @ChanYiLin @mantissahz
- [BUG] migration test cases could fail due to unexpected volume controllers and replicas status ([6215](https://github.com/longhorn/longhorn/issues/6215)) - @yangchiu @PhanLe1010
- [BUG] Engine continues to attempt to rebuild replica while detaching ([6217](https://github.com/longhorn/longhorn/issues/6217)) - @yangchiu @ejweber
## Misc
- [TASK] Remove deprecated volume spec recurringJobs and storageClass recurringJobs field ([2865](https://github.com/longhorn/longhorn/issues/2865)) - @c3y1huang @chriscchien
- [TASK] Remove deprecated fields after CRD API version bump ([3289](https://github.com/longhorn/longhorn/issues/3289)) - @c3y1huang @roger-ryao
- [TASK] Replace jobq lib with an alternative way for listing remote backup volumes and info ([4176](https://github.com/longhorn/longhorn/issues/4176)) - @ChanYiLin @chriscchien
- [DOC] Update the Longhorn document in Uninstalling Longhorn using kubectl ([4841](https://github.com/longhorn/longhorn/issues/4841)) - @roger-ryao
- [TASK] Remove a deprecated feature `disable-replica-rebuild` from longhorn-manager ([4997](https://github.com/longhorn/longhorn/issues/4997)) - @ejweber @chriscchien
- [TASK] Update the distro matrix supports on Longhorn docs for 1.5 ([5177](https://github.com/longhorn/longhorn/issues/5177)) - @yangchiu
- [TASK] Clarify if any upcoming K8s API deprecation/removal will impact Longhorn 1.4 ([5180](https://github.com/longhorn/longhorn/issues/5180)) - @PhanLe1010
- [TASK] Revert affinity for Longhorn user deployed components ([5191](https://github.com/longhorn/longhorn/issues/5191)) - @weizhe0422 @ejweber
- [TASK] Add GitHub action for CI to lib repos for supporting dependency bot ([5239](https://github.com/longhorn/longhorn/issues/5239)) -
- [DOC] Update the readme of longhorn-spdk-engine about using new Longhorn (RAID1) bdev ([5256](https://github.com/longhorn/longhorn/issues/5256)) - @DamiaSan
- [TASK][UI] add new recurring job tasks ([5272](https://github.com/longhorn/longhorn/issues/5272)) - @smallteeths @chriscchien
- [DOC] Update the node maintenance doc to cover upgrade prerequisites for Rancher ([5278](https://github.com/longhorn/longhorn/issues/5278)) - @PhanLe1010
- [TASK] Run build-engine-test-images automatically when having incompatible engine on master ([5400](https://github.com/longhorn/longhorn/issues/5400)) - @yangchiu
- [TASK] Update k8s.gcr.io to registry.k8s.io in repos ([5432](https://github.com/longhorn/longhorn/issues/5432)) - @yangchiu
- [TASK][UI] add new recurring job task - filesystem trim ([5529](https://github.com/longhorn/longhorn/issues/5529)) - @smallteeths @chriscchien
- doc: update prerequisites in chart readme to make it consistent with documentation v1.3.x ([5531](https://github.com/longhorn/longhorn/pull/5531)) - @ChanYiLin
- [FEATURE] Remove deprecated `allow-node-drain-with-last-healthy-replica` ([5620](https://github.com/longhorn/longhorn/issues/5620)) - @weizhe0422 @PhanLe1010
- [FEATURE] Set recurring jobs to PVCs ([5791](https://github.com/longhorn/longhorn/issues/5791)) - @yangchiu @c3y1huang
- [TASK] Automatically update crds.yaml in longhorn repo from longhorn-manager repo ([5854](https://github.com/longhorn/longhorn/issues/5854)) - @yangchiu
- [IMPROVEMENT] Remove privilege requirement from lifecycle jobs ([5862](https://github.com/longhorn/longhorn/issues/5862)) - @mantissahz @chriscchien
- [TASK][UI] support new aio typed instance managers ([5876](https://github.com/longhorn/longhorn/issues/5876)) - @smallteeths @chriscchien
- [TASK] Remove `Guaranteed Engine Manager CPU`, `Guaranteed Replica Manager CPU`, and `Guaranteed Engine CPU` settings. ([5917](https://github.com/longhorn/longhorn/issues/5917)) - @c3y1huang @roger-ryao
- [TASK][UI] Support volume backup policy ([6028](https://github.com/longhorn/longhorn/issues/6028)) - @smallteeths @chriscchien
- [TASK] Reduce BackupConcurrentLimit and RestoreConcurrentLimit default values ([6135](https://github.com/longhorn/longhorn/issues/6135)) - @derekbit @chriscchien
## Contributors
- @ChanYiLin
- @DamiaSan
- @PhanLe1010
- @WebberHuang1118
- @achims311
- @c3y1huang
- @chriscchien
- @derekbit
- @ejweber
- @hedefalk
- @innobead
- @khushboo-rancher
- @mantissahz
- @roger-ryao
- @shuo-wu
- @smallteeths
- @weizhe0422
- @yangchiu

View File

@ -0,0 +1,65 @@
## Release Note
### **v1.5.1 released!** 🎆
Longhorn v1.5.1 is the latest version of Longhorn 1.5.
This release introduces bug fixes as described below about 1.5.0 upgrade issues, stability, troubleshooting and so on. Please try it and feedback. Thanks for all the contributions!
> For the definition of stable or latest release, please check [here](https://github.com/longhorn/longhorn#releases).
## Installation
> **Please ensure your Kubernetes cluster is at least v1.21 before installing v1.5.1.**
Longhorn supports 3 installation ways including Rancher App Marketplace, Kubectl, and Helm. Follow the installation instructions [here](https://longhorn.io/docs/1.5.1/deploy/install/).
## Upgrade
> **Please read the [important notes](https://longhorn.io/docs/1.5.1/deploy/important-notes/) first and ensure your Kubernetes cluster is at least v1.21 before upgrading to Longhorn v1.5.1 from v1.4.x/v1.5.0, which are only supported source versions.**
Follow the upgrade instructions [here](https://longhorn.io/docs/1.5.1/deploy/upgrade/).
## Deprecation & Incompatibilities
N/A
## Known Issues after Release
Please follow up on [here](https://github.com/longhorn/longhorn/wiki/Outstanding-Known-Issues-of-Releases) about any outstanding issues found after this release.
## Improvement
- [IMPROVEMENT] Implement/fix the unit tests of Volume Attachment and volume controller ([6005](https://github.com/longhorn/longhorn/issues/6005)) - @PhanLe1010
- [QUESTION] Repetitive warnings and errors in a new longhorn setup ([6257](https://github.com/longhorn/longhorn/issues/6257)) - @derekbit @c3y1huang @roger-ryao
## Resilience
- [BUG] 1.5.0 Upgrade: Longhorn conversion webhook server fails ([6259](https://github.com/longhorn/longhorn/issues/6259)) - @derekbit @roger-ryao
- [BUG] Race leaves snapshot CRs that cannot be deleted ([6298](https://github.com/longhorn/longhorn/issues/6298)) - @yangchiu @PhanLe1010 @ejweber
## Bugs
- [BUG] Engine continues to attempt to rebuild replica while detaching ([6217](https://github.com/longhorn/longhorn/issues/6217)) - @yangchiu @ejweber
- [BUG] Upgrade to 1.5.0 failed: validator.longhorn.io denied the request if having orphan resources ([6246](https://github.com/longhorn/longhorn/issues/6246)) - @derekbit @roger-ryao
- [BUG] Unable to receive support bundle from UI when it's large (400MB+) ([6256](https://github.com/longhorn/longhorn/issues/6256)) - @c3y1huang @chriscchien
- [BUG] Longhorn Manager Pods CrashLoop after upgrade from 1.4.0 to 1.5.0 while backing up volumes ([6264](https://github.com/longhorn/longhorn/issues/6264)) - @ChanYiLin @roger-ryao
- [BUG] Can not delete type=`bi` VolumeSnapshot if related backing image not exist ([6266](https://github.com/longhorn/longhorn/issues/6266)) - @ChanYiLin @chriscchien
- [BUG] 1.5.0: AttachVolume.Attach failed for volume, the volume is currently attached to a different node ([6287](https://github.com/longhorn/longhorn/issues/6287)) - @yangchiu @derekbit
- [BUG] test case test_setting_priority_class failed in master and v1.5.x ([6319](https://github.com/longhorn/longhorn/issues/6319)) - @derekbit @chriscchien
- [BUG] Unused webhook and recovery backend deployment left in helm chart ([6252](https://github.com/longhorn/longhorn/issues/6252)) - @ChanYiLin @chriscchien
## Misc
- [DOC] v1.5.0 additional outgoing firewall ports need to be opened 9501 9502 9503 ([6317](https://github.com/longhorn/longhorn/issues/6317)) - @ChanYiLin @chriscchien
## Contributors
- @ChanYiLin
- @PhanLe1010
- @c3y1huang
- @chriscchien
- @derekbit
- @ejweber
- @innobead
- @roger-ryao
- @yangchiu

View File

@ -3,5 +3,6 @@ The list of current Longhorn maintainers:
Name, <Email>, @GitHubHandle Name, <Email>, @GitHubHandle
Sheng Yang, <sheng@yasker.org>, @yasker Sheng Yang, <sheng@yasker.org>, @yasker
Shuo Wu, <shuo.wu@suse.com>, @shuo-wu Shuo Wu, <shuo.wu@suse.com>, @shuo-wu
Joshua Moody, <joshua.moody@suse.com>, @joshimoo
David Ko, <dko@suse.com>, @innobead David Ko, <dko@suse.com>, @innobead
Derek Su, <derek.su@suse.com>, @derekbit
Phan Le, <phan.le@suse.com>, @PhanLe1010

112
README.md
View File

@ -1,8 +1,20 @@
# Longhorn <h1 align="center" style="border-bottom: none">
<a href="https://longhorn.io/" target="_blank"><img alt="Longhorn" width="120px" src="https://github.com/longhorn/website/blob/master/static/img/icon-longhorn.svg"></a><br>Longhorn
</h1>
Longhorn is a distributed block storage system for Kubernetes. Longhorn is cloud native storage because it is built using Kubernetes and container primitives. <p align="center">A CNCF Incubating Project. Visit <a href="https://longhorn.io/" target="_blank">longhorn.io</a> for the full documentation.</p>
Longhorn is lightweight, reliable, and powerful. You can install Longhorn on an existing Kubernetes cluster with one `kubectl apply` command or using Helm charts. Once Longhorn is installed, it adds persistent volume support to the Kubernetes cluster. <div align="center">
[![Releases](https://img.shields.io/github/release/longhorn/longhorn/all.svg)](https://github.com/longhorn/longhorn/releases)
[![GitHub](https://img.shields.io/github/license/longhorn/longhorn)](https://github.com/longhorn/longhorn/blob/master/LICENSE)
[![Docs](https://img.shields.io/badge/docs-latest-green.svg)](https://longhorn.io/docs/latest/)
</div>
Longhorn is a distributed block storage system for Kubernetes. Longhorn is cloud-native storage built using Kubernetes and container primitives.
Longhorn is lightweight, reliable, and powerful. You can install Longhorn on an existing Kubernetes cluster with one `kubectl apply`command or by using Helm charts. Once Longhorn is installed, it adds persistent volume support to the Kubernetes cluster.
Longhorn implements distributed block storage using containers and microservices. Longhorn creates a dedicated storage controller for each block device volume and synchronously replicates the volume across multiple replicas stored on multiple nodes. The storage controller and replicas are themselves orchestrated using Kubernetes. Here are some notable features of Longhorn: Longhorn implements distributed block storage using containers and microservices. Longhorn creates a dedicated storage controller for each block device volume and synchronously replicates the volume across multiple replicas stored on multiple nodes. The storage controller and replicas are themselves orchestrated using Kubernetes. Here are some notable features of Longhorn:
@ -15,40 +27,37 @@ Longhorn implements distributed block storage using containers and microservices
You can read more technical details of Longhorn [here](https://longhorn.io/). You can read more technical details of Longhorn [here](https://longhorn.io/).
## Current Status # Releases
The latest release of Longhorn is [![Releases](https://img.shields.io/github/release/longhorn/longhorn/all.svg)](https://github.com/longhorn/longhorn/releases) > **NOTE**:
> - __\<version\>*__ means the release branch is under active support and will have periodic follow-up patch releases.
> - __Latest__ release means the version is the latest release of the newest release branch.
> - __Stable__ release means the version is stable and has been widely adopted by users.
https://github.com/longhorn/longhorn/releases
| Release | Version | Type | Release Note (Changelog) | Important Note |
|-----------|---------|----------------|----------------------------------------------------------------|-------------------------------------------------------------|
| **1.5*** | 1.5.1 | Latest | [🔗](https://github.com/longhorn/longhorn/releases/tag/v1.5.1) | [🔗](https://longhorn.io/docs/1.5.1/deploy/important-notes) |
| **1.4*** | 1.4.4 | Stable | [🔗](https://github.com/longhorn/longhorn/releases/tag/v1.4.4) | [🔗](https://longhorn.io/docs/1.4.4/deploy/important-notes) |
| 1.3 | 1.3.3 | Stable | [🔗](https://github.com/longhorn/longhorn/releases/tag/v1.3.3) | [🔗](https://longhorn.io/docs/1.3.3/deploy/important-notes) |
| 1.2 | 1.2.6 | Stable | [🔗](https://github.com/longhorn/longhorn/releases/tag/v1.2.6) | [🔗](https://longhorn.io/docs/1.2.6/deploy/important-notes) |
| 1.1 | 1.1.3 | Stable | [🔗](https://github.com/longhorn/longhorn/releases/tag/v1.1.3) | |
# Roadmap
https://github.com/longhorn/longhorn/wiki/Roadmap
# Components
Longhorn is 100% open source software. Project source code is spread across a number of repos:
## Build Status
* Engine: [![Build Status](https://drone-publish.longhorn.io/api/badges/longhorn/longhorn-engine/status.svg)](https://drone-publish.longhorn.io/longhorn/longhorn-engine)[![Go Report Card](https://goreportcard.com/badge/github.com/longhorn/longhorn-engine)](https://goreportcard.com/report/github.com/longhorn/longhorn-engine)[![FOSSA Status](https://app.fossa.com/api/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-engine.svg?type=shield)](https://app.fossa.com/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-engine?ref=badge_shield) * Engine: [![Build Status](https://drone-publish.longhorn.io/api/badges/longhorn/longhorn-engine/status.svg)](https://drone-publish.longhorn.io/longhorn/longhorn-engine)[![Go Report Card](https://goreportcard.com/badge/github.com/longhorn/longhorn-engine)](https://goreportcard.com/report/github.com/longhorn/longhorn-engine)[![FOSSA Status](https://app.fossa.com/api/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-engine.svg?type=shield)](https://app.fossa.com/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-engine?ref=badge_shield)
* Manager: [![Build Status](https://drone-publish.longhorn.io/api/badges/longhorn/longhorn-manager/status.svg)](https://drone-publish.longhorn.io/longhorn/longhorn-manager)[![Go Report Card](https://goreportcard.com/badge/github.com/longhorn/longhorn-manager)](https://goreportcard.com/report/github.com/longhorn/longhorn-manager)[![FOSSA Status](https://app.fossa.com/api/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-manager.svg?type=shield)](https://app.fossa.com/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-manager?ref=badge_shield) * Manager: [![Build Status](https://drone-publish.longhorn.io/api/badges/longhorn/longhorn-manager/status.svg)](https://drone-publish.longhorn.io/longhorn/longhorn-manager)[![Go Report Card](https://goreportcard.com/badge/github.com/longhorn/longhorn-manager)](https://goreportcard.com/report/github.com/longhorn/longhorn-manager)[![FOSSA Status](https://app.fossa.com/api/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-manager.svg?type=shield)](https://app.fossa.com/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-manager?ref=badge_shield)
* Instance Manager: [![Build Status](http://drone-publish.longhorn.io/api/badges/longhorn/longhorn-instance-manager/status.svg)](http://drone-publish.longhorn.io/longhorn/longhorn-instance-manager)[![Go Report Card](https://goreportcard.com/badge/github.com/longhorn/longhorn-instance-manager)](https://goreportcard.com/report/github.com/longhorn/longhorn-instance-manager)[![FOSSA Status](https://app.fossa.com/api/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-instance-manager.svg?type=shield)](https://app.fossa.com/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-instance-manager?ref=badge_shield) * Instance Manager: [![Build Status](http://drone-publish.longhorn.io/api/badges/longhorn/longhorn-instance-manager/status.svg)](http://drone-publish.longhorn.io/longhorn/longhorn-instance-manager)[![Go Report Card](https://goreportcard.com/badge/github.com/longhorn/longhorn-instance-manager)](https://goreportcard.com/report/github.com/longhorn/longhorn-instance-manager)[![FOSSA Status](https://app.fossa.com/api/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-instance-manager.svg?type=shield)](https://app.fossa.com/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-instance-manager?ref=badge_shield)
* Share Manager: [![Build Status](http://drone-publish.longhorn.io/api/badges/longhorn/longhorn-share-manager/status.svg)](http://drone-publish.longhorn.io/longhorn/longhorn-share-manager)[![Go Report Card](https://goreportcard.com/badge/github.com/longhorn/longhorn-share-manager)](https://goreportcard.com/report/github.com/longhorn/longhorn-share-manager)[![FOSSA Status](https://app.fossa.com/api/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-share-manager.svg?type=shield)](https://app.fossa.com/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-share-manager?ref=badge_shield) * Share Manager: [![Build Status](http://drone-publish.longhorn.io/api/badges/longhorn/longhorn-share-manager/status.svg)](http://drone-publish.longhorn.io/longhorn/longhorn-share-manager)[![Go Report Card](https://goreportcard.com/badge/github.com/longhorn/longhorn-share-manager)](https://goreportcard.com/report/github.com/longhorn/longhorn-share-manager)[![FOSSA Status](https://app.fossa.com/api/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-share-manager.svg?type=shield)](https://app.fossa.com/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-share-manager?ref=badge_shield)
* Backing Image Manager: [![Build Status](http://drone-publish.longhorn.io/api/badges/longhorn/backing-image-manager/status.svg)](http://drone-publish.longhorn.io/longhorn/backing-image-manager)[![Go Report Card](https://goreportcard.com/badge/github.com/longhorn/backing-image-manager)](https://goreportcard.com/report/github.com/longhorn/backing-image-manager)[![FOSSA Status](https://app.fossa.com/api/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Fbacking-image-manager.svg?type=shield)](https://app.fossa.com/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Fbacking-image-manager?ref=badge_shield) * Backing Image Manager: [![Build Status](http://drone-publish.longhorn.io/api/badges/longhorn/backing-image-manager/status.svg)](http://drone-publish.longhorn.io/longhorn/backing-image-manager)[![Go Report Card](https://goreportcard.com/badge/github.com/longhorn/backing-image-manager)](https://goreportcard.com/report/github.com/longhorn/backing-image-manager)[![FOSSA Status](https://app.fossa.com/api/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Fbacking-image-manager.svg?type=shield)](https://app.fossa.com/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Fbacking-image-manager?ref=badge_shield)
* UI: [![Build Status](https://drone-publish.longhorn.io/api/badges/longhorn/longhorn-ui/status.svg)](https://drone-publish.longhorn.io/longhorn/longhorn-ui)[![FOSSA Status](https://app.fossa.com/api/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-ui.svg?type=shield)](https://app.fossa.com/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-ui?ref=badge_shield) * UI: [![Build Status](https://drone-publish.longhorn.io/api/badges/longhorn/longhorn-ui/status.svg)](https://drone-publish.longhorn.io/longhorn/longhorn-ui)[![FOSSA Status](https://app.fossa.com/api/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-ui.svg?type=shield)](https://app.fossa.com/projects/custom%2B25850%2Fgithub.com%2Flonghorn%2Flonghorn-ui?ref=badge_shield)
* Test: [![Build Status](http://drone-publish.longhorn.io/api/badges/longhorn/longhorn-tests/status.svg)](http://drone-publish.longhorn.io/longhorn/longhorn-tests)
## Release Status
| Release | Version | Type |
|---------|---------|--------|
| 1.3 | 1.3.2 | Stable |
| 1.2 | 1.2.6 | Stable |
| 1.1 | 1.1.3 | Stable |
## Get Involved
### Community Meeting and Office Hours
Hosted by the core maintainers of Longhorn: 4th Friday of the every month at 09:00 (CET) or 16:00 (CST) at https://community.cncf.io/longhorn-community/.
### Longhorn Mailing List
Stay up to date on the latest news and events: https://lists.cncf.io/g/cncf-longhorn
You can read more about the community and its events here: https://github.com/longhorn/community
## Source code
Longhorn is 100% open source software. Project source code is spread across a number of repos:
| Component | What it does | GitHub repo | | Component | What it does | GitHub repo |
| :----------------------------- | :--------------------------------------------------------------------- | :------------------------------------------------------------------------------------------ | | :----------------------------- | :--------------------------------------------------------------------- | :------------------------------------------------------------------------------------------ |
@ -61,18 +70,21 @@ Longhorn is 100% open source software. Project source code is spread across a nu
![Longhorn UI](./longhorn-ui.png) ![Longhorn UI](./longhorn-ui.png)
# Get Started
## Requirements ## Requirements
For the installation requirements, refer to the [Longhorn documentation.](https://longhorn.io/docs/latest/deploy/install/#installation-requirements) For the installation requirements, refer to the [Longhorn documentation.](https://longhorn.io/docs/latest/deploy/install/#installation-requirements)
## Installation ## Installation
> **NOTE**: Please note that the master branch is for the upcoming feature release development. > **NOTE**:
> Please note that the master branch is for the upcoming feature release development.
> For an official release installation or upgrade, please refer to the below ways. > For an official release installation or upgrade, please refer to the below ways.
Longhorn can be installed on a Kubernetes cluster in several ways: Longhorn can be installed on a Kubernetes cluster in several ways:
- [Rancher catalog app](https://longhorn.io/docs/latest/deploy/install/install-with-rancher/) - [Rancher App Marketplace](https://longhorn.io/docs/latest/deploy/install/install-with-rancher/)
- [kubectl](https://longhorn.io/docs/latest/deploy/install/install-with-kubectl/) - [kubectl](https://longhorn.io/docs/latest/deploy/install/install-with-kubectl/)
- [Helm](https://longhorn.io/docs/latest/deploy/install/install-with-helm/) - [Helm](https://longhorn.io/docs/latest/deploy/install/install-with-helm/)
@ -80,6 +92,24 @@ Longhorn can be installed on a Kubernetes cluster in several ways:
The official Longhorn documentation is [here.](https://longhorn.io/docs) The official Longhorn documentation is [here.](https://longhorn.io/docs)
# Get Involved
## Discussion, Feedback
If having any discussions or feedbacks, feel free to [file a discussion](https://github.com/longhorn/longhorn/discussions).
## Features Request, Bug Reporting
If having any issues, feel free to [file an issue](https://github.com/longhorn/longhorn/issues/new/choose).
We have a weekly community issue review meeting to review all reported issues or enhancement requests.
When creating a bug issue, please help upload the support bundle to the issue or send to
[longhorn-support-bundle](mailto:longhorn-support-bundle@suse.com).
## Report Vulnerabilities
If having any vulnerabilities found, please report to [longhorn-security](mailto:longhorn-security@suse.com).
# Community # Community
Longhorn is open source software, so contributions are greatly welcome. Longhorn is open source software, so contributions are greatly welcome.
@ -91,25 +121,17 @@ If you have any feedbacks, feel free to [file an issue](https://github.com/longh
If having any discussion, feedbacks, requests, issues or security reports, please follow below ways. If having any discussion, feedbacks, requests, issues or security reports, please follow below ways.
We also have a [CNCF Slack channel: longhorn](https://cloud-native.slack.com/messages/longhorn) for discussion. We also have a [CNCF Slack channel: longhorn](https://cloud-native.slack.com/messages/longhorn) for discussion.
## Discussions or Feedbacks ## Community Meeting and Office Hours
Hosted by the core maintainers of Longhorn: 4th Friday of the every month at 09:00 (CET) or 16:00 (CST) at https://community.cncf.io/longhorn-community/.
If having any discussions or feedbacks, feel free to [file a discussion](https://github.com/longhorn/longhorn/discussions). ## Longhorn Mailing List
Stay up to date on the latest news and events: https://lists.cncf.io/g/cncf-longhorn
## Requests or Issues You can read more about the community and its events here: https://github.com/longhorn/community
If having any issues, feel free to [file an issue](https://github.com/longhorn/longhorn/issues/new/choose).
We have a weekly community issue review meeting to review all reported issues or enhancement requests.
When creating a bug issue, please help upload the support bundle to the issue or send to
[longhorn-support-bundle](mailto:longhorn-support-bundle@suse.com).
## Report Vulnerabilities
If having any vulnerabilities found, please report to [longhorn-security](mailto:longhorn-security@suse.com).
# License # License
Copyright (c) 2014-2021 The Longhorn Authors Copyright (c) 2014-2022 The Longhorn Authors
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

View File

@ -1,7 +1,7 @@
apiVersion: v1 apiVersion: v1
name: longhorn name: longhorn
version: 1.4.0-dev version: 1.6.0-dev
appVersion: v1.4.0-dev appVersion: v1.6.0-dev
kubeVersion: ">=1.21.0-0" kubeVersion: ">=1.21.0-0"
description: Longhorn is a distributed block storage system for Kubernetes. description: Longhorn is a distributed block storage system for Kubernetes.
keywords: keywords:

View File

@ -18,10 +18,24 @@ Longhorn is 100% open source software. Project source code is spread across a nu
## Prerequisites ## Prerequisites
1. A container runtime compatible with Kubernetes (Docker v1.13+, containerd v1.3.7+, etc.) 1. A container runtime compatible with Kubernetes (Docker v1.13+, containerd v1.3.7+, etc.)
2. Kubernetes v1.18+ 2. Kubernetes >= v1.21
3. Make sure `bash`, `curl`, `findmnt`, `grep`, `awk` and `blkid` has been installed in all nodes of the Kubernetes cluster. 3. Make sure `bash`, `curl`, `findmnt`, `grep`, `awk` and `blkid` has been installed in all nodes of the Kubernetes cluster.
4. Make sure `open-iscsi` has been installed, and the `iscsid` daemon is running on all nodes of the Kubernetes cluster. For GKE, recommended Ubuntu as guest OS image since it contains `open-iscsi` already. 4. Make sure `open-iscsi` has been installed, and the `iscsid` daemon is running on all nodes of the Kubernetes cluster. For GKE, recommended Ubuntu as guest OS image since it contains `open-iscsi` already.
## Upgrading to Kubernetes v1.25+
Starting in Kubernetes v1.25, [Pod Security Policies](https://kubernetes.io/docs/concepts/security/pod-security-policy/) have been removed from the Kubernetes API.
As a result, **before upgrading to Kubernetes v1.25** (or on a fresh install in a Kubernetes v1.25+ cluster), users are expected to perform an in-place upgrade of this chart with `enablePSP` set to `false` if it has been previously set to `true`.
> **Note:**
> If you upgrade your cluster to Kubernetes v1.25+ before removing PSPs via a `helm upgrade` (even if you manually clean up resources), **it will leave the Helm release in a broken state within the cluster such that further Helm operations will not work (`helm uninstall`, `helm upgrade`, etc.).**
>
> If your charts get stuck in this state, you may have to clean up your Helm release secrets.
Upon setting `enablePSP` to false, the chart will remove any PSP resources deployed on its behalf from the cluster. This is the default setting for this chart.
As a replacement for PSPs, [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/) should be used. Please consult the Longhorn docs for more details on how to configure your chart release namespaces to work with the new Pod Security Admission and apply Pod Security Standards.
## Installation ## Installation
1. Add Longhorn chart repository. 1. Add Longhorn chart repository.
``` ```
@ -49,14 +63,264 @@ helm install longhorn longhorn/longhorn --namespace longhorn-system
With Helm 2 to uninstall Longhorn. With Helm 2 to uninstall Longhorn.
``` ```
kubectl -n longhorn-system patch -p '{"value": "true"}' --type=merge lhs deleting-confirmation-flag
helm delete longhorn --purge helm delete longhorn --purge
``` ```
With Helm 3 to uninstall Longhorn. With Helm 3 to uninstall Longhorn.
``` ```
kubectl -n longhorn-system patch -p '{"value": "true"}' --type=merge lhs deleting-confirmation-flag
helm uninstall longhorn -n longhorn-system helm uninstall longhorn -n longhorn-system
kubectl delete namespace longhorn-system kubectl delete namespace longhorn-system
``` ```
## Values
The `values.yaml` contains items used to tweak a deployment of this chart.
### Cattle Settings
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| global.cattle.systemDefaultRegistry | string | `""` | System default registry |
| global.cattle.windowsCluster.defaultSetting.systemManagedComponentsNodeSelector | string | `"kubernetes.io/os:linux"` | Node selector for Longhorn system managed components |
| global.cattle.windowsCluster.defaultSetting.taintToleration | string | `"cattle.io/os=linux:NoSchedule"` | Toleration for Longhorn system managed components |
| global.cattle.windowsCluster.enabled | bool | `false` | Enable this to allow Longhorn to run on the Rancher deployed Windows cluster |
| global.cattle.windowsCluster.nodeSelector | object | `{"kubernetes.io/os":"linux"}` | Select Linux nodes to run Longhorn user deployed components |
| global.cattle.windowsCluster.tolerations | list | `[{"effect":"NoSchedule","key":"cattle.io/os","operator":"Equal","value":"linux"}]` | Tolerate Linux nodes to run Longhorn user deployed components |
### Network Policies
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| networkPolicies.enabled | bool | `false` | Enable NetworkPolicies to limit access to the Longhorn pods |
| networkPolicies.type | string | `"k3s"` | Create the policy based on your distribution to allow access for the ingress. Options: `k3s`, `rke2`, `rke1` |
### Image Settings
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| image.csi.attacher.repository | string | `"longhornio/csi-attacher"` | Specify CSI attacher image repository. Leave blank to autodetect |
| image.csi.attacher.tag | string | `"v4.2.0"` | Specify CSI attacher image tag. Leave blank to autodetect |
| image.csi.livenessProbe.repository | string | `"longhornio/livenessprobe"` | Specify CSI liveness probe image repository. Leave blank to autodetect |
| image.csi.livenessProbe.tag | string | `"v2.9.0"` | Specify CSI liveness probe image tag. Leave blank to autodetect |
| image.csi.nodeDriverRegistrar.repository | string | `"longhornio/csi-node-driver-registrar"` | Specify CSI node driver registrar image repository. Leave blank to autodetect |
| image.csi.nodeDriverRegistrar.tag | string | `"v2.7.0"` | Specify CSI node driver registrar image tag. Leave blank to autodetect |
| image.csi.provisioner.repository | string | `"longhornio/csi-provisioner"` | Specify CSI provisioner image repository. Leave blank to autodetect |
| image.csi.provisioner.tag | string | `"v3.4.1"` | Specify CSI provisioner image tag. Leave blank to autodetect |
| image.csi.resizer.repository | string | `"longhornio/csi-resizer"` | Specify CSI driver resizer image repository. Leave blank to autodetect |
| image.csi.resizer.tag | string | `"v1.7.0"` | Specify CSI driver resizer image tag. Leave blank to autodetect |
| image.csi.snapshotter.repository | string | `"longhornio/csi-snapshotter"` | Specify CSI driver snapshotter image repository. Leave blank to autodetect |
| image.csi.snapshotter.tag | string | `"v6.2.1"` | Specify CSI driver snapshotter image tag. Leave blank to autodetect. |
| image.longhorn.backingImageManager.repository | string | `"longhornio/backing-image-manager"` | Specify Longhorn backing image manager image repository |
| image.longhorn.backingImageManager.tag | string | `"master-head"` | Specify Longhorn backing image manager image tag |
| image.longhorn.engine.repository | string | `"longhornio/longhorn-engine"` | Specify Longhorn engine image repository |
| image.longhorn.engine.tag | string | `"master-head"` | Specify Longhorn engine image tag |
| image.longhorn.instanceManager.repository | string | `"longhornio/longhorn-instance-manager"` | Specify Longhorn instance manager image repository |
| image.longhorn.instanceManager.tag | string | `"master-head"` | Specify Longhorn instance manager image tag |
| image.longhorn.manager.repository | string | `"longhornio/longhorn-manager"` | Specify Longhorn manager image repository |
| image.longhorn.manager.tag | string | `"master-head"` | Specify Longhorn manager image tag |
| image.longhorn.shareManager.repository | string | `"longhornio/longhorn-share-manager"` | Specify Longhorn share manager image repository |
| image.longhorn.shareManager.tag | string | `"master-head"` | Specify Longhorn share manager image tag |
| image.longhorn.supportBundleKit.repository | string | `"longhornio/support-bundle-kit"` | Specify Longhorn support bundle manager image repository |
| image.longhorn.supportBundleKit.tag | string | `"v0.0.27"` | Specify Longhorn support bundle manager image tag |
| image.longhorn.ui.repository | string | `"longhornio/longhorn-ui"` | Specify Longhorn ui image repository |
| image.longhorn.ui.tag | string | `"master-head"` | Specify Longhorn ui image tag |
| image.openshift.oauthProxy.repository | string | `"quay.io/openshift/origin-oauth-proxy"` | For openshift user. Specify oauth proxy image repository |
| image.openshift.oauthProxy.tag | float | `4.13` | For openshift user. Specify oauth proxy image tag. Note: Use your OCP/OKD 4.X Version, Current Stable is 4.13 |
| image.pullPolicy | string | `"IfNotPresent"` | Image pull policy which applies to all user deployed Longhorn Components. e.g, Longhorn manager, Longhorn driver, Longhorn UI |
### Service Settings
| Key | Description |
|-----|-------------|
| service.manager.nodePort | NodePort port number (to set explicitly, choose port between 30000-32767) |
| service.manager.type | Define Longhorn manager service type. |
| service.ui.nodePort | NodePort port number (to set explicitly, choose port between 30000-32767) |
| service.ui.type | Define Longhorn UI service type. Options: `ClusterIP`, `NodePort`, `LoadBalancer`, `Rancher-Proxy` |
### StorageClass Settings
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| persistence.backingImage.dataSourceParameters | string | `nil` | Specify the data source parameters for the backing image used in Longhorn StorageClass. This option accepts a json string of a map. e.g., `'{\"url\":\"https://backing-image-example.s3-region.amazonaws.com/test-backing-image\"}'`. |
| persistence.backingImage.dataSourceType | string | `nil` | Specify the data source type for the backing image used in Longhorn StorageClass. If the backing image does not exists, Longhorn will use this field to create a backing image. Otherwise, Longhorn will use it to verify the selected backing image. |
| persistence.backingImage.enable | bool | `false` | Set backing image for Longhorn StorageClass |
| persistence.backingImage.expectedChecksum | string | `nil` | Specify the expected SHA512 checksum of the selected backing image in Longhorn StorageClass |
| persistence.backingImage.name | string | `nil` | Specify a backing image that will be used by Longhorn volumes in Longhorn StorageClass. If not exists, the backing image data source type and backing image data source parameters should be specified so that Longhorn will create the backing image before using it |
| persistence.defaultClass | bool | `true` | Set Longhorn StorageClass as default |
| persistence.defaultClassReplicaCount | int | `3` | Set replica count for Longhorn StorageClass |
| persistence.defaultDataLocality | string | `"disabled"` | Set data locality for Longhorn StorageClass. Options: `disabled`, `best-effort` |
| persistence.defaultFsType | string | `"ext4"` | Set filesystem type for Longhorn StorageClass |
| persistence.defaultMkfsParams | string | `""` | Set mkfs options for Longhorn StorageClass |
| persistence.defaultNodeSelector.enable | bool | `false` | Enable Node selector for Longhorn StorageClass |
| persistence.defaultNodeSelector.selector | string | `""` | This selector enables only certain nodes having these tags to be used for the volume. e.g. `"storage,fast"` |
| persistence.migratable | bool | `false` | Set volume migratable for Longhorn StorageClass |
| persistence.reclaimPolicy | string | `"Delete"` | Define reclaim policy. Options: `Retain`, `Delete` |
| persistence.recurringJobSelector.enable | bool | `false` | Enable recurring job selector for Longhorn StorageClass |
| persistence.recurringJobSelector.jobList | list | `[]` | Recurring job selector list for Longhorn StorageClass. Please be careful of quotes of input. e.g., `[{"name":"backup", "isGroup":true}]` |
| persistence.removeSnapshotsDuringFilesystemTrim | string | `"ignored"` | Allow automatically removing snapshots during filesystem trim for Longhorn StorageClass. Options: `ignored`, `enabled`, `disabled` |
### CSI Settings
| Key | Description |
|-----|-------------|
| csi.attacherReplicaCount | Specify replica count of CSI Attacher. Leave blank to use default count: 3 |
| csi.kubeletRootDir | Specify kubelet root-dir. Leave blank to autodetect |
| csi.provisionerReplicaCount | Specify replica count of CSI Provisioner. Leave blank to use default count: 3 |
| csi.resizerReplicaCount | Specify replica count of CSI Resizer. Leave blank to use default count: 3 |
| csi.snapshotterReplicaCount | Specify replica count of CSI Snapshotter. Leave blank to use default count: 3 |
### Longhorn Manager Settings
Longhorn system contains user deployed components (e.g, Longhorn manager, Longhorn driver, Longhorn UI) and system managed components (e.g, instance manager, engine image, CSI driver, etc.).
These settings only apply to Longhorn manager component.
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| longhornManager.log.format | string | `"plain"` | Options: `plain`, `json` |
| longhornManager.nodeSelector | object | `{}` | Select nodes to run Longhorn manager |
| longhornManager.priorityClass | string | `nil` | Priority class for longhorn manager |
| longhornManager.serviceAnnotations | object | `{}` | Annotation used in Longhorn manager service |
| longhornManager.tolerations | list | `[]` | Tolerate nodes to run Longhorn manager |
### Longhorn Driver Settings
Longhorn system contains user deployed components (e.g, Longhorn manager, Longhorn driver, Longhorn UI) and system managed components (e.g, instance manager, engine image, CSI driver, etc.).
These settings only apply to Longhorn driver component.
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| longhornDriver.nodeSelector | object | `{}` | Select nodes to run Longhorn driver |
| longhornDriver.priorityClass | string | `nil` | Priority class for longhorn driver |
| longhornDriver.tolerations | list | `[]` | Tolerate nodes to run Longhorn driver |
### Longhorn UI Settings
Longhorn system contains user deployed components (e.g, Longhorn manager, Longhorn driver, Longhorn UI) and system managed components (e.g, instance manager, engine image, CSI driver, etc.).
These settings only apply to Longhorn UI component.
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| longhornUI.nodeSelector | object | `{}` | Select nodes to run Longhorn UI |
| longhornUI.priorityClass | string | `nil` | Priority class count for longhorn ui |
| longhornUI.replicas | int | `2` | Replica count for longhorn ui |
| longhornUI.tolerations | list | `[]` | Tolerate nodes to run Longhorn UI |
### Ingress Settings
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| ingress.annotations | string | `nil` | Ingress annotations done as key:value pairs |
| ingress.enabled | bool | `false` | Set to true to enable ingress record generation |
| ingress.host | string | `"sslip.io"` | Layer 7 Load Balancer hostname |
| ingress.ingressClassName | string | `nil` | Add ingressClassName to the Ingress Can replace the kubernetes.io/ingress.class annotation on v1.18+ |
| ingress.path | string | `"/"` | If ingress is enabled you can set the default ingress path then you can access the UI by using the following full path {{host}}+{{path}} |
| ingress.secrets | string | `nil` | If you're providing your own certificates, please use this to add the certificates as secrets |
| ingress.secureBackends | bool | `false` | Enable this in order to enable that the backend service will be connected at port 443 |
| ingress.tls | bool | `false` | Set this to true in order to enable TLS on the ingress record |
| ingress.tlsSecret | string | `"longhorn.local-tls"` | If TLS is set to true, you must declare what secret will store the key/certificate for TLS |
### Private Registry Settings
Longhorn can be installed in an air gapped environment with private registry settings. Please refer to **Air Gap Installation** in our official site [link](https://longhorn.io/docs)
| Key | Description |
|-----|-------------|
| privateRegistry.createSecret | Set `true` to create a new private registry secret |
| privateRegistry.registryPasswd | Password used to authenticate to private registry |
| privateRegistry.registrySecret | If create a new private registry secret is true, create a Kubernetes secret with this name; else use the existing secret of this name. Use it to pull images from your private registry |
| privateRegistry.registryUrl | URL of private registry. Leave blank to apply system default registry |
| privateRegistry.registryUser | User used to authenticate to private registry |
### OS/Kubernetes Distro Settings
#### Opensift Settings
Please also refer to this document [ocp-readme](https://github.com/longhorn/longhorn/blob/master/chart/ocp-readme.md) for more details
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| openshift.enabled | bool | `false` | Enable when using openshift |
| openshift.ui.port | int | `443` | UI port in openshift environment |
| openshift.ui.proxy | int | `8443` | UI proxy in openshift environment |
| openshift.ui.route | string | `"longhorn-ui"` | UI route in openshift environment |
### Other Settings
| Key | Default | Description |
|-----|---------|-------------|
| annotations | `{}` | Annotations to add to the Longhorn Manager DaemonSet Pods. Optional. |
| enablePSP | `false` | For Kubernetes < v1.25, if your cluster enables Pod Security Policy admission controller, set this to `true` to ship longhorn-psp which allow privileged Longhorn pods to start |
### System Default Settings
For system default settings, you can first leave blank to use default values which will be applied when installing Longhorn.
You can then change them through UI after installation.
For more details like types or options, you can refer to **Settings Reference** in our official site [link](https://longhorn.io/docs)
| Key | Description |
|-----|-------------|
| defaultSettings.allowEmptyDiskSelectorVolume | Allow Scheduling Empty Disk Selector Volumes To Any Disk |
| defaultSettings.allowEmptyNodeSelectorVolume | Allow Scheduling Empty Node Selector Volumes To Any Node |
| defaultSettings.allowRecurringJobWhileVolumeDetached | If this setting is enabled, Longhorn will automatically attaches the volume and takes snapshot/backup when it is the time to do recurring snapshot/backup. |
| defaultSettings.allowVolumeCreationWithDegradedAvailability | This setting allows user to create and attach a volume that doesn't have all the replicas scheduled at the time of creation. |
| defaultSettings.autoCleanupSystemGeneratedSnapshot | This setting enables Longhorn to automatically cleanup the system generated snapshot after replica rebuild is done. |
| defaultSettings.autoDeletePodWhenVolumeDetachedUnexpectedly | If enabled, Longhorn will automatically delete the workload pod that is managed by a controller (e.g. deployment, statefulset, daemonset, etc...) when Longhorn volume is detached unexpectedly (e.g. during Kubernetes upgrade, Docker reboot, or network disconnect). By deleting the pod, its controller restarts the pod and Kubernetes handles volume reattachment and remount. |
| defaultSettings.autoSalvage | If enabled, volumes will be automatically salvaged when all the replicas become faulty e.g. due to network disconnection. Longhorn will try to figure out which replica(s) are usable, then use them for the volume. By default true. |
| defaultSettings.backingImageCleanupWaitInterval | This interval in minutes determines how long Longhorn will wait before cleaning up the backing image file when there is no replica in the disk using it. |
| defaultSettings.backingImageRecoveryWaitInterval | This interval in seconds determines how long Longhorn will wait before re-downloading the backing image file when all disk files of this backing image become failed or unknown. |
| defaultSettings.backupCompressionMethod | This setting allows users to specify backup compression method. |
| defaultSettings.backupConcurrentLimit | This setting controls how many worker threads per backup concurrently. |
| defaultSettings.backupTarget | The endpoint used to access the backupstore. Available: NFS, CIFS, AWS, GCP, AZURE. |
| defaultSettings.backupTargetCredentialSecret | The name of the Kubernetes secret associated with the backup target. |
| defaultSettings.backupstorePollInterval | In seconds. The backupstore poll interval determines how often Longhorn checks the backupstore for new backups. Set to 0 to disable the polling. By default 300. |
| defaultSettings.concurrentAutomaticEngineUpgradePerNodeLimit | This setting controls how Longhorn automatically upgrades volumes' engines to the new default engine image after upgrading Longhorn manager. The value of this setting specifies the maximum number of engines per node that are allowed to upgrade to the default engine image at the same time. If the value is 0, Longhorn will not automatically upgrade volumes' engines to default version. |
| defaultSettings.concurrentReplicaRebuildPerNodeLimit | This setting controls how many replicas on a node can be rebuilt simultaneously. |
| defaultSettings.concurrentVolumeBackupRestorePerNodeLimit | This setting controls how many volumes on a node can restore the backup concurrently. Set the value to **0** to disable backup restore. |
| defaultSettings.createDefaultDiskLabeledNodes | Create default Disk automatically only on Nodes with the label "node.longhorn.io/create-default-disk=true" if no other disks exist. If disabled, the default disk will be created on all new nodes when each node is first added. |
| defaultSettings.defaultDataLocality | Longhorn volume has data locality if there is a local replica of the volume on the same node as the pod which is using the volume. |
| defaultSettings.defaultDataPath | Default path to use for storing data on a host. By default "/var/lib/longhorn/" |
| defaultSettings.defaultLonghornStaticStorageClass | The 'storageClassName' is given to PVs and PVCs that are created for an existing Longhorn volume. The StorageClass name can also be used as a label, so it is possible to use a Longhorn StorageClass to bind a workload to an existing PV without creating a Kubernetes StorageClass object. By default 'longhorn-static'. |
| defaultSettings.defaultReplicaCount | The default number of replicas when a volume is created from the Longhorn UI. For Kubernetes configuration, update the `numberOfReplicas` in the StorageClass. By default 3. |
| defaultSettings.deletingConfirmationFlag | This flag is designed to prevent Longhorn from being accidentally uninstalled which will lead to data lost. |
| defaultSettings.disableRevisionCounter | This setting is only for volumes created by UI. By default, this is false meaning there will be a reivision counter file to track every write to the volume. During salvage recovering Longhorn will pick the replica with largest reivision counter as candidate to recover the whole volume. If revision counter is disabled, Longhorn will not track every write to the volume. During the salvage recovering, Longhorn will use the 'volume-head-xxx.img' file last modification time and file size to pick the replica candidate to recover the whole volume. |
| defaultSettings.disableSchedulingOnCordonedNode | Disable Longhorn manager to schedule replica on Kubernetes cordoned node. By default true. |
| defaultSettings.engineReplicaTimeout | In seconds. The setting specifies the timeout between the engine and replica(s), and the value should be between 8 to 30 seconds. The default value is 8 seconds. |
| defaultSettings.failedBackupTTL | In minutes. This setting determines how long Longhorn will keep the backup resource that was failed. Set to 0 to disable the auto-deletion. |
| defaultSettings.fastReplicaRebuildEnabled | This feature supports the fast replica rebuilding. It relies on the checksum of snapshot disk files, so setting the snapshot-data-integrity to **enable** or **fast-check** is a prerequisite. |
| defaultSettings.guaranteedInstanceManagerCPU | This integer value indicates how many percentage of the total allocatable CPU on each node will be reserved for each instance manager Pod. You can leave it with the default value, which is 12%. |
| defaultSettings.kubernetesClusterAutoscalerEnabled | Enabling this setting will notify Longhorn that the cluster is using Kubernetes Cluster Autoscaler. |
| defaultSettings.logLevel | The log level Panic, Fatal, Error, Warn, Info, Debug, Trace used in longhorn manager. Default to Info. |
| defaultSettings.nodeDownPodDeletionPolicy | Defines the Longhorn action when a Volume is stuck with a StatefulSet/Deployment Pod on a node that is down. |
| defaultSettings.nodeDrainPolicy | Define the policy to use when a node with the last healthy replica of a volume is drained. |
| defaultSettings.offlineReplicaRebuilding | This setting allows users to enable the offline replica rebuilding for volumes using v2 data engine. |
| defaultSettings.orphanAutoDeletion | This setting allows Longhorn to delete the orphan resource and its corresponding orphaned data automatically like stale replicas. Orphan resources on down or unknown nodes will not be cleaned up automatically. |
| defaultSettings.priorityClass | priorityClass for longhorn system componentss |
| defaultSettings.recurringFailedJobsHistoryLimit | This setting specifies how many failed backup or snapshot job histories should be retained. History will not be retained if the value is 0. |
| defaultSettings.recurringSuccessfulJobsHistoryLimit | This setting specifies how many successful backup or snapshot job histories should be retained. History will not be retained if the value is 0. |
| defaultSettings.removeSnapshotsDuringFilesystemTrim | This setting allows Longhorn filesystem trim feature to automatically mark the latest snapshot and its ancestors as removed and stops at the snapshot containing multiple children. |
| defaultSettings.replicaAutoBalance | Enable this setting automatically rebalances replicas when discovered an available node. |
| defaultSettings.replicaDiskSoftAntiAffinity | Allow scheduling on disks with existing healthy replicas of the same volume. By default true. |
| defaultSettings.replicaFileSyncHttpClientTimeout | In seconds. The setting specifies the HTTP client timeout to the file sync server. |
| defaultSettings.replicaReplenishmentWaitInterval | In seconds. The interval determines how long Longhorn will wait at least in order to reuse the existing data on a failed replica rather than directly creating a new replica for a degraded volume. |
| defaultSettings.replicaSoftAntiAffinity | Allow scheduling on nodes with existing healthy replicas of the same volume. By default false. |
| defaultSettings.replicaZoneSoftAntiAffinity | Allow scheduling new Replicas of Volume to the Nodes in the same Zone as existing healthy Replicas. Nodes don't belong to any Zone will be treated as in the same Zone. Notice that Longhorn relies on label `topology.kubernetes.io/zone=<Zone name of the node>` in the Kubernetes node object to identify the zone. By default true. |
| defaultSettings.restoreConcurrentLimit | This setting controls how many worker threads per restore concurrently. |
| defaultSettings.restoreVolumeRecurringJobs | Restore recurring jobs from the backup volume on the backup target and create recurring jobs if not exist during a backup restoration. |
| defaultSettings.snapshotDataIntegrity | This setting allows users to enable or disable snapshot hashing and data integrity checking. |
| defaultSettings.snapshotDataIntegrityCronjob | Unix-cron string format. The setting specifies when Longhorn checks the data integrity of snapshot disk files. |
| defaultSettings.snapshotDataIntegrityImmediateCheckAfterSnapshotCreation | Hashing snapshot disk files impacts the performance of the system. The immediate snapshot hashing and checking can be disabled to minimize the impact after creating a snapshot. |
| defaultSettings.storageMinimalAvailablePercentage | If the minimum available disk capacity exceeds the actual percentage of available disk capacity, the disk becomes unschedulable until more space is freed up. By default 25. |
| defaultSettings.storageNetwork | Longhorn uses the storage network for in-cluster data traffic. Leave this blank to use the Kubernetes cluster network. |
| defaultSettings.storageOverProvisioningPercentage | The over-provisioning percentage defines how much storage can be allocated relative to the hard drive's capacity. By default 200. |
| defaultSettings.storageReservedPercentageForDefaultDisk | The reserved percentage specifies the percentage of disk space that will not be allocated to the default disk on each new Longhorn node. |
| defaultSettings.supportBundleFailedHistoryLimit | This setting specifies how many failed support bundles can exist in the cluster. Set this value to **0** to have Longhorn automatically purge all failed support bundles. |
| defaultSettings.systemManagedComponentsNodeSelector | nodeSelector for longhorn system components |
| defaultSettings.systemManagedPodsImagePullPolicy | This setting defines the Image Pull Policy of Longhorn system managed pod. e.g. instance manager, engine image, CSI driver, etc. The new Image Pull Policy will only apply after the system managed pods restart. |
| defaultSettings.taintToleration | taintToleration for longhorn system components |
| defaultSettings.upgradeChecker | Upgrade Checker will check for new Longhorn version periodically. When there is a new version available, a notification will appear in the UI. By default true. |
| defaultSettings.v2DataEngine | This allows users to activate v2 data engine based on SPDK. Currently, it is in the preview phase and should not be utilized in a production environment. |
--- ---
Please see [link](https://github.com/longhorn/longhorn) for more information. Please see [link](https://github.com/longhorn/longhorn) for more information.

253
chart/README.md.gotmpl Normal file
View File

@ -0,0 +1,253 @@
# Longhorn Chart
> **Important**: Please install the Longhorn chart in the `longhorn-system` namespace only.
> **Warning**: Longhorn doesn't support downgrading from a higher version to a lower version.
## Source Code
Longhorn is 100% open source software. Project source code is spread across a number of repos:
1. Longhorn Engine -- Core controller/replica logic https://github.com/longhorn/longhorn-engine
2. Longhorn Instance Manager -- Controller/replica instance lifecycle management https://github.com/longhorn/longhorn-instance-manager
3. Longhorn Share Manager -- NFS provisioner that exposes Longhorn volumes as ReadWriteMany volumes. https://github.com/longhorn/longhorn-share-manager
4. Backing Image Manager -- Backing image file lifecycle management. https://github.com/longhorn/backing-image-manager
5. Longhorn Manager -- Longhorn orchestration, includes CSI driver for Kubernetes https://github.com/longhorn/longhorn-manager
6. Longhorn UI -- Dashboard https://github.com/longhorn/longhorn-ui
## Prerequisites
1. A container runtime compatible with Kubernetes (Docker v1.13+, containerd v1.3.7+, etc.)
2. Kubernetes >= v1.21
3. Make sure `bash`, `curl`, `findmnt`, `grep`, `awk` and `blkid` has been installed in all nodes of the Kubernetes cluster.
4. Make sure `open-iscsi` has been installed, and the `iscsid` daemon is running on all nodes of the Kubernetes cluster. For GKE, recommended Ubuntu as guest OS image since it contains `open-iscsi` already.
## Upgrading to Kubernetes v1.25+
Starting in Kubernetes v1.25, [Pod Security Policies](https://kubernetes.io/docs/concepts/security/pod-security-policy/) have been removed from the Kubernetes API.
As a result, **before upgrading to Kubernetes v1.25** (or on a fresh install in a Kubernetes v1.25+ cluster), users are expected to perform an in-place upgrade of this chart with `enablePSP` set to `false` if it has been previously set to `true`.
> **Note:**
> If you upgrade your cluster to Kubernetes v1.25+ before removing PSPs via a `helm upgrade` (even if you manually clean up resources), **it will leave the Helm release in a broken state within the cluster such that further Helm operations will not work (`helm uninstall`, `helm upgrade`, etc.).**
>
> If your charts get stuck in this state, you may have to clean up your Helm release secrets.
Upon setting `enablePSP` to false, the chart will remove any PSP resources deployed on its behalf from the cluster. This is the default setting for this chart.
As a replacement for PSPs, [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/) should be used. Please consult the Longhorn docs for more details on how to configure your chart release namespaces to work with the new Pod Security Admission and apply Pod Security Standards.
## Installation
1. Add Longhorn chart repository.
```
helm repo add longhorn https://charts.longhorn.io
```
2. Update local Longhorn chart information from chart repository.
```
helm repo update
```
3. Install Longhorn chart.
- With Helm 2, the following command will create the `longhorn-system` namespace and install the Longhorn chart together.
```
helm install longhorn/longhorn --name longhorn --namespace longhorn-system
```
- With Helm 3, the following commands will create the `longhorn-system` namespace first, then install the Longhorn chart.
```
kubectl create namespace longhorn-system
helm install longhorn longhorn/longhorn --namespace longhorn-system
```
## Uninstallation
With Helm 2 to uninstall Longhorn.
```
kubectl -n longhorn-system patch -p '{"value": "true"}' --type=merge lhs deleting-confirmation-flag
helm delete longhorn --purge
```
With Helm 3 to uninstall Longhorn.
```
kubectl -n longhorn-system patch -p '{"value": "true"}' --type=merge lhs deleting-confirmation-flag
helm uninstall longhorn -n longhorn-system
kubectl delete namespace longhorn-system
```
## Values
The `values.yaml` contains items used to tweak a deployment of this chart.
### Cattle Settings
| Key | Type | Default | Description |
|-----|------|---------|-------------|
{{- range .Values }}
{{- if hasPrefix "global" .Key }}
| {{ .Key }} | {{ .Type }} | {{ if .Default }}{{ .Default }}{{ else }}{{ .AutoDefault }}{{ end }} | {{ if .Description }}{{ .Description }}{{ else }}{{ .AutoDescription }}{{ end }} |
{{- end }}
{{- end }}
### Network Policies
| Key | Type | Default | Description |
|-----|------|---------|-------------|
{{- range .Values }}
{{- if hasPrefix "networkPolicies" .Key }}
| {{ .Key }} | {{ .Type }} | {{ if .Default }}{{ .Default }}{{ else }}{{ .AutoDefault }}{{ end }} | {{ if .Description }}{{ .Description }}{{ else }}{{ .AutoDescription }}{{ end }} |
{{- end }}
{{- end }}
### Image Settings
| Key | Type | Default | Description |
|-----|------|---------|-------------|
{{- range .Values }}
{{- if hasPrefix "image" .Key }}
| {{ .Key }} | {{ .Type }} | {{ if .Default }}{{ .Default }}{{ else }}{{ .AutoDefault }}{{ end }} | {{ if .Description }}{{ .Description }}{{ else }}{{ .AutoDescription }}{{ end }} |
{{- end }}
{{- end }}
### Service Settings
| Key | Description |
|-----|-------------|
{{- range .Values }}
{{- if (and (hasPrefix "service" .Key) (not (contains "Account" .Key))) }}
| {{ .Key }} | {{ if .Description }}{{ .Description }}{{ else }}{{ .AutoDescription }}{{ end }} |
{{- end }}
{{- end }}
### StorageClass Settings
| Key | Type | Default | Description |
|-----|------|---------|-------------|
{{- range .Values }}
{{- if hasPrefix "persistence" .Key }}
| {{ .Key }} | {{ .Type }} | {{ if .Default }}{{ .Default }}{{ else }}{{ .AutoDefault }}{{ end }} | {{ if .Description }}{{ .Description }}{{ else }}{{ .AutoDescription }}{{ end }} |
{{- end }}
{{- end }}
### CSI Settings
| Key | Description |
|-----|-------------|
{{- range .Values }}
{{- if hasPrefix "csi" .Key }}
| {{ .Key }} | {{ if .Description }}{{ .Description }}{{ else }}{{ .AutoDescription }}{{ end }} |
{{- end }}
{{- end }}
### Longhorn Manager Settings
Longhorn system contains user deployed components (e.g, Longhorn manager, Longhorn driver, Longhorn UI) and system managed components (e.g, instance manager, engine image, CSI driver, etc.).
These settings only apply to Longhorn manager component.
| Key | Type | Default | Description |
|-----|------|---------|-------------|
{{- range .Values }}
{{- if hasPrefix "longhornManager" .Key }}
| {{ .Key }} | {{ .Type }} | {{ if .Default }}{{ .Default }}{{ else }}{{ .AutoDefault }}{{ end }} | {{ if .Description }}{{ .Description }}{{ else }}{{ .AutoDescription }}{{ end }} |
{{- end }}
{{- end }}
### Longhorn Driver Settings
Longhorn system contains user deployed components (e.g, Longhorn manager, Longhorn driver, Longhorn UI) and system managed components (e.g, instance manager, engine image, CSI driver, etc.).
These settings only apply to Longhorn driver component.
| Key | Type | Default | Description |
|-----|------|---------|-------------|
{{- range .Values }}
{{- if hasPrefix "longhornDriver" .Key }}
| {{ .Key }} | {{ .Type }} | {{ if .Default }}{{ .Default }}{{ else }}{{ .AutoDefault }}{{ end }} | {{ if .Description }}{{ .Description }}{{ else }}{{ .AutoDescription }}{{ end }} |
{{- end }}
{{- end }}
### Longhorn UI Settings
Longhorn system contains user deployed components (e.g, Longhorn manager, Longhorn driver, Longhorn UI) and system managed components (e.g, instance manager, engine image, CSI driver, etc.).
These settings only apply to Longhorn UI component.
| Key | Type | Default | Description |
|-----|------|---------|-------------|
{{- range .Values }}
{{- if hasPrefix "longhornUI" .Key }}
| {{ .Key }} | {{ .Type }} | {{ if .Default }}{{ .Default }}{{ else }}{{ .AutoDefault }}{{ end }} | {{ if .Description }}{{ .Description }}{{ else }}{{ .AutoDescription }}{{ end }} |
{{- end }}
{{- end }}
### Ingress Settings
| Key | Type | Default | Description |
|-----|------|---------|-------------|
{{- range .Values }}
{{- if hasPrefix "ingress" .Key }}
| {{ .Key }} | {{ .Type }} | {{ if .Default }}{{ .Default }}{{ else }}{{ .AutoDefault }}{{ end }} | {{ if .Description }}{{ .Description }}{{ else }}{{ .AutoDescription }}{{ end }} |
{{- end }}
{{- end }}
### Private Registry Settings
Longhorn can be installed in an air gapped environment with private registry settings. Please refer to **Air Gap Installation** in our official site [link](https://longhorn.io/docs)
| Key | Description |
|-----|-------------|
{{- range .Values }}
{{- if hasPrefix "privateRegistry" .Key }}
| {{ .Key }} | {{ if .Description }}{{ .Description }}{{ else }}{{ .AutoDescription }}{{ end }} |
{{- end }}
{{- end }}
### OS/Kubernetes Distro Settings
#### Opensift Settings
Please also refer to this document [ocp-readme](https://github.com/longhorn/longhorn/blob/master/chart/ocp-readme.md) for more details
| Key | Type | Default | Description |
|-----|------|---------|-------------|
{{- range .Values }}
{{- if hasPrefix "openshift" .Key }}
| {{ .Key }} | {{ .Type }} | {{ if .Default }}{{ .Default }}{{ else }}{{ .AutoDefault }}{{ end }} | {{ if .Description }}{{ .Description }}{{ else }}{{ .AutoDescription }}{{ end }} |
{{- end }}
{{- end }}
### Other Settings
| Key | Default | Description |
|-----|---------|-------------|
{{- range .Values }}
{{- if not (or (hasPrefix "defaultSettings" .Key)
(hasPrefix "networkPolicies" .Key)
(hasPrefix "image" .Key)
(hasPrefix "service" .Key)
(hasPrefix "persistence" .Key)
(hasPrefix "csi" .Key)
(hasPrefix "longhornManager" .Key)
(hasPrefix "longhornDriver" .Key)
(hasPrefix "longhornUI" .Key)
(hasPrefix "privateRegistry" .Key)
(hasPrefix "ingress" .Key)
(hasPrefix "openshift" .Key)
(hasPrefix "global" .Key)) }}
| {{ .Key }} | {{ if .Default }}{{ .Default }}{{ else }}{{ .AutoDefault }}{{ end }} | {{ if .Description }}{{ .Description }}{{ else }}{{ .AutoDescription }}{{ end }} |
{{- end }}
{{- end }}
### System Default Settings
For system default settings, you can first leave blank to use default values which will be applied when installing Longhorn.
You can then change them through UI after installation.
For more details like types or options, you can refer to **Settings Reference** in our official site [link](https://longhorn.io/docs)
| Key | Description |
|-----|-------------|
{{- range .Values }}
{{- if hasPrefix "defaultSettings" .Key }}
| {{ .Key }} | {{ if .Description }}{{ .Description }}{{ else }}{{ .AutoDescription }}{{ end }} |
{{- end }}
{{- end }}
---
Please see [link](https://github.com/longhorn/longhorn) for more information.

177
chart/ocp-readme.md Normal file
View File

@ -0,0 +1,177 @@
# OpenShift / OKD Extra Configuration Steps
- [OpenShift / OKD Extra Configuration Steps](#openshift--okd-extra-configuration-steps)
- [Notes](#notes)
- [Known Issues](#known-issues)
- [Preparing Nodes (Optional)](#preparing-nodes-optional)
- [Default /var/lib/longhorn setup](#default-varliblonghorn-setup)
- [Separate /var/mnt/longhorn setup](#separate-varmntlonghorn-setup)
- [Create Filesystem](#create-filesystem)
- [Mounting Disk On Boot](#mounting-disk-on-boot)
- [Label and Annotate Nodes](#label-and-annotate-nodes)
- [Example values.yaml](#example-valuesyaml)
- [Installation](#installation)
- [Refs](#refs)
## Notes
Main changes and tasks for OCP are:
- On OCP / OKD, the Operating System is Managed by the Cluster
- OCP Imposes [Security Context Constraints](https://docs.openshift.com/container-platform/4.11/authentication/managing-security-context-constraints.html)
- This requires everything to run with the least privilege possible. For the moment every component has been given access to run as higher privilege.
- Something to circle back on is network polices and which components can have their privileges reduced without impacting functionality.
- The UI probably can be for example.
- openshift/oauth-proxy for authentication to the Longhorn Ui
- **⚠️** Currently Scoped to Authenticated Users that can delete a longhorn settings object.
- **⚠️** Since the UI it self is not protected, network policies will need to be created to prevent namespace <--> namespace communication against the pod or service object directly.
- Anyone with access to the UI Deployment can remove the route restriction. (Namespace Scoped Admin)
- Option to use separate disk in /var/mnt/longhorn & MachineConfig file to mount /var/mnt/longhorn
- Adding finalizers for mount propagation
## Known Issues
- General Feature/Issue Thread
- [[FEATURE] Deploying Longhorn on OKD/Openshift](https://github.com/longhorn/longhorn/issues/1831)
- 4.10 / 1.23:
- 4.10.0-0.okd-2022-03-07-131213 to 4.10.0-0.okd-2022-07-09-073606
- Tested, No Known Issues
- 4.11 / 1.24:
- 4.11.0-0.okd-2022-07-27-052000 to 4.11.0-0.okd-2022-11-19-050030
- Tested, No Known Issues
- 4.11.0-0.okd-2022-12-02-145640, 4.11.0-0.okd-2023-01-14-152430:
- Workaround: [[BUG] Volumes Stuck in Attach/Detach Loop](https://github.com/longhorn/longhorn/issues/4988)
- [MachineConfig Patch](https://github.com/longhorn/longhorn/issues/4988#issuecomment-1345676772)
- 4.12 / 1.25:
- 4.12.0-0.okd-2022-12-05-210624 to 4.12.0-0.okd-2023-01-20-101927
- Tested, No Known Issues
- 4.12.0-0.okd-2023-01-21-055900 to 4.12.0-0.okd-2023-02-18-033438:
- Workaround: [[BUG] Volumes Stuck in Attach/Detach Loop](https://github.com/longhorn/longhorn/issues/4988)
- [MachineConfig Patch](https://github.com/longhorn/longhorn/issues/4988#issuecomment-1345676772)
- 4.12.0-0.okd-2023-03-05-022504 - 4.12.0-0.okd-2023-04-16-041331:
- Tested, No Known Issues
- 4.13 / 1.26:
- 4.13.0-0.okd-2023-05-03-001308 - 4.13.0-0.okd-2023-08-18-135805:
- Tested, No Known Issues
- 4.14 / 1.27:
- 4.14.0-0.okd-2023-08-12-022330 - 4.14.0-0.okd-2023-10-28-073550:
- Tested, No Known Issues
## Preparing Nodes (Optional)
Only required if you require additional customizations, such as storage-less nodes, or secondary disks.
### Default /var/lib/longhorn setup
Label each node for storage with:
```bash
oc get nodes --no-headers | awk '{print $1}'
export NODE="worker-0"
oc label node "${NODE}" node.longhorn.io/create-default-disk=true
```
### Separate /var/mnt/longhorn setup
#### Create Filesystem
On the storage nodes create a filesystem with the label longhorn:
```bash
oc get nodes --no-headers | awk '{print $1}'
export NODE="worker-0"
oc debug node/${NODE} -t -- chroot /host bash
# Validate Target Drive is Present
lsblk
export DRIVE="sdb" #vdb
sudo mkfs.ext4 -L longhorn /dev/${DRIVE}
```
> ⚠️ Note: If you add New Nodes After the below Machine Config is applied, you will need to also reboot the node.
#### Mounting Disk On Boot
The Secondary Drive needs to be mounted on every boot. Save the Concents and Apply the MachineConfig with `oc apply -f`:
> ⚠️ This will trigger an machine config profile update and reboot all worker nodes on the cluster
```yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 71-mount-storage-worker
spec:
config:
ignition:
version: 3.2.0
systemd:
units:
- name: var-mnt-longhorn.mount
enabled: true
contents: |
[Unit]
Before=local-fs.target
[Mount]
Where=/var/mnt/longhorn
What=/dev/disk/by-label/longhorn
Options=rw,relatime,discard
[Install]
WantedBy=local-fs.target
```
#### Label and Annotate Nodes
Label and annotate storage nodes like this:
```bash
oc get nodes --no-headers | awk '{print $1}'
export NODE="worker-0"
oc annotate node ${NODE} --overwrite node.longhorn.io/default-disks-config='[{"path":"/var/mnt/longhorn","allowScheduling":true}]'
oc label node ${NODE} node.longhorn.io/create-default-disk=config
```
## Example values.yaml
Minimum Adjustments Required
```yaml
openshift:
oauthProxy:
repository: quay.io/openshift/origin-oauth-proxy
tag: 4.14 # Use Your OCP/OKD 4.X Version, Current Stable is 4.14
# defaultSettings: # Preparing nodes (Optional)
# createDefaultDiskLabeledNodes: true
openshift:
enabled: true
ui:
route: "longhorn-ui"
port: 443
proxy: 8443
```
## Installation
```bash
# helm template ./chart/ --namespace longhorn-system --values ./chart/values.yaml --no-hooks > longhorn.yaml # Local Testing
helm template longhorn --namespace longhorn-system --values values.yaml --no-hooks > longhorn.yaml
oc create namespace longhorn-system -o yaml --dry-run=client | oc apply -f -
oc apply -f longhorn.yaml -n longhorn-system
```
## Refs
- <https://docs.openshift.com/container-platform/4.11/storage/persistent_storage/persistent-storage-iscsi.html>
- <https://docs.okd.io/4.11/storage/persistent_storage/persistent-storage-iscsi.html>
- okd 4.5: <https://github.com/longhorn/longhorn/issues/1831#issuecomment-702690613>
- okd 4.6: <https://github.com/longhorn/longhorn/issues/1831#issuecomment-765884631>
- oauth-proxy: <https://github.com/openshift/oauth-proxy/blob/master/contrib/sidecar.yaml>
- <https://github.com/longhorn/longhorn/issues/1831>

View File

@ -82,6 +82,18 @@ questions:
type: string type: string
label: Longhorn Backing Image Manager Image Tag label: Longhorn Backing Image Manager Image Tag
group: "Longhorn Images Settings" group: "Longhorn Images Settings"
- variable: image.longhorn.supportBundleKit.repository
default: longhornio/support-bundle-kit
description: "Specify Longhorn Support Bundle Manager Image Repository"
type: string
label: Longhorn Support Bundle Kit Image Repository
group: "Longhorn Images Settings"
- variable: image.longhorn.supportBundleKit.tag
default: v0.0.27
description: "Specify Longhorn Support Bundle Manager Image Tag"
type: string
label: Longhorn Support Bundle Kit Image Tag
group: "Longhorn Images Settings"
- variable: image.csi.attacher.repository - variable: image.csi.attacher.repository
default: longhornio/csi-attacher default: longhornio/csi-attacher
description: "Specify CSI attacher image repository. Leave blank to autodetect." description: "Specify CSI attacher image repository. Leave blank to autodetect."
@ -89,7 +101,7 @@ questions:
label: Longhorn CSI Attacher Image Repository label: Longhorn CSI Attacher Image Repository
group: "Longhorn CSI Driver Images" group: "Longhorn CSI Driver Images"
- variable: image.csi.attacher.tag - variable: image.csi.attacher.tag
default: v3.4.0 default: v4.2.0
description: "Specify CSI attacher image tag. Leave blank to autodetect." description: "Specify CSI attacher image tag. Leave blank to autodetect."
type: string type: string
label: Longhorn CSI Attacher Image Tag label: Longhorn CSI Attacher Image Tag
@ -101,7 +113,7 @@ questions:
label: Longhorn CSI Provisioner Image Repository label: Longhorn CSI Provisioner Image Repository
group: "Longhorn CSI Driver Images" group: "Longhorn CSI Driver Images"
- variable: image.csi.provisioner.tag - variable: image.csi.provisioner.tag
default: v2.1.2 default: v3.4.1
description: "Specify CSI provisioner image tag. Leave blank to autodetect." description: "Specify CSI provisioner image tag. Leave blank to autodetect."
type: string type: string
label: Longhorn CSI Provisioner Image Tag label: Longhorn CSI Provisioner Image Tag
@ -113,7 +125,7 @@ questions:
label: Longhorn CSI Node Driver Registrar Image Repository label: Longhorn CSI Node Driver Registrar Image Repository
group: "Longhorn CSI Driver Images" group: "Longhorn CSI Driver Images"
- variable: image.csi.nodeDriverRegistrar.tag - variable: image.csi.nodeDriverRegistrar.tag
default: v2.5.0 default: v2.7.0
description: "Specify CSI Node Driver Registrar image tag. Leave blank to autodetect." description: "Specify CSI Node Driver Registrar image tag. Leave blank to autodetect."
type: string type: string
label: Longhorn CSI Node Driver Registrar Image Tag label: Longhorn CSI Node Driver Registrar Image Tag
@ -125,7 +137,7 @@ questions:
label: Longhorn CSI Driver Resizer Image Repository label: Longhorn CSI Driver Resizer Image Repository
group: "Longhorn CSI Driver Images" group: "Longhorn CSI Driver Images"
- variable: image.csi.resizer.tag - variable: image.csi.resizer.tag
default: v1.3.0 default: v1.7.0
description: "Specify CSI Driver Resizer image tag. Leave blank to autodetect." description: "Specify CSI Driver Resizer image tag. Leave blank to autodetect."
type: string type: string
label: Longhorn CSI Driver Resizer Image Tag label: Longhorn CSI Driver Resizer Image Tag
@ -137,7 +149,7 @@ questions:
label: Longhorn CSI Driver Snapshotter Image Repository label: Longhorn CSI Driver Snapshotter Image Repository
group: "Longhorn CSI Driver Images" group: "Longhorn CSI Driver Images"
- variable: image.csi.snapshotter.tag - variable: image.csi.snapshotter.tag
default: v5.0.1 default: v6.2.1
description: "Specify CSI Driver Snapshotter image tag. Leave blank to autodetect." description: "Specify CSI Driver Snapshotter image tag. Leave blank to autodetect."
type: string type: string
label: Longhorn CSI Driver Snapshotter Image Tag label: Longhorn CSI Driver Snapshotter Image Tag
@ -147,9 +159,9 @@ questions:
description: "Specify CSI liveness probe image repository. Leave blank to autodetect." description: "Specify CSI liveness probe image repository. Leave blank to autodetect."
type: string type: string
label: Longhorn CSI Liveness Probe Image Repository label: Longhorn CSI Liveness Probe Image Repository
group: "Longhorn CSI Liveness Probe Images" group: "Longhorn CSI Driver Images"
- variable: image.csi.livenessProbe.tag - variable: image.csi.livenessProbe.tag
default: v2.8.0 default: v2.9.0
description: "Specify CSI liveness probe image tag. Leave blank to autodetect." description: "Specify CSI liveness probe image tag. Leave blank to autodetect."
type: string type: string
label: Longhorn CSI Liveness Probe Image Tag label: Longhorn CSI Liveness Probe Image Tag
@ -232,7 +244,7 @@ questions:
group: "Longhorn CSI Driver Settings" group: "Longhorn CSI Driver Settings"
- variable: defaultSettings.backupTarget - variable: defaultSettings.backupTarget
label: Backup Target label: Backup Target
description: "The endpoint used to access the backupstore. NFS and S3 are supported." description: "The endpoint used to access the backupstore. Available: NFS, CIFS, AWS, GCP, AZURE"
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: string type: string
default: default:
@ -244,8 +256,7 @@ questions:
default: default:
- variable: defaultSettings.allowRecurringJobWhileVolumeDetached - variable: defaultSettings.allowRecurringJobWhileVolumeDetached
label: Allow Recurring Job While Volume Is Detached label: Allow Recurring Job While Volume Is Detached
description: 'If this setting is enabled, Longhorn will automatically attaches the volume and takes snapshot/backup when it is the time to do recurring snapshot/backup. description: 'If this setting is enabled, Longhorn will automatically attaches the volume and takes snapshot/backup when it is the time to do recurring snapshot/backup.'
Note that the volume is not ready for workload during the period when the volume was automatically attached. Workload will have to wait until the recurring job finishes.'
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: boolean type: boolean
default: "false" default: "false"
@ -263,11 +274,7 @@ Note that the volume is not ready for workload during the period when the volume
default: "/var/lib/longhorn/" default: "/var/lib/longhorn/"
- variable: defaultSettings.defaultDataLocality - variable: defaultSettings.defaultDataLocality
label: Default Data Locality label: Default Data Locality
description: 'We say a Longhorn volume has data locality if there is a local replica of the volume on the same node as the pod which is using the volume. description: 'Longhorn volume has data locality if there is a local replica of the volume on the same node as the pod which is using the volume.'
This setting specifies the default data locality when a volume is created from the Longhorn UI. For Kubernetes configuration, update the `dataLocality` in the StorageClass
The available modes are:
- **disabled**. This is the default option. There may or may not be a replica on the same node as the attached volume (workload)
- **best-effort**. This option instructs Longhorn to try to keep a replica on the same node as the attached volume (workload). Longhorn will not stop the volume, even if it cannot keep a replica local to the attached volume (workload) due to environment limitation, e.g. not enough disk space, incompatible disk tags, etc.'
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: enum type: enum
options: options:
@ -282,17 +289,7 @@ The available modes are:
default: "false" default: "false"
- variable: defaultSettings.replicaAutoBalance - variable: defaultSettings.replicaAutoBalance
label: Replica Auto Balance label: Replica Auto Balance
description: 'Enable this setting automatically rebalances replicas when discovered an available node. description: 'Enable this setting automatically rebalances replicas when discovered an available node.'
The available global options are:
- **disabled**. This is the default option. No replica auto-balance will be done.
- **least-effort**. This option instructs Longhorn to balance replicas for minimal redundancy.
- **best-effort**. This option instructs Longhorn to balance replicas for even redundancy.
Longhorn also support individual volume setting. The setting can be specified in volume.spec.replicaAutoBalance, this overrules the global setting.
The available volume spec options are:
- **ignored**. This is the default option that instructs Longhorn to inherit from the global setting.
- **disabled**. This option instructs Longhorn no replica auto-balance should be done.
- **least-effort**. This option instructs Longhorn to balance replicas for minimal redundancy.
- **best-effort**. This option instructs Longhorn to balance replicas for even redundancy.'
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: enum type: enum
options: options:
@ -315,6 +312,14 @@ The available volume spec options are:
min: 0 min: 0
max: 100 max: 100
default: 25 default: 25
- variable: defaultSettings.storageReservedPercentageForDefaultDisk
label: Storage Reserved Percentage For Default Disk
description: "The reserved percentage specifies the percentage of disk space that will not be allocated to the default disk on each new Longhorn node."
group: "Longhorn Default Settings"
type: int
min: 0
max: 100
default: 30
- variable: defaultSettings.upgradeChecker - variable: defaultSettings.upgradeChecker
label: Enable Upgrade Checker label: Enable Upgrade Checker
description: 'Upgrade Checker will check for new Longhorn version periodically. When there is a new version available, a notification will appear in the UI. By default true.' description: 'Upgrade Checker will check for new Longhorn version periodically. When there is a new version available, a notification will appear in the UI. By default true.'
@ -344,28 +349,20 @@ The available volume spec options are:
default: 300 default: 300
- variable: defaultSettings.failedBackupTTL - variable: defaultSettings.failedBackupTTL
label: Failed Backup Time to Live label: Failed Backup Time to Live
description: "In minutes. This setting determines how long Longhorn will keep the backup resource that was failed. Set to 0 to disable the auto-deletion. description: "In minutes. This setting determines how long Longhorn will keep the backup resource that was failed. Set to 0 to disable the auto-deletion."
Failed backups will be checked and cleaned up during backupstore polling which is controlled by **Backupstore Poll Interval** setting.
Hence this value determines the minimal wait interval of the cleanup. And the actual cleanup interval is multiple of **Backupstore Poll Interval**.
Disabling **Backupstore Poll Interval** also means to disable failed backup auto-deletion."
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: int type: int
min: 0 min: 0
default: 1440 default: 1440
- variable: defaultSettings.restoreVolumeRecurringJobs - variable: defaultSettings.restoreVolumeRecurringJobs
label: Restore Volume Recurring Jobs label: Restore Volume Recurring Jobs
description: "Restore recurring jobs from the backup volume on the backup target and create recurring jobs if not exist during a backup restoration. description: "Restore recurring jobs from the backup volume on the backup target and create recurring jobs if not exist during a backup restoration."
Longhorn also supports individual volume setting. The setting can be specified on Backup page when making a backup restoration, this overrules the global setting. group: "Longhorn Default Settings"
The available volume setting options are: type: boolean
- **ignored**. This is the default option that instructs Longhorn to inherit from the global setting. default: "false"
- **enabled**. This option instructs Longhorn to restore recurring jobs/groups from the backup target forcibly.
- **disabled**. This option instructs Longhorn no restoring recurring jobs/groups should be done."
group: "Longhorn Default Settings"
type: boolean
default: "false"
- variable: defaultSettings.recurringSuccessfulJobsHistoryLimit - variable: defaultSettings.recurringSuccessfulJobsHistoryLimit
label: Cronjob Successful Jobs History Limit label: Cronjob Successful Jobs History Limit
description: "This setting specifies how many successful backup or snapshot job histories should be retained. History will not be retained if the value is 0.", description: "This setting specifies how many successful backup or snapshot job histories should be retained. History will not be retained if the value is 0."
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: int type: int
min: 0 min: 0
@ -379,9 +376,7 @@ The available volume setting options are:
default: 1 default: 1
- variable: defaultSettings.supportBundleFailedHistoryLimit - variable: defaultSettings.supportBundleFailedHistoryLimit
label: SupportBundle Failed History Limit label: SupportBundle Failed History Limit
description: This setting specifies how many failed support bundles can exist in the cluster. description: "This setting specifies how many failed support bundles can exist in the cluster. Set this value to **0** to have Longhorn automatically purge all failed support bundles."
The retained failed support bundle is for analysis purposes and needs to clean up manually.
Set this value to **0** to have Longhorn automatically purge all failed support bundles.
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: int type: int
min: 0 min: 0
@ -394,9 +389,7 @@ Set this value to **0** to have Longhorn automatically purge all failed support
default: "true" default: "true"
- variable: defaultSettings.autoDeletePodWhenVolumeDetachedUnexpectedly - variable: defaultSettings.autoDeletePodWhenVolumeDetachedUnexpectedly
label: Automatically Delete Workload Pod when The Volume Is Detached Unexpectedly label: Automatically Delete Workload Pod when The Volume Is Detached Unexpectedly
description: 'If enabled, Longhorn will automatically delete the workload pod that is managed by a controller (e.g. deployment, statefulset, daemonset, etc...) when Longhorn volume is detached unexpectedly (e.g. during Kubernetes upgrade, Docker reboot, or network disconnect). By deleting the pod, its controller restarts the pod and Kubernetes handles volume reattachment and remount. description: 'If enabled, Longhorn will automatically delete the workload pod that is managed by a controller (e.g. deployment, statefulset, daemonset, etc...) when Longhorn volume is detached unexpectedly (e.g. during Kubernetes upgrade, Docker reboot, or network disconnect). By deleting the pod, its controller restarts the pod and Kubernetes handles volume reattachment and remount.'
If disabled, Longhorn will not delete the workload pod that is managed by a controller. You will have to manually restart the pod to reattach and remount the volume.
**Note:** This setting does not apply to the workload pods that do not have a controller. Longhorn never deletes them.'
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: boolean type: boolean
default: "true" default: "true"
@ -412,13 +405,27 @@ If disabled, Longhorn will not delete the workload pod that is managed by a cont
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: boolean type: boolean
default: "true" default: "true"
- variable: defaultSettings.replicaDiskSoftAntiAffinity
label: Replica Disk Level Soft Anti-Affinity
description: 'Allow scheduling on disks with existing healthy replicas of the same volume. By default true.'
group: "Longhorn Default Settings"
type: boolean
default: "true"
- variable: defaultSettings.allowEmptyNodeSelectorVolume
label: Allow Empty Node Selector Volume
description: "Allow Scheduling Empty Node Selector Volumes To Any Node"
group: "Longhorn Default Settings"
type: boolean
default: "true"
- variable: defaultSettings.allowEmptyDiskSelectorVolume
label: Allow Empty Disk Selector Volume
description: "Allow Scheduling Empty Disk Selector Volumes To Any Disk"
group: "Longhorn Default Settings"
type: boolean
default: "true"
- variable: defaultSettings.nodeDownPodDeletionPolicy - variable: defaultSettings.nodeDownPodDeletionPolicy
label: Pod Deletion Policy When Node is Down label: Pod Deletion Policy When Node is Down
description: "Defines the Longhorn action when a Volume is stuck with a StatefulSet/Deployment Pod on a node that is down. description: "Defines the Longhorn action when a Volume is stuck with a StatefulSet/Deployment Pod on a node that is down."
- **do-nothing** is the default Kubernetes behavior of never force deleting StatefulSet/Deployment terminating pods. Since the pod on the node that is down isn't removed, Longhorn volumes are stuck on nodes that are down.
- **delete-statefulset-pod** Longhorn will force delete StatefulSet terminating pods on nodes that are down to release Longhorn volumes so that Kubernetes can spin up replacement pods.
- **delete-deployment-pod** Longhorn will force delete Deployment terminating pods on nodes that are down to release Longhorn volumes so that Kubernetes can spin up replacement pods.
- **delete-both-statefulset-and-deployment-pod** Longhorn will force delete StatefulSet/Deployment terminating pods on nodes that are down to release Longhorn volumes so that Kubernetes can spin up replacement pods."
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: enum type: enum
options: options:
@ -427,49 +434,33 @@ If disabled, Longhorn will not delete the workload pod that is managed by a cont
- "delete-deployment-pod" - "delete-deployment-pod"
- "delete-both-statefulset-and-deployment-pod" - "delete-both-statefulset-and-deployment-pod"
default: "do-nothing" default: "do-nothing"
- variable: defaultSettings.allowNodeDrainWithLastHealthyReplica - variable: defaultSettings.nodeDrainPolicy
label: Allow Node Drain with the Last Healthy Replica label: Node Drain Policy
description: "By default, Longhorn will block `kubectl drain` action on a node if the node contains the last healthy replica of a volume. description: "Define the policy to use when a node with the last healthy replica of a volume is drained."
If this setting is enabled, Longhorn will **not** block `kubectl drain` action on a node even if the node contains the last healthy replica of a volume."
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: boolean type: enum
default: "false" options:
- variable: defaultSettings.mkfsExt4Parameters - "block-if-contains-last-replica"
label: Custom mkfs.ext4 parameters - "allow-if-replica-is-stopped"
description: "Allows setting additional filesystem creation parameters for ext4. For older host kernels it might be necessary to disable the optional ext4 metadata_csum feature by specifying `-O ^64bit,^metadata_csum`." - "always-allow"
group: "Longhorn Default Settings" default: "block-if-contains-last-replica"
type: string
- variable: defaultSettings.disableReplicaRebuild
label: Disable Replica Rebuild
description: "This setting disable replica rebuild cross the whole cluster, eviction and data locality feature won't work if this setting is true. But doesn't have any impact to any current replica rebuild and restore disaster recovery volume."
group: "Longhorn Default Settings"
type: boolean
default: "false"
- variable: defaultSettings.replicaReplenishmentWaitInterval - variable: defaultSettings.replicaReplenishmentWaitInterval
label: Replica Replenishment Wait Interval label: Replica Replenishment Wait Interval
description: "In seconds. The interval determines how long Longhorn will wait at least in order to reuse the existing data on a failed replica rather than directly creating a new replica for a degraded volume. description: "In seconds. The interval determines how long Longhorn will wait at least in order to reuse the existing data on a failed replica rather than directly creating a new replica for a degraded volume."
Warning: This option works only when there is a failed replica in the volume. And this option may block the rebuilding for a while in the case."
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: int type: int
min: 0 min: 0
default: 600 default: 600
- variable: defaultSettings.concurrentReplicaRebuildPerNodeLimit - variable: defaultSettings.concurrentReplicaRebuildPerNodeLimit
label: Concurrent Replica Rebuild Per Node Limit label: Concurrent Replica Rebuild Per Node Limit
description: "This setting controls how many replicas on a node can be rebuilt simultaneously. description: "This setting controls how many replicas on a node can be rebuilt simultaneously."
Typically, Longhorn can block the replica starting once the current rebuilding count on a node exceeds the limit. But when the value is 0, it means disabling the replica rebuilding.
WARNING:
- The old setting \"Disable Replica Rebuild\" is replaced by this setting.
- Different from relying on replica starting delay to limit the concurrent rebuilding, if the rebuilding is disabled, replica object replenishment will be directly skipped.
- When the value is 0, the eviction and data locality feature won't work. But this shouldn't have any impact to any current replica rebuild and backup restore."
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: int type: int
min: 0 min: 0
default: 5 default: 5
- variable: defaultSettings.concurrentVolumeBackupRestorePerNodeLimit - variable: defaultSettings.concurrentVolumeBackupRestorePerNodeLimit
label: Concurrent Volume Backup Restore Per Node Limit label: Concurrent Volume Backup Restore Per Node Limit
description: "This setting controls how many volumes on a node can restore the backup concurrently. description: "This setting controls how many volumes on a node can restore the backup concurrently. Set the value to **0** to disable backup restore."
Longhorn blocks the backup restore once the restoring volume count exceeds the limit.
Set the value to **0** to disable backup restore."
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: int type: int
min: 0 min: 0
@ -518,58 +509,28 @@ Set the value to **0** to disable backup restore."
default: 60 default: 60
- variable: defaultSettings.backingImageRecoveryWaitInterval - variable: defaultSettings.backingImageRecoveryWaitInterval
label: Backing Image Recovery Wait Interval label: Backing Image Recovery Wait Interval
description: "This interval in seconds determines how long Longhorn will wait before re-downloading the backing image file when all disk files of this backing image become failed or unknown. description: "This interval in seconds determines how long Longhorn will wait before re-downloading the backing image file when all disk files of this backing image become failed or unknown."
WARNING:
- This recovery only works for the backing image of which the creation type is \"download\".
- File state \"unknown\" means the related manager pods on the pod is not running or the node itself is down/disconnected."
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: int type: int
min: 0 min: 0
default: 300 default: 300
- variable: defaultSettings.guaranteedEngineManagerCPU - variable: defaultSettings.guaranteedInstanceManagerCPU
label: Guaranteed Engine Manager CPU label: Guaranteed Instance Manager CPU
description: "This integer value indicates how many percentage of the total allocatable CPU on each node will be reserved for each engine manager Pod. For example, 10 means 10% of the total CPU on a node will be allocated to each engine manager pod on this node. This will help maintain engine stability during high node workload. description: "This integer value indicates how many percentage of the total allocatable CPU on each node will be reserved for each instance manager Pod. You can leave it with the default value, which is 12%."
In order to prevent unexpected volume engine crash as well as guarantee a relative acceptable IO performance, you can use the following formula to calculate a value for this setting:
Guaranteed Engine Manager CPU = The estimated max Longhorn volume engine count on a node * 0.1 / The total allocatable CPUs on the node * 100.
The result of above calculation doesn't mean that's the maximum CPU resources the Longhorn workloads require. To fully exploit the Longhorn volume I/O performance, you can allocate/guarantee more CPU resources via this setting.
If it's hard to estimate the usage now, you can leave it with the default value, which is 12%. Then you can tune it when there is no running workload using Longhorn volumes.
WARNING:
- Value 0 means unsetting CPU requests for engine manager pods.
- Considering the possible new instance manager pods in the further system upgrade, this integer value is range from 0 to 40. And the sum with setting 'Guaranteed Engine Manager CPU' should not be greater than 40.
- One more set of instance manager pods may need to be deployed when the Longhorn system is upgraded. If current available CPUs of the nodes are not enough for the new instance manager pods, you need to detach the volumes using the oldest instance manager pods so that Longhorn can clean up the old pods automatically and release the CPU resources. And the new pods with the latest instance manager image will be launched then.
- This global setting will be ignored for a node if the field \"EngineManagerCPURequest\" on the node is set.
- After this setting is changed, all engine manager pods using this global setting on all the nodes will be automatically restarted. In other words, DO NOT CHANGE THIS SETTING WITH ATTACHED VOLUMES."
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: int type: int
min: 0 min: 0
max: 40 max: 40
default: 12 default: 12
- variable: defaultSettings.guaranteedReplicaManagerCPU - variable: defaultSettings.logLevel
label: Guaranteed Replica Manager CPU label: Log Level
description: "This integer value indicates how many percentage of the total allocatable CPU on each node will be reserved for each replica manager Pod. 10 means 10% of the total CPU on a node will be allocated to each replica manager pod on this node. This will help maintain replica stability during high node workload. description: "The log level Panic, Fatal, Error, Warn, Info, Debug, Trace used in longhorn manager. Default to Info."
In order to prevent unexpected volume replica crash as well as guarantee a relative acceptable IO performance, you can use the following formula to calculate a value for this setting:
Guaranteed Replica Manager CPU = The estimated max Longhorn volume replica count on a node * 0.1 / The total allocatable CPUs on the node * 100.
The result of above calculation doesn't mean that's the maximum CPU resources the Longhorn workloads require. To fully exploit the Longhorn volume I/O performance, you can allocate/guarantee more CPU resources via this setting.
If it's hard to estimate the usage now, you can leave it with the default value, which is 12%. Then you can tune it when there is no running workload using Longhorn volumes.
WARNING:
- Value 0 means unsetting CPU requests for replica manager pods.
- Considering the possible new instance manager pods in the further system upgrade, this integer value is range from 0 to 40. And the sum with setting 'Guaranteed Replica Manager CPU' should not be greater than 40.
- One more set of instance manager pods may need to be deployed when the Longhorn system is upgraded. If current available CPUs of the nodes are not enough for the new instance manager pods, you need to detach the volumes using the oldest instance manager pods so that Longhorn can clean up the old pods automatically and release the CPU resources. And the new pods with the latest instance manager image will be launched then.
- This global setting will be ignored for a node if the field \"ReplicaManagerCPURequest\" on the node is set.
- After this setting is changed, all replica manager pods using this global setting on all the nodes will be automatically restarted. In other words, DO NOT CHANGE THIS SETTING WITH ATTACHED VOLUMES."
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: int type: string
min: 0 default: "Info"
max: 40
default: 12
- variable: defaultSettings.kubernetesClusterAutoscalerEnabled - variable: defaultSettings.kubernetesClusterAutoscalerEnabled
label: Kubernetes Cluster Autoscaler Enabled (Experimental) label: Kubernetes Cluster Autoscaler Enabled (Experimental)
description: "Enabling this setting will notify Longhorn that the cluster is using Kubernetes Cluster Autoscaler. description: "Enabling this setting will notify Longhorn that the cluster is using Kubernetes Cluster Autoscaler."
Longhorn prevents data loss by only allowing the Cluster Autoscaler to scale down a node that met all conditions:
- No volume attached to the node.
- Is not the last node containing the replica of any volume.
- Is not running backing image components pod.
- Is not running share manager components pod."
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: boolean type: boolean
default: false default: false
@ -581,20 +542,13 @@ Set the value to **0** to disable backup restore."
default: false default: false
- variable: defaultSettings.storageNetwork - variable: defaultSettings.storageNetwork
label: Storage Network label: Storage Network
description: "Longhorn uses the storage network for in-cluster data traffic. Leave this blank to use the Kubernetes cluster network. description: "Longhorn uses the storage network for in-cluster data traffic. Leave this blank to use the Kubernetes cluster network."
To segregate the storage network, input the pre-existing NetworkAttachmentDefinition in \"<namespace>/<name>\" format.
WARNING:
- The cluster must have pre-existing Multus installed, and NetworkAttachmentDefinition IPs are reachable between nodes.
- DO NOT CHANGE THIS SETTING WITH ATTACHED VOLUMES. Longhorn will try to block this setting update when there are attached volumes.
- When applying the setting, Longhorn will restart all manager, instance-manager, and backing-image-manager pods."
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: string type: string
default: default:
- variable: defaultSettings.deletingConfirmationFlag - variable: defaultSettings.deletingConfirmationFlag
label: Deleting Confirmation Flag label: Deleting Confirmation Flag
description: "This flag is designed to prevent Longhorn from being accidentally uninstalled which will lead to data lost. description: "This flag is designed to prevent Longhorn from being accidentally uninstalled which will lead to data lost."
Set this flag to **true** to allow Longhorn uninstallation.
If this flag **false**, Longhorn uninstallation job will fail. "
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: boolean type: boolean
default: "false" default: "false"
@ -606,11 +560,7 @@ Set the value to **0** to disable backup restore."
default: "8" default: "8"
- variable: defaultSettings.snapshotDataIntegrity - variable: defaultSettings.snapshotDataIntegrity
label: Snapshot Data Integrity label: Snapshot Data Integrity
description: "This setting allows users to enable or disable snapshot hashing and data integrity checking. description: "This setting allows users to enable or disable snapshot hashing and data integrity checking."
Available options are
- **disabled**: Disable snapshot disk file hashing and data integrity checking.
- **enabled**: Enables periodic snapshot disk file hashing and data integrity checking. To detect the filesystem-unaware corruption caused by bit rot or other issues in snapshot disk files, Longhorn system periodically hashes files and finds corrupted ones. Hence, the system performance will be impacted during the periodical checking.
- **fast-check**: Enable snapshot disk file hashing and fast data integrity checking. Longhorn system only hashes snapshot disk files if their are not hashed or the modification time are changed. In this mode, filesystem-unaware corruption cannot be detected, but the impact on system performance can be minimized."
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: string type: string
default: "disabled" default: "disabled"
@ -622,17 +572,13 @@ Set the value to **0** to disable backup restore."
default: "false" default: "false"
- variable: defaultSettings.snapshotDataIntegrityCronjob - variable: defaultSettings.snapshotDataIntegrityCronjob
label: Snapshot Data Integrity Check CronJob label: Snapshot Data Integrity Check CronJob
description: "Unix-cron string format. The setting specifies when Longhorn checks the data integrity of snapshot disk files. description: "Unix-cron string format. The setting specifies when Longhorn checks the data integrity of snapshot disk files."
Warning: Hashing snapshot disk files impacts the performance of the system. It is recommended to run data integrity checks during off-peak times and to reduce the frequency of checks."
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: string type: string
default: "0 0 */7 * *" default: "0 0 */7 * *"
- variable: defaultSettings.removeSnapshotsDuringFilesystemTrim - variable: defaultSettings.removeSnapshotsDuringFilesystemTrim
label: Remove Snapshots During Filesystem Trim label: Remove Snapshots During Filesystem Trim
description: "This setting allows Longhorn filesystem trim feature to automatically mark the latest snapshot and its ancestors as removed and stops at the snapshot containing multiple children.\n\n description: "This setting allows Longhorn filesystem trim feature to automatically mark the latest snapshot and its ancestors as removed and stops at the snapshot containing multiple children."
Since Longhorn filesystem trim feature can be applied to the volume head and the followed continuous removed or system snapshots only.\n\n
Notice that trying to trim a removed files from a valid snapshot will do nothing but the filesystem will discard this kind of in-memory trimmable file info.\n\n
Later on if you mark the snapshot as removed and want to retry the trim, you may need to unmount and remount the filesystem so that the filesystem can recollect the trimmable file info."
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: boolean type: boolean
default: "false" default: "false"
@ -642,6 +588,48 @@ Set the value to **0** to disable backup restore."
group: "Longhorn Default Settings" group: "Longhorn Default Settings"
type: boolean type: boolean
default: false default: false
- variable: defaultSettings.replicaFileSyncHttpClientTimeout
label: Timeout of HTTP Client to Replica File Sync Server
description: "In seconds. The setting specifies the HTTP client timeout to the file sync server."
group: "Longhorn Default Settings"
type: int
default: "30"
- variable: defaultSettings.backupCompressionMethod
label: Backup Compression Method
description: "This setting allows users to specify backup compression method."
group: "Longhorn Default Settings"
type: string
default: "lz4"
- variable: defaultSettings.backupConcurrentLimit
label: Backup Concurrent Limit Per Backup
description: "This setting controls how many worker threads per backup concurrently."
group: "Longhorn Default Settings"
type: int
min: 1
default: 2
- variable: defaultSettings.restoreConcurrentLimit
label: Restore Concurrent Limit Per Backup
description: "This setting controls how many worker threads per restore concurrently."
group: "Longhorn Default Settings"
type: int
min: 1
default: 2
- variable: defaultSettings.v2DataEngine
label: V2 Data Engine
description: "This allows users to activate v2 data engine based on SPDK. Currently, it is in the preview phase and should not be utilized in a production environment."
group: "Longhorn V2 Data Engine (Preview Feature) Settings"
type: boolean
default: false
- variable: defaultSettings.offlineReplicaRebuilding
label: Offline Replica Rebuilding
description: "This setting allows users to enable the offline replica rebuilding for volumes using v2 data engine."
group: "Longhorn V2 Data Engine (Preview Feature) Settings"
required: true
type: enum
options:
- "enabled"
- "disabled"
default: "enabled"
- variable: persistence.defaultClass - variable: persistence.defaultClass
default: "true" default: "true"
description: "Set as default StorageClass for Longhorn" description: "Set as default StorageClass for Longhorn"
@ -651,7 +639,7 @@ Set the value to **0** to disable backup restore."
type: boolean type: boolean
- variable: persistence.reclaimPolicy - variable: persistence.reclaimPolicy
label: Storage Class Retain Policy label: Storage Class Retain Policy
description: "Define reclaim policy (Retain or Delete)" description: "Define reclaim policy. Options: `Retain`, `Delete`"
group: "Longhorn Storage Class Settings" group: "Longhorn Storage Class Settings"
required: true required: true
type: enum type: enum
@ -668,7 +656,7 @@ Set the value to **0** to disable backup restore."
max: 10 max: 10
default: 3 default: 3
- variable: persistence.defaultDataLocality - variable: persistence.defaultDataLocality
description: "Set data locality for Longhorn StorageClass" description: "Set data locality for Longhorn StorageClass. Options: `disabled`, `best-effort`"
label: Default Storage Class Data Locality label: Default Storage Class Data Locality
group: "Longhorn Storage Class Settings" group: "Longhorn Storage Class Settings"
type: enum type: enum
@ -690,18 +678,18 @@ Set the value to **0** to disable backup restore."
group: "Longhorn Storage Class Settings" group: "Longhorn Storage Class Settings"
type: string type: string
default: default:
- variable: defaultSettings.defaultNodeSelector.enable - variable: persistence.defaultNodeSelector.enable
description: "Enable recurring Node selector for Longhorn StorageClass" description: "Enable Node selector for Longhorn StorageClass"
group: "Longhorn Storage Class Settings" group: "Longhorn Storage Class Settings"
label: Enable Storage Class Node Selector label: Enable Storage Class Node Selector
type: boolean type: boolean
default: false default: false
show_subquestion_if: true show_subquestion_if: true
subquestions: subquestions:
- variable: defaultSettings.defaultNodeSelector.selector - variable: persistence.defaultNodeSelector.selector
label: Storage Class Node Selector label: Storage Class Node Selector
description: 'We use NodeSelector when we want to bind PVC via StorageClass into desired mountpoint on the nodes tagged whith its value' description: 'This selector enables only certain nodes having these tags to be used for the volume. e.g. `"storage,fast"`'
group: "Longhorn Default Settings" group: "Longhorn Storage Class Settings"
type: string type: string
default: default:
- variable: persistence.backingImage.enable - variable: persistence.backingImage.enable
@ -754,7 +742,7 @@ Set the value to **0** to disable backup restore."
type: string type: string
default: default:
- variable: persistence.removeSnapshotsDuringFilesystemTrim - variable: persistence.removeSnapshotsDuringFilesystemTrim
description: "Allow automatically removing snapshots during filesystem trim for Longhorn StorageClass" description: "Allow automatically removing snapshots during filesystem trim for Longhorn StorageClass. Options: `ignored`, `enabled`, `disabled`"
label: Default Storage Class Remove Snapshots During Filesystem Trim label: Default Storage Class Remove Snapshots During Filesystem Trim
group: "Longhorn Storage Class Settings" group: "Longhorn Storage Class Settings"
type: enum type: enum
@ -785,7 +773,7 @@ Set the value to **0** to disable backup restore."
label: Ingress Path label: Ingress Path
- variable: service.ui.type - variable: service.ui.type
default: "Rancher-Proxy" default: "Rancher-Proxy"
description: "Define Longhorn UI service type" description: "Define Longhorn UI service type. Options: `ClusterIP`, `NodePort`, `LoadBalancer`, `Rancher-Proxy`"
type: enum type: enum
options: options:
- "ClusterIP" - "ClusterIP"
@ -806,7 +794,7 @@ Set the value to **0** to disable backup restore."
show_if: "service.ui.type=NodePort||service.ui.type=LoadBalancer" show_if: "service.ui.type=NodePort||service.ui.type=LoadBalancer"
label: UI Service NodePort number label: UI Service NodePort number
- variable: enablePSP - variable: enablePSP
default: "true" default: "false"
description: "Setup a pod security policy for Longhorn workloads." description: "Setup a pod security policy for Longhorn workloads."
label: Pod Security Policy label: Pod Security Policy
type: boolean type: boolean
@ -817,3 +805,21 @@ Set the value to **0** to disable backup restore."
label: Rancher Windows Cluster label: Rancher Windows Cluster
type: boolean type: boolean
group: "Other Settings" group: "Other Settings"
- variable: networkPolicies.enabled
description: "Enable NetworkPolicies to limit access to the longhorn pods.
Warning: The Rancher Proxy will not work if this feature is enabled and a custom NetworkPolicy must be added."
group: "Other Settings"
label: Network Policies
default: "false"
type: boolean
subquestions:
- variable: networkPolicies.type
label: Network Policies for Ingress
description: "Create the policy based on your distribution to allow access for the ingress. Options: `k3s`, `rke2`, `rke1`"
show_if: "networkPolicies.enabled=true&&ingress.enabled=true"
type: enum
default: "rke2"
options:
- "rke1"
- "rke2"
- "k3s"

View File

@ -37,11 +37,15 @@ rules:
- apiGroups: ["longhorn.io"] - apiGroups: ["longhorn.io"]
resources: ["volumes", "volumes/status", "engines", "engines/status", "replicas", "replicas/status", "settings", resources: ["volumes", "volumes/status", "engines", "engines/status", "replicas", "replicas/status", "settings",
"engineimages", "engineimages/status", "nodes", "nodes/status", "instancemanagers", "instancemanagers/status", "engineimages", "engineimages/status", "nodes", "nodes/status", "instancemanagers", "instancemanagers/status",
{{- if .Values.openshift.enabled }}
"engineimages/finalizers", "nodes/finalizers", "instancemanagers/finalizers",
{{- end }}
"sharemanagers", "sharemanagers/status", "backingimages", "backingimages/status", "sharemanagers", "sharemanagers/status", "backingimages", "backingimages/status",
"backingimagemanagers", "backingimagemanagers/status", "backingimagedatasources", "backingimagedatasources/status", "backingimagemanagers", "backingimagemanagers/status", "backingimagedatasources", "backingimagedatasources/status",
"backuptargets", "backuptargets/status", "backupvolumes", "backupvolumes/status", "backups", "backups/status", "backuptargets", "backuptargets/status", "backupvolumes", "backupvolumes/status", "backups", "backups/status",
"recurringjobs", "recurringjobs/status", "orphans", "orphans/status", "snapshots", "snapshots/status", "recurringjobs", "recurringjobs/status", "orphans", "orphans/status", "snapshots", "snapshots/status",
"supportbundles", "supportbundles/status", "systembackups", "systembackups/status", "systemrestores", "systemrestores/status"] "supportbundles", "supportbundles/status", "systembackups", "systembackups/status", "systemrestores", "systemrestores/status",
"volumeattachments", "volumeattachments/status"]
verbs: ["*"] verbs: ["*"]
- apiGroups: ["coordination.k8s.io"] - apiGroups: ["coordination.k8s.io"]
resources: ["leases"] resources: ["leases"]
@ -58,3 +62,16 @@ rules:
- apiGroups: ["rbac.authorization.k8s.io"] - apiGroups: ["rbac.authorization.k8s.io"]
resources: ["roles", "rolebindings", "clusterrolebindings", "clusterroles"] resources: ["roles", "rolebindings", "clusterrolebindings", "clusterroles"]
verbs: ["*"] verbs: ["*"]
{{- if .Values.openshift.enabled }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: longhorn-ocp-privileged-role
labels: {{- include "longhorn.labels" . | nindent 4 }}
rules:
- apiGroups: ["security.openshift.io"]
resources: ["securitycontextconstraints"]
resourceNames: ["anyuid", "privileged"]
verbs: ["use"]
{{- end }}

View File

@ -25,3 +25,25 @@ subjects:
- kind: ServiceAccount - kind: ServiceAccount
name: longhorn-support-bundle name: longhorn-support-bundle
namespace: {{ include "release_namespace" . }} namespace: {{ include "release_namespace" . }}
{{- if .Values.openshift.enabled }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: longhorn-ocp-privileged-bind
labels: {{- include "longhorn.labels" . | nindent 4 }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: longhorn-ocp-privileged-role
subjects:
- kind: ServiceAccount
name: longhorn-service-account
namespace: {{ include "release_namespace" . }}
- kind: ServiceAccount
name: longhorn-ui-service-account
namespace: {{ include "release_namespace" . }}
- kind: ServiceAccount
name: default # supportbundle-agent-support-bundle uses default sa
namespace: {{ include "release_namespace" . }}
{{- end }}

View File

@ -296,12 +296,6 @@ spec:
properties: properties:
currentChecksum: currentChecksum:
type: string type: string
directory:
description: 'Deprecated: This field is useless.'
type: string
downloadProgress:
description: 'Deprecated: This field is renamed to `Progress`.'
type: integer
message: message:
type: string type: string
name: name:
@ -317,9 +311,6 @@ spec:
type: integer type: integer
state: state:
type: string type: string
url:
description: 'Deprecated: This field is useless now. The manager of backing image files doesn''t care if a file is downloaded and how.'
type: string
uuid: uuid:
type: string type: string
type: object type: object
@ -364,7 +355,7 @@ spec:
name: longhorn-conversion-webhook name: longhorn-conversion-webhook
namespace: {{ include "release_namespace" . }} namespace: {{ include "release_namespace" . }}
path: /v1/webhook/conversion path: /v1/webhook/conversion
port: 9443 port: 9501
conversionReviewVersions: conversionReviewVersions:
- v1beta2 - v1beta2
- v1beta1 - v1beta1
@ -446,9 +437,6 @@ spec:
additionalProperties: additionalProperties:
type: string type: string
type: object type: object
imageURL:
description: 'Deprecated: This kind of info will be included in the related BackingImageDataSource.'
type: string
sourceParameters: sourceParameters:
additionalProperties: additionalProperties:
type: string type: string
@ -465,19 +453,6 @@ spec:
properties: properties:
checksum: checksum:
type: string type: string
diskDownloadProgressMap:
additionalProperties:
type: integer
description: 'Deprecated: Replaced by field `Progress` in `DiskFileStatusMap`.'
nullable: true
type: object
diskDownloadStateMap:
additionalProperties:
description: BackingImageDownloadState is replaced by BackingImageState.
type: string
description: 'Deprecated: Replaced by field `State` in `DiskFileStatusMap`.'
nullable: true
type: object
diskFileStatusMap: diskFileStatusMap:
additionalProperties: additionalProperties:
properties: properties:
@ -637,6 +612,9 @@ spec:
backupCreatedAt: backupCreatedAt:
description: The snapshot backup upload finished time. description: The snapshot backup upload finished time.
type: string type: string
compressionMethod:
description: Compression method
type: string
error: error:
description: The error message when taking the snapshot backup. description: The error message when taking the snapshot backup.
type: string type: string
@ -724,7 +702,7 @@ spec:
name: longhorn-conversion-webhook name: longhorn-conversion-webhook
namespace: {{ include "release_namespace" . }} namespace: {{ include "release_namespace" . }}
path: /v1/webhook/conversion path: /v1/webhook/conversion
port: 9443 port: 9501
conversionReviewVersions: conversionReviewVersions:
- v1beta2 - v1beta2
- v1beta1 - v1beta1
@ -1032,6 +1010,9 @@ spec:
size: size:
description: The backup volume size. description: The backup volume size.
type: string type: string
storageClassName:
description: the storage class name of pv/pvc binding with the volume.
type: string
type: object type: object
type: object type: object
served: true served: true
@ -1064,7 +1045,7 @@ spec:
name: longhorn-conversion-webhook name: longhorn-conversion-webhook
namespace: {{ include "release_namespace" . }} namespace: {{ include "release_namespace" . }}
path: /v1/webhook/conversion path: /v1/webhook/conversion
port: 9443 port: 9501
conversionReviewVersions: conversionReviewVersions:
- v1beta2 - v1beta2
- v1beta1 - v1beta1
@ -1333,6 +1314,11 @@ spec:
properties: properties:
active: active:
type: boolean type: boolean
backendStoreDriver:
enum:
- v1
- v2
type: string
backupVolume: backupVolume:
type: string type: string
desireState: desireState:
@ -1340,13 +1326,17 @@ spec:
disableFrontend: disableFrontend:
type: boolean type: boolean
engineImage: engineImage:
description: 'Deprecated: Replaced by field `image`.'
type: string type: string
frontend: frontend:
enum: enum:
- blockdev - blockdev
- iscsi - iscsi
- nvmf
- "" - ""
type: string type: string
image:
type: string
logRequested: logRequested:
type: boolean type: boolean
nodeID: nodeID:
@ -1668,15 +1658,13 @@ spec:
spec: spec:
description: InstanceManagerSpec defines the desired state of the Longhorn instancer manager description: InstanceManagerSpec defines the desired state of the Longhorn instancer manager
properties: properties:
engineImage:
description: 'TODO: deprecate this field'
type: string
image: image:
type: string type: string
nodeID: nodeID:
type: string type: string
type: type:
enum: enum:
- aio
- engine - engine
- replica - replica
type: string type: string
@ -1694,11 +1682,13 @@ spec:
type: integer type: integer
currentState: currentState:
type: string type: string
instances: instanceEngines:
additionalProperties: additionalProperties:
properties: properties:
spec: spec:
properties: properties:
backendStoreDriver:
type: string
name: name:
type: string type: string
type: object type: object
@ -1727,6 +1717,77 @@ spec:
type: object type: object
nullable: true nullable: true
type: object type: object
instanceReplicas:
additionalProperties:
properties:
spec:
properties:
backendStoreDriver:
type: string
name:
type: string
type: object
status:
properties:
endpoint:
type: string
errorMsg:
type: string
listen:
type: string
portEnd:
format: int32
type: integer
portStart:
format: int32
type: integer
resourceVersion:
format: int64
type: integer
state:
type: string
type:
type: string
type: object
type: object
nullable: true
type: object
instances:
additionalProperties:
properties:
spec:
properties:
backendStoreDriver:
type: string
name:
type: string
type: object
status:
properties:
endpoint:
type: string
errorMsg:
type: string
listen:
type: string
portEnd:
format: int32
type: integer
portStart:
format: int32
type: integer
resourceVersion:
format: int64
type: integer
state:
type: string
type:
type: string
type: object
type: object
nullable: true
description: 'Deprecated: Replaced by InstanceEngines and InstanceReplicas'
type: object
ip: ip:
type: string type: string
ownerID: ownerID:
@ -1763,7 +1824,7 @@ spec:
name: longhorn-conversion-webhook name: longhorn-conversion-webhook
namespace: {{ include "release_namespace" . }} namespace: {{ include "release_namespace" . }}
path: /v1/webhook/conversion path: /v1/webhook/conversion
port: 9443 port: 9501
conversionReviewVersions: conversionReviewVersions:
- v1beta2 - v1beta2
- v1beta1 - v1beta1
@ -1865,16 +1926,19 @@ spec:
items: items:
type: string type: string
type: array type: array
diskType:
enum:
- filesystem
- block
type: string
type: object type: object
type: object type: object
engineManagerCPURequest:
type: integer
evictionRequested: evictionRequested:
type: boolean type: boolean
instanceManagerCPURequest:
type: integer
name: name:
type: string type: string
replicaManagerCPURequest:
type: integer
tags: tags:
items: items:
type: string type: string
@ -1934,6 +1998,8 @@ spec:
type: object type: object
nullable: true nullable: true
type: array type: array
diskType:
type: string
diskUUID: diskUUID:
type: string type: string
scheduledReplica: scheduledReplica:
@ -2153,7 +2219,7 @@ spec:
jsonPath: .spec.groups jsonPath: .spec.groups
name: Groups name: Groups
type: string type: string
- description: Should be one of "backup" or "snapshot" - description: Should be one of "snapshot", "snapshot-force-create", "snapshot-cleanup", "snapshot-delete", "backup", "backup-force-create" or "filesystem-trim"
jsonPath: .spec.task jsonPath: .spec.task
name: Task name: Task
type: string type: string
@ -2215,10 +2281,15 @@ spec:
description: The retain count of the snapshot/backup. description: The retain count of the snapshot/backup.
type: integer type: integer
task: task:
description: The recurring job type. Can be "snapshot" or "backup". description: The recurring job task. Can be "snapshot", "snapshot-force-create", "snapshot-cleanup", "snapshot-delete", "backup", "backup-force-create" or "filesystem-trim"
enum: enum:
- snapshot - snapshot
- snapshot-force-create
- snapshot-cleanup
- snapshot-delete
- backup - backup
- backup-force-create
- filesystem-trim
type: string type: string
type: object type: object
status: status:
@ -2348,16 +2419,15 @@ spec:
properties: properties:
active: active:
type: boolean type: boolean
backendStoreDriver:
enum:
- v1
- v2
type: string
backingImage: backingImage:
type: string type: string
baseImage:
description: Deprecated. Rename to BackingImage
type: string
dataDirectoryName: dataDirectoryName:
type: string type: string
dataPath:
description: Deprecated
type: string
desireState: desireState:
type: string type: string
diskID: diskID:
@ -2365,6 +2435,7 @@ spec:
diskPath: diskPath:
type: string type: string
engineImage: engineImage:
description: 'Deprecated: Replaced by field `image`.'
type: string type: string
engineName: engineName:
type: string type: string
@ -2374,6 +2445,8 @@ spec:
type: string type: string
healthyAt: healthyAt:
type: string type: string
image:
type: string
logRequested: logRequested:
type: boolean type: boolean
nodeID: nodeID:
@ -2624,16 +2697,20 @@ spec:
description: ShareManagerSpec defines the desired state of the Longhorn share manager description: ShareManagerSpec defines the desired state of the Longhorn share manager
properties: properties:
image: image:
description: Share manager image used for creating a share manager pod
type: string type: string
type: object type: object
status: status:
description: ShareManagerStatus defines the observed state of the Longhorn share manager description: ShareManagerStatus defines the observed state of the Longhorn share manager
properties: properties:
endpoint: endpoint:
description: NFS endpoint that can access the mounted filesystem of the volume
type: string type: string
ownerID: ownerID:
description: The node ID on which the controller is responsible to reconcile this share manager resource
type: string type: string
state: state:
description: The state of the share manager resource
type: string type: string
type: object type: object
type: object type: object
@ -2945,6 +3022,11 @@ spec:
type: object type: object
spec: spec:
description: SystemBackupSpec defines the desired state of the Longhorn SystemBackup description: SystemBackupSpec defines the desired state of the Longhorn SystemBackup
properties:
volumeBackupPolicy:
description: The create volume backup policy Can be "if-not-present", "always" or "disabled"
nullable: true
type: string
type: object type: object
status: status:
description: SystemBackupStatus defines the observed state of the Longhorn SystemBackup description: SystemBackupStatus defines the observed state of the Longhorn SystemBackup
@ -3129,7 +3211,7 @@ spec:
name: longhorn-conversion-webhook name: longhorn-conversion-webhook
namespace: {{ include "release_namespace" . }} namespace: {{ include "release_namespace" . }}
path: /v1/webhook/conversion path: /v1/webhook/conversion
port: 9443 port: 9501
conversionReviewVersions: conversionReviewVersions:
- v1beta2 - v1beta2
- v1beta1 - v1beta1
@ -3236,10 +3318,18 @@ spec:
- rwo - rwo
- rwx - rwx
type: string type: string
backendStoreDriver:
enum:
- v1
- v2
type: string
backingImage: backingImage:
type: string type: string
baseImage: backupCompressionMethod:
description: Deprecated. Rename to BackingImage enum:
- none
- lz4
- gzip
type: string type: string
dataLocality: dataLocality:
enum: enum:
@ -3258,21 +3348,19 @@ spec:
encrypted: encrypted:
type: boolean type: boolean
engineImage: engineImage:
description: 'Deprecated: Replaced by field `image`.'
type: string type: string
fromBackup: fromBackup:
type: string type: string
restoreVolumeRecurringJob:
enum:
- ignored
- enabled
- disabled
type: string
frontend: frontend:
enum: enum:
- blockdev - blockdev
- iscsi - iscsi
- nvmf
- "" - ""
type: string type: string
image:
type: string
lastAttachedBy: lastAttachedBy:
type: string type: string
migratable: migratable:
@ -3287,34 +3375,13 @@ spec:
type: array type: array
numberOfReplicas: numberOfReplicas:
type: integer type: integer
recurringJobs: offlineReplicaRebuilding:
description: Deprecated. Replaced by a separate resource named "RecurringJob" description: OfflineReplicaRebuilding is used to determine if the offline replica rebuilding feature is enabled or not
items: enum:
description: 'VolumeRecurringJobSpec is a deprecated struct. TODO: Should be removed when recurringJobs gets removed from the volume spec.' - ignored
properties: - disabled
concurrency: - enabled
type: integer type: string
cron:
type: string
groups:
items:
type: string
type: array
labels:
additionalProperties:
type: string
type: object
name:
type: string
retain:
type: integer
task:
enum:
- snapshot
- backup
type: string
type: object
type: array
replicaAutoBalance: replicaAutoBalance:
enum: enum:
- ignored - ignored
@ -3322,6 +3389,33 @@ spec:
- least-effort - least-effort
- best-effort - best-effort
type: string type: string
replicaDiskSoftAntiAffinity:
description: Replica disk soft anti affinity of the volume. Set enabled to allow replicas to be scheduled in the same disk.
enum:
- ignored
- enabled
- disabled
type: string
replicaSoftAntiAffinity:
description: Replica soft anti affinity of the volume. Set enabled to allow replicas to be scheduled on the same node.
enum:
- ignored
- enabled
- disabled
type: string
replicaZoneSoftAntiAffinity:
description: Replica zone soft anti affinity of the volume. Set enabled to allow replicas to be scheduled in the same zone.
enum:
- ignored
- enabled
- disabled
type: string
restoreVolumeRecurringJob:
enum:
- ignored
- enabled
- disabled
type: string
revisionCounterDisabled: revisionCounterDisabled:
type: boolean type: boolean
size: size:
@ -3384,6 +3478,9 @@ spec:
type: array type: array
currentImage: currentImage:
type: string type: string
currentMigrationNodeID:
description: the node that this volume is currently migrating to
type: string
currentNodeID: currentNodeID:
type: string type: string
expansionRequired: expansionRequired:
@ -3429,9 +3526,12 @@ spec:
type: string type: string
lastDegradedAt: lastDegradedAt:
type: string type: string
offlineReplicaRebuildingRequired:
type: boolean
ownerID: ownerID:
type: string type: string
pendingNodeID: pendingNodeID:
description: Deprecated.
type: string type: string
remountRequestedAt: remountRequestedAt:
type: string type: string
@ -3459,3 +3559,130 @@ status:
plural: "" plural: ""
conditions: [] conditions: []
storedVersions: [] storedVersions: []
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.7.0
creationTimestamp: null
labels: {{- include "longhorn.labels" . | nindent 4 }}
longhorn-manager: ""
name: volumeattachments.longhorn.io
spec:
group: longhorn.io
names:
kind: VolumeAttachment
listKind: VolumeAttachmentList
plural: volumeattachments
shortNames:
- lhva
singular: volumeattachment
scope: Namespaced
versions:
- additionalPrinterColumns:
- jsonPath: .metadata.creationTimestamp
name: Age
type: date
name: v1beta2
schema:
openAPIV3Schema:
description: VolumeAttachment stores attachment information of a Longhorn volume
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: VolumeAttachmentSpec defines the desired state of Longhorn VolumeAttachment
properties:
attachmentTickets:
additionalProperties:
properties:
generation:
description: A sequence number representing a specific generation of the desired state. Populated by the system. Read-only.
format: int64
type: integer
id:
description: The unique ID of this attachment. Used to differentiate different attachments of the same volume.
type: string
nodeID:
description: The node that this attachment is requesting
type: string
parameters:
additionalProperties:
type: string
description: Optional additional parameter for this attachment
type: object
type:
type: string
type: object
type: object
volume:
description: The name of Longhorn volume of this VolumeAttachment
type: string
required:
- volume
type: object
status:
description: VolumeAttachmentStatus defines the observed state of Longhorn VolumeAttachment
properties:
attachmentTicketStatuses:
additionalProperties:
properties:
conditions:
description: Record any error when trying to fulfill this attachment
items:
properties:
lastProbeTime:
description: Last time we probed the condition.
type: string
lastTransitionTime:
description: Last time the condition transitioned from one status to another.
type: string
message:
description: Human-readable message indicating details about last transition.
type: string
reason:
description: Unique, one-word, CamelCase reason for the condition's last transition.
type: string
status:
description: Status is the status of the condition. Can be True, False, Unknown.
type: string
type:
description: Type is the type of the condition.
type: string
type: object
nullable: true
type: array
generation:
description: A sequence number representing a specific generation of the desired state. Populated by the system. Read-only.
format: int64
type: integer
id:
description: The unique ID of this attachment. Used to differentiate different attachments of the same volume.
type: string
satisfied:
description: Indicate whether this attachment ticket has been satisfied
type: boolean
required:
- conditions
- satisfied
type: object
type: object
type: object
type: object
served: true
storage: true
subresources:
status: {}
status:
acceptedNames:
kind: ""
plural: ""
conditions: []
storedVersions: []

View File

@ -18,10 +18,6 @@ spec:
{{- toYaml . | nindent 8 }} {{- toYaml . | nindent 8 }}
{{- end }} {{- end }}
spec: spec:
initContainers:
- name: wait-longhorn-admission-webhook
image: {{ template "registry_url" . }}{{ .Values.image.longhorn.manager.repository }}:{{ .Values.image.longhorn.manager.tag }}
command: ['sh', '-c', 'while [ $(curl -m 1 -s -o /dev/null -w "%{http_code}" -k https://longhorn-admission-webhook:9443/v1/healthz) != "200" ]; do echo waiting; sleep 2; done']
containers: containers:
- name: longhorn-manager - name: longhorn-manager
image: {{ template "registry_url" . }}{{ .Values.image.longhorn.manager.repository }}:{{ .Values.image.longhorn.manager.tag }} image: {{ template "registry_url" . }}{{ .Values.image.longhorn.manager.repository }}:{{ .Values.image.longhorn.manager.tag }}
@ -52,9 +48,17 @@ spec:
ports: ports:
- containerPort: 9500 - containerPort: 9500
name: manager name: manager
- containerPort: 9501
name: conversion-wh
- containerPort: 9502
name: admission-wh
- containerPort: 9503
name: recov-backend
readinessProbe: readinessProbe:
tcpSocket: httpGet:
port: 9500 path: /v1/healthz
port: 9501
scheme: HTTPS
volumeMounts: volumeMounts:
- name: dev - name: dev
mountPath: /host/dev/ mountPath: /host/dev/

View File

@ -15,6 +15,7 @@ data:
{{ if not (kindIs "invalid" .Values.defaultSettings.replicaAutoBalance) }}replica-auto-balance: {{ .Values.defaultSettings.replicaAutoBalance }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.replicaAutoBalance) }}replica-auto-balance: {{ .Values.defaultSettings.replicaAutoBalance }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.storageOverProvisioningPercentage) }}storage-over-provisioning-percentage: {{ .Values.defaultSettings.storageOverProvisioningPercentage }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.storageOverProvisioningPercentage) }}storage-over-provisioning-percentage: {{ .Values.defaultSettings.storageOverProvisioningPercentage }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.storageMinimalAvailablePercentage) }}storage-minimal-available-percentage: {{ .Values.defaultSettings.storageMinimalAvailablePercentage }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.storageMinimalAvailablePercentage) }}storage-minimal-available-percentage: {{ .Values.defaultSettings.storageMinimalAvailablePercentage }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.storageReservedPercentageForDefaultDisk) }}storage-reserved-percentage-for-default-disk: {{ .Values.defaultSettings.storageReservedPercentageForDefaultDisk }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.upgradeChecker) }}upgrade-checker: {{ .Values.defaultSettings.upgradeChecker }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.upgradeChecker) }}upgrade-checker: {{ .Values.defaultSettings.upgradeChecker }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.defaultReplicaCount) }}default-replica-count: {{ .Values.defaultSettings.defaultReplicaCount }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.defaultReplicaCount) }}default-replica-count: {{ .Values.defaultSettings.defaultReplicaCount }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.defaultDataLocality) }}default-data-locality: {{ .Values.defaultSettings.defaultDataLocality }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.defaultDataLocality) }}default-data-locality: {{ .Values.defaultSettings.defaultDataLocality }}{{ end }}
@ -50,10 +51,9 @@ data:
{{ if not (kindIs "invalid" .Values.defaultSettings.autoDeletePodWhenVolumeDetachedUnexpectedly) }}auto-delete-pod-when-volume-detached-unexpectedly: {{ .Values.defaultSettings.autoDeletePodWhenVolumeDetachedUnexpectedly }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.autoDeletePodWhenVolumeDetachedUnexpectedly) }}auto-delete-pod-when-volume-detached-unexpectedly: {{ .Values.defaultSettings.autoDeletePodWhenVolumeDetachedUnexpectedly }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.disableSchedulingOnCordonedNode) }}disable-scheduling-on-cordoned-node: {{ .Values.defaultSettings.disableSchedulingOnCordonedNode }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.disableSchedulingOnCordonedNode) }}disable-scheduling-on-cordoned-node: {{ .Values.defaultSettings.disableSchedulingOnCordonedNode }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.replicaZoneSoftAntiAffinity) }}replica-zone-soft-anti-affinity: {{ .Values.defaultSettings.replicaZoneSoftAntiAffinity }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.replicaZoneSoftAntiAffinity) }}replica-zone-soft-anti-affinity: {{ .Values.defaultSettings.replicaZoneSoftAntiAffinity }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.replicaDiskSoftAntiAffinity) }}replica-disk-soft-anti-affinity: {{ .Values.defaultSettings.replicaDiskSoftAntiAffinity }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.nodeDownPodDeletionPolicy) }}node-down-pod-deletion-policy: {{ .Values.defaultSettings.nodeDownPodDeletionPolicy }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.nodeDownPodDeletionPolicy) }}node-down-pod-deletion-policy: {{ .Values.defaultSettings.nodeDownPodDeletionPolicy }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.allowNodeDrainWithLastHealthyReplica) }}allow-node-drain-with-last-healthy-replica: {{ .Values.defaultSettings.allowNodeDrainWithLastHealthyReplica }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.nodeDrainPolicy) }}node-drain-policy: {{ .Values.defaultSettings.nodeDrainPolicy }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.mkfsExt4Parameters) }}mkfs-ext4-parameters: {{ .Values.defaultSettings.mkfsExt4Parameters }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.disableReplicaRebuild) }}disable-replica-rebuild: {{ .Values.defaultSettings.disableReplicaRebuild }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.replicaReplenishmentWaitInterval) }}replica-replenishment-wait-interval: {{ .Values.defaultSettings.replicaReplenishmentWaitInterval }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.replicaReplenishmentWaitInterval) }}replica-replenishment-wait-interval: {{ .Values.defaultSettings.replicaReplenishmentWaitInterval }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.concurrentReplicaRebuildPerNodeLimit) }}concurrent-replica-rebuild-per-node-limit: {{ .Values.defaultSettings.concurrentReplicaRebuildPerNodeLimit }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.concurrentReplicaRebuildPerNodeLimit) }}concurrent-replica-rebuild-per-node-limit: {{ .Values.defaultSettings.concurrentReplicaRebuildPerNodeLimit }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.concurrentVolumeBackupRestorePerNodeLimit) }}concurrent-volume-backup-restore-per-node-limit: {{ .Values.defaultSettings.concurrentVolumeBackupRestorePerNodeLimit }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.concurrentVolumeBackupRestorePerNodeLimit) }}concurrent-volume-backup-restore-per-node-limit: {{ .Values.defaultSettings.concurrentVolumeBackupRestorePerNodeLimit }}{{ end }}
@ -64,8 +64,7 @@ data:
{{ if not (kindIs "invalid" .Values.defaultSettings.concurrentAutomaticEngineUpgradePerNodeLimit) }}concurrent-automatic-engine-upgrade-per-node-limit: {{ .Values.defaultSettings.concurrentAutomaticEngineUpgradePerNodeLimit }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.concurrentAutomaticEngineUpgradePerNodeLimit) }}concurrent-automatic-engine-upgrade-per-node-limit: {{ .Values.defaultSettings.concurrentAutomaticEngineUpgradePerNodeLimit }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.backingImageCleanupWaitInterval) }}backing-image-cleanup-wait-interval: {{ .Values.defaultSettings.backingImageCleanupWaitInterval }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.backingImageCleanupWaitInterval) }}backing-image-cleanup-wait-interval: {{ .Values.defaultSettings.backingImageCleanupWaitInterval }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.backingImageRecoveryWaitInterval) }}backing-image-recovery-wait-interval: {{ .Values.defaultSettings.backingImageRecoveryWaitInterval }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.backingImageRecoveryWaitInterval) }}backing-image-recovery-wait-interval: {{ .Values.defaultSettings.backingImageRecoveryWaitInterval }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.guaranteedEngineManagerCPU) }}guaranteed-engine-manager-cpu: {{ .Values.defaultSettings.guaranteedEngineManagerCPU }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.guaranteedInstanceManagerCPU) }}guaranteed-instance-manager-cpu: {{ .Values.defaultSettings.guaranteedInstanceManagerCPU }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.guaranteedReplicaManagerCPU) }}guaranteed-replica-manager-cpu: {{ .Values.defaultSettings.guaranteedReplicaManagerCPU }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.kubernetesClusterAutoscalerEnabled) }}kubernetes-cluster-autoscaler-enabled: {{ .Values.defaultSettings.kubernetesClusterAutoscalerEnabled }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.kubernetesClusterAutoscalerEnabled) }}kubernetes-cluster-autoscaler-enabled: {{ .Values.defaultSettings.kubernetesClusterAutoscalerEnabled }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.orphanAutoDeletion) }}orphan-auto-deletion: {{ .Values.defaultSettings.orphanAutoDeletion }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.orphanAutoDeletion) }}orphan-auto-deletion: {{ .Values.defaultSettings.orphanAutoDeletion }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.storageNetwork) }}storage-network: {{ .Values.defaultSettings.storageNetwork }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.storageNetwork) }}storage-network: {{ .Values.defaultSettings.storageNetwork }}{{ end }}
@ -76,3 +75,12 @@ data:
{{ if not (kindIs "invalid" .Values.defaultSettings.snapshotDataIntegrityCronjob) }}snapshot-data-integrity-cronjob: {{ .Values.defaultSettings.snapshotDataIntegrityCronjob }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.snapshotDataIntegrityCronjob) }}snapshot-data-integrity-cronjob: {{ .Values.defaultSettings.snapshotDataIntegrityCronjob }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.removeSnapshotsDuringFilesystemTrim) }}remove-snapshots-during-filesystem-trim: {{ .Values.defaultSettings.removeSnapshotsDuringFilesystemTrim }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.removeSnapshotsDuringFilesystemTrim) }}remove-snapshots-during-filesystem-trim: {{ .Values.defaultSettings.removeSnapshotsDuringFilesystemTrim }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.fastReplicaRebuildEnabled) }}fast-replica-rebuild-enabled: {{ .Values.defaultSettings.fastReplicaRebuildEnabled }}{{ end }} {{ if not (kindIs "invalid" .Values.defaultSettings.fastReplicaRebuildEnabled) }}fast-replica-rebuild-enabled: {{ .Values.defaultSettings.fastReplicaRebuildEnabled }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.replicaFileSyncHttpClientTimeout) }}replica-file-sync-http-client-timeout: {{ .Values.defaultSettings.replicaFileSyncHttpClientTimeout }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.logLevel) }}log-level: {{ .Values.defaultSettings.logLevel }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.backupCompressionMethod) }}backup-compression-method: {{ .Values.defaultSettings.backupCompressionMethod }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.backupConcurrentLimit) }}backup-concurrent-limit: {{ .Values.defaultSettings.backupConcurrentLimit }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.restoreConcurrentLimit) }}restore-concurrent-limit: {{ .Values.defaultSettings.restoreConcurrentLimit }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.v2DataEngine) }}v2-data-engine: {{ .Values.defaultSettings.v2DataEngine }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.offlineReplicaRebuilding) }}offline-replica-rebuilding: {{ .Values.defaultSettings.offlineReplicaRebuilding }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.allowEmptyNodeSelectorVolume) }}allow-empty-node-selector-volume: {{ .Values.defaultSettings.allowEmptyNodeSelectorVolume }}{{ end }}
{{ if not (kindIs "invalid" .Values.defaultSettings.allowEmptyDiskSelectorVolume) }}allow-empty-disk-selector-volume: {{ .Values.defaultSettings.allowEmptyDiskSelectorVolume }}{{ end }}

View File

@ -1,73 +0,0 @@
apiVersion: apps/v1
kind: Deployment
metadata:
labels: {{- include "longhorn.labels" . | nindent 4 }}
app: longhorn-recovery-backend
name: longhorn-recovery-backend
namespace: {{ include "release_namespace" . }}
spec:
replicas: 2
selector:
matchLabels:
app: longhorn-recovery-backend
template:
metadata:
labels: {{- include "longhorn.labels" . | nindent 8 }}
app: longhorn-recovery-backend
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- longhorn-recovery-backend
topologyKey: kubernetes.io/hostname
containers:
- name: longhorn-recovery-backend
image: {{ template "registry_url" . }}{{ .Values.image.longhorn.manager.repository }}:{{ .Values.image.longhorn.manager.tag }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
securityContext:
runAsUser: 2000
command:
- longhorn-manager
- recovery-backend
- --service-account
- longhorn-service-account
ports:
- containerPort: 9600
name: recov-backend
readinessProbe:
tcpSocket:
port: 9600
initialDelaySeconds: 3
periodSeconds: 5
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
{{- if .Values.privateRegistry.registrySecret }}
imagePullSecrets:
- name: {{ .Values.privateRegistry.registrySecret }}
{{- end }}
{{- if .Values.longhornDriver.priorityClass }}
priorityClassName: {{ .Values.longhornDriver.priorityClass | quote}}
{{- end }}
{{- if .Values.longhornDriver.tolerations }}
tolerations:
{{ toYaml .Values.longhornDriver.tolerations | indent 6 }}
{{- end }}
{{- if .Values.longhornDriver.nodeSelector }}
nodeSelector:
{{ toYaml .Values.longhornDriver.nodeSelector | indent 8 }}
{{- end }}
serviceAccountName: longhorn-service-account

View File

@ -1,3 +1,41 @@
{{- if .Values.openshift.enabled }}
{{- if .Values.openshift.ui.route }}
# https://github.com/openshift/oauth-proxy/blob/master/contrib/sidecar.yaml
# Create a proxy service account and ensure it will use the route "proxy"
# Create a secure connection to the proxy via a route
apiVersion: route.openshift.io/v1
kind: Route
metadata:
labels: {{- include "longhorn.labels" . | nindent 4 }}
app: longhorn-ui
name: {{ .Values.openshift.ui.route }}
namespace: {{ include "release_namespace" . }}
spec:
to:
kind: Service
name: longhorn-ui
tls:
termination: reencrypt
---
apiVersion: v1
kind: Service
metadata:
labels: {{- include "longhorn.labels" . | nindent 4 }}
app: longhorn-ui
name: longhorn-ui
namespace: {{ include "release_namespace" . }}
annotations:
service.alpha.openshift.io/serving-cert-secret-name: longhorn-ui-tls
spec:
ports:
- name: longhorn-ui
port: {{ .Values.openshift.ui.port | default 443 }}
targetPort: {{ .Values.openshift.ui.proxy | default 8443 }}
selector:
app: longhorn-ui
---
{{- end }}
{{- end }}
apiVersion: apps/v1 apiVersion: apps/v1
kind: Deployment kind: Deployment
metadata: metadata:
@ -15,7 +53,42 @@ spec:
labels: {{- include "longhorn.labels" . | nindent 8 }} labels: {{- include "longhorn.labels" . | nindent 8 }}
app: longhorn-ui app: longhorn-ui
spec: spec:
serviceAccountName: longhorn-ui-service-account
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- longhorn-ui
topologyKey: kubernetes.io/hostname
containers: containers:
{{- if .Values.openshift.enabled }}
{{- if .Values.openshift.ui.route }}
- name: oauth-proxy
image: {{ template "registry_url" . }}{{ .Values.image.openshift.oauthProxy.repository }}:{{ .Values.image.openshift.oauthProxy.tag }}
imagePullPolicy: IfNotPresent
ports:
- containerPort: {{ .Values.openshift.ui.proxy | default 8443 }}
name: public
args:
- --https-address=:{{ .Values.openshift.ui.proxy | default 8443 }}
- --provider=openshift
- --openshift-service-account=longhorn-ui-service-account
- --upstream=http://localhost:8000
- --tls-cert=/etc/tls/private/tls.crt
- --tls-key=/etc/tls/private/tls.key
- --cookie-secret=SECRET
- --openshift-sar={"namespace":"{{ include "release_namespace" . }}","group":"longhorn.io","resource":"setting","verb":"delete"}
volumeMounts:
- mountPath: /etc/tls/private
name: longhorn-ui-tls
{{- end }}
{{- end }}
- name: longhorn-ui - name: longhorn-ui
image: {{ template "registry_url" . }}{{ .Values.image.longhorn.ui.repository }}:{{ .Values.image.longhorn.ui.tag }} image: {{ template "registry_url" . }}{{ .Values.image.longhorn.ui.repository }}:{{ .Values.image.longhorn.ui.tag }}
imagePullPolicy: {{ .Values.image.pullPolicy }} imagePullPolicy: {{ .Values.image.pullPolicy }}
@ -35,6 +108,13 @@ spec:
- name: LONGHORN_UI_PORT - name: LONGHORN_UI_PORT
value: "8000" value: "8000"
volumes: volumes:
{{- if .Values.openshift.enabled }}
{{- if .Values.openshift.ui.route }}
- name: longhorn-ui-tls
secret:
secretName: longhorn-ui-tls
{{- end }}
{{- end }}
- emptyDir: {} - emptyDir: {}
name: nginx-cache name: nginx-cache
- emptyDir: {} - emptyDir: {}

View File

@ -1,166 +0,0 @@
apiVersion: apps/v1
kind: Deployment
metadata:
labels: {{- include "longhorn.labels" . | nindent 4 }}
app: longhorn-conversion-webhook
name: longhorn-conversion-webhook
namespace: {{ include "release_namespace" . }}
spec:
replicas: 2
selector:
matchLabels:
app: longhorn-conversion-webhook
template:
metadata:
labels: {{- include "longhorn.labels" . | nindent 8 }}
app: longhorn-conversion-webhook
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- longhorn-conversion-webhook
topologyKey: kubernetes.io/hostname
containers:
- name: longhorn-conversion-webhook
image: {{ template "registry_url" . }}{{ .Values.image.longhorn.manager.repository }}:{{ .Values.image.longhorn.manager.tag }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
securityContext:
runAsUser: 2000
command:
- longhorn-manager
- conversion-webhook
- --service-account
- longhorn-service-account
ports:
- containerPort: 9443
name: conversion-wh
readinessProbe:
tcpSocket:
port: 9443
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
{{- if .Values.privateRegistry.registrySecret }}
imagePullSecrets:
- name: {{ .Values.privateRegistry.registrySecret }}
{{- end }}
{{- if .Values.longhornDriver.priorityClass }}
priorityClassName: {{ .Values.longhornDriver.priorityClass | quote }}
{{- end }}
{{- if or .Values.longhornDriver.tolerations .Values.global.cattle.windowsCluster.enabled }}
tolerations:
{{- if and .Values.global.cattle.windowsCluster.enabled .Values.global.cattle.windowsCluster.tolerations }}
{{ toYaml .Values.global.cattle.windowsCluster.tolerations | indent 6 }}
{{- end }}
{{- if .Values.longhornDriver.tolerations }}
{{ toYaml .Values.longhornDriver.tolerations | indent 6 }}
{{- end }}
{{- end }}
{{- if or .Values.longhornDriver.nodeSelector .Values.global.cattle.windowsCluster.enabled }}
nodeSelector:
{{- if and .Values.global.cattle.windowsCluster.enabled .Values.global.cattle.windowsCluster.nodeSelector }}
{{ toYaml .Values.global.cattle.windowsCluster.nodeSelector | indent 8 }}
{{- end }}
{{- if .Values.longhornDriver.nodeSelector }}
{{ toYaml .Values.longhornDriver.nodeSelector | indent 8 }}
{{- end }}
{{- end }}
serviceAccountName: longhorn-service-account
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels: {{- include "longhorn.labels" . | nindent 4 }}
app: longhorn-admission-webhook
name: longhorn-admission-webhook
namespace: {{ include "release_namespace" . }}
spec:
replicas: 2
selector:
matchLabels:
app: longhorn-admission-webhook
template:
metadata:
labels: {{- include "longhorn.labels" . | nindent 8 }}
app: longhorn-admission-webhook
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- longhorn-admission-webhook
topologyKey: kubernetes.io/hostname
initContainers:
- name: wait-longhorn-conversion-webhook
image: {{ template "registry_url" . }}{{ .Values.image.longhorn.manager.repository }}:{{ .Values.image.longhorn.manager.tag }}
command: ['sh', '-c', 'while [ $(curl -m 1 -s -o /dev/null -w "%{http_code}" -k https://longhorn-conversion-webhook:9443/v1/healthz) != "200" ]; do echo waiting; sleep 2; done']
imagePullPolicy: {{ .Values.image.pullPolicy }}
securityContext:
runAsUser: 2000
containers:
- name: longhorn-admission-webhook
image: {{ template "registry_url" . }}{{ .Values.image.longhorn.manager.repository }}:{{ .Values.image.longhorn.manager.tag }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
securityContext:
runAsUser: 2000
command:
- longhorn-manager
- admission-webhook
- --service-account
- longhorn-service-account
ports:
- containerPort: 9443
name: admission-wh
readinessProbe:
tcpSocket:
port: 9443
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
{{- if .Values.privateRegistry.registrySecret }}
imagePullSecrets:
- name: {{ .Values.privateRegistry.registrySecret }}
{{- end }}
{{- if .Values.longhornDriver.priorityClass }}
priorityClassName: {{ .Values.longhornDriver.priorityClass | quote }}
{{- end }}
{{- if or .Values.longhornDriver.tolerations .Values.global.cattle.windowsCluster.enabled }}
tolerations:
{{- if and .Values.global.cattle.windowsCluster.enabled .Values.global.cattle.windowsCluster.tolerations }}
{{ toYaml .Values.global.cattle.windowsCluster.tolerations | indent 6 }}
{{- end }}
{{- if .Values.longhornDriver.tolerations }}
{{ toYaml .Values.longhornDriver.tolerations | indent 6 }}
{{- end }}
{{- end }}
{{- if or .Values.longhornDriver.nodeSelector .Values.global.cattle.windowsCluster.enabled }}
nodeSelector:
{{- if and .Values.global.cattle.windowsCluster.enabled .Values.global.cattle.windowsCluster.nodeSelector }}
{{ toYaml .Values.global.cattle.windowsCluster.nodeSelector | indent 8 }}
{{- end }}
{{- if or .Values.longhornDriver.nodeSelector }}
{{ toYaml .Values.longhornDriver.nodeSelector | indent 8 }}
{{- end }}
{{- end }}
serviceAccountName: longhorn-service-account

View File

@ -0,0 +1,27 @@
{{- if .Values.networkPolicies.enabled }}
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backing-image-data-source
namespace: longhorn-system
spec:
podSelector:
matchLabels:
longhorn.io/component: backing-image-data-source
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: longhorn-manager
- podSelector:
matchLabels:
longhorn.io/component: instance-manager
- podSelector:
matchLabels:
longhorn.io/component: backing-image-manager
- podSelector:
matchLabels:
longhorn.io/component: backing-image-data-source
{{- end }}

View File

@ -0,0 +1,27 @@
{{- if .Values.networkPolicies.enabled }}
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backing-image-manager
namespace: longhorn-system
spec:
podSelector:
matchLabels:
longhorn.io/component: backing-image-manager
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: longhorn-manager
- podSelector:
matchLabels:
longhorn.io/component: instance-manager
- podSelector:
matchLabels:
longhorn.io/component: backing-image-manager
- podSelector:
matchLabels:
longhorn.io/component: backing-image-data-source
{{- end }}

View File

@ -0,0 +1,27 @@
{{- if .Values.networkPolicies.enabled }}
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: instance-manager
namespace: longhorn-system
spec:
podSelector:
matchLabels:
longhorn.io/component: instance-manager
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: longhorn-manager
- podSelector:
matchLabels:
longhorn.io/component: instance-manager
- podSelector:
matchLabels:
longhorn.io/component: backing-image-manager
- podSelector:
matchLabels:
longhorn.io/component: backing-image-data-source
{{- end }}

View File

@ -0,0 +1,35 @@
{{- if .Values.networkPolicies.enabled }}
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: longhorn-manager
namespace: longhorn-system
spec:
podSelector:
matchLabels:
app: longhorn-manager
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: longhorn-manager
- podSelector:
matchLabels:
app: longhorn-ui
- podSelector:
matchLabels:
app: longhorn-csi-plugin
- podSelector:
matchLabels:
longhorn.io/managed-by: longhorn-manager
matchExpressions:
- { key: recurring-job.longhorn.io, operator: Exists }
- podSelector:
matchExpressions:
- { key: longhorn.io/job-task, operator: Exists }
- podSelector:
matchLabels:
app: longhorn-driver-deployer
{{- end }}

View File

@ -0,0 +1,17 @@
{{- if .Values.networkPolicies.enabled }}
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: longhorn-recovery-backend
namespace: longhorn-system
spec:
podSelector:
matchLabels:
app: longhorn-manager
policyTypes:
- Ingress
ingress:
- ports:
- protocol: TCP
port: 9503
{{- end }}

View File

@ -0,0 +1,46 @@
{{- if and .Values.networkPolicies.enabled .Values.ingress.enabled (not (eq .Values.networkPolicies.type "")) }}
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: longhorn-ui-frontend
namespace: longhorn-system
spec:
podSelector:
matchLabels:
app: longhorn-ui
policyTypes:
- Ingress
ingress:
- from:
{{- if eq .Values.networkPolicies.type "rke1"}}
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: ingress-nginx
podSelector:
matchLabels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/name: ingress-nginx
{{- else if eq .Values.networkPolicies.type "rke2" }}
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: rke2-ingress-nginx
app.kubernetes.io/name: rke2-ingress-nginx
{{- else if eq .Values.networkPolicies.type "k3s" }}
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
app.kubernetes.io/name: traefik
ports:
- port: 8000
protocol: TCP
- port: 80
protocol: TCP
{{- end }}
{{- end }}

View File

@ -0,0 +1,33 @@
{{- if .Values.networkPolicies.enabled }}
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: longhorn-conversion-webhook
namespace: longhorn-system
spec:
podSelector:
matchLabels:
app: longhorn-manager
policyTypes:
- Ingress
ingress:
- ports:
- protocol: TCP
port: 9501
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: longhorn-admission-webhook
namespace: longhorn-system
spec:
podSelector:
matchLabels:
app: longhorn-manager
policyTypes:
- Ingress
ingress:
- ports:
- protocol: TCP
port: 9502
{{- end }}

View File

@ -19,8 +19,6 @@ spec:
- name: longhorn-post-upgrade - name: longhorn-post-upgrade
image: {{ template "registry_url" . }}{{ .Values.image.longhorn.manager.repository }}:{{ .Values.image.longhorn.manager.tag }} image: {{ template "registry_url" . }}{{ .Values.image.longhorn.manager.repository }}:{{ .Values.image.longhorn.manager.tag }}
imagePullPolicy: {{ .Values.image.pullPolicy }} imagePullPolicy: {{ .Values.image.pullPolicy }}
securityContext:
privileged: true
command: command:
- longhorn-manager - longhorn-manager
- post-upgrade - post-upgrade

View File

@ -0,0 +1,58 @@
{{- if .Values.helmPreUpgradeCheckerJob.enabled }}
apiVersion: batch/v1
kind: Job
metadata:
annotations:
"helm.sh/hook": pre-upgrade
"helm.sh/hook-delete-policy": hook-succeeded,before-hook-creation,hook-failed
name: longhorn-pre-upgrade
namespace: {{ include "release_namespace" . }}
labels: {{- include "longhorn.labels" . | nindent 4 }}
spec:
activeDeadlineSeconds: 900
backoffLimit: 1
template:
metadata:
name: longhorn-pre-upgrade
labels: {{- include "longhorn.labels" . | nindent 8 }}
spec:
containers:
- name: longhorn-pre-upgrade
image: {{ template "registry_url" . }}{{ .Values.image.longhorn.manager.repository }}:{{ .Values.image.longhorn.manager.tag }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
command:
- longhorn-manager
- pre-upgrade
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
restartPolicy: OnFailure
{{- if .Values.privateRegistry.registrySecret }}
imagePullSecrets:
- name: {{ .Values.privateRegistry.registrySecret }}
{{- end }}
{{- if .Values.longhornManager.priorityClass }}
priorityClassName: {{ .Values.longhornManager.priorityClass | quote }}
{{- end }}
serviceAccountName: longhorn-service-account
{{- if or .Values.longhornManager.tolerations .Values.global.cattle.windowsCluster.enabled }}
tolerations:
{{- if and .Values.global.cattle.windowsCluster.enabled .Values.global.cattle.windowsCluster.tolerations }}
{{ toYaml .Values.global.cattle.windowsCluster.tolerations | indent 6 }}
{{- end }}
{{- if .Values.longhornManager.tolerations }}
{{ toYaml .Values.longhornManager.tolerations | indent 6 }}
{{- end }}
{{- end }}
{{- if or .Values.longhornManager.nodeSelector .Values.global.cattle.windowsCluster.enabled }}
nodeSelector:
{{- if and .Values.global.cattle.windowsCluster.enabled .Values.global.cattle.windowsCluster.nodeSelector }}
{{ toYaml .Values.global.cattle.windowsCluster.nodeSelector | indent 8 }}
{{- end }}
{{- if .Values.longhornManager.nodeSelector }}
{{ toYaml .Values.longhornManager.nodeSelector | indent 8 }}
{{- end }}
{{- end }}
{{- end }}

View File

@ -11,6 +11,25 @@ metadata:
--- ---
apiVersion: v1 apiVersion: v1
kind: ServiceAccount kind: ServiceAccount
metadata:
name: longhorn-ui-service-account
namespace: {{ include "release_namespace" . }}
labels: {{- include "longhorn.labels" . | nindent 4 }}
{{- with .Values.serviceAccount.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
{{- if .Values.openshift.enabled }}
{{- if .Values.openshift.ui.route }}
{{- if not .Values.serviceAccount.annotations }}
annotations:
{{- end }}
serviceaccounts.openshift.io/oauth-redirectreference.primary: '{"kind":"OAuthRedirectReference","apiVersion":"v1","reference":{"kind":"Route","name":"longhorn-ui"}}'
{{- end }}
{{- end }}
---
apiVersion: v1
kind: ServiceAccount
metadata: metadata:
name: longhorn-support-bundle name: longhorn-support-bundle
namespace: {{ include "release_namespace" . }} namespace: {{ include "release_namespace" . }}

View File

@ -9,10 +9,10 @@ spec:
type: ClusterIP type: ClusterIP
sessionAffinity: ClientIP sessionAffinity: ClientIP
selector: selector:
app: longhorn-conversion-webhook app: longhorn-manager
ports: ports:
- name: conversion-webhook - name: conversion-webhook
port: 9443 port: 9501
targetPort: conversion-wh targetPort: conversion-wh
--- ---
apiVersion: v1 apiVersion: v1
@ -26,10 +26,10 @@ spec:
type: ClusterIP type: ClusterIP
sessionAffinity: ClientIP sessionAffinity: ClientIP
selector: selector:
app: longhorn-admission-webhook app: longhorn-manager
ports: ports:
- name: admission-webhook - name: admission-webhook
port: 9443 port: 9502
targetPort: admission-wh targetPort: admission-wh
--- ---
apiVersion: v1 apiVersion: v1
@ -43,10 +43,10 @@ spec:
type: ClusterIP type: ClusterIP
sessionAffinity: ClientIP sessionAffinity: ClientIP
selector: selector:
app: longhorn-recovery-backend app: longhorn-manager
ports: ports:
- name: recovery-backend - name: recovery-backend
port: 9600 port: 9503
targetPort: recov-backend targetPort: recov-backend
--- ---
apiVersion: v1 apiVersion: v1

View File

@ -19,8 +19,6 @@ spec:
- name: longhorn-uninstall - name: longhorn-uninstall
image: {{ template "registry_url" . }}{{ .Values.image.longhorn.manager.repository }}:{{ .Values.image.longhorn.manager.tag }} image: {{ template "registry_url" . }}{{ .Values.image.longhorn.manager.repository }}:{{ .Values.image.longhorn.manager.tag }}
imagePullPolicy: {{ .Values.image.pullPolicy }} imagePullPolicy: {{ .Values.image.pullPolicy }}
securityContext:
privileged: true
command: command:
- longhorn-manager - longhorn-manager
- uninstall - uninstall

View File

@ -0,0 +1,7 @@
#{{- if gt (len (lookup "rbac.authorization.k8s.io/v1" "ClusterRole" "" "")) 0 -}}
#{{- if .Values.enablePSP }}
#{{- if not (.Capabilities.APIVersions.Has "policy/v1beta1/PodSecurityPolicy") }}
#{{- fail "The target cluster does not have the PodSecurityPolicy API resource. Please disable PSPs in this chart before proceeding." -}}
#{{- end }}
#{{- end }}
#{{- end }}

View File

@ -3,172 +3,350 @@
# Declare variables to be passed into your templates. # Declare variables to be passed into your templates.
global: global:
cattle: cattle:
# -- System default registry
systemDefaultRegistry: "" systemDefaultRegistry: ""
windowsCluster: windowsCluster:
# Enable this to allow Longhorn to run on the Rancher deployed Windows cluster # -- Enable this to allow Longhorn to run on the Rancher deployed Windows cluster
enabled: false enabled: false
# Tolerate Linux node taint # -- Tolerate Linux nodes to run Longhorn user deployed components
tolerations: tolerations:
- key: "cattle.io/os" - key: "cattle.io/os"
value: "linux" value: "linux"
effect: "NoSchedule" effect: "NoSchedule"
operator: "Equal" operator: "Equal"
# Select Linux nodes # -- Select Linux nodes to run Longhorn user deployed components
nodeSelector: nodeSelector:
kubernetes.io/os: "linux" kubernetes.io/os: "linux"
# Recognize toleration and node selector for Longhorn run-time created components
defaultSetting: defaultSetting:
# -- Toleration for Longhorn system managed components
taintToleration: cattle.io/os=linux:NoSchedule taintToleration: cattle.io/os=linux:NoSchedule
# -- Node selector for Longhorn system managed components
systemManagedComponentsNodeSelector: kubernetes.io/os:linux systemManagedComponentsNodeSelector: kubernetes.io/os:linux
networkPolicies:
# -- Enable NetworkPolicies to limit access to the Longhorn pods
enabled: false
# -- Create the policy based on your distribution to allow access for the ingress. Options: `k3s`, `rke2`, `rke1`
type: "k3s"
image: image:
longhorn: longhorn:
engine: engine:
# -- Specify Longhorn engine image repository
repository: longhornio/longhorn-engine repository: longhornio/longhorn-engine
# -- Specify Longhorn engine image tag
tag: master-head tag: master-head
manager: manager:
# -- Specify Longhorn manager image repository
repository: longhornio/longhorn-manager repository: longhornio/longhorn-manager
# -- Specify Longhorn manager image tag
tag: master-head tag: master-head
ui: ui:
# -- Specify Longhorn ui image repository
repository: longhornio/longhorn-ui repository: longhornio/longhorn-ui
# -- Specify Longhorn ui image tag
tag: master-head tag: master-head
instanceManager: instanceManager:
# -- Specify Longhorn instance manager image repository
repository: longhornio/longhorn-instance-manager repository: longhornio/longhorn-instance-manager
# -- Specify Longhorn instance manager image tag
tag: master-head tag: master-head
shareManager: shareManager:
# -- Specify Longhorn share manager image repository
repository: longhornio/longhorn-share-manager repository: longhornio/longhorn-share-manager
# -- Specify Longhorn share manager image tag
tag: master-head tag: master-head
backingImageManager: backingImageManager:
# -- Specify Longhorn backing image manager image repository
repository: longhornio/backing-image-manager repository: longhornio/backing-image-manager
# -- Specify Longhorn backing image manager image tag
tag: master-head tag: master-head
supportBundleKit: supportBundleKit:
# -- Specify Longhorn support bundle manager image repository
repository: longhornio/support-bundle-kit repository: longhornio/support-bundle-kit
tag: v0.0.16 # -- Specify Longhorn support bundle manager image tag
tag: v0.0.27
csi: csi:
attacher: attacher:
# -- Specify CSI attacher image repository. Leave blank to autodetect
repository: longhornio/csi-attacher repository: longhornio/csi-attacher
tag: v3.4.0 # -- Specify CSI attacher image tag. Leave blank to autodetect
tag: v4.2.0
provisioner: provisioner:
# -- Specify CSI provisioner image repository. Leave blank to autodetect
repository: longhornio/csi-provisioner repository: longhornio/csi-provisioner
tag: v2.1.2 # -- Specify CSI provisioner image tag. Leave blank to autodetect
tag: v3.4.1
nodeDriverRegistrar: nodeDriverRegistrar:
# -- Specify CSI node driver registrar image repository. Leave blank to autodetect
repository: longhornio/csi-node-driver-registrar repository: longhornio/csi-node-driver-registrar
tag: v2.5.0 # -- Specify CSI node driver registrar image tag. Leave blank to autodetect
tag: v2.7.0
resizer: resizer:
# -- Specify CSI driver resizer image repository. Leave blank to autodetect
repository: longhornio/csi-resizer repository: longhornio/csi-resizer
tag: v1.3.0 # -- Specify CSI driver resizer image tag. Leave blank to autodetect
tag: v1.7.0
snapshotter: snapshotter:
# -- Specify CSI driver snapshotter image repository. Leave blank to autodetect
repository: longhornio/csi-snapshotter repository: longhornio/csi-snapshotter
tag: v5.0.1 # -- Specify CSI driver snapshotter image tag. Leave blank to autodetect.
tag: v6.2.1
livenessProbe: livenessProbe:
# -- Specify CSI liveness probe image repository. Leave blank to autodetect
repository: longhornio/livenessprobe repository: longhornio/livenessprobe
tag: v2.8.0 # -- Specify CSI liveness probe image tag. Leave blank to autodetect
tag: v2.9.0
openshift:
oauthProxy:
# -- For openshift user. Specify oauth proxy image repository
repository: quay.io/openshift/origin-oauth-proxy
# -- For openshift user. Specify oauth proxy image tag. Note: Use your OCP/OKD 4.X Version, Current Stable is 4.14
tag: 4.14
# -- Image pull policy which applies to all user deployed Longhorn Components. e.g, Longhorn manager, Longhorn driver, Longhorn UI
pullPolicy: IfNotPresent pullPolicy: IfNotPresent
service: service:
ui: ui:
# -- Define Longhorn UI service type. Options: `ClusterIP`, `NodePort`, `LoadBalancer`, `Rancher-Proxy`
type: ClusterIP type: ClusterIP
# -- NodePort port number (to set explicitly, choose port between 30000-32767)
nodePort: null nodePort: null
manager: manager:
# -- Define Longhorn manager service type.
type: ClusterIP type: ClusterIP
# -- NodePort port number (to set explicitly, choose port between 30000-32767)
nodePort: "" nodePort: ""
loadBalancerIP: ""
loadBalancerSourceRanges: ""
persistence: persistence:
# -- Set Longhorn StorageClass as default
defaultClass: true defaultClass: true
# -- Set filesystem type for Longhorn StorageClass
defaultFsType: ext4 defaultFsType: ext4
# -- Set mkfs options for Longhorn StorageClass
defaultMkfsParams: "" defaultMkfsParams: ""
# -- Set replica count for Longhorn StorageClass
defaultClassReplicaCount: 3 defaultClassReplicaCount: 3
defaultDataLocality: disabled # best-effort otherwise # -- Set data locality for Longhorn StorageClass. Options: `disabled`, `best-effort`
defaultDataLocality: disabled
# -- Define reclaim policy. Options: `Retain`, `Delete`
reclaimPolicy: Delete reclaimPolicy: Delete
# -- Set volume migratable for Longhorn StorageClass
migratable: false migratable: false
recurringJobSelector: recurringJobSelector:
# -- Enable recurring job selector for Longhorn StorageClass
enable: false enable: false
# -- Recurring job selector list for Longhorn StorageClass. Please be careful of quotes of input. e.g., `[{"name":"backup", "isGroup":true}]`
jobList: [] jobList: []
backingImage: backingImage:
# -- Set backing image for Longhorn StorageClass
enable: false enable: false
# -- Specify a backing image that will be used by Longhorn volumes in Longhorn StorageClass. If not exists, the backing image data source type and backing image data source parameters should be specified so that Longhorn will create the backing image before using it
name: ~ name: ~
# -- Specify the data source type for the backing image used in Longhorn StorageClass.
# If the backing image does not exists, Longhorn will use this field to create a backing image. Otherwise, Longhorn will use it to verify the selected backing image.
dataSourceType: ~ dataSourceType: ~
# -- Specify the data source parameters for the backing image used in Longhorn StorageClass. This option accepts a json string of a map. e.g., `'{\"url\":\"https://backing-image-example.s3-region.amazonaws.com/test-backing-image\"}'`.
dataSourceParameters: ~ dataSourceParameters: ~
# -- Specify the expected SHA512 checksum of the selected backing image in Longhorn StorageClass
expectedChecksum: ~ expectedChecksum: ~
defaultNodeSelector: defaultNodeSelector:
enable: false # disable by default # -- Enable Node selector for Longhorn StorageClass
selector: [] enable: false
removeSnapshotsDuringFilesystemTrim: ignored # "enabled" or "disabled" otherwise # -- This selector enables only certain nodes having these tags to be used for the volume. e.g. `"storage,fast"`
selector: ""
# -- Allow automatically removing snapshots during filesystem trim for Longhorn StorageClass. Options: `ignored`, `enabled`, `disabled`
removeSnapshotsDuringFilesystemTrim: ignored
helmPreUpgradeCheckerJob:
enabled: true
csi: csi:
# -- Specify kubelet root-dir. Leave blank to autodetect
kubeletRootDir: ~ kubeletRootDir: ~
# -- Specify replica count of CSI Attacher. Leave blank to use default count: 3
attacherReplicaCount: ~ attacherReplicaCount: ~
# -- Specify replica count of CSI Provisioner. Leave blank to use default count: 3
provisionerReplicaCount: ~ provisionerReplicaCount: ~
# -- Specify replica count of CSI Resizer. Leave blank to use default count: 3
resizerReplicaCount: ~ resizerReplicaCount: ~
# -- Specify replica count of CSI Snapshotter. Leave blank to use default count: 3
snapshotterReplicaCount: ~ snapshotterReplicaCount: ~
defaultSettings: defaultSettings:
# -- The endpoint used to access the backupstore. Available: NFS, CIFS, AWS, GCP, AZURE.
backupTarget: ~ backupTarget: ~
# -- The name of the Kubernetes secret associated with the backup target.
backupTargetCredentialSecret: ~ backupTargetCredentialSecret: ~
# -- If this setting is enabled, Longhorn will automatically attaches the volume and takes snapshot/backup
# when it is the time to do recurring snapshot/backup.
allowRecurringJobWhileVolumeDetached: ~ allowRecurringJobWhileVolumeDetached: ~
# -- Create default Disk automatically only on Nodes with the label "node.longhorn.io/create-default-disk=true" if no other disks exist.
# If disabled, the default disk will be created on all new nodes when each node is first added.
createDefaultDiskLabeledNodes: ~ createDefaultDiskLabeledNodes: ~
# -- Default path to use for storing data on a host. By default "/var/lib/longhorn/"
defaultDataPath: ~ defaultDataPath: ~
# -- Longhorn volume has data locality if there is a local replica of the volume on the same node as the pod which is using the volume.
defaultDataLocality: ~ defaultDataLocality: ~
# -- Allow scheduling on nodes with existing healthy replicas of the same volume. By default false.
replicaSoftAntiAffinity: ~ replicaSoftAntiAffinity: ~
# -- Enable this setting automatically rebalances replicas when discovered an available node.
replicaAutoBalance: ~ replicaAutoBalance: ~
# -- The over-provisioning percentage defines how much storage can be allocated relative to the hard drive's capacity. By default 200.
storageOverProvisioningPercentage: ~ storageOverProvisioningPercentage: ~
# -- If the minimum available disk capacity exceeds the actual percentage of available disk capacity,
# the disk becomes unschedulable until more space is freed up. By default 25.
storageMinimalAvailablePercentage: ~ storageMinimalAvailablePercentage: ~
# -- The reserved percentage specifies the percentage of disk space that will not be allocated to the default disk on each new Longhorn node.
storageReservedPercentageForDefaultDisk: ~
# -- Upgrade Checker will check for new Longhorn version periodically.
# When there is a new version available, a notification will appear in the UI. By default true.
upgradeChecker: ~ upgradeChecker: ~
# -- The default number of replicas when a volume is created from the Longhorn UI.
# For Kubernetes configuration, update the `numberOfReplicas` in the StorageClass. By default 3.
defaultReplicaCount: ~ defaultReplicaCount: ~
# -- The 'storageClassName' is given to PVs and PVCs that are created for an existing Longhorn volume. The StorageClass name can also be used as a label,
# so it is possible to use a Longhorn StorageClass to bind a workload to an existing PV without creating a Kubernetes StorageClass object.
# By default 'longhorn-static'.
defaultLonghornStaticStorageClass: ~ defaultLonghornStaticStorageClass: ~
# -- In seconds. The backupstore poll interval determines how often Longhorn checks the backupstore for new backups.
# Set to 0 to disable the polling. By default 300.
backupstorePollInterval: ~ backupstorePollInterval: ~
# -- In minutes. This setting determines how long Longhorn will keep the backup resource that was failed. Set to 0 to disable the auto-deletion.
failedBackupTTL: ~ failedBackupTTL: ~
# -- Restore recurring jobs from the backup volume on the backup target and create recurring jobs if not exist during a backup restoration.
restoreVolumeRecurringJobs: ~ restoreVolumeRecurringJobs: ~
# -- This setting specifies how many successful backup or snapshot job histories should be retained. History will not be retained if the value is 0.
recurringSuccessfulJobsHistoryLimit: ~ recurringSuccessfulJobsHistoryLimit: ~
# -- This setting specifies how many failed backup or snapshot job histories should be retained. History will not be retained if the value is 0.
recurringFailedJobsHistoryLimit: ~ recurringFailedJobsHistoryLimit: ~
# -- This setting specifies how many failed support bundles can exist in the cluster.
# Set this value to **0** to have Longhorn automatically purge all failed support bundles.
supportBundleFailedHistoryLimit: ~ supportBundleFailedHistoryLimit: ~
# -- taintToleration for longhorn system components
taintToleration: ~ taintToleration: ~
# -- nodeSelector for longhorn system components
systemManagedComponentsNodeSelector: ~ systemManagedComponentsNodeSelector: ~
# -- priorityClass for longhorn system componentss
priorityClass: ~ priorityClass: ~
# -- If enabled, volumes will be automatically salvaged when all the replicas become faulty e.g. due to network disconnection.
# Longhorn will try to figure out which replica(s) are usable, then use them for the volume. By default true.
autoSalvage: ~ autoSalvage: ~
# -- If enabled, Longhorn will automatically delete the workload pod that is managed by a controller (e.g. deployment, statefulset, daemonset, etc...)
# when Longhorn volume is detached unexpectedly (e.g. during Kubernetes upgrade, Docker reboot, or network disconnect).
# By deleting the pod, its controller restarts the pod and Kubernetes handles volume reattachment and remount.
autoDeletePodWhenVolumeDetachedUnexpectedly: ~ autoDeletePodWhenVolumeDetachedUnexpectedly: ~
# -- Disable Longhorn manager to schedule replica on Kubernetes cordoned node. By default true.
disableSchedulingOnCordonedNode: ~ disableSchedulingOnCordonedNode: ~
# -- Allow scheduling new Replicas of Volume to the Nodes in the same Zone as existing healthy Replicas.
# Nodes don't belong to any Zone will be treated as in the same Zone.
# Notice that Longhorn relies on label `topology.kubernetes.io/zone=<Zone name of the node>` in the Kubernetes node object to identify the zone.
# By default true.
replicaZoneSoftAntiAffinity: ~ replicaZoneSoftAntiAffinity: ~
# -- Allow scheduling on disks with existing healthy replicas of the same volume. By default true.
replicaDiskSoftAntiAffinity: ~
# -- Defines the Longhorn action when a Volume is stuck with a StatefulSet/Deployment Pod on a node that is down.
nodeDownPodDeletionPolicy: ~ nodeDownPodDeletionPolicy: ~
allowNodeDrainWithLastHealthyReplica: ~ # -- Define the policy to use when a node with the last healthy replica of a volume is drained.
mkfsExt4Parameters: ~ nodeDrainPolicy: ~
disableReplicaRebuild: ~ # -- In seconds. The interval determines how long Longhorn will wait at least in order to reuse the existing data on a failed replica
# rather than directly creating a new replica for a degraded volume.
replicaReplenishmentWaitInterval: ~ replicaReplenishmentWaitInterval: ~
# -- This setting controls how many replicas on a node can be rebuilt simultaneously.
concurrentReplicaRebuildPerNodeLimit: ~ concurrentReplicaRebuildPerNodeLimit: ~
# -- This setting controls how many volumes on a node can restore the backup concurrently. Set the value to **0** to disable backup restore.
concurrentVolumeBackupRestorePerNodeLimit: ~ concurrentVolumeBackupRestorePerNodeLimit: ~
# -- This setting is only for volumes created by UI.
# By default, this is false meaning there will be a reivision counter file to track every write to the volume.
# During salvage recovering Longhorn will pick the replica with largest reivision counter as candidate to recover the whole volume.
# If revision counter is disabled, Longhorn will not track every write to the volume.
# During the salvage recovering, Longhorn will use the 'volume-head-xxx.img' file last modification time and
# file size to pick the replica candidate to recover the whole volume.
disableRevisionCounter: ~ disableRevisionCounter: ~
# -- This setting defines the Image Pull Policy of Longhorn system managed pod.
# e.g. instance manager, engine image, CSI driver, etc.
# The new Image Pull Policy will only apply after the system managed pods restart.
systemManagedPodsImagePullPolicy: ~ systemManagedPodsImagePullPolicy: ~
# -- This setting allows user to create and attach a volume that doesn't have all the replicas scheduled at the time of creation.
allowVolumeCreationWithDegradedAvailability: ~ allowVolumeCreationWithDegradedAvailability: ~
# -- This setting enables Longhorn to automatically cleanup the system generated snapshot after replica rebuild is done.
autoCleanupSystemGeneratedSnapshot: ~ autoCleanupSystemGeneratedSnapshot: ~
# -- This setting controls how Longhorn automatically upgrades volumes' engines to the new default engine image after upgrading Longhorn manager.
# The value of this setting specifies the maximum number of engines per node that are allowed to upgrade to the default engine image at the same time.
# If the value is 0, Longhorn will not automatically upgrade volumes' engines to default version.
concurrentAutomaticEngineUpgradePerNodeLimit: ~ concurrentAutomaticEngineUpgradePerNodeLimit: ~
# -- This interval in minutes determines how long Longhorn will wait before cleaning up the backing image file when there is no replica in the disk using it.
backingImageCleanupWaitInterval: ~ backingImageCleanupWaitInterval: ~
# -- This interval in seconds determines how long Longhorn will wait before re-downloading the backing image file
# when all disk files of this backing image become failed or unknown.
backingImageRecoveryWaitInterval: ~ backingImageRecoveryWaitInterval: ~
guaranteedEngineManagerCPU: ~ # -- This integer value indicates how many percentage of the total allocatable CPU on each node will be reserved for each instance manager Pod.
guaranteedReplicaManagerCPU: ~ # You can leave it with the default value, which is 12%.
guaranteedInstanceManagerCPU: ~
# -- Enabling this setting will notify Longhorn that the cluster is using Kubernetes Cluster Autoscaler.
kubernetesClusterAutoscalerEnabled: ~ kubernetesClusterAutoscalerEnabled: ~
# -- This setting allows Longhorn to delete the orphan resource and its corresponding orphaned data automatically like stale replicas.
# Orphan resources on down or unknown nodes will not be cleaned up automatically.
orphanAutoDeletion: ~ orphanAutoDeletion: ~
# -- Longhorn uses the storage network for in-cluster data traffic. Leave this blank to use the Kubernetes cluster network.
storageNetwork: ~ storageNetwork: ~
# -- This flag is designed to prevent Longhorn from being accidentally uninstalled which will lead to data lost.
deletingConfirmationFlag: ~ deletingConfirmationFlag: ~
# -- In seconds. The setting specifies the timeout between the engine and replica(s), and the value should be between 8 to 30 seconds.
# The default value is 8 seconds.
engineReplicaTimeout: ~ engineReplicaTimeout: ~
# -- This setting allows users to enable or disable snapshot hashing and data integrity checking.
snapshotDataIntegrity: ~ snapshotDataIntegrity: ~
# -- Hashing snapshot disk files impacts the performance of the system.
# The immediate snapshot hashing and checking can be disabled to minimize the impact after creating a snapshot.
snapshotDataIntegrityImmediateCheckAfterSnapshotCreation: ~ snapshotDataIntegrityImmediateCheckAfterSnapshotCreation: ~
# -- Unix-cron string format. The setting specifies when Longhorn checks the data integrity of snapshot disk files.
snapshotDataIntegrityCronjob: ~ snapshotDataIntegrityCronjob: ~
# -- This setting allows Longhorn filesystem trim feature to automatically mark the latest snapshot and
# its ancestors as removed and stops at the snapshot containing multiple children.
removeSnapshotsDuringFilesystemTrim: ~ removeSnapshotsDuringFilesystemTrim: ~
# -- This feature supports the fast replica rebuilding.
# It relies on the checksum of snapshot disk files, so setting the snapshot-data-integrity to **enable** or **fast-check** is a prerequisite.
fastReplicaRebuildEnabled: ~ fastReplicaRebuildEnabled: ~
# -- In seconds. The setting specifies the HTTP client timeout to the file sync server.
replicaFileSyncHttpClientTimeout: ~
# -- The log level Panic, Fatal, Error, Warn, Info, Debug, Trace used in longhorn manager. Default to Info.
logLevel: ~
# -- This setting allows users to specify backup compression method.
backupCompressionMethod: ~
# -- This setting controls how many worker threads per backup concurrently.
backupConcurrentLimit: ~
# -- This setting controls how many worker threads per restore concurrently.
restoreConcurrentLimit: ~
# -- This allows users to activate v2 data engine based on SPDK.
# Currently, it is in the preview phase and should not be utilized in a production environment.
v2DataEngine: ~
# -- This setting allows users to enable the offline replica rebuilding for volumes using v2 data engine.
offlineReplicaRebuilding: ~
# -- Allow Scheduling Empty Node Selector Volumes To Any Node
allowEmptyNodeSelectorVolume: ~
# -- Allow Scheduling Empty Disk Selector Volumes To Any Disk
allowEmptyDiskSelectorVolume: ~
privateRegistry: privateRegistry:
# -- Set `true` to create a new private registry secret
createSecret: ~ createSecret: ~
# -- URL of private registry. Leave blank to apply system default registry
registryUrl: ~ registryUrl: ~
# -- User used to authenticate to private registry
registryUser: ~ registryUser: ~
# -- Password used to authenticate to private registry
registryPasswd: ~ registryPasswd: ~
# -- If create a new private registry secret is true, create a Kubernetes secret with this name; else use the existing secret of this name. Use it to pull images from your private registry
registrySecret: ~ registrySecret: ~
longhornManager: longhornManager:
log: log:
## Allowed values are `plain` or `json`. # -- Options: `plain`, `json`
format: plain format: plain
# -- Priority class for longhorn manager
priorityClass: ~ priorityClass: ~
# -- Tolerate nodes to run Longhorn manager
tolerations: [] tolerations: []
## If you want to set tolerations for Longhorn Manager DaemonSet, delete the `[]` in the line above ## If you want to set tolerations for Longhorn Manager DaemonSet, delete the `[]` in the line above
## and uncomment this example block ## and uncomment this example block
@ -176,11 +354,13 @@ longhornManager:
# operator: "Equal" # operator: "Equal"
# value: "value" # value: "value"
# effect: "NoSchedule" # effect: "NoSchedule"
# -- Select nodes to run Longhorn manager
nodeSelector: {} nodeSelector: {}
## If you want to set node selector for Longhorn Manager DaemonSet, delete the `{}` in the line above ## If you want to set node selector for Longhorn Manager DaemonSet, delete the `{}` in the line above
## and uncomment this example block ## and uncomment this example block
# label-key1: "label-value1" # label-key1: "label-value1"
# label-key2: "label-value2" # label-key2: "label-value2"
# -- Annotation used in Longhorn manager service
serviceAnnotations: {} serviceAnnotations: {}
## If you want to set annotations for the Longhorn Manager service, delete the `{}` in the line above ## If you want to set annotations for the Longhorn Manager service, delete the `{}` in the line above
## and uncomment this example block ## and uncomment this example block
@ -188,7 +368,9 @@ longhornManager:
# annotation-key2: "annotation-value2" # annotation-key2: "annotation-value2"
longhornDriver: longhornDriver:
# -- Priority class for longhorn driver
priorityClass: ~ priorityClass: ~
# -- Tolerate nodes to run Longhorn driver
tolerations: [] tolerations: []
## If you want to set tolerations for Longhorn Driver Deployer Deployment, delete the `[]` in the line above ## If you want to set tolerations for Longhorn Driver Deployer Deployment, delete the `[]` in the line above
## and uncomment this example block ## and uncomment this example block
@ -196,6 +378,7 @@ longhornDriver:
# operator: "Equal" # operator: "Equal"
# value: "value" # value: "value"
# effect: "NoSchedule" # effect: "NoSchedule"
# -- Select nodes to run Longhorn driver
nodeSelector: {} nodeSelector: {}
## If you want to set node selector for Longhorn Driver Deployer Deployment, delete the `{}` in the line above ## If you want to set node selector for Longhorn Driver Deployer Deployment, delete the `{}` in the line above
## and uncomment this example block ## and uncomment this example block
@ -203,8 +386,11 @@ longhornDriver:
# label-key2: "label-value2" # label-key2: "label-value2"
longhornUI: longhornUI:
replicas: 1 # -- Replica count for longhorn ui
replicas: 2
# -- Priority class count for longhorn ui
priorityClass: ~ priorityClass: ~
# -- Tolerate nodes to run Longhorn UI
tolerations: [] tolerations: []
## If you want to set tolerations for Longhorn UI Deployment, delete the `[]` in the line above ## If you want to set tolerations for Longhorn UI Deployment, delete the `[]` in the line above
## and uncomment this example block ## and uncomment this example block
@ -212,6 +398,7 @@ longhornUI:
# operator: "Equal" # operator: "Equal"
# value: "value" # value: "value"
# effect: "NoSchedule" # effect: "NoSchedule"
# -- Select nodes to run Longhorn UI
nodeSelector: {} nodeSelector: {}
## If you want to set node selector for Longhorn UI Deployment, delete the `{}` in the line above ## If you want to set node selector for Longhorn UI Deployment, delete the `{}` in the line above
## and uncomment this example block ## and uncomment this example block
@ -219,29 +406,29 @@ longhornUI:
# label-key2: "label-value2" # label-key2: "label-value2"
ingress: ingress:
## Set to true to enable ingress record generation # -- Set to true to enable ingress record generation
enabled: false enabled: false
## Add ingressClassName to the Ingress # -- Add ingressClassName to the Ingress
## Can replace the kubernetes.io/ingress.class annotation on v1.18+ # Can replace the kubernetes.io/ingress.class annotation on v1.18+
ingressClassName: ~ ingressClassName: ~
# -- Layer 7 Load Balancer hostname
host: sslip.io host: sslip.io
## Set this to true in order to enable TLS on the ingress record # -- Set this to true in order to enable TLS on the ingress record
tls: false tls: false
## Enable this in order to enable that the backend service will be connected at port 443 # -- Enable this in order to enable that the backend service will be connected at port 443
secureBackends: false secureBackends: false
## If TLS is set to true, you must declare what secret will store the key/certificate for TLS # -- If TLS is set to true, you must declare what secret will store the key/certificate for TLS
tlsSecret: longhorn.local-tls tlsSecret: longhorn.local-tls
## If ingress is enabled you can set the default ingress path # -- If ingress is enabled you can set the default ingress path
## then you can access the UI by using the following full path {{host}}+{{path}} # then you can access the UI by using the following full path {{host}}+{{path}}
path: / path: /
## Ingress annotations done as key:value pairs
## If you're using kube-lego, you will want to add: ## If you're using kube-lego, you will want to add:
## kubernetes.io/tls-acme: true ## kubernetes.io/tls-acme: true
## ##
@ -249,10 +436,12 @@ ingress:
## ref: https://github.com/kubernetes/ingress-nginx/blob/master/docs/annotations.md ## ref: https://github.com/kubernetes/ingress-nginx/blob/master/docs/annotations.md
## ##
## If tls is set to true, annotation ingress.kubernetes.io/secure-backends: "true" will automatically be set ## If tls is set to true, annotation ingress.kubernetes.io/secure-backends: "true" will automatically be set
# -- Ingress annotations done as key:value pairs
annotations: annotations:
# kubernetes.io/ingress.class: nginx # kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: true # kubernetes.io/tls-acme: true
# -- If you're providing your own certificates, please use this to add the certificates as secrets
secrets: secrets:
## If you're providing your own certificates, please use this to add the certificates as secrets ## If you're providing your own certificates, please use this to add the certificates as secrets
## key and certificate should start with -----BEGIN CERTIFICATE----- or ## key and certificate should start with -----BEGIN CERTIFICATE----- or
@ -267,17 +456,25 @@ ingress:
# key: # key:
# certificate: # certificate:
# For Kubernetes < v1.25, if your cluster enables Pod Security Policy admission controller, # -- For Kubernetes < v1.25, if your cluster enables Pod Security Policy admission controller,
# set this to `true` to ship longhorn-psp which allow privileged Longhorn pods to start # set this to `true` to ship longhorn-psp which allow privileged Longhorn pods to start
enablePSP: false enablePSP: false
## Specify override namespace, specifically this is useful for using longhorn as sub-chart # -- Annotations to add to the Longhorn Manager DaemonSet Pods. Optional.
## and its release namespace is not the `longhorn-system`
namespaceOverride: ""
# Annotations to add to the Longhorn Manager DaemonSet Pods. Optional.
annotations: {} annotations: {}
serviceAccount: serviceAccount:
# Annotations to add to the service account # -- Annotations to add to the service account
annotations: {} annotations: {}
## openshift settings
openshift:
# -- Enable when using openshift
enabled: false
ui:
# -- UI route in openshift environment
route: "longhorn-ui"
# -- UI port in openshift environment
port: 443
# -- UI proxy in openshift environment
proxy: 8443

View File

@ -0,0 +1,48 @@
# same secret for longhorn-system namespace
apiVersion: v1
kind: Secret
metadata:
name: azblob-secret
namespace: longhorn-system
type: Opaque
data:
AZBLOB_ACCOUNT_NAME: ZGV2c3RvcmVhY2NvdW50MQ==
AZBLOB_ACCOUNT_KEY: RWJ5OHZkTTAyeE5PY3FGbHFVd0pQTGxtRXRsQ0RYSjFPVXpGVDUwdVNSWjZJRnN1RnEyVVZFckN6NEk2dHEvSzFTWkZQVE90ci9LQkhCZWtzb0dNR3c9PQ==
AZBLOB_ENDPOINT: aHR0cDovL2F6YmxvYi1zZXJ2aWNlLmRlZmF1bHQ6MTAwMDAv
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: longhorn-test-azblob
namespace: default
labels:
app: longhorn-test-azblob
spec:
replicas: 1
selector:
matchLabels:
app: longhorn-test-azblob
template:
metadata:
labels:
app: longhorn-test-azblob
spec:
containers:
- name: azurite
image: mcr.microsoft.com/azure-storage/azurite:3.23.0
ports:
- containerPort: 10000
---
apiVersion: v1
kind: Service
metadata:
name: azblob-service
namespace: default
spec:
selector:
app: longhorn-test-azblob
ports:
- port: 10000
targetPort: 10000
protocol: TCP
sessionAffinity: ClientIP

View File

@ -0,0 +1,87 @@
apiVersion: v1
kind: Secret
metadata:
name: cifs-secret
namespace: longhorn-system
type: Opaque
data:
CIFS_USERNAME: bG9uZ2hvcm4tY2lmcy11c2VybmFtZQ== # longhorn-cifs-username
CIFS_PASSWORD: bG9uZ2hvcm4tY2lmcy1wYXNzd29yZA== # longhorn-cifs-password
---
apiVersion: v1
kind: Secret
metadata:
name: cifs-secret
namespace: default
type: Opaque
data:
CIFS_USERNAME: bG9uZ2hvcm4tY2lmcy11c2VybmFtZQ== # longhorn-cifs-username
CIFS_PASSWORD: bG9uZ2hvcm4tY2lmcy1wYXNzd29yZA== # longhorn-cifs-password
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: longhorn-test-cifs
namespace: default
labels:
app: longhorn-test-cifs
spec:
replicas: 1
selector:
matchLabels:
app: longhorn-test-cifs
template:
metadata:
labels:
app: longhorn-test-cifs
spec:
volumes:
- name: cifs-volume
emptyDir: {}
containers:
- name: longhorn-test-cifs-container
image: derekbit/samba:latest
ports:
- containerPort: 139
- containerPort: 445
imagePullPolicy: Always
env:
- name: EXPORT_PATH
value: /opt/backupstore
- name: CIFS_DISK_IMAGE_SIZE_MB
value: "4096"
- name: CIFS_USERNAME
valueFrom:
secretKeyRef:
name: cifs-secret
key: CIFS_USERNAME
- name: CIFS_PASSWORD
valueFrom:
secretKeyRef:
name: cifs-secret
key: CIFS_PASSWORD
securityContext:
privileged: true
capabilities:
add: ["SYS_ADMIN", "DAC_READ_SEARCH"]
volumeMounts:
- name: cifs-volume
mountPath: "/opt/backupstore"
args: ["-u", "$(CIFS_USERNAME);$(CIFS_PASSWORD)", "-s", "backupstore;$(EXPORT_PATH);yes;no;no;all;none"]
---
kind: Service
apiVersion: v1
metadata:
name: longhorn-test-cifs-svc
namespace: default
spec:
selector:
app: longhorn-test-cifs
clusterIP: None
ports:
- name: netbios-port
port: 139
targetPort: 139
- name: microsoft-port
port: 445
targetPort: 445

View File

@ -24,49 +24,57 @@ data:
AWS_ENDPOINTS: aHR0cHM6Ly9taW5pby1zZXJ2aWNlLmRlZmF1bHQ6OTAwMA== # https://minio-service.default:9000 AWS_ENDPOINTS: aHR0cHM6Ly9taW5pby1zZXJ2aWNlLmRlZmF1bHQ6OTAwMA== # https://minio-service.default:9000
AWS_CERT: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURMRENDQWhTZ0F3SUJBZ0lSQU1kbzQycGhUZXlrMTcvYkxyWjVZRHN3RFFZSktvWklodmNOQVFFTEJRQXcKR2pFWU1CWUdBMVVFQ2hNUFRHOXVaMmh2Y200Z0xTQlVaWE4wTUNBWERUSXdNRFF5TnpJek1EQXhNVm9ZRHpJeApNakF3TkRBek1qTXdNREV4V2pBYU1SZ3dGZ1lEVlFRS0V3OU1iMjVuYUc5eWJpQXRJRlJsYzNRd2dnRWlNQTBHCkNTcUdTSWIzRFFFQkFRVUFBNElCRHdBd2dnRUtBb0lCQVFEWHpVdXJnUFpEZ3pUM0RZdWFlYmdld3Fvd2RlQUQKODRWWWF6ZlN1USs3K21Oa2lpUVBvelVVMmZvUWFGL1BxekJiUW1lZ29hT3l5NVhqM1VFeG1GcmV0eDBaRjVOVgpKTi85ZWFJNWRXRk9teHhpMElPUGI2T0RpbE1qcXVEbUVPSXljdjRTaCsvSWo5Zk1nS0tXUDdJZGxDNUJPeThkCncwOVdkckxxaE9WY3BKamNxYjN6K3hISHd5Q05YeGhoRm9tb2xQVnpJbnlUUEJTZkRuSDBuS0lHUXl2bGhCMGsKVHBHSzYxc2prZnFTK3hpNTlJeHVrbHZIRXNQcjFXblRzYU9oaVh6N3lQSlorcTNBMWZoVzBVa1JaRFlnWnNFbQovZ05KM3JwOFhZdURna2kzZ0UrOElXQWRBWHExeWhqRDdSSkI4VFNJYTV0SGpKUUtqZ0NlSG5HekFnTUJBQUdqCmF6QnBNQTRHQTFVZER3RUIvd1FFQXdJQ3BEQVRCZ05WSFNVRUREQUtCZ2dyQmdFRkJRY0RBVEFQQmdOVkhSTUIKQWY4RUJUQURBUUgvTURFR0ExVWRFUVFxTUNpQ0NXeHZZMkZzYUc5emRJSVZiV2x1YVc4dGMyVnlkbWxqWlM1awpaV1poZFd4MGh3Ui9BQUFCTUEwR0NTcUdTSWIzRFFFQkN3VUFBNElCQVFDbUZMMzlNSHVZMzFhMTFEajRwMjVjCnFQRUM0RHZJUWozTk9kU0dWMmQrZjZzZ3pGejFXTDhWcnF2QjFCMVM2cjRKYjJQRXVJQkQ4NFlwVXJIT1JNU2MKd3ViTEppSEtEa0Jmb2U5QWI1cC9VakpyS0tuajM0RGx2c1cvR3AwWTZYc1BWaVdpVWorb1JLbUdWSTI0Q0JIdgpnK0JtVzNDeU5RR1RLajk0eE02czNBV2xHRW95YXFXUGU1eHllVWUzZjFBWkY5N3RDaklKUmVWbENtaENGK0JtCmFUY1RSUWN3cVdvQ3AwYmJZcHlERFlwUmxxOEdQbElFOW8yWjZBc05mTHJVcGFtZ3FYMmtYa2gxa3lzSlEralAKelFadHJSMG1tdHVyM0RuRW0yYmk0TktIQVFIcFc5TXUxNkdRakUxTmJYcVF0VEI4OGpLNzZjdEg5MzRDYWw2VgotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0t AWS_CERT: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURMRENDQWhTZ0F3SUJBZ0lSQU1kbzQycGhUZXlrMTcvYkxyWjVZRHN3RFFZSktvWklodmNOQVFFTEJRQXcKR2pFWU1CWUdBMVVFQ2hNUFRHOXVaMmh2Y200Z0xTQlVaWE4wTUNBWERUSXdNRFF5TnpJek1EQXhNVm9ZRHpJeApNakF3TkRBek1qTXdNREV4V2pBYU1SZ3dGZ1lEVlFRS0V3OU1iMjVuYUc5eWJpQXRJRlJsYzNRd2dnRWlNQTBHCkNTcUdTSWIzRFFFQkFRVUFBNElCRHdBd2dnRUtBb0lCQVFEWHpVdXJnUFpEZ3pUM0RZdWFlYmdld3Fvd2RlQUQKODRWWWF6ZlN1USs3K21Oa2lpUVBvelVVMmZvUWFGL1BxekJiUW1lZ29hT3l5NVhqM1VFeG1GcmV0eDBaRjVOVgpKTi85ZWFJNWRXRk9teHhpMElPUGI2T0RpbE1qcXVEbUVPSXljdjRTaCsvSWo5Zk1nS0tXUDdJZGxDNUJPeThkCncwOVdkckxxaE9WY3BKamNxYjN6K3hISHd5Q05YeGhoRm9tb2xQVnpJbnlUUEJTZkRuSDBuS0lHUXl2bGhCMGsKVHBHSzYxc2prZnFTK3hpNTlJeHVrbHZIRXNQcjFXblRzYU9oaVh6N3lQSlorcTNBMWZoVzBVa1JaRFlnWnNFbQovZ05KM3JwOFhZdURna2kzZ0UrOElXQWRBWHExeWhqRDdSSkI4VFNJYTV0SGpKUUtqZ0NlSG5HekFnTUJBQUdqCmF6QnBNQTRHQTFVZER3RUIvd1FFQXdJQ3BEQVRCZ05WSFNVRUREQUtCZ2dyQmdFRkJRY0RBVEFQQmdOVkhSTUIKQWY4RUJUQURBUUgvTURFR0ExVWRFUVFxTUNpQ0NXeHZZMkZzYUc5emRJSVZiV2x1YVc4dGMyVnlkbWxqWlM1awpaV1poZFd4MGh3Ui9BQUFCTUEwR0NTcUdTSWIzRFFFQkN3VUFBNElCQVFDbUZMMzlNSHVZMzFhMTFEajRwMjVjCnFQRUM0RHZJUWozTk9kU0dWMmQrZjZzZ3pGejFXTDhWcnF2QjFCMVM2cjRKYjJQRXVJQkQ4NFlwVXJIT1JNU2MKd3ViTEppSEtEa0Jmb2U5QWI1cC9VakpyS0tuajM0RGx2c1cvR3AwWTZYc1BWaVdpVWorb1JLbUdWSTI0Q0JIdgpnK0JtVzNDeU5RR1RLajk0eE02czNBV2xHRW95YXFXUGU1eHllVWUzZjFBWkY5N3RDaklKUmVWbENtaENGK0JtCmFUY1RSUWN3cVdvQ3AwYmJZcHlERFlwUmxxOEdQbElFOW8yWjZBc05mTHJVcGFtZ3FYMmtYa2gxa3lzSlEralAKelFadHJSMG1tdHVyM0RuRW0yYmk0TktIQVFIcFc5TXUxNkdRakUxTmJYcVF0VEI4OGpLNzZjdEg5MzRDYWw2VgotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0t
--- ---
apiVersion: v1 apiVersion: apps/v1
kind: Pod kind: Deployment
metadata: metadata:
name: longhorn-test-minio name: longhorn-test-minio
namespace: default namespace: default
labels: labels:
app: longhorn-test-minio app: longhorn-test-minio
spec: spec:
volumes: replicas: 1
- name: minio-volume selector:
emptyDir: {} matchLabels:
- name: minio-certificates app: longhorn-test-minio
secret: template:
secretName: minio-secret metadata:
items: labels:
- key: AWS_CERT app: longhorn-test-minio
path: public.crt spec:
- key: AWS_CERT_KEY volumes:
path: private.key - name: minio-volume
emptyDir: {}
containers: - name: minio-certificates
- name: minio secret:
image: minio/minio:RELEASE.2022-02-01T18-00-14Z secretName: minio-secret
command: ["sh", "-c", "mkdir -p /storage/backupbucket && mkdir -p /root/.minio/certs && ln -s /root/certs/private.key /root/.minio/certs/private.key && ln -s /root/certs/public.crt /root/.minio/certs/public.crt && exec minio server /storage"] items:
env: - key: AWS_CERT
- name: MINIO_ROOT_USER path: public.crt
valueFrom: - key: AWS_CERT_KEY
secretKeyRef: path: private.key
name: minio-secret containers:
key: AWS_ACCESS_KEY_ID - name: minio
- name: MINIO_ROOT_PASSWORD image: minio/minio:RELEASE.2022-02-01T18-00-14Z
valueFrom: command: ["sh", "-c", "mkdir -p /storage/backupbucket && mkdir -p /root/.minio/certs && ln -s /root/certs/private.key /root/.minio/certs/private.key && ln -s /root/certs/public.crt /root/.minio/certs/public.crt && exec minio server /storage"]
secretKeyRef: env:
name: minio-secret - name: MINIO_ROOT_USER
key: AWS_SECRET_ACCESS_KEY valueFrom:
ports: secretKeyRef:
- containerPort: 9000 name: minio-secret
volumeMounts: key: AWS_ACCESS_KEY_ID
- name: minio-volume - name: MINIO_ROOT_PASSWORD
mountPath: "/storage" valueFrom:
- name: minio-certificates secretKeyRef:
mountPath: "/root/certs" name: minio-secret
readOnly: true key: AWS_SECRET_ACCESS_KEY
ports:
- containerPort: 9000
volumeMounts:
- name: minio-volume
mountPath: "/storage"
- name: minio-certificates
mountPath: "/root/certs"
readOnly: true
--- ---
apiVersion: v1 apiVersion: v1
kind: Service kind: Service

View File

@ -1,41 +1,49 @@
apiVersion: v1 apiVersion: apps/v1
kind: Pod kind: Deployment
metadata: metadata:
name: longhorn-test-nfs name: longhorn-test-nfs
namespace: default namespace: default
labels: labels:
app: longhorn-test-nfs app: longhorn-test-nfs
spec: spec:
volumes: selector:
- name: nfs-volume matchLabels:
emptyDir: {} app: longhorn-test-nfs
containers: template:
- name: longhorn-test-nfs-container metadata:
image: longhornio/nfs-ganesha:latest labels:
imagePullPolicy: Always app: longhorn-test-nfs
env: spec:
- name: EXPORT_ID volumes:
value: "14" - name: nfs-volume
- name: EXPORT_PATH emptyDir: {}
value: /opt/backupstore containers:
- name: PSEUDO_PATH - name: longhorn-test-nfs-container
value: /opt/backupstore image: longhornio/nfs-ganesha:latest
- name: NFS_DISK_IMAGE_SIZE_MB imagePullPolicy: Always
value: "4096" env:
command: ["bash", "-c", "chmod 700 /opt/backupstore && /opt/start_nfs.sh | tee /var/log/ganesha.log"] - name: EXPORT_ID
securityContext: value: "14"
privileged: true - name: EXPORT_PATH
capabilities: value: /opt/backupstore
add: ["SYS_ADMIN", "DAC_READ_SEARCH"] - name: PSEUDO_PATH
volumeMounts: value: /opt/backupstore
- name: nfs-volume - name: NFS_DISK_IMAGE_SIZE_MB
mountPath: "/opt/backupstore" value: "4096"
livenessProbe: command: ["bash", "-c", "chmod 700 /opt/backupstore && /opt/start_nfs.sh | tee /var/log/ganesha.log"]
exec: securityContext:
command: ["bash", "-c", "grep \"No export entries found\" /var/log/ganesha.log > /dev/null 2>&1 ; [ $? -ne 0 ]"] privileged: true
initialDelaySeconds: 5 capabilities:
periodSeconds: 5 add: ["SYS_ADMIN", "DAC_READ_SEARCH"]
timeoutSeconds: 4 volumeMounts:
- name: nfs-volume
mountPath: "/opt/backupstore"
livenessProbe:
exec:
command: ["bash", "-c", "grep \"No export entries found\" /var/log/ganesha.log > /dev/null 2>&1 ; [ $? -ne 0 ]"]
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 4
--- ---
kind: Service kind: Service
apiVersion: v1 apiVersion: v1

View File

@ -1,13 +1,13 @@
longhornio/csi-attacher:v3.4.0 longhornio/csi-attacher:v4.2.0
longhornio/csi-provisioner:v2.1.2 longhornio/csi-provisioner:v3.4.1
longhornio/csi-resizer:v1.3.0 longhornio/csi-resizer:v1.7.0
longhornio/csi-snapshotter:v5.0.1 longhornio/csi-snapshotter:v6.2.1
longhornio/csi-node-driver-registrar:v2.5.0 longhornio/csi-node-driver-registrar:v2.7.0
longhornio/livenessprobe:v2.8.0 longhornio/livenessprobe:v2.9.0
longhornio/backing-image-manager:master-head longhornio/backing-image-manager:master-head
longhornio/longhorn-engine:master-head longhornio/longhorn-engine:master-head
longhornio/longhorn-instance-manager:master-head longhornio/longhorn-instance-manager:master-head
longhornio/longhorn-manager:master-head longhornio/longhorn-manager:master-head
longhornio/longhorn-share-manager:master-head longhornio/longhorn-share-manager:master-head
longhornio/longhorn-ui:master-head longhornio/longhorn-ui:master-head
longhornio/support-bundle-kit:v0.0.16 longhornio/support-bundle-kit:v0.0.27

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,36 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: longhorn-cifs-installation
labels:
app: longhorn-cifs-installation
annotations:
command: &cmd OS=$(grep -E "^ID_LIKE=" /etc/os-release | cut -d '=' -f 2); if [[ -z "${OS}" ]]; then OS=$(grep -E "^ID=" /etc/os-release | cut -d '=' -f 2); fi; if [[ "${OS}" == *"debian"* ]]; then sudo apt-get update -q -y && sudo apt-get install -q -y cifs-utils; elif [[ "${OS}" == *"suse"* ]]; then sudo zypper --gpg-auto-import-keys -q refresh && sudo zypper --gpg-auto-import-keys -q install -y cifs-utils; else sudo yum makecache -q -y && sudo yum --setopt=tsflags=noscripts install -q -y cifs-utils; fi && if [ $? -eq 0 ]; then echo "cifs install successfully"; else echo "cifs utilities install failed error code $?"; fi
spec:
selector:
matchLabels:
app: longhorn-cifs-installation
template:
metadata:
labels:
app: longhorn-cifs-installation
spec:
hostNetwork: true
hostPID: true
initContainers:
- name: cifs-installation
command:
- nsenter
- --mount=/proc/1/ns/mnt
- --
- bash
- -c
- *cmd
image: alpine:3.12
securityContext:
privileged: true
containers:
- name: sleep
image: registry.k8s.io/pause:3.1
updateStrategy:
type: RollingUpdate

View File

@ -26,11 +26,11 @@ spec:
- bash - bash
- -c - -c
- *cmd - *cmd
image: alpine:3.12 image: alpine:3.17
securityContext: securityContext:
privileged: true privileged: true
containers: containers:
- name: sleep - name: sleep
image: k8s.gcr.io/pause:3.1 image: registry.k8s.io/pause:3.1
updateStrategy: updateStrategy:
type: RollingUpdate type: RollingUpdate

View File

@ -0,0 +1,35 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: longhorn-iscsi-selinux-workaround
labels:
app: longhorn-iscsi-selinux-workaround
annotations:
command: &cmd if ! rpm -q policycoreutils > /dev/null 2>&1; then echo "failed to apply workaround; only applicable in Fedora based distros with SELinux enabled"; exit; elif cd /tmp && echo '(allow iscsid_t self (capability (dac_override)))' > local_longhorn.cil && semodule -vi local_longhorn.cil && rm -f local_longhorn.cil; then echo "applied workaround successfully"; else echo "failed to apply workaround; error code $?"; fi
spec:
selector:
matchLabels:
app: longhorn-iscsi-selinux-workaround
template:
metadata:
labels:
app: longhorn-iscsi-selinux-workaround
spec:
hostPID: true
initContainers:
- name: iscsi-selinux-workaround
command:
- nsenter
- --mount=/proc/1/ns/mnt
- --
- bash
- -c
- *cmd
image: alpine:3.17
securityContext:
privileged: true
containers:
- name: sleep
image: registry.k8s.io/pause:3.1
updateStrategy:
type: RollingUpdate

View File

@ -31,6 +31,6 @@ spec:
privileged: true privileged: true
containers: containers:
- name: sleep - name: sleep
image: k8s.gcr.io/pause:3.1 image: registry.k8s.io/pause:3.1
updateStrategy: updateStrategy:
type: RollingUpdate type: RollingUpdate

View File

@ -0,0 +1,36 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: longhorn-nvme-cli-installation
labels:
app: longhorn-nvme-cli-installation
annotations:
command: &cmd OS=$(grep -E "^ID_LIKE=" /etc/os-release | cut -d '=' -f 2); if [[ -z "${OS}" ]]; then OS=$(grep -E "^ID=" /etc/os-release | cut -d '=' -f 2); fi; if [[ "${OS}" == *"debian"* ]]; then sudo apt-get update -q -y && sudo apt-get install -q -y nvme-cli && sudo modprobe nvme-tcp; elif [[ "${OS}" == *"suse"* ]]; then sudo zypper --gpg-auto-import-keys -q refresh && sudo zypper --gpg-auto-import-keys -q install -y nvme-cli && sudo modprobe nvme-tcp; else sudo yum makecache -q -y && sudo yum --setopt=tsflags=noscripts install -q -y nvme-cli && sudo modprobe nvme-tcp; fi && if [ $? -eq 0 ]; then echo "nvme-cli install successfully"; else echo "nvme-cli install failed error code $?"; fi
spec:
selector:
matchLabels:
app: longhorn-nvme-cli-installation
template:
metadata:
labels:
app: longhorn-nvme-cli-installation
spec:
hostNetwork: true
hostPID: true
initContainers:
- name: nvme-cli-installation
command:
- nsenter
- --mount=/proc/1/ns/mnt
- --
- bash
- -c
- *cmd
image: alpine:3.12
securityContext:
privileged: true
containers:
- name: sleep
image: registry.k8s.io/pause:3.1
updateStrategy:
type: RollingUpdate

View File

@ -0,0 +1,47 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: longhorn-spdk-setup
labels:
app: longhorn-spdk-setup
annotations:
command: &cmd OS=$(grep -E "^ID_LIKE=" /etc/os-release | cut -d '=' -f 2); if [[ -z "${OS}" ]]; then OS=$(grep -E "^ID=" /etc/os-release | cut -d '=' -f 2); fi; if [[ "${OS}" == *"debian"* ]]; then sudo apt-get update -q -y && sudo apt-get install -q -y git; elif [[ "${OS}" == *"suse"* ]]; then sudo zypper --gpg-auto-import-keys -q refresh && sudo zypper --gpg-auto-import-keys -q install -y git; else sudo yum makecache -q -y && sudo yum --setopt=tsflags=noscripts install -q -y git; fi && if [ $? -eq 0 ]; then echo "git install successfully"; else echo "git install failed error code $?"; fi && rm -rf ${SPDK_DIR}; git clone -b longhorn https://github.com/longhorn/spdk.git ${SPDK_DIR} && bash ${SPDK_DIR}/scripts/setup.sh ${SPDK_OPTION}; if [ $? -eq 0 ]; then echo "vm.nr_hugepages=$((HUGEMEM/2))" >> /etc/sysctl.conf; echo "SPDK environment is configured successfully"; else echo "Failed to configure SPDK environment error code $?"; fi; rm -rf ${SPDK_DIR}
spec:
selector:
matchLabels:
app: longhorn-spdk-setup
template:
metadata:
labels:
app: longhorn-spdk-setup
spec:
hostNetwork: true
hostPID: true
initContainers:
- name: longhorn-spdk-setup
command:
- nsenter
- --mount=/proc/1/ns/mnt
- --
- bash
- -c
- *cmd
image: alpine:3.12
env:
- name: SPDK_DIR
value: "/tmp/spdk"
- name: SPDK_OPTION
value: ""
- name: HUGEMEM
value: "1024"
- name: PCI_ALLOWED
value: "none"
- name: DRIVER_OVERRIDE
value: "uio_pci_generic"
securityContext:
privileged: true
containers:
- name: sleep
image: registry.k8s.io/pause:3.1
updateStrategy:
type: RollingUpdate

View File

@ -1,5 +1,7 @@
# Upgrade Responder Helm Chart
This directory contains the helm values for the Longhorn upgrade responder server. This directory contains the helm values for the Longhorn upgrade responder server.
The values are in the file `./chart-values.yaml`. The values are in the file `./chart-values.yaml`.
When you update the content of `./chart-values.yaml`, automation pipeline will update the Longhorn upgrade responder. When you update the content of `./chart-values.yaml`, automation pipeline will update the Longhorn upgrade responder.
Information about the source chart is in `chart.yaml`.
The chart source chart is in `chart.yaml` See [dev/upgrade-responder](../../dev/upgrade-responder/README.md) for manual deployment steps.

View File

@ -14,34 +14,359 @@ secret:
# Set this to false if you don't want to manage these secrets with helm # Set this to false if you don't want to manage these secrets with helm
managed: false managed: false
resources:
limits:
cpu: 400m
memory: 512Mi
requests:
cpu: 200m
memory: 256Mi
# This configmap contains information about the latest release # This configmap contains information about the latest release
# of the application that is using this Upgrade Responder # of the application that is using this Upgrade Responder
configMap: configMap:
responseConfig: |- responseConfig: |-
{ {
"Versions": [ "versions": [
{ {
"Name": "v1.1.3", "name": "v1.3.3",
"ReleaseDate": "2021-12-17T00:00:00Z", "releaseDate": "2023-04-19T00:00:00Z",
"Tags": [ "tags": [
"stable" "stable"
] ]
}, },
{ {
"Name": "v1.2.6", "name": "v1.4.3",
"ReleaseDate": "2022-11-04T00:00:00Z", "releaseDate": "2023-07-14T00:00:00Z",
"Tags": [ "tags": [
"stable"
]
},
{
"Name": "v1.3.2",
"ReleaseDate": "2022-10-03T00:00:00Z",
"Tags": [
"latest", "latest",
"stable" "stable"
] ]
},
{
"name": "v1.5.1",
"releaseDate": "2023-07-19T00:00:00Z",
"tags": [
"latest"
]
} }
] ]
} }
requestSchema: |-
{
"appVersionSchema": {
"dataType": "string",
"maxLen": 200
},
"extraTagInfoSchema": {
"hostKernelRelease": {
"dataType": "string",
"maxLen": 200
},
"hostOsDistro": {
"dataType": "string",
"maxLen": 200
},
"kubernetesNodeProvider": {
"dataType": "string",
"maxLen": 200
},
"kubernetesVersion": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingAllowRecurringJobWhileVolumeDetached": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingAllowVolumeCreationWithDegradedAvailability": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingAutoCleanupSystemGeneratedSnapshot": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingAutoDeletePodWhenVolumeDetachedUnexpectedly": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingAutoSalvage": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingBackupCompressionMethod": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingBackupTarget": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingCrdApiVersion": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingCreateDefaultDiskLabeledNodes": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingDefaultDataLocality": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingDisableRevisionCounter": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingDisableSchedulingOnCordonedNode": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingFastReplicaRebuildEnabled": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingKubernetesClusterAutoscalerEnabled": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingNodeDownPodDeletionPolicy": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingNodeDrainPolicy": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingOfflineReplicaRebuilding": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingOrphanAutoDeletion": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingPriorityClass": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingRegistrySecret": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingRemoveSnapshotsDuringFilesystemTrim": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingReplicaAutoBalance": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingReplicaSoftAntiAffinity": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingReplicaZoneSoftAntiAffinity": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingReplicaDiskSoftAntiAffinity": {
"dataType": "string",
"maxLen": 200
}
"longhornSettingRestoreVolumeRecurringJobs": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingSnapshotDataIntegrity": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingSnapshotDataIntegrityCronjob": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingSnapshotDataIntegrityImmediateCheckAfterSnapshotCreation": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingStorageNetwork": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingSystemManagedComponentsNodeSelector": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingSystemManagedPodsImagePullPolicy": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingTaintToleration": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingV2DataEngine": {
"dataType": "string",
"maxLen": 200
}
},
"extraFieldInfoSchema": {
"longhornInstanceManagerAverageCpuUsageMilliCores": {
"dataType": "float"
},
"longhornInstanceManagerAverageMemoryUsageBytes": {
"dataType": "float"
},
"longhornManagerAverageCpuUsageMilliCores": {
"dataType": "float"
},
"longhornManagerAverageMemoryUsageBytes": {
"dataType": "float"
},
"longhornNamespaceUid": {
"dataType": "string",
"maxLen": 200
},
"longhornNodeCount": {
"dataType": "float"
},
"longhornNodeDiskHDDCount": {
"dataType": "float"
},
"longhornNodeDiskNVMeCount": {
"dataType": "float"
},
"longhornNodeDiskSSDCount": {
"dataType": "float"
},
"longhornSettingBackingImageCleanupWaitInterval": {
"dataType": "float"
},
"longhornSettingBackingImageRecoveryWaitInterval": {
"dataType": "float"
},
"longhornSettingBackupConcurrentLimit": {
"dataType": "float"
},
"longhornSettingBackupstorePollInterval": {
"dataType": "float"
},
"longhornSettingConcurrentAutomaticEngineUpgradePerNodeLimit": {
"dataType": "float"
},
"longhornSettingConcurrentReplicaRebuildPerNodeLimit": {
"dataType": "float"
},
"longhornSettingConcurrentVolumeBackupRestorePerNodeLimit": {
"dataType": "float"
},
"longhornSettingDefaultReplicaCount": {
"dataType": "float"
},
"longhornSettingEngineReplicaTimeout": {
"dataType": "float"
},
"longhornSettingFailedBackupTtl": {
"dataType": "float"
},
"longhornSettingGuaranteedInstanceManagerCpu": {
"dataType": "float"
},
"longhornSettingRecurringFailedJobsHistoryLimit": {
"dataType": "float"
},
"longhornSettingRecurringSuccessfulJobsHistoryLimit": {
"dataType": "float"
},
"longhornSettingReplicaFileSyncHttpClientTimeout": {
"dataType": "float"
},
"longhornSettingReplicaReplenishmentWaitInterval": {
"dataType": "float"
},
"longhornSettingRestoreConcurrentLimit": {
"dataType": "float"
},
"longhornSettingStorageMinimalAvailablePercentage": {
"dataType": "float"
},
"longhornSettingStorageOverProvisioningPercentage": {
"dataType": "float"
},
"longhornSettingStorageReservedPercentageForDefaultDisk": {
"dataType": "float"
},
"longhornSettingSupportBundleFailedHistoryLimit": {
"dataType": "float"
},
"longhornVolumeAccessModeRwoCount": {
"dataType": "float"
},
"longhornVolumeAccessModeRwxCount": {
"dataType": "float"
},
"longhornVolumeAccessModeUnknownCount": {
"dataType": "float"
},
"longhornVolumeAverageActualSizeBytes": {
"dataType": "float"
},
"longhornVolumeAverageNumberOfReplicas": {
"dataType": "float"
},
"longhornVolumeAverageSizeBytes": {
"dataType": "float"
},
"longhornVolumeAverageSnapshotCount": {
"dataType": "float"
},
"longhornVolumeDataLocalityBestEffortCount": {
"dataType": "float"
},
"longhornVolumeDataLocalityDisabledCount": {
"dataType": "float"
},
"longhornVolumeDataLocalityStrictLocalCount": {
"dataType": "float"
},
"longhornVolumeFrontendBlockdevCount": {
"dataType": "float"
},
"longhornVolumeFrontendIscsiCount": {
"dataType": "float"
},
"longhornVolumeOfflineReplicaRebuildingDisabledCount": {
"dataType": "float"
},
"longhornVolumeOfflineReplicaRebuildingEnabledCount": {
"dataType": "float"
},
"longhornVolumeReplicaAutoBalanceDisabledCount": {
"dataType": "float"
},
"longhornVolumeReplicaSoftAntiAffinityFalseCount": {
"dataType": "float"
},
"longhornVolumeReplicaZoneSoftAntiAffinityTrueCount": {
"dataType": "float"
},
"longhornVolumeReplicaDiskSoftAntiAffinityTrueCount": {
"dataType": "float"
},
"longhornVolumeRestoreVolumeRecurringJobFalseCount": {
"dataType": "float"
},
"longhornVolumeSnapshotDataIntegrityDisabledCount": {
"dataType": "float"
},
"longhornVolumeSnapshotDataIntegrityFastCheckCount": {
"dataType": "float"
},
"longhornVolumeUnmapMarkSnapChainRemovedFalseCount": {
"dataType": "float"
}
}
}

View File

@ -1,5 +1,5 @@
url: https://github.com/longhorn/upgrade-responder.git url: https://github.com/longhorn/upgrade-responder.git
commit: 3c78890f5415744af1923eac01f98636ac52a113 commit: 116f807836c29185038cfb005708f0a8d41f4d35
releaseName: longhorn-upgrade-responder releaseName: longhorn-upgrade-responder
namespace: longhorn-upgrade-responder namespace: longhorn-upgrade-responder

View File

@ -0,0 +1,55 @@
## Overview
### Install
1. Install Longhorn.
1. Install Longhorn [upgrade-responder](https://github.com/longhorn/upgrade-responder) stack.
```bash
./install.sh
```
Sample output:
```shell
secret/influxdb-creds created
persistentvolumeclaim/influxdb created
deployment.apps/influxdb created
service/influxdb created
Deployment influxdb is running.
Cloning into 'upgrade-responder'...
remote: Enumerating objects: 1077, done.
remote: Counting objects: 100% (1076/1076), done.
remote: Compressing objects: 100% (454/454), done.
remote: Total 1077 (delta 573), reused 1049 (delta 565), pack-reused 1
Receiving objects: 100% (1077/1077), 55.01 MiB | 18.10 MiB/s, done.
Resolving deltas: 100% (573/573), done.
Release "longhorn-upgrade-responder" does not exist. Installing it now.
NAME: longhorn-upgrade-responder
LAST DEPLOYED: Thu May 11 00:42:44 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the Upgrade Responder server URL by running these commands:
export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=upgrade-responder,app.kubernetes.io/instance=longhorn-upgrade-responder" -o jsonpath="{.items[0].metadata.name}")
kubectl port-forward $POD_NAME 8080:8314 --namespace default
echo "Upgrade Responder server URL is http://127.0.0.1:8080"
Deployment longhorn-upgrade-responder is running.
persistentvolumeclaim/grafana-pvc created
deployment.apps/grafana created
service/grafana created
Deployment grafana is running.
[Upgrade Checker]
URL : http://longhorn-upgrade-responder.default.svc.cluster.local:8314/v1/checkupgrade
[InfluxDB]
URL : http://influxdb.default.svc.cluster.local:8086
Database : longhorn_upgrade_responder
Username : root
Password : root
[Grafana]
Dashboard : http://1.2.3.4:30864
Username : admin
Password : admin
```

424
dev/upgrade-responder/install.sh Executable file
View File

@ -0,0 +1,424 @@
#!/bin/bash
UPGRADE_RESPONDER_REPO="https://github.com/longhorn/upgrade-responder.git"
UPGRADE_RESPONDER_REPO_BRANCH="master"
UPGRADE_RESPONDER_VALUE_YAML="upgrade-responder-value.yaml"
UPGRADE_RESPONDER_IMAGE_REPO="longhornio/upgrade-responder"
UPGRADE_RESPONDER_IMAGE_TAG="master-head"
INFLUXDB_URL="http://influxdb.default.svc.cluster.local:8086"
APP_NAME="longhorn"
DEPLOYMENT_TIMEOUT_SEC=300
DEPLOYMENT_WAIT_INTERVAL_SEC=5
temp_dir=$(mktemp -d)
trap 'rm -rf "${temp_dir}"' EXIT # -f because packed Git files (.pack, .idx) are write protected.
cp -a ./* ${temp_dir}
cd ${temp_dir}
wait_for_deployment() {
local deployment_name="$1"
local start_time=$(date +%s)
while true; do
status=$(kubectl rollout status deployment/${deployment_name})
if [[ ${status} == *"successfully rolled out"* ]]; then
echo "Deployment ${deployment_name} is running."
break
fi
elapsed_secs=$(($(date +%s) - ${start_time}))
if [[ ${elapsed_secs} -ge ${timeout_secs} ]]; then
echo "Timed out waiting for deployment ${deployment_name} to be running."
exit 1
fi
echo "Deployment ${deployment_name} is not running yet. Waiting..."
sleep ${DEPLOYMENT_WAIT_INTERVAL_SEC}
done
}
install_influxdb() {
kubectl apply -f ./manifests/influxdb.yaml
wait_for_deployment "influxdb"
}
install_grafana() {
kubectl apply -f ./manifests/grafana.yaml
wait_for_deployment "grafana"
}
install_upgrade_responder() {
cat << EOF > ${UPGRADE_RESPONDER_VALUE_YAML}
applicationName: ${APP_NAME}
secret:
name: upgrade-responder-secrets
managed: true
influxDBUrl: "${INFLUXDB_URL}"
influxDBUser: "root"
influxDBPassword: "root"
configMap:
responseConfig: |-
{
"versions": [{
"name": "v1.0.0",
"releaseDate": "2020-05-18T12:30:00Z",
"tags": ["latest"]
}]
}
requestSchema: |-
{
"appVersionSchema": {
"dataType": "string",
"maxLen": 200
},
"extraTagInfoSchema": {
"hostKernelRelease": {
"dataType": "string",
"maxLen": 200
},
"hostOsDistro": {
"dataType": "string",
"maxLen": 200
},
"kubernetesNodeProvider": {
"dataType": "string",
"maxLen": 200
},
"kubernetesVersion": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingAllowRecurringJobWhileVolumeDetached": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingAllowVolumeCreationWithDegradedAvailability": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingAutoCleanupSystemGeneratedSnapshot": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingAutoDeletePodWhenVolumeDetachedUnexpectedly": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingAutoSalvage": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingBackupCompressionMethod": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingBackupTarget": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingCrdApiVersion": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingCreateDefaultDiskLabeledNodes": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingDefaultDataLocality": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingDisableRevisionCounter": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingDisableSchedulingOnCordonedNode": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingFastReplicaRebuildEnabled": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingKubernetesClusterAutoscalerEnabled": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingNodeDownPodDeletionPolicy": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingNodeDrainPolicy": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingOfflineReplicaRebuilding": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingOrphanAutoDeletion": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingPriorityClass": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingRegistrySecret": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingRemoveSnapshotsDuringFilesystemTrim": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingReplicaAutoBalance": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingReplicaSoftAntiAffinity": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingReplicaZoneSoftAntiAffinity": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingReplicaDiskSoftAntiAffinity": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingRestoreVolumeRecurringJobs": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingSnapshotDataIntegrity": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingSnapshotDataIntegrityCronjob": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingSnapshotDataIntegrityImmediateCheckAfterSnapshotCreation": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingStorageNetwork": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingSystemManagedComponentsNodeSelector": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingSystemManagedPodsImagePullPolicy": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingTaintToleration": {
"dataType": "string",
"maxLen": 200
},
"longhornSettingV2DataEngine": {
"dataType": "string",
"maxLen": 200
}
},
"extraFieldInfoSchema": {
"longhornInstanceManagerAverageCpuUsageMilliCores": {
"dataType": "float"
},
"longhornInstanceManagerAverageMemoryUsageBytes": {
"dataType": "float"
},
"longhornManagerAverageCpuUsageMilliCores": {
"dataType": "float"
},
"longhornManagerAverageMemoryUsageBytes": {
"dataType": "float"
},
"longhornNamespaceUid": {
"dataType": "string",
"maxLen": 200
},
"longhornNodeCount": {
"dataType": "float"
},
"longhornNodeDiskHDDCount": {
"dataType": "float"
},
"longhornNodeDiskNVMeCount": {
"dataType": "float"
},
"longhornNodeDiskSSDCount": {
"dataType": "float"
},
"longhornSettingBackingImageCleanupWaitInterval": {
"dataType": "float"
},
"longhornSettingBackingImageRecoveryWaitInterval": {
"dataType": "float"
},
"longhornSettingBackupConcurrentLimit": {
"dataType": "float"
},
"longhornSettingBackupstorePollInterval": {
"dataType": "float"
},
"longhornSettingConcurrentAutomaticEngineUpgradePerNodeLimit": {
"dataType": "float"
},
"longhornSettingConcurrentReplicaRebuildPerNodeLimit": {
"dataType": "float"
},
"longhornSettingConcurrentVolumeBackupRestorePerNodeLimit": {
"dataType": "float"
},
"longhornSettingDefaultReplicaCount": {
"dataType": "float"
},
"longhornSettingEngineReplicaTimeout": {
"dataType": "float"
},
"longhornSettingFailedBackupTtl": {
"dataType": "float"
},
"longhornSettingGuaranteedInstanceManagerCpu": {
"dataType": "float"
},
"longhornSettingRecurringFailedJobsHistoryLimit": {
"dataType": "float"
},
"longhornSettingRecurringSuccessfulJobsHistoryLimit": {
"dataType": "float"
},
"longhornSettingReplicaFileSyncHttpClientTimeout": {
"dataType": "float"
},
"longhornSettingReplicaReplenishmentWaitInterval": {
"dataType": "float"
},
"longhornSettingRestoreConcurrentLimit": {
"dataType": "float"
},
"longhornSettingStorageMinimalAvailablePercentage": {
"dataType": "float"
},
"longhornSettingStorageOverProvisioningPercentage": {
"dataType": "float"
},
"longhornSettingStorageReservedPercentageForDefaultDisk": {
"dataType": "float"
},
"longhornSettingSupportBundleFailedHistoryLimit": {
"dataType": "float"
},
"longhornVolumeAccessModeRwoCount": {
"dataType": "float"
},
"longhornVolumeAccessModeRwxCount": {
"dataType": "float"
},
"longhornVolumeAccessModeUnknownCount": {
"dataType": "float"
},
"longhornVolumeAverageActualSizeBytes": {
"dataType": "float"
},
"longhornVolumeAverageNumberOfReplicas": {
"dataType": "float"
},
"longhornVolumeAverageSizeBytes": {
"dataType": "float"
},
"longhornVolumeAverageSnapshotCount": {
"dataType": "float"
},
"longhornVolumeDataLocalityBestEffortCount": {
"dataType": "float"
},
"longhornVolumeDataLocalityDisabledCount": {
"dataType": "float"
},
"longhornVolumeDataLocalityStrictLocalCount": {
"dataType": "float"
},
"longhornVolumeFrontendBlockdevCount": {
"dataType": "float"
},
"longhornVolumeFrontendIscsiCount": {
"dataType": "float"
},
"longhornVolumeOfflineReplicaRebuildingDisabledCount": {
"dataType": "float"
},
"longhornVolumeOfflineReplicaRebuildingEnabledCount": {
"dataType": "float"
},
"longhornVolumeReplicaAutoBalanceDisabledCount": {
"dataType": "float"
},
"longhornVolumeReplicaSoftAntiAffinityFalseCount": {
"dataType": "float"
},
"longhornVolumeReplicaZoneSoftAntiAffinityTrueCount": {
"dataType": "float"
},
"longhornVolumeReplicaDiskSoftAntiAffinityTrueCount": {
"dataType": "float"
},
"longhornVolumeRestoreVolumeRecurringJobFalseCount": {
"dataType": "float"
},
"longhornVolumeSnapshotDataIntegrityDisabledCount": {
"dataType": "float"
},
"longhornVolumeSnapshotDataIntegrityFastCheckCount": {
"dataType": "float"
},
"longhornVolumeUnmapMarkSnapChainRemovedFalseCount": {
"dataType": "float"
}
}
}
image:
repository: ${UPGRADE_RESPONDER_IMAGE_REPO}
tag: ${UPGRADE_RESPONDER_IMAGE_TAG}
EOF
git clone -b ${UPGRADE_RESPONDER_REPO_BRANCH} ${UPGRADE_RESPONDER_REPO}
helm upgrade --install ${APP_NAME}-upgrade-responder upgrade-responder/chart -f ${UPGRADE_RESPONDER_VALUE_YAML}
wait_for_deployment "${APP_NAME}-upgrade-responder"
}
output() {
local upgrade_responder_service_info=$(kubectl get svc/${APP_NAME}-upgrade-responder --no-headers)
local upgrade_responder_service_port=$(echo "${upgrade_responder_service_info}" | awk '{print $5}' | cut -d'/' -f1)
echo # a blank line to separate the installation outputs for better readability.
printf "[Upgrade Checker]\n"
printf "%-10s: http://${APP_NAME}-upgrade-responder.default.svc.cluster.local:${upgrade_responder_service_port}/v1/checkupgrade\n\n" "URL"
printf "[InfluxDB]\n"
printf "%-10s: ${INFLUXDB_URL}\n" "URL"
printf "%-10s: ${APP_NAME}_upgrade_responder\n" "Database"
printf "%-10s: root\n" "Username"
printf "%-10s: root\n\n" "Password"
local public_ip=$(curl -s https://ifconfig.me/ip)
local grafana_service_info=$(kubectl get svc/grafana --no-headers)
local grafana_service_port=$(echo "${grafana_service_info}" | awk '{print $5}' | cut -d':' -f2 | cut -d'/' -f1)
printf "[Grafana]\n"
printf "%-10s: http://${public_ip}:${grafana_service_port}\n" "Dashboard"
printf "%-10s: admin\n" "Username"
printf "%-10s: admin\n" "Password"
}
install_influxdb
install_upgrade_responder
install_grafana
output

View File

@ -0,0 +1,86 @@
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 2Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: grafana
name: grafana
spec:
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
securityContext:
fsGroup: 472
supplementalGroups:
- 0
containers:
- name: grafana
image: grafana/grafana:7.1.0
imagePullPolicy: IfNotPresent
env:
- name: GF_INSTALL_PLUGINS
value: "grafana-worldmap-panel"
ports:
- containerPort: 3000
name: http-grafana
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /robots.txt
port: 3000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 2
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 3000
timeoutSeconds: 1
resources:
requests:
cpu: 250m
memory: 750Mi
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-pv
volumes:
- name: grafana-pv
persistentVolumeClaim:
claimName: grafana-pvc
---
apiVersion: v1
kind: Service
metadata:
name: grafana
spec:
ports:
- port: 3000
protocol: TCP
targetPort: http-grafana
selector:
app: grafana
sessionAffinity: None
type: LoadBalancer

View File

@ -0,0 +1,90 @@
apiVersion: v1
kind: Secret
metadata:
name: influxdb-creds
namespace: default
type: Opaque
data:
INFLUXDB_HOST: aW5mbHV4ZGI= # influxdb
INFLUXDB_PASSWORD: cm9vdA== # root
INFLUXDB_USERNAME: cm9vdA== # root
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: influxdb
namespace: default
labels:
app: influxdb
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 2Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: influxdb
name: influxdb
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: influxdb
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: influxdb
spec:
containers:
- image: docker.io/influxdb:1.8.10
imagePullPolicy: IfNotPresent
name: influxdb
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
envFrom:
- secretRef:
name: influxdb-creds
volumeMounts:
- mountPath: /var/lib/influxdb
name: var-lib-influxdb
volumes:
- name: var-lib-influxdb
persistentVolumeClaim:
claimName: influxdb
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
---
apiVersion: v1
kind: Service
metadata:
labels:
app: influxdb
name: influxdb
namespace: default
spec:
ports:
- port: 8086
protocol: TCP
targetPort: 8086
selector:
app: influxdb
sessionAffinity: None
type: ClusterIP

View File

@ -106,7 +106,7 @@ The life cycle of a snapshot CR is as below:
1. **Create** 1. **Create**
1. When a snapshot CR is created, Longhorn mutation webhook will: 1. When a snapshot CR is created, Longhorn mutation webhook will:
1. Add a volume label `longhornvolume: <VOLUME-NAME>` to the snapshot CR. This allow us to efficiently find snapshots corresponding to a volume without having listing potientially thoundsands of snapshots. 1. Add a volume label `longhornvolume: <VOLUME-NAME>` to the snapshot CR. This allow us to efficiently find snapshots corresponding to a volume without having listing potentially thoundsands of snapshots.
1. Add `longhornFinalizerKey` to snapshot CR to prevent it from being removed before Longhorn has change to clean up the corresponding snapshot 1. Add `longhornFinalizerKey` to snapshot CR to prevent it from being removed before Longhorn has change to clean up the corresponding snapshot
1. Populate the value for `snapshot.OwnerReferences` to uniquely identify the volume of this snapshot. This field contains the volume UID to uniquely identify the volume in case the old volume was deleted and a new volume was created with the same name. 1. Populate the value for `snapshot.OwnerReferences` to uniquely identify the volume of this snapshot. This field contains the volume UID to uniquely identify the volume in case the old volume was deleted and a new volume was created with the same name.
2. For user created snapshot CR, the field `Spec.CreateSnapshot` should be set to `true` indicating that Longhorn should provision a new snapshot for this CR. 2. For user created snapshot CR, the field `Spec.CreateSnapshot` should be set to `true` indicating that Longhorn should provision a new snapshot for this CR.

View File

@ -51,7 +51,7 @@ https://github.com/longhorn/longhorn/issues/3546
- Introduce a new gRPC server in Instance Manager. - Introduce a new gRPC server in Instance Manager.
- Keep re-usable connections between Manager and Instance Managers. - Keep reusable connections between Manager and Instance Managers.
- Allow Manager to fall back to engine binary call when communicating with old Instance Manager. - Allow Manager to fall back to engine binary call when communicating with old Instance Manager.

View File

@ -68,7 +68,7 @@ While the node where the share-manager pod is running is down, the share-manager
│ │ │ │
HTTP API ┌─────────────┴──────────────┐ HTTP API ┌─────────────┴──────────────┐
│ │ │ │ │ │
│ │ endpint 1 │ endpoint N │ │ endpoint 1 │ endpoint N
┌──────────────────────┐ │ ┌─────────▼────────┐ ┌────────▼─────────┐ ┌──────────────────────┐ │ ┌─────────▼────────┐ ┌────────▼─────────┐
│ share-manager pod │ │ │ recovery-backend │ │ recovery-backend │ │ share-manager pod │ │ │ recovery-backend │ │ recovery-backend │
│ │ │ │ pod │ │ pod │ │ │ │ │ pod │ │ pod │

View File

@ -0,0 +1,181 @@
# Reimplement Longhorn Engine with SPDK
## Summary
The Storage Performance Development Kit [SPDK](https://spdk.io) provides a set of tools and C libraries for writing high performance, scalable, user-mode storage applications. It achieves high performance through the use of a number of key techniques:
* Moving all of the necessary drivers into userspace, which avoids syscalls and enables zero-copy access from the application.
* Polling hardware for completions instead of relying on interrupts, which lowers both total latency and latency variance.
* Avoiding all locks in the I/O path, instead relying on message passing.
SPDK has several features that allow it to perform tasks similar to what the `longhorn-engine` currently needs:
* [Block Device](https://spdk.io/doc/bdev.html) layer, often simply called bdev, intends to be equivalent to the operating system block storage layer that often sits immediately above the device drivers in a traditional kernel storage stack. SPDK provides also virtual bdev modules which creates block devices on existing bdev, for example Logical Volumes or RAID1.
* [Logical volumes](https://spdk.io/doc/logical_volumes.html) library is a flexible storage space management system. It allows creating and managing virtual block devices with variable size on top of other bdevs. The SPDK Logical Volume library is built on top [Blobstore](https://spdk.io/doc/blob.html) which is a persistent, power-fail safe block allocator designed to be used as the local storage system backing a higher level storage service, typically in lieu of a traditional filesystem. Logical volumes have a couple of features like Thinly Provisioning and Snapshots similar to what actual Longhorn-Engine provides.
* [NVMe over Fabrics](https://spdk.io/doc/nvmf.html) is a feature to presents block devices over a fabrics such as Ethernet, supporting RDMA and TCP transports. The standard Linux kernel initiators for NVMe-oF interoperate with these SPDK NVMe-oF targets, so with this feature we can serve bdev over the network or to other processes
## Motivation
These are the reasons that have driven us:
* Use SPDK to improve performance of Longhorn
* Use SPDK functionality to improve reliability and robustness
* Use SPDK to take advantage of the new features that are continuously added to the framework
### Goals
* Implement all actual `longhorn-engine` functionalities
* Continue to support multiple `longhorn-engine` versions concurrently
* Maintain as much as possible the same user experience between Longhorn with and without SPDK
* Lay the groundwork for extending Longhorn to sharding and aggegration of storage devices
## Proposal
SPDK implements a JSON-RPC 2.0 server to allow external management tools to dynamically configure SPDK components ([documentation](https://spdk.io/doc/jsonrpc.html)).
What we aim is to create an external orchestrator that, with JSON-RPC calls towards multiple instances of `spdk_tgt` app running in different machines, could manage the durability and reliability of data. Actually, not all needed functionalities to do that are already available in SPDK, so some new JSON-RPC commands will be developed over SPDK. This orchestrator is implemented in longhorn manager pods and will use a new process, called `longhorn-spdk-engine` in continuity with actual `longhorn-engine`, to talk with `spdk_tgt`.
* The main purpose of `longhorn-spdk-engine` is to create and export via NVMe-oF logical volumes from multiple replica nodes (one of them likely local), attach to these volumes on a controller node, use resulting bdevs to create a RAID1 bdev and exporting it via NVMe-oF locally. At this point NVMe Kernel module can be used to connect to this NVMe-oF subsystem and so to create a block device `/dev/nvmeXnY` to be used by the Longhorn CSI driver. In this way we will have multiple replica of the same data written on this block device.
* Below a diagram that shows the control plane of the proposal ![SPDK New Architecture](./image/spdk-control-plane.png)
* In release 23.01, support for ublk will be added in SPDK: with this functionality we can directly create a block device without using the NVMe layer on Linux kernel versions >6.0. This will be a quite big enhancement over using NVMe-oF locally.
The `longhorn-spdk-engine` will be responsible to make all others control operations, like for example creating snapshots over all replicas of the same volume. Other functionalities orchestrated through the engine will be the remote rebuild, a complete rebuild of the entire snapshot stack of a volume needed to add or repair a replica, the backup and restore, export/import of a SPDK logical volumes to/from sparse files stored on an external storage system via S3.
The `longhorn-spdk-engine` will be developed in Go so maybe we can reuse some code from `longhorn-engine`, for example gRPC handling to receive control commands and error handling during snapshot/backup/restore operations.
What about the data plane, below a comparison between actual architecture and new design:
* longhorn-engine ![](./image/engine-data-plane.png)
* spdk_tgt
![](./image/spdk-data-plane.png)
## Design
### Implementation Overview
Actually there is a `longhorn-engine` controller and some `longhorn-engine` replica for every volume to manage. All these instances are started and controlled by the `instance-manager`, so on every node belonging to the cluster we have one instance of `instance-manager` and multiple instances of `longhorn-engine`. Every volume is stored in a sequence of sparse files representing the live data and the snapshots. With SPDK we have a different situation, because `spdk_tgt` can take the control of an entire disk, so in every node we will have a single instance of SPDK that will handle all the volumes created by Longhorn.
To orchestrate SPDK instances running on different nodes in a way to make up a set of replicas, we will introduce, as discussed before, the `longhorn-spdk-engine`; to make the volume management lighter we will have an instance of the engine per volume. `longhorn-spdk-engine` will implement actual gRPC interface used by `longhorn-engine` to talk with `instance-manager`, so this last one will became the portal to communicate with `longhorn-manager` by different data plane.
`spdk_tgt` by default starts with a single thread, but it can be configured to use multiple threads: we can have a thread per core available on the CPU. This will increase the performance but comes with the cost of an high CPU utilization. Working in polling mode instead than in interrupt mode, CPU core utilization by a single thread is always rising 100% even with no workload to handle. This could be a problem, so we can configure `spdk_tgt` with dynamic scheduler: in this way, if no workload is present, only one core will be used and only one thread will continue polling. Other thread will be put in a idle state and will become active again only when needed. Moreover, dynamic scheduler has a way to reduce the CPU frequency. (See future work section.)
### Snapshots
When `longhorn-spdk-engine` receive a snapshot request from `instance-manager`, before to proceed all I/O operations over volume's block device `/dev/nvmeXnY` must be stopped to ensure that snapshots over all the replicas contains the same data.
Actually there is no way to suspend the I/O operations over a block device, so we will have to implement this feature into SPDK. But in RAID bdev there are already some private functions to suspend I/O (they will be used for example in base bdev removing), maybe we can use and improve them. These functions actually enqueue all the I/O operations received during the suspend time.
Once received a snapshot request, `longhorn-spdk-engine` will call the JSON-RPC to make a snapshot over the local replica of the volume involved. The snapshot RPC command will ensure to freeze all I/O over the logical volume to be snapshotted, so all pending I/O will be executed before the snapshot.
SPDK logical volume have a couple of features that we will use:
* clone, used to create new logical volume based on a snapshot. It can be used to revert a volume to a snapshot too, cloning a new volume, deleting the old one and then renaming the new one as the old one
* decouple, feature that can be used to delete a snapshot, first decoupling the child volume from this snapshot and then deleting the snapshot.
### Replica rebuild
RAID replica rebuild is actually under development, so we don'know exactly hot it will be implemented, but we can suppose that we will not use it because presumably it will work only at bdev layer.
When a new replica has to be added or a replica has to be rebuilt, we have to recreate the entire snapshot stack of each volume that are hosted on that node. Actually SPDK doesn't have nothing to do that, but after discussing with core maintainers we arranged a procedure. Let's make an example.
Supposing we have to rebuild a volume with two layer of snapshots, snapshotA is the oldest and snapshotB the younger, basically we have to (in _italic_ what we miss):
* create a new volume on the node to be rebuilt
* export this volume via NVMe-oF
* attach to this volume in the node where we have the source data
* _copy snapshotA over the attached volume_
* perform a snapshot over the exported volume
* repeat the copy and snapshot operations for snapshotB
What we have to implement is a JSON-RPC to copy a logical volume over an arbitrary bdev (that in our case will represent a remote volume exported via NVMe-oF and locally attached) _while the top layer is also being modified_ (see next section).
So, in this way we can rebuild the snapshot stack of a volume. But what about the live data? Actual `longhorn-engine` make the replica rebuild in an "hot" way, i.e., during the rebuilding phase it is writing over the live data of the new volume. So, how can we reproduce this with SPDK? First of all we have to wait the conclusion of RAID1 bdev's review process to see what kind of replica rebuild will be implemented. But, supposing that the rebuild feature will not be useful for us, we will need to create a couple of additional JSON-RPC over SPDK to implement the following procedure (in _italic_ what we miss):
* create a new volume over the node to be rebuilt
* export this volume via NVMe-oF
* attach to this volume in the node where we have the RAID1
* _add the bdev of the attached volume to the RAID1 bdev excluded from the read balancing_
* wait for the snapshot stack rebuilding to finish
* _change the upper volume of the snapshot stack from the current to this one with the live data_
* _enable the bdev of the attached volume for RAID1 read balancing_
What we have at the end of the rebuilding's phase is a snapshot stack with an empty volume at the top, while in the RAID1 we have a volume with the live data but without any snapshot. So we have to couple these 2 stacks exchanging the upper volume and to do that we need a new JSON-RPC. We will need to implement the JSON-RPC to enable/disable a bdev from the RAID1 read balancing too.
### Backup and Restore
Backup will be implemented exporting a volume to a sparse file and then save this file over an external storage via S3. SPDK already has a `spdk_dd` application that can copy a bdev to a file and this app has an option to preserve bdev sparseness. But using spdk_dd has some problems: actually the sparse option works only with bdev that represent a local logical volume, not an exported one via NVMe-oF. So to backup a volume we cannot work on a remote node where to export this volume, we need to work on the node where we have the data source. But in this way, to perform a backup, we would need to stop the `spdk_tgt` app, run the `spdk_dd` and then restart the `spdk_tgt`. This operation is needed because it could not be safe to run multiple spdk applications over the same disk (even if spdk_dd would read from a read only volume) and moreover `spdk_dd` could not see the volume to export if this has been created after the last restart of `spdk_tgt` app. This because blobstore metadata, and so newly created logical volume, are saved on disk only on application exit.
Stopping `spdk_tgt` is not acceptable because it would suspend operation over all other volumes hosted in this node so, to solve these problems, we have 2 possible solutions:
* create a JSON-RPC command to export logical volume to a sparse file, so that we can make the operation directly over the `spdk_tgt` app
* create a custom NVMe-oF command to implement the seek_data and seek_hole functionalities of bdev used by `spdk_dd` to skip holes
With the second solution we could export the volume via NVMe-oF to a dedicated node where to perform the backup with `spdk_dd`application.
The restore operation can be done in a couple of way:
* read the backup sparse file and write its content into the longhorn block device. In this way data will be fully replicated
* clone from backup over each replica, importing the backup sparse file into a new thinly provisioned logical volume. We can perform this operation over the local node, owner of the new volume, if for the backup process we choose to develop a JSON-RPC to export/import logical volume to/from sparse files. Otherwise we can do it or over a dedicated node with `spdk_dd` application, that handle sparse file with SEEK_HOLE and SEEK_DATA functionalities of `lseek`.
If we leverage the same backup & restore mechanism of `longhorn-engine`, we can restore a backup done by the actual engine to a SPDK volume.
### Remote Control
The JSON-RPC API by default is only available over the `/var/tmp/spdk.sock` Unix domain socket, but SPDK offer the sample python script [rpc_http_proxy](https://spdk.io/doc/jsonrpc_proxy.html) that provides http server which listens for JSON objects from users. Otherwise we could use the `socat` application to forward requests received from an IP socket towards a Unix socket. Both `socat` and `rpc_http_proxy` can perform user authentication with password.
### Upgrade Strategy
What kind of upgrade/migration will we support?
For out-of-cluster migration we can use the Restore procedure to create SPDK logical volumes starting from existing Longhorn files. Instead for in-cluster migration we can retain read support for the old format, writing new data over SPDK.
Whatabout `spdk_tgt` updates, we can perform a rolling update strategy updating nodes one by one. Stopping `spdk_tgt` over a node will cause:
* stop of all the volumes controlled in the node. To avoid service interruption the node must be evacuated before the update. The cheat is to delay the update until the node has to be rebooted for a kernel update.
* stop of all the replicas hosted in the node. This is not a problem because during the update the I/O will be redirected towards other replica of the volume. To make a clean update of a node, before to stop `spdk_tgt`, we have to notify all the nodes that have a bdev imported via NMVMe-oF from this node to detach controllers involved.
Moreover this is a good time to introduce backup versioning, which allows us to change/improve the backup format [REF: GH3175](https://github.com/longhorn/longhorn/issues/3175)
### Future Work
* For Edge use cases, energy efficiency is important. We may need further enhancements and an interrupt-driven mode during low load periods for the scheduler. [Here](https://www.snia.org/educational-library/spdk-schedulers-%E2%80%93-saving-cpu-cores-polled-mode-storage-application-2021) an introduction to SPDK Schedulers that describes briefly the interrupt mode.
### Roadmap
For Longhorn 1.5, we need to have the below capabilities:
* replica (RAID1)
* snapshot (create, delete/purge, revert)
* replica rebuilding
* volume clone
For 1.6, we need the rest of the feature parity functions:
* volume backup & restore
* DR volume restore (incremental restore from another volume backup)
* volume encryption
* create volume from the backing image
* create backing image from volume  
* volume expansion
* volume trim
* volume metrics (bandwidth, latency, IOPS)
* volume data integrity (snapshot checksum)
SPDK uses a quarterly release cycle, next release will be 23.01 (January 2023). Assuming actual RAID1 implementation will be available in 23.01 release, actually the JSON-RPC we need to implement over SPDK are:
* suspend I/O operation
* copy a snapshot over an arbitrary bdev
* add bdev to raid1 in read balancing disabled mode
* enable/disable bdev in raid1 read balancing
* export/import file to/from bdev or implement seek_data/hole in NVMe-oF
The first development is necessary for snapshot, the last one for backup/restore and the other three developments are necessary for replica rebuilding.
The snapshot copy has already been discussed with SPDK core maintainers, so an upstream development can be made.
### Limitations
Actual RAID1 implementation is not still complete, so actually we have some limitations:
* read balancing has been developed but is still under review, so it is available only in SPDK Gerrit
* replica rebuild is still under development, so it isn't available. As a consequence of this, actually RAID1 miss the functionality to add a new base bdev to an existing RAID1 bdev.

View File

@ -0,0 +1,126 @@
# Recurring Snapshot Cleanup
## Summary
Currently, Longhorn's recurring job automatically cleans up older snapshots of volumes to retain no more than the defined snapshot number. However, this is limited to the snapshot created by the recurring job. For the non-recurring volume snapshots or snapshots created by backups, the user needs to clean them manually.
Having periodic snapshot cleanup could help to delete/purge those extra snapshots regardless of the creation method.
### Related Issues
https://github.com/longhorn/longhorn/issues/3836
## Motivation
### Goals
Introduce new recurring job types:
- `snapshot-delete`: periodically remove and purge all kinds of snapshots that exceed the retention count.
- `snapshot-cleanup`: periodically purge removable or system snapshots.
### Non-goals [optional]
`None`
## Proposal
- Introduce two new `RecurringJobType`:
- snapshot-delete
- snapshot-cleanup
- Recurring job periodically deletes and purges the snapshots for RecurringJob using the `snapshot-delete` task type. Longhorn will retain snapshots based on the given retain number.
- Recurring job periodically purges the snapshots for RecurringJob using the `snapshot-cleanup` task type.
### User Stories
- The user can create a RecurringJob with `spec.task=snapshot-delete` to instruct Longhorn periodically delete and purge snapshots.
- The user can create a RecurringJob with `spec.task=snapshot-cleanup` to instruct Longhorn periodically purge removable or system snapshots.
### User Experience In Detail
#### Recurring Snapshot Deletion
1. Have some volume backups and snapshots.
1. Create RecurringJob with the `snapshot-delete` task type.
```yaml
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
name: recurring-snap-delete-per-min
namespace: longhorn-system
spec:
concurrency: 1
cron: '* * * * *'
groups: []
labels: {}
name: recurring-snap-delete-per-min
retain: 2
task: snapshot-delete
```
1. Assign the RecurringJob to volume.
1. Longhorn deletes all expired snapshots. As a result of the above example, the user will see two snapshots after the job completes.
#### Recurring Snapshot Cleanup
1. Have some system snapshots.
1. Create RecurringJob with the `snapshot-cleanup` task type.
```yaml
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
name: recurring-snap-cleanup-per-min
namespace: longhorn-system
spec:
concurrency: 1
cron: '* * * * *'
groups: []
labels: {}
name: recurring-snap-cleanup-per-min
task: snapshot-cleanup
```
1. Assign the RecurringJob to volume.
1. Longhorn deletes all expired system snapshots. As a result of the above example, the user will see 0 system snapshot after the job completes.
### API changes
`None`
## Design
### Implementation Overview
#### The RecurringJob `snapshot-delete` Task Type
1. List all expired snapshots (similar to the current `listSnapshotNamesForCleanup` implementation), and use as the [cleanupSnapshotNames](https://github.com/longhorn/longhorn-manager/blob/d20e1ca6e04b229b9823c1a941d865929007874c/app/recurring_job.go#L418) in `doSnapshotCleanup`.
1. Continue with the current implementation to purge snapshots.
#### The RecurringJob `snapshot-cleanup` Task Type
1. Do snapshot purge only in `doSnapshotCleanup`.
### RecurringJob Mutate
1. Mutate the `Recurringjob.Spec.Retain` to 0 when the task type is `snapshot-cleanup` since retain value has no effect on the purge.
### Test plan
#### Test Recurring Snapshot Delete
1. Create volume.
1. Create 2 volume backups.
1. Create 2 volume snapshots.
1. Create a snapshot RecurringJob with the `snapshot-delete` task type.
1. Assign the RecurringJob to volume.
1. Wait until the recurring job is completed.
1. Should see the number of snapshots matching the Recurring job `spec.retain`.
#### Test Recurring Snapshot Cleanup
1. Create volume.
1. Create 2 volume system snapshots, ex: delete replica, online expansion.
1. Create a snapshot RecurringJob with the `snapshot-cleanup` task type.
1. Assign the RecurringJob to volume.
1. Wait until the recurring job is completed.
1. Should see the volume has 0 system snapshots.
### Upgrade strategy
`None`
## Note [optional]
`None`

View File

@ -0,0 +1,167 @@
# Improve Backup and Restore Efficiency using Multiple Threads and Faster Compression Methods
## Summary
Longhorn is capable of backing up or restoring volume in multiple threads and using more efficient compression methods for improving Recovery Time Objective (RTO).
### Related Issues
[https://github.com/longhorn/longhorn/issues/5189](https://github.com/longhorn/longhorn/issues/5189)
## Motivation
### Goals
- Support multi-threaded volume backup and restore.
- Support efficient compression algorithm (`lz4`) and disable compression.
- Support backward compatibility of existing backups compressed by `gzip`.
### Non-goals
- Larger backup block size helps improve the backup efficiency more and decrease the block lookup operations. In the enhancement, the adaptive large backup block size is not supported and will be handled in https://github.com/longhorn/longhorn/issues/5215.
## Proposal
1. Introduce multi-threaded volume backup and restore. Number of backup and restore threads are configurable by uses.
2. Introduce efficient compression methods. By default, the compression method is `lz4`, and user can globally change it to `none` or `gzip`. Additionally, the per-volume compression method can be customized.
3. Existing backups compressed by `gzip` will not be impacted.
## User Stories
Longhorn supports the backup and restore of volumes. Although the underlying computing and storage are powerful, the single thread implementation and low efficiency `gzip` compression method lead to slower backup and restore times and poor RTO.The enhancement aims to increase backup and restore efficiency through the use of multiple threads and efficient compression methods. The new parameters can be configured to accommodate a variety of applications and platforms, such as limiting the number of threads in an edge device or disabling compression for multimedia data.
### User Experience In Details
- For existing volumes that already have backups, the compression method remains `gzip` for backward compatibility. Multi-threaded backups and restores are supported for subsequent backups.
- By default, the global backup compression method is set to `lz4`. By editing the global setting `backup-compression-method`, users can configure the compression method to `none` or `gzip`. The backup compression method can be customized per volume by editing `volume.spec.backupCompressionMethod` for different data format in the volume.
- Number of backup threads per backup is configurable by the global setting `backup-concurrent-limit`.
- Number of restore threads per backup is configurable by the global setting `restore-concurrent-limit`.
- Changing the compression method of a volume having backups is not supported.
### CLI Changes
- Add `compression-method` to longhorn-engine binary `backup create` command.
- Add `concurrent-limit` to longhorn-engine binary `backup create` command.
- Add `concurrent-limit` to longhorn0engine binary `backup restore` command.
### API Changes
- engine-proxy
- Add `compressionMethod` and `concurrentLimit` to EngineSnapshotBackup method.
- Add `concurrentLimit` to `EngineBackupRestore` method.
- syncagent
- Add `compressionMethod` and `concurrentLimit` to syncagent `BackupCreate` method.
- Add `concurrentLimit` to syncagent `BackupRestore` method.
## Design
### Implementation Overview
#### Global Settings
- backup-compression-method
- This setting allows users to specify backup compression method.
- Options:
- `none`: Disable the compression method. Suitable for multimedia data such as encoded images and videos.
- `lz4`: Suitable for text files.
- `gzip`: A bit of higher compression ratio but relatively slow. Not recommended.
- Default: lz4
- backup-concurrent-limit
- This setting controls how many worker threads per backup job concurrently.
- Default: 5
- restore-concurrent-limit
- This setting controls how many worker threads per restore job concurrently.
- Default: 5
#### CRDs
1. Introduce `volume.spec.backupCompressionMethod`
2. Introduce `backup.status.compressionMethod`
#### Backup
A producer-consumer pattern is used to achieve multi-threaded backups. In this implementation, there are one producer and multiple consumers which is controlled by the global setting `backup-concurrent-limit`.
- Producer
- Open the disk file to be backed up and create a `Block` channel.
- Iterate the blocks in the disk file.
- Skip sparse blocks.
- Send the data blocks information including offset and size to `Block` channel.
- Close the `Block` channel after finishing the iteration.
- Consumers
- Block handling goroutines (consumers) are created and consume blocks from the `Block` channel.
- Processing blocks
- Calculate the checksum of the incoming block.
- Check the in-memory `processingBlocks` map to determine whether the block is being processed.
- If YES, end up appending the block to `Blocks` that record the blocks processed in the backup.
- If NO, check the remote backupstore to determine whether the block exists.
- If YES, append the block to `Blocks`.
- If NO, compress the block, upload the block, and end up appending it to the `Blocks`.
- After the blocks have been consumed and the `Block` channel has been closed, the goroutines are terminated.
Then, update the volume and backup metadata files in remote backupstore.
#### Restore
A producer-consumer pattern is used to achieve multi-threaded restores. In this implementation, there are one producer and multiple consumers which is controlled by the global setting `restore-concurrent-limit`.
- Producer
- Create a `Block` channel.
- Open the backup metadata file and get the information, offset, size and checksum, of the blocks.
- Iterate the blocks and send the block information to the `Block` channel.
- Close the `Block` channel after finishing the iteration.
- Consumers
- Block handling goroutines (consumers) are created and consume blocks from the `Block` channel.
- It is necessary for each consumer to open the disk file in order to avoid race conditions between the seek and write operations.
- Read the block data from the backupstore, verify the data integrity and write to the disk file.
- After the blocks have been consumed and the `Block` channel has been closed, the goroutines are terminated.
### Performance Benchmark
In summary, the backup throughput is increased by 15X when using `lz4` and `10` concurrent threads in comparison with the backup in Longhorn v1.4.0. The restore (to a volume with 3 replica) throughput is increased by 140%, and the throughput is limited by the IO bound of the backupstore server.
#### Setup
| | |
|---|---|
| Platform | Equinix |
| Host | Japan-Tokyo/m3.small.x86 |
| CPU | Intel(R) Xeon(R) E-2378G CPU @ 2.80GHz |
| RAM | 64 GiB |
| Disk | Micron_5300_MTFD |
| OS | Ubuntu 22.04.1 LTS(kernel 5.15.0-53-generic) |
| Kubernetes | v1.23.6+rke2r2 |
| Longhorn | master-branch + backup improvement |
| Nodes | 3 nodes |
| Backupstore target | external MinIO S3 (m3.small.x86) |
| Volume | 50 GiB containing 1GB filesystem metadata and 10 GiB random data (3 replicas) |
#### Results
- Single-Threaded Backup and Restore by Different Compression Methods
![Single-Threaded Backup and Restore by Different Compression Methods](image/backup_perf/compression-methods.png)
- Multi-Threaded Backup
![Multi-Threaded Backup](image/backup_perf/multi-threaded-backup.png)
- Multi-Threaded Restore to One Volume with 3 Replicas
![Multi-Threaded Restore to One Volume with 3 Replicas](image/backup_perf/multi-threaded-restore-to-volume-3-replicas.png)
Restore hit the IO bound of the backupstore server, because the throughput is saturated from 5 worker threads.
- Multi-Threaded Restore to One Volume with 1 Replica
![Multi-Threaded Restore to One Volume with 1 Replica](image/backup_perf/multi-threaded-restore-to-volume-1-replica.png)
## Test Plan
### Integration Tests
1. Create a volumes and then create backups using the compression method, `none`, `lz4` or `gzip` and different number of backup threads. The backups should succeed.
2. Restore the backups created in step 1 by different number of restore threads. Verify the data integrity of the disk files.

View File

@ -0,0 +1,70 @@
# SMB/CIFS Backup Store Support
## Summary
Longhorn supports SMB/CIFS share as a backup storage.
### Related Issues
https://github.com/longhorn/longhorn/issues/3599
## Motivation
### Goals
- Support SMB/CIFS share as a backup storage.
## Proposal
- Introduce SMB/CIFS client for supporting SMB/CIFS as a backup storage.
## User Stories
Longhorn already supports NFSv4 and S3 servers as backup storage. However, certain users may encounter compatibility issues with their backup servers, particularly those running on Windows, as the protocols for NFSv4 and S3 are not always supported. To address this issue, the enhancement will enhance support for backup storage options with a focus on the commonly used SMB/CIFS protocol, which is compatible with both Linux and Windows-based servers.
### User Experience In Details
- Check each Longhorn node's kernel supports the CIFS filesystem by
```
cat /boot/config-`uname -r` | grep CONFIG_CIFS
```
- Install the CIFS filesystem user-space tools `cifs-utils` on each Longhorn node
- Users can configure a SMB/CIFS share as a backup storage
- Set **Backup Target**. The path to a SMB/CIFS share is like
```bash
cifs://${IP address}/${share name}
```
- Set **Backup Target Credential Secret**
- Create a secret and deploy it
```yaml
apiVersion: v1
kind: Secret
metadata:
name: cifs-secret
namespace: longhorn-system
type: Opaque
data:
CIFS_USERNAME: ${CIFS_USERNAME}
CIFS_PASSWORD: ${CIFS_PASSWORD}
```
- Set the setting **Backup Target Credential Secret** to `cifs-secret`
## Design
### Implementation Overview
- longhorn-manager
- Introduce the fields `CIFS_USERNAME` and `CIFS_PASSWORD` in credentials. The two fields are passed to engine and replica processes for volume backup and restore operations.
- backupstore
- Implement SMB/CIFS register/unregister and mount/unmount functions
### Test Plan
### Integration Tests
1. Set a SMB/CIFS share as backup storage.
2. Back up volumes to the backup storage and the operation should succeed.
3. Restore backups and the operation should succeed.

View File

@ -0,0 +1,598 @@
# Consolidate Longhorn Instance Managers
## Summary
Longhorn architecture includes engine and replica instance manager pods on each node. After the upgrade, Longhorn adds an additional engine and replica instance manager pods. When the cluster is set with a default request of 12% guaranteed CPU, all instance manager pods will occupy 12% * 4 CPUs per node. Nevertheless, this caused high base resource requirements and is likely unnecessary.
```
NAME STATE E-CPU(CORES) E-MEM(BYTES) R-CPU(CORES) R-MEM(BYTES) CREATED-WORKLOADS DURATION(MINUTES) AGE
demo-0 (no-IO) Complete 8.88m 24Mi 1.55m 43Mi 5 10 22h
demo-0-bs-512b-5g Complete 109.70m 66Mi 36.46m 54Mi 5 10 16h
demo-0-bs-1m-10g Complete 113.16m 65Mi 36.63m 56Mi 5 10 14h
demo-0-bs-5m-10g Complete 114.17m 64Mi 31.37m 54Mi 5 10 42m
```
Aiming to simplify the architecture and free up some resource requests, this document proposes to consolidate the engine and replica instance managers into a single pod. This consolidation will not affect any data plane operations or volume migration. As the engine process is the primary consumer of CPU resources, merging the instance managers will result in a 50% reduction in CPU requests for instance managers. This is because there will only be one instance manager pod for both process types.
### Related Issues
Phase 1:
- https://github.com/longhorn/longhorn/issues/5208
Phase 2:
- https://github.com/longhorn/longhorn/issues/5842
- https://github.com/longhorn/longhorn/issues/5844
## Motivation
### Goals
- Having single instance manager pods to run replica and engine processes.
- After the Longhorn upgrade, the previous engine instance manager should continue to handle data plane operations for attached volumes until they are detached. And the replica instance managers should continue servicing data plane operations until the volume engine is upgraded or volume is detached.
- Automatically clean up any engine/replica instance managers when all instances (process) get removed.
- Online/offline upgrade volume engine should be functional. The replicas will automatically migrate to use the new `aio` (all-in-one) type instance managers, and the `engine` type instance manager will continue to serve until the first volume detachment.
- The Pod Disruption Budget (PDB) handling for cluster auto-scaler and node drain should work as expected.
### Non-goals [optional]
`None`
## Proposal
To ensure uninterrupted upgrades, this enhancement will be implemented in two phases. The existing `engine`/`replica` instance manager may coexist with the consolidated instance manager during the transition.
Phase 1:
- Introduce a new `aio` instance manager type. The `engine` and `replica` instance manager types will be deprecated and continue to serve for the upgraded volumes until the first volume detachment.
- Introduce new `Guaranteed Instance Manager CPU` setting, `Guaranteed Engine Manager CPU` and `Guaranteed Replica Manager CPU` settings will be deprecated and continues to serve for the upgraded volumes until the first volume detachment.
Phase 2:
- Remove all instance manager types.
- Remove the `Guaranteed Engine Manager CPU` and `Guaranteed Replica Manager CPU` settings.
### User Stories
- For freshly installed Longhorn, the user will see `aio` type instance managers.
- For upgraded Longhorn with all volume detached, the user will see the `engine`, and `replica` instance managers removed and replaced by `aio` type instance managers.
- For upgraded Longhorn with volume attached, the user will see existing `engine`, and `replica` instance managers still servicing the old attached volumes and the new `aio` type instance manager servicing new volume attachments.
### User Experience In Detail
#### New Installation
1. User creates and attaches a volume.
```
> kubectl -n longhorn-system get volume
NAME STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
demo-0 attached unknown 21474836480 ip-10-0-1-113 12s
> kubectl -n longhorn-system get lhim
NAME STATE TYPE NODE AGE
instance-manager-8f81ca7c3bf95bbbf656be6ac2d1b7c4 running aio ip-10-0-1-105 124m
instance-manager-7e59c9f2ef7649630344050a8d5be68e running aio ip-10-0-1-102 124m
instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc running aio ip-10-0-1-113 124m
> kubectl -n longhorn-system get lhim/instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc -o yaml
apiVersion: longhorn.io/v1beta2
kind: InstanceManager
metadata:
creationTimestamp: "2023-03-16T10:48:59Z"
generation: 1
labels:
longhorn.io/component: instance-manager
longhorn.io/instance-manager-image: imi-8d41c3a4
longhorn.io/instance-manager-type: aio
longhorn.io/managed-by: longhorn-manager
longhorn.io/node: ip-10-0-1-113
name: instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc
namespace: longhorn-system
ownerReferences:
- apiVersion: longhorn.io/v1beta2
blockOwnerDeletion: true
kind: Node
name: ip-10-0-1-113
uid: 00c0734b-f061-4b28-8071-62596274cb18
resourceVersion: "926067"
uid: a869def6-1077-4363-8b64-6863097c1e26
spec:
engineImage: ""
image: c3y1huang/research:175-lh-im
nodeID: ip-10-0-1-113
type: aio
status:
apiMinVersion: 1
apiVersion: 3
currentState: running
instanceEngines:
demo-0-e-06d4c77d:
spec:
name: demo-0-e-06d4c77d
status:
endpoint: ""
errorMsg: ""
listen: ""
portEnd: 10015
portStart: 10015
resourceVersion: 0
state: running
type: engine
instanceReplicas:
demo-0-r-ca78cab4:
spec:
name: demo-0-r-ca78cab4
status:
endpoint: ""
errorMsg: ""
listen: ""
portEnd: 10014
portStart: 10000
resourceVersion: 0
state: running
type: replica
ip: 10.42.0.238
ownerID: ip-10-0-1-113
proxyApiMinVersion: 1
proxyApiVersion: 4
```
- The engine and replica instances(processes) created in the `aio` type instance manager.
#### Upgrade With Volumes Detached
1. User has a Longhorn v1.4.0 cluster and a volume in the detached state.
```
> kubectl -n longhorn-system get volume
NAME STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
demo-1 detached unknown 21474836480 12s
> kubectl -n longhorn-system get lhim
NAME STATE TYPE NODE AGE
instance-manager-r-1278a39fa6e6d8f49eba156b81ac1f59 running replica ip-10-0-1-113 3m44s
instance-manager-e-1278a39fa6e6d8f49eba156b81ac1f59 running engine ip-10-0-1-113 3m44s
instance-manager-e-45ad195db7f55ed0a2dd1ea5f19c5edf running engine ip-10-0-1-105 3m41s
instance-manager-r-45ad195db7f55ed0a2dd1ea5f19c5edf running replica ip-10-0-1-105 3m41s
instance-manager-e-225a2c7411a666c8eab99484ab632359 running engine ip-10-0-1-102 3m42s
instance-manager-r-225a2c7411a666c8eab99484ab632359 running replica ip-10-0-1-102 3m42s
```
1. User upgraded Longhorn to v1.5.0.
```
> kubectl -n longhorn-system get lhim
NAME STATE TYPE NODE AGE
instance-manager-8f81ca7c3bf95bbbf656be6ac2d1b7c4 running aio ip-10-0-1-105 112s
instance-manager-7e59c9f2ef7649630344050a8d5be68e running aio ip-10-0-1-102 48s
instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc running aio ip-10-0-1-113 47s
```
- Unused `engine` type instance managers removed.
- Unused `replica` type instance managers removed.
- 3 `aio` type instance managers created.
1. User upgraded volume engine.
1. User attaches the volume.
```
> kubectl -n longhorn-system get volume
NAME STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
demo-1 attached healthy 21474836480 ip-10-0-1-113 4m51s
> kubectl -n longhorn-system get lhim
NAME STATE TYPE NODE AGE
instance-manager-8f81ca7c3bf95bbbf656be6ac2d1b7c4 running aio ip-10-0-1-105 3m58s
instance-manager-7e59c9f2ef7649630344050a8d5be68e running aio ip-10-0-1-102 2m54s
instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc running aio ip-10-0-1-113 2m53s
> kubectl -n longhorn-system get lhim/instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc -o yaml
apiVersion: longhorn.io/v1beta2
kind: InstanceManager
metadata:
creationTimestamp: "2023-03-16T13:03:15Z"
generation: 1
labels:
longhorn.io/component: instance-manager
longhorn.io/instance-manager-image: imi-8d41c3a4
longhorn.io/instance-manager-type: aio
longhorn.io/managed-by: longhorn-manager
longhorn.io/node: ip-10-0-1-113
name: instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc
namespace: longhorn-system
ownerReferences:
- apiVersion: longhorn.io/v1beta2
blockOwnerDeletion: true
kind: Node
name: ip-10-0-1-113
uid: 12eb73cd-e9de-4c45-875d-3eff7cfb1034
resourceVersion: "3762"
uid: c996a89a-f841-4841-b69d-4218ed8d8c6e
spec:
engineImage: ""
image: c3y1huang/research:175-lh-im
nodeID: ip-10-0-1-113
type: aio
status:
apiMinVersion: 1
apiVersion: 3
currentState: running
instanceEngines:
demo-1-e-b7d28fb3:
spec:
name: demo-1-e-b7d28fb3
status:
endpoint: ""
errorMsg: ""
listen: ""
portEnd: 10015
portStart: 10015
resourceVersion: 0
state: running
type: engine
instanceReplicas:
demo-1-r-189c1bbb:
spec:
name: demo-1-r-189c1bbb
status:
endpoint: ""
errorMsg: ""
listen: ""
portEnd: 10014
portStart: 10000
resourceVersion: 0
state: running
type: replica
ip: 10.42.0.28
ownerID: ip-10-0-1-113
proxyApiMinVersion: 1
proxyApiVersion: 4
```
- The engine and replica instances(processes) created in the `aio` type instance manager.
#### Upgrade With Volumes Attached
1. User has a Longhorn v1.4.0 cluster and a volume in the attached state.
```
> kubectl -n longhorn-system get volume
NAME STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
demo-2 attached healthy 21474836480 ip-10-0-1-113 35s
> kubectl -n longhorn-system get lhim
NAME STATE TYPE NODE AGE
instance-manager-r-1278a39fa6e6d8f49eba156b81ac1f59 running replica ip-10-0-1-113 2m41s
instance-manager-r-45ad195db7f55ed0a2dd1ea5f19c5edf running replica ip-10-0-1-105 119s
instance-manager-r-225a2c7411a666c8eab99484ab632359 running replica ip-10-0-1-102 119s
instance-manager-e-1278a39fa6e6d8f49eba156b81ac1f59 running engine ip-10-0-1-113 2m41s
instance-manager-e-225a2c7411a666c8eab99484ab632359 running engine ip-10-0-1-102 119s
instance-manager-e-45ad195db7f55ed0a2dd1ea5f19c5edf running engine ip-10-0-1-105 119s
```
1. User upgraded Longhorn to v1.5.0.
```
> kubectl -n longhorn-system get lhim
NAME STATE TYPE NODE AGE
instance-manager-r-1278a39fa6e6d8f49eba156b81ac1f59 running replica ip-10-0-1-113 5m24s
instance-manager-r-45ad195db7f55ed0a2dd1ea5f19c5edf running replica ip-10-0-1-105 4m42s
instance-manager-r-225a2c7411a666c8eab99484ab632359 running replica ip-10-0-1-102 4m42s
instance-manager-e-1278a39fa6e6d8f49eba156b81ac1f59 running engine ip-10-0-1-113 5m24s
instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc running aio ip-10-0-1-113 117s
instance-manager-7e59c9f2ef7649630344050a8d5be68e running aio ip-10-0-1-102 33s
instance-manager-8f81ca7c3bf95bbbf656be6ac2d1b7c4 running aio ip-10-0-1-105 32s
```
- 2 unused `engine` type instance managers removed.
- 3 `aio` type instance managers created.
1. User upgraded online volume engine.
```
> kubectl -n longhorn-system get lhim
NAME STATE TYPE NODE AGE
instance-manager-8f81ca7c3bf95bbbf656be6ac2d1b7c4 running aio ip-10-0-1-105 6m53s
instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc running aio ip-10-0-1-113 8m18s
instance-manager-7e59c9f2ef7649630344050a8d5be68e running aio ip-10-0-1-102 6m54s
instance-manager-e-1278a39fa6e6d8f49eba156b81ac1f59 running engine ip-10-0-1-113 11m
```
- All `replica` type instance manager migrated to `aio` type instance managers.
1. User detached the volume.
```
> kubectl -n longhorn-system get lhim
NAME STATE TYPE NODE AGE
instance-manager-8f81ca7c3bf95bbbf656be6ac2d1b7c4 running aio ip-10-0-1-105 8m38s
instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc running aio ip-10-0-1-113 10m
instance-manager-7e59c9f2ef7649630344050a8d5be68e running aio ip-10-0-1-102 8m39s
```
- The `engine` type instance managers removed.
1. User attached the volume.
```
> kubectl -n longhorn-system get volume
NAME STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
demo-2 attached healthy 21474836480 ip-10-0-1-113 12m
> kubectl -n longhorn-system get lhim
NAME STATE TYPE NODE AGE
instance-manager-7e59c9f2ef7649630344050a8d5be68e running aio ip-10-0-1-102 9m40s
instance-manager-8f81ca7c3bf95bbbf656be6ac2d1b7c4 running aio ip-10-0-1-105 9m39s
instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc running aio ip-10-0-1-113 11m
> kubectl -n longhorn-system get lhim/instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc -o yaml
apiVersion: longhorn.io/v1beta2
kind: InstanceManager
metadata:
creationTimestamp: "2023-03-16T13:12:41Z"
generation: 1
labels:
longhorn.io/component: instance-manager
longhorn.io/instance-manager-image: imi-8d41c3a4
longhorn.io/instance-manager-type: aio
longhorn.io/managed-by: longhorn-manager
longhorn.io/node: ip-10-0-1-113
name: instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc
namespace: longhorn-system
ownerReferences:
- apiVersion: longhorn.io/v1beta2
blockOwnerDeletion: true
kind: Node
name: ip-10-0-1-113
uid: 6d109c40-abe3-42ed-8e40-f76cfc33e4c2
resourceVersion: "4339"
uid: 01556f2c-fbb4-4a15-a778-c73df518b070
spec:
engineImage: ""
image: c3y1huang/research:175-lh-im
nodeID: ip-10-0-1-113
type: aio
status:
apiMinVersion: 1
apiVersion: 3
currentState: running
instanceEngines:
demo-2-e-65845267:
spec:
name: demo-2-e-65845267
status:
endpoint: ""
errorMsg: ""
listen: ""
portEnd: 10015
portStart: 10015
resourceVersion: 0
state: running
type: engine
instanceReplicas:
demo-2-r-a2bd415f:
spec:
name: demo-2-r-a2bd415f
status:
endpoint: ""
errorMsg: ""
listen: ""
portEnd: 10014
portStart: 10000
resourceVersion: 0
state: running
type: replica
ip: 10.42.0.31
ownerID: ip-10-0-1-113
proxyApiMinVersion: 1
proxyApiVersion: 4
```
- The engine and replica instances(processes) created in the `aio` type instance manager.
### API changes
- Introduce new `instanceManagerCPURequest` in `Node` resource.
- Introduce new `instanceEngines` in InstanceManager resource.
- Introduce new `instanceReplicas` in InstanceManager resource.
## Design
### Phase 1: All-in-one Instance Manager Implementation Overview
Introducing a new instance manager type to have Longhorn continue to service existing attached volumes for Longhorn v1.5.x.
#### New Instance Manager Type
- Introduce a new `aio` (all-in-one) instance manager type to differentiate the handling of the old `engine`/`replica` instance managers and the new consolidated instance managers.
- When getting InstanceManagers by instance of the attached volume, retrieve the InstanceManager from the instance manager list using the new `aio` type.
#### InstanceManager `instances` Field Replacement For New InstanceManagers
- New InstanceManagers will use the `instanceEngines` and `instanceReplicas` fields, replacing the `instances` field.
- For the existing InstanceManagers for the attached Volumes, the `instances` field will remain in use.
#### Instance Manager Execution
- Rename the `engine-manager` script to `instance-manager`.
- Bump up version to `4`.
#### New Instance Manager Pod
- Replace `engine` and `replica` pod creation with spec to use for `aio` instance manager pod.
```
> kubectl -n longhorn-system get pod/instance-manager-0d96990c6881c828251c534eb31bfa85 -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
longhorn.io/last-applied-tolerations: '[]'
creationTimestamp: "2023-03-01T08:13:03Z"
labels:
longhorn.io/component: instance-manager
longhorn.io/instance-manager-image: imi-a1873aa3
longhorn.io/instance-manager-type: aio
longhorn.io/managed-by: longhorn-manager
longhorn.io/node: ip-10-0-1-113
name: instance-manager-0d96990c6881c828251c534eb31bfa85
namespace: longhorn-system
ownerReferences:
- apiVersion: longhorn.io/v1beta2
blockOwnerDeletion: true
controller: true
kind: InstanceManager
name: instance-manager-0d96990c6881c828251c534eb31bfa85
uid: 51c13e4f-d0a2-445d-b98b-80cca7080c78
resourceVersion: "12133"
uid: 81397cca-d9e9-48f6-8813-e7f2e2cd4617
spec:
containers:
- args:
- instance-manager
- --debug
- daemon
- --listen
- 0.0.0.0:8500
env:
- name: TLS_DIR
value: /tls-files/
image: c3y1huang/research:174-lh-im
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 3
periodSeconds: 5
successThreshold: 1
tcpSocket:
port: 8500
timeoutSeconds: 4
name: instance-manager
resources:
requests:
cpu: 960m
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /host
mountPropagation: HostToContainer
name: host
- mountPath: /engine-binaries/
mountPropagation: HostToContainer
name: engine-binaries
- mountPath: /host/var/lib/longhorn/unix-domain-socket/
name: unix-domain-socket
- mountPath: /tls-files/
name: longhorn-grpc-tls
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-hkbfc
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: ip-10-0-1-113
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
serviceAccount: longhorn-service-account
serviceAccountName: longhorn-service-account
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- hostPath:
path: /
type: ""
name: host
- hostPath:
path: /var/lib/longhorn/engine-binaries/
type: ""
name: engine-binaries
- hostPath:
path: /var/lib/longhorn/unix-domain-socket/
type: ""
name: unix-domain-socket
- name: longhorn-grpc-tls
secret:
defaultMode: 420
optional: true
secretName: longhorn-grpc-tls
- name: kube-api-access-hkbfc
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-03-01T08:13:03Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2023-03-01T08:13:04Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2023-03-01T08:13:04Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2023-03-01T08:13:03Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://cb249b97d128e47a7f13326b76496656d407fd16fc44b5f1a37384689d0fa900
image: docker.io/c3y1huang/research:174-lh-im
imageID: docker.io/c3y1huang/research@sha256:1f4e86b92b3f437596f9792cd42a1bb59d1eace4196139dc030b549340af2e68
lastState: {}
name: instance-manager
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2023-03-01T08:13:03Z"
hostIP: 10.0.1.113
phase: Running
podIP: 10.42.0.27
podIPs:
- ip: 10.42.0.27
qosClass: Burstable
startTime: "2023-03-01T08:13:03Z"
```
#### Controllers Change
- Map the status of the engine/replica process to the corresponding instanceEngines/instanceReplicas fields in the InstanceManager instead of the instances field. To ensure backward compatibility, the instances field will continue to be utilized by the pre-upgrade attached volume.
- Ensure support for the previous version's attached volumes with the old engine/replica instance manager types.
- Replace the old engine/replica InstanceManagers with the aio type instance manager during replenishment.
#### New Setting
- Introduce a new `Guaranteed Instance Manager CPU` setting for the new `aio` instance manager pod.
- The `Guaranteed Engine Manager CPU` and `Guaranteed Replica Manager CPU` will co-exist with this setting in Longhorn v1.5.x.
### Phase 2 - Deprecations Overview
Based on the assumption when upgrading from v1.5.x to 1.6.x, volumes should have detached at least once and migrated to `aio` type instance managers. Then the cluster should not have volume depending on `engine` and `replica` type instance managers. Therefore in this phase, remove the related types and settings.
#### Old Instance Manager Types
- Remove the `engine`, `replica`, and `aio` instance manager types. There is no need for differentiation.
### Old Settings
- Remove the `Guaranteed Engine Manager CPU` and `Guaranteed Replica Manager CPU` settings. The settings have already been replaced by the `Guaranteed Instance Manager CPU` setting in phase 1.
#### Controllers Change
- Remove support for engine/replica InstanceManager types.
### Test plan
Support new `aio` instance manager type and run regression test cases.
### Upgrade strategy
The `instances` field in the instance manager custom resource will still be utilized by old instance managers of the attached volume.
## Note [optional]
`None`

View File

@ -0,0 +1,77 @@
# Use PDB to protect Longhorn components from drains
## Summary
Some Longhorn components should be available to correctly handle cleanup/detach Longhorn volumes during the draining process.
They are: `csi-attacher`, `csi-provisioner`, `longhorn-admission-webhook`, `longhorn-conversion-webhook`, `share-manager`, `instance-manager`, and daemonset pods in `longhorn-system` namespace.
This LEP outlines our existing solutions to protect these components, the issues of these solutions, and the proposal for improvement.
### Related Issues
https://github.com/longhorn/longhorn/issues/3304
## Motivation
### Goals
1. Have better ways to protect Longhorn components (`csi-attacher`, `csi-provisioner`, `longhorn-admission-webhook`, `longhorn-conversion-webhook`) without demanding the users to specify the draining flags to skip these pods.
## Proposal
1. Our existing solutions to protect these components are:
* For `instance-manager`: dynamically create/delete instance manager PDB
* For Daemonset pods in `longhorn-system` namespace: we advise the users to specify `--ignore-daemonsets` to ignore them in the `kubectl drain` command. This actually follows the [best practice](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/#:~:text=If%20there%20are%20pods%20managed%20by%20a%20DaemonSet%2C%20you%20will%20need%20to%20specify%20%2D%2Dignore%2Ddaemonsets%20with%20kubectl%20to%20successfully%20drain%20the%20node)
* For `csi-attacher`, `csi-provisioner`, `longhorn-admission-webhook`, and `longhorn-conversion-webhook`: we advise the user to specify `--pod-selector` to ignore these pods
1. Proposal for `csi-attacher`, `csi-provisioner`, `longhorn-admission-webhook`, and `longhorn-conversion-webhook`: <br>
The problem with the existing solution is that sometime, users could not specify `--pod-selector` for the `kubectl drain` command.
For example, for the users that are using the project [System Upgrade Controller](https://github.com/rancher/system-upgrade-controller), they don't have option to specify `--pod-selector`.
Also, we would like to have a more automatic way instead of relying on the user to set kubectl drain options.
Therefore, we propose the following design:
* Longhorn manager automatically create PDBs for `csi-attacher`, `csi-provisioner`, `longhorn-admission-webhook`, and `longhorn-conversion-webhook` with `minAvailable` set to 1.
This will make sure that each of these deployment has at least 1 running pod during the draining process.
* Longhorn manager continuously watches the volumes and removes the PDBs once there is no attached volume.
This should work for both single-node and multi-node cluster.
### User Stories
#### Story 1
Before the enhancement, users would need to specify the drain options for drain command to exclude Longhorn pods.
Sometimes, this is not possible when users use third-party solution to drain and upgrade kubernetes, such as System Upgrade Controller.
#### Story 2
### User Experience In Detail
After the enhancement, the user can doesn't need to specify the drain options for the drain command to exclude Longhorn pods.
### API changes
None
## Design
### Implementation Overview
Create a new controller inside Longhorn manager called `longhorn-pdb-controller`, the controller listens for the changes for
`csi-attacher`, `csi-provisioner`, `longhorn-admission-webhook`, `longhorn-conversion-webhook`, and Longhorn volumes to adjust the PDB correspondingly.
### Test plan
https://github.com/longhorn/longhorn/issues/3304#issuecomment-1467174481
### Upgrade strategy
No Upgrade is needed
## Note
In the original Github ticket, we mentioned that we need to add PDB to protect share manager pod from being drained before its workload pods because if share manager pod doesn't exist then its volume cannot be unmounted in the CSI flow.
However, with the fix https://github.com/longhorn/longhorn/issues/5296, we can always umounted the volume even if the share manager is not running.
Therefore, we don't need to protect share manager pod.

View File

@ -0,0 +1,86 @@
# Recurring Filesystem Trim
## Summary
Longhorn currently supports the [filesystem trim](./20221103-filesystem-trim.md) feature, which allows users to reclaim volume disk spaces of deleted files. However, this is a manual process, which can be time-consuming and inconvenient.
To improve user experience, Longhorn could automate the process by implementing a new RecurringJob `filesystem-trim` type. This enhancement enables regularly freeing up unused volume spaces and reducing the need for manual interventions.
### Related Issues
https://github.com/longhorn/longhorn/issues/5186
## Motivation
### Goals
Introduce a new recurring job type called `filesystem-trim` to periodically trim the volume filesystem to reclaim disk spaces.
### Non-goals [optional]
`None`
## Proposal
To extend the RecurringJob custom resource definition by adding new `RecurringJobType: filesystem-trim`.
### User Stories
To schedule regular volume filesystem trims, user can create a RecurringJob with `spec.task=filesystem-trim` and associating it with volumes.
### User Experience In Detail
#### Recurring Filesystem Trim
1. The user sees workload volume size has increased over time.
1. Create RecurringJob with the `filesystem-trim` task type and assign it to the volume.
```yaml
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
name: recurring-fs-trim-per-min
namespace: longhorn-system
spec:
concurrency: 1
cron: '* * * * *'
groups: []
labels: {}
name: recurring-fs-trim-per-min
retain: 0
task: filesystem-trim
```
1. The RecurringJob runs and relaims some volume spaces.
### API changes
`None`
## Design
### Implementation Overview
#### The RecurringJob `filesystem-trim` Task Type
1. Call Volume API `ActionTrimFilesystem` when the RecurringJob type is `filesystem-trim`.
### RecurringJob Mutate
1. Mutate the `Recurringjob.Spec.Retain` to 0 when the task type is `filesystem-trim` as it is not effective for this type of task.
### Test plan
#### Test Recurring Filesystem Trim
1. Create workload.
1. Create a file with some data in the workload.
1. Volume actual size should increase.
1. Delete the file.
1. Volume actual size should not decrease.
1. Create RecurringJob with type `filesystem-trim` and assign to the workload volume.
1. Wait for RecurringJob to complete.
1. Volume actual size should decrease.
### Upgrade strategy
`None`
## Note [optional]
`None`

View File

@ -0,0 +1,269 @@
# Upgrade Path Enforcement
## Summary
Currently, Longhorn does not enforce the upgrade path, even though we claim Longhorn only supports upgrading from the previous stable release, for example, upgrading to 1.5.x is only supported from 1.4.x or 1.5.0.
Without upgrade enforcement, we will allow users to upgrade from any previous version. This will cause extra testing efforts to cover all upgrade paths. Additionally, the goal of this enhancement is to support rollback after upgrade failure and prevent downgrades.
### Related Issues
https://github.com/longhorn/longhorn/issues/5131
## Motivation
### Goals
- Enforce an upgrade path to prevent users from upgrading from any unsupported version. After rejecting the user's upgrade attempt, the user's Longhorn setup should remain intact without any impacts.
- Upgrade Longhorn from the authorized versions to a major release version.
- Support rollback the failed upgrade to the previous version.
- Prevent unexpected downgrade.
### Non-goals
- Automatic rollback if the upgrade failed.
## Proposal
- When upgrading with `kubectl`, it will check the upgrade path at entry point of the pods for `longhorn-manager`, `longhorn-admission-webhook`, `longhorn-conversion-webhook` and `longhorn-recovery-backend`.
- When upgrading with `Helm` or as a `Rancher App Marketplace`, it will check the upgrade path by a `pre-upgrade` job of `Helm hook`
### User Stories
- As the admin, I want to upgrade Longhorn from x.y.* or x.(y+1).0 to x.(y+1).* by `kubectl`, `Helm` or `Rancher App Marketplace`, so that the upgrade should succeed.
- As the admin, I want to upgrade Longhorn from the previous authorized versions to a new major/minor version by `kubectl`, `Helm`, or `Rancher App Marketplace`, so that the upgrade should succeed.
- As the admin, I want to upgrade Longhorn from x.(y-1).* to x.(y+1).* by 'kubectl', 'Helm' or 'Rancher App Marketplace', so that the upgrade should be prevented and the system with the current version continues running w/o any interruptions.
- As the admin, I want to roll back Longhorn from the failed upgrade to the previous install by `kubectl`, `Helm`, or `Rancher App Marketplace`, so that the rollback should succeed.
- As the admin, I want to downgrade Longhorn to any lower version by `kubectl`, `Helm`, or `Rancher App Marketplace`, so that the downgrade should be prevented and the system with the current version continues running w/o any interruptions.
### User Experience In Detail
#### Upgrade Longhorn From x.y.* or x.(y+1).0 To x.(y+1).*
##### Upgrade With `kubectl`
1. Install Longhorn on any Kubernetes cluster by using this command:
```shell
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/vx.y.*/deploy/longhorn.yaml
```
or
```shell
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/vx.(y+1).0/deploy/longhorn.yaml
```
1. After Longhorn works normally, upgrade Longhorn by using this command:
```shell
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/vx.(y+1).*/deploy/longhorn.yaml
```
1. It will be allowed and Longhorn will be upgraded successfully.
##### Upgrade With `Helm` Or `Rancher App Marketplace`
1. Install Longhorn x.y.* or x.(y+1).0 with Helm as [Longhorn Install with Helm document](https://longhorn.io/docs/1.4.1/deploy/install/install-with-helm/) or install Longhorn x.y.* or x.(y+1).0 with a Rancher Apps as [Longhorn Install as a Rancher Apps & Marketplace document](https://longhorn.io/docs/1.4.1/deploy/install/install-with-rancher/)
1. Upgrade to Longhorn x.(y+1).* with Helm as [Longhorn Upgrade with Helm document](https://longhorn.io/docs/1.4.1/deploy/upgrade/longhorn-manager/#upgrade-with-helm) or upgrade to Longhorn x.(y+1).* with a Rancher Catalog App as [Longhorn Upgrade as a Rancher Apps & Marketplace document](https://longhorn.io/docs/1.4.1/deploy/upgrade/longhorn-manager/#upgrade-as-a-rancher-catalog-app)
1. It will be allowed and Longhorn will be upgraded successfully.
#### Upgrade Longhorn From The Authorized Versions To A Major Release Version
##### Upgrade With `kubectl`
1. Install Longhorn on any Kubernetes cluster by using this command:
```shell
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/vx.y.*/deploy/longhorn.yaml
```
1. After Longhorn works normally, upgrade Longhorn by using this command:
```shell
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v(x+1).0.*/deploy/longhorn.yaml
```
1. It will be allowed and Longhorn will be upgraded successfully.
##### Upgrade With `Helm` Or `Rancher App Marketplace`
1. Install Longhorn x.y.* with Helm such as [Longhorn Install with Helm document](https://longhorn.io/docs/1.4.1/deploy/install/install-with-helm/) or install Longhorn x.y.* with a Rancher Apps as [Longhorn Install as a Rancher Apps & Marketplace document](https://longhorn.io/docs/1.4.1/deploy/install/install-with-rancher/)
1. Upgrade to Longhorn (x+1).0.* with Helm as [Longhorn Upgrade with Helm document](https://longhorn.io/docs/1.4.1/deploy/upgrade/longhorn-manager/#upgrade-with-helm) or upgrade to Longhorn (x+1).0.* with a Rancher Catalog App as [Longhorn Upgrade as a Rancher Apps & Marketplace document](https://longhorn.io/docs/1.4.1/deploy/upgrade/longhorn-manager/#upgrade-as-a-rancher-catalog-app)
1. It will be allowed and Longhorn will be upgraded successfully.
#### Upgrade Longhorn From x.(y-1).* To x.(y+1).*
##### Upgrade With `kubectl`
1. Install Longhorn on any Kubernetes cluster by using this command:
```shell
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/vx.(y-1).*/deploy/longhorn.yaml
```
1. After Longhorn works normally, upgrade Longhorn by using this command:
```shell
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/vx.(y+1).*/deploy/longhorn.yaml
```
1. It will be not allowed and Longhorn will block the upgrade for `longhorn-manager`, `longhorn-admission-webhook`, `longhorn-conversion-webhook` and `longhorn-recovery-backend`.
1. Users need to roll back Longhorn manually to restart `longhorn-manager` pods.
##### Upgrade With `Helm` Or `Rancher App Marketplace`
1. Install Longhorn x.(y-1).* with Helm as [Longhorn Install with Helm document](https://longhorn.io/docs/1.4.1/deploy/install/install-with-helm/) or install Longhorn x.(y-1).* with a Rancher Apps as [Longhorn Install as a Rancher Apps & Marketplace document](https://longhorn.io/docs/1.4.1/deploy/install/install-with-rancher/)
1. Upgrade to Longhorn x.(y+1).* with Helm as [Longhorn Upgrade with Helm document](https://longhorn.io/docs/1.4.1/deploy/upgrade/longhorn-manager/#upgrade-with-helm) or upgrade to Longhorn x.(y+1).* with a Rancher Catalog App as [Longhorn Upgrade as a Rancher Apps & Marketplace document](https://longhorn.io/docs/1.4.1/deploy/upgrade/longhorn-manager/#upgrade-as-a-rancher-catalog-app)
1. It will not be allowed and a `pre-upgrade`job of `Helm hook` failed makes the whole helm upgrading process failed.
1. Longhorn is intact and continues serving.
#### Roll Back Longhorn From The Failed Upgrade To The Previous Install
##### Roll Back With `kubectl`
1. Users need to recover Longhorn by using this command again:
```shell
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/[previous installed version]/deploy/longhorn.yaml
```
1. Longhorn will be rolled back successfully.
1. And users might need to delete new components introduced by new version Longhorn manually.
##### Roll Back With `Helm` Or `Rancher App Marketplace`
1. Users need to recover Longhorn with `Helm` by using commands:
```shell
helm history longhorn # to get previous installed Longhorn REVISION
helm rollback longhorn [REVISION]
```
or
```shell
helm upgrade longhorn longhorn/longhorn --namespace longhorn-system --version [previous installed version]
```
1. Users need to recover Longhorn with `Rancher Catalog Apps` by upgrading the previous installed Longhorn version at `Rancher App Marketplace` again.
1. Longhorn will be rolled back successfully.
##### Manually Cleanup Example
When users try to upgrade Longhorn from v1.3.x to v1.5.x, a new deployment `longhorn-recovery-backend` will be introduced and the upgrade will fail.
Users need to delete the deployment `longhorn-recovery-backend` manually after rolling back Longhorn
#### Downgrade Longhorn To Any Lower Version
##### Downgrade With `kubectl`
1. Install Longhorn on any Kubernetes cluster by using this command:
```shell
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/vx.y.*/deploy/longhorn.yaml
```
1. After Longhorn works normally, upgrade Longhorn by using this command:
```shell
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/vx.(y-z).*/deploy/longhorn.yaml
```
1. It will be not allowed and Longhorn will block the downgrade for `longhorn-manager`. [or `longhorn-admission-webhook`, `longhorn-conversion-webhook` and `longhorn-recovery-backend` if downgrading version had these components]
1. Users need to roll back Longhorn manually to restart `longhorn-manager` pods.
##### Downgrade With `Helm` Or `Rancher App Marketplace`
1. Install Longhorn x.y.* with Helm as [Longhorn Install with Helm document](https://longhorn.io/docs/1.4.1/deploy/install/install-with-helm/) or install Longhorn x.y.* with a Rancher Apps as [Longhorn Install as a Rancher Apps & Marketplace document](https://longhorn.io/docs/1.4.1/deploy/install/install-with-rancher/)
1. Downgrade to Longhorn (x-z).y.* or x.(y-z).* with Helm as [Longhorn Upgrade with Helm document](https://longhorn.io/docs/1.4.1/deploy/upgrade/longhorn-manager/#upgrade-with-helm) or downgrade to Longhorn (x-z).y.* or x.(y-z).* with a Rancher Catalog App as [Longhorn Upgrade as a Rancher Apps & Marketplace document](https://longhorn.io/docs/1.4.1/deploy/upgrade/longhorn-manager/#upgrade-as-a-rancher-catalog-app)
1. It will not be allowed and a `pre-upgrade`job of `Helm hook` failed makes the whole helm downgrading process failed.
1. Longhorn is intact and continues serving.
### API changes
`None`
## Design
### Implementation Overview
#### Blocking Upgrade With `kubectl`
Check the upgrade path is supported or not at entry point of the `longhorn-manager`, `longhorn-admission-webhook`, `longhorn-conversion-webhook` and `longhorn-recovery-backend`
1. Get Longhorn current version `currentVersion` by the function `GetCurrentLonghornVersion`
1. Get Longhorn upgrading version `upgradeVersion` from `meta.Version`
1. Compare currentVersion and upgradeVersion, only allow authorized version upgrade (e.g., 1.3.x to 1.5.x is not allowed) as following table.
| currentVersion | upgradeVersion | Allow |
| :-: | :-: | :-: |
| x.y.* | x.(y+1).* | ✓ |
| x.y.0 | x.y.* | ✓ |
| x.y.* | (x+1).y.* | ✓ |
| x.(y-1).* | x.(y+1).* | X |
| x.(y-2).* | x.(y+1).* | X |
| x.y.* | x.(y-1).* | X |
| x.y.* | x.y.(*-1) | X |
1. Downgrade is not allowed.
2. When the upgrade path is not supported, new created pods of the `longhorn-manager`, `longhorn-admission-webhook`, `longhorn-conversion-webhook` and `longhorn-recovery-backend` will show logs and broadcast events for the upgrade path is not supported and return errors.
3. Previous installed Longhorn will work normally still.
#### Blocking Upgrade With `Helm` Or `Rancher App Marketplace`
1. Add a new job for pre-upgrade hook of `Helm` as the [`post-upgrade` job](https://github.com/longhorn/longhorn/blob/master/chart/templates/postupgrade-job.yaml).
```txt
apiVersion: batch/v1
kind: Job
metadata:
annotations:
"helm.sh/hook": pre-upgrade
"helm.sh/hook-delete-policy": hook-succeeded,before-hook-creation,hook-failed
name: longhorn-pre-upgrade
...
spec:
...
template:
metadata:
name: longhorn-pre-upgrade
...
spec:
containers:
- name: longhorn-post-upgrade
...
command:
- longhorn-manager
- pre-upgrade
env:
...
```
1. When upgrading starts, the `pre-upgrade` job will start to run firstly and it will be failed if the upgrade path is not supported then `Helm` upgrading process will be failed.
### Test plan
#### Test Supported Upgrade Path
1. Install Longhorn v1.4.x.
1. Wait for all pods ready.
1. Create a Volume and write some data.
1. Upgrade to Longhorn v1.5.0.
1. Wait for all pods upgraded successfully.
1. Check if data is not corrupted.
#### Test Unsupported Upgrade Path
1. Install Longhorn v1.3.x.
1. Wait for all pods ready.
1. Create a Volume and write some data.
1. Upgrade to Longhorn v1.5.0.
1. Upgrading process will be stuck or failed.
1. Check if data is not corrupted.
1. Rollback to Longhorn v1.3.x with the same setting.
1. Longhorn v1.3.x will work normally.
### Upgrade strategy
`None`
## Note
`None`

View File

@ -0,0 +1,532 @@
# Title
Extend CSI snapshot to support Longhorn BackingImage
## Summary
In Longhorn, we have BackingImage for VM usage. We would like to extend the CSI Snapshotter to support BackingImage management.
### Related Issues
[BackingImage Management via VolumeSnapshot #5005](https://github.com/longhorn/longhorn/issues/5005)
## Motivation
### Goals
Extend the CSI snapshotter to support:
- Create Longhorn BackingImage
- Delete Longhorn BackingImage
- Creating a new PVC from CSI snapshot that is associated with a Longhorn BackingImage
### Non-goals [optional]
- Can support COW over each relative base image for delta data transfer for better space efficiency. (Will be in next improvement)
- User can backup a BackingImage based volume and restore it in another cluster without manually preparing BackingImage in a new cluster.
## Proposal
### User Story
With this improvement, users can use standard CSI VolumeSnapshot as the unified interface for BackingImage creation, deletion and restoration of a Volume.
### User Experience In Detail
To use this feature, users need to deploy the CSI snapshot CRDs and related Controller
1. The instructions are already on our document: https://longhorn.io/docs/1.4.1/snapshots-and-backups/csi-snapshot-support/enable-csi-snapshot-support/
2. Create a VolumeSnapshotClass with type `bi` which refers to BackingImage
```yaml
kind: VolumeSnapshotClass
apiVersion: snapshot.storage.k8s.io/v1
metadata:
name: longhorn-snapshot-vsc
driver: driver.longhorn.io
deletionPolicy: Delete
parameters:
type: bi
export-type: qcow2 # default to raw if it is not provided
```
#### BackingImage creation via VolumenSnapshot resource
Users can create a BackingImage of a Volume by creation of VolumeSnapshot. Example below for a Volume named `test-vol`
```yaml
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
name: test-snapshot-backing
spec:
volumeSnapshotClassName: longhorn-snapshot-vsc
source:
persistentVolumeClaimName: test-vol
```
Longhorn will create a BackingImage **exported** from this Volume.
#### Restoration via VolumeSnapshot resource
Users can create a volume based on a prior created VolumeSnapshot. Example below for a Volume named `test-vol-restore`
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-vol-restore
spec:
storageClassName: longhorn
dataSource:
name: test-snapshot-backing
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
```
Longhorn will create a Volume based on the BackingImage associated with the VolumeSnapshot.
#### Restoration of an existing BackingImage (pre-provisioned)
Users can request the creation of a Volume based on a prior BackingImage which was not created via the CSI VolumeSnapshot.
With the BackingImage already existing, users need to create the VolumeSnapshotContent with an associated VolumeSnapshot. The `snapshotHandle` of the VolumeSnapshotContent needs to point to an existing BackingImage. Example below for a Volume named `test-restore-existing-backing` and an existing BackingImage `test-bi`
- For pre-provisioning, users need to provide following query parameters:
- `backingImageDataSourceType`: `sourceType` of existing BackingImage, e.g. `export-from-volume`, `download`
- `backingImage`: Name of the BackingImage
- you should also provide the `sourceParameters` of existing BackingImage in the `snapshotHandle` for validation.
- `export-from-volume`: you should provide
- `volume-name`
- `export-type`
- `download`: you should proviide
- `url`
- `checksum`: optional
```yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
name: test-existing-backing
spec:
volumeSnapshotClassName: longhorn-snapshot-vsc
driver: driver.longhorn.io
deletionPolicy: Delete
source:
# NOTE: change this to point to an existing BackingImage in Longhorn
snapshotHandle: bi://backing?backingImageDataSourceType=export-from-volume&backingImage=test-bi&volume-name=vol-export-src&export-type=qcow2
volumeSnapshotRef:
name: test-snapshot-existing-backing
namespace: default
```
```yaml
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
name: test-snapshot-existing-backing
spec:
volumeSnapshotClassName: longhorn-snapshot-vsc
source:
volumeSnapshotContentName: test-existing-backing
```
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-restore-existing-backing
spec:
storageClassName: longhorn
dataSource:
name: test-snapshot-existing-backing
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
```
Longhorn will create a Volume based on the BackingImage associated with the VolumeSnapshot and the VolumeSnapshotContent.
#### Restoration of a non-existing BackingImage (on-demand provision)
Users can request the creation of a Volume based on a BackingImage which was not created yet with following 2 kinds of data sources.
1. `download`: Download a file from a URL as a BackingImage.
2. `export-from-volume`: Export an existing in-cluster volume as a backing image.
Users need to create the VolumeSnapshotContent with an associated VolumeSnapshot. The `snapshotHandle` of the VolumeSnapshotContent needs to provide the parameters for the data source. Example below for a volume named `test-on-demand-backing` and an non-existing BackingImage `test-bi` with two different data sources.
1. `download`: Users need to provide following parameters
- `backingImageDataSourceType`: `download` for on-demand download.
- `backingImage`: Name of the BackingImage
- `url`: The file from a URL as a BackingImage.
- `backingImageChecksum`: Optional. Used for checking the checksum of the file.
- example yaml:
```yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
name: test-on-demand-backing
spec:
volumeSnapshotClassName: longhorn-snapshot-vsc
driver: driver.longhorn.io
deletionPolicy: Delete
source:
# NOTE: change this to provide the correct parameters
snapshotHandle: bi://backing?backingImageDataSourceType=download&backingImage=test-bi&url=https%3A%2F%2Flonghorn-backing-image.s3-us-west-1.amazonaws.com%2Fparrot.qcow2&backingImageChecksum=bd79ab9e6d45abf4f3f0adf552a868074dd235c4698ce7258d521160e0ad79ffe555b94e7d4007add6e1a25f4526885eb25c53ce38f7d344dd4925b9f2cb5d3b
volumeSnapshotRef:
name: test-snapshot-on-demand-backing
namespace: default
```
2. `export-from-volume`: Users need to provide following parameters
- `backingImageDataSourceType`: `export-form-volume` for on-demand export.
- `backingImage`: Name of the BackingImage
- `volume-name`: Volume to be exported for the BackingImage
- `export-type`: Currently Longhorn supports `raw` or `qcow2`
- example yaml:
```yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
name: test-on-demand-backing
spec:
volumeSnapshotClassName: longhorn-snapshot-vsc
driver: driver.longhorn.io
deletionPolicy: Delete
source:
# NOTE: change this to provide the correct parameters
snapshotHandle: bi://backing?backingImageDataSourceType=export-from-volume&backingImage=test-bi&volume-name=vol-export-src&export-type=qcow2
volumeSnapshotRef:
name: test-snapshot-on-demand-backing
namespace: default
```
Users then can create corresponding VolumeSnapshot and PVC
```yaml
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
name: test-snapshot-on-demand-backing
spec:
volumeSnapshotClassName: longhorn-snapshot-vsc
source:
# NOTE: change this to point to the prior VolumeSnapshotContent
volumeSnapshotContentName: test-on-demand-backing
```
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-restore-on-demand-backing
spec:
storageClassName: longhorn
dataSource:
name: test-snapshot-on-demand-backing
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
```
### API changes
No changes necessary
## Design
### Implementation Overview
We add a new type `bi` to the parameter `type` in the VolumeSnapshotClass. It means that the CSI VolumeSnapshot created with this VolumeSnapshotClass is associated with a Longhorn BackingImage.
#### CreateSnapshot function
When the users create VolumeSnapshot and the volumeSnapshotClass `type` is `bi`
```yaml
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
name: test-snapshot-backing
spec:
volumeSnapshotClassName: longhorn-snapshot-vsc
source:
persistentVolumeClaimName: test-vol
```
We do:
- Get the name of the Volume
- The name of the BackingImage will be same as the VolumeSnapshot `test-snapshot-backing`.
- Check if a BackingImage with the same name as the requested VolumeSnapshot already exists. Return success without creating a new BackingImage.
- Create a BackingImage.
- Get `export-type` from VolumeSnapshotClass parameter `export-type`, default to `raw.`
- Encode the `snapshotId` as `bi://backing?backingImageDataSourceType=export-from-volume&backingImage=test-snapshot-backing&volume-name=${VolumeName}&export-type=raw`
- This `snaphotId` will be used in the later CSI CreateVolume and DeleteSnapshot call.
#### CreateVolume function
- If VolumeSource type is `VolumeContentSource_Snapshot`, decode the `snapshotId` to get the parameters.
- `bi://backing?backingImageDataSourceType=${TYPE}&backingImage=${BACKINGIMAGE_NAME}&backingImageChecksum=${backingImageChecksum}&${OTHER_PARAMETES}`
- If BackingImage with the given name already exists, create the volume.
- If BackingImage with the given name does not exists, we prepare it first. There are 2 kinds of types which are `export-from-volume` and `download`.
- For `download`, it means we have to prepare the BackingImage before creating the Volume. We first decode other parameters from `snapshotId` and create the BackingImage.
- For `export-from-volume`, it means we have to prepare the BackingImage before creating the Volume. We first decode other parameters from `snapshotId` and create the BackingImage.
NOTE: we already have related code for preparing the BackingImage with type `download` or `export-from-volume` before creating a Volume, [here](https://github.com/longhorn/longhorn-manager/blob/master/csi/controller_server.go#L195)
#### DeleteSnapshot function
- Decode the `snapshotId` to get the name of the BackingImage. Then we delete the BackingImage directly.
### Test plan
Integration test plan.
#### Prerequisite
1. Deploy the csi snapshot CRDs, Controller as instructed at
https://longhorn.io/docs/1.4.1/snapshots-and-backups/csi-snapshot-support/enable-csi-snapshot-support/
2. Create a VolumeSnapshotClass with type `bi`
```yaml
# Use v1 as an example
kind: VolumeSnapshotClass
apiVersion: snapshot.storage.k8s.io/v1
metadata:
name: longhorn-snapshot-vsc
driver: driver.longhorn.io
deletionPolicy: Delete
parameters:
type: bi
```
#### Scenerios 1: Create VolumeSnapshot from a Volume
- Success
1. Create a Volume `test-vol` of 5GB. Create PV/PVC for the Volume.
2. Create a workload using the Volume. Write some data to the Volume.
3. Create a VolumeSnapshot with following yaml:
```yaml
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
name: test-snapshot-backing
spec:
volumeSnapshotClassName: longhorn-snapshot-vsc
source:
persistentVolumeClaimName: test-vol
```
4. Verify that BacingImage is created.
- Verify the properties of BackingImage
- `sourceType` is `export-from-volume`
- `volume-name` is `test-vol`
- `export-type` is `raw`
5. Delete the VolumeSnapshot `test-snapshot-backing`
6. Verify the BacingImage is deleted
#### Scenerios 2: Create new Volume from CSI snapshot
1. Create a Volume `test-vol` of 5GB. Create PV/PVC for the Volume.
2. Create a workload using the Volume. Write some data to the Volume.
3. Create a VolumeSnapshot with following yaml:
```yaml
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
name: test-snapshot-backing
spec:
volumeSnapshotClassName: longhorn-snapshot-vsc
source:
persistentVolumeClaimName: test-vol
```
4. Verify that BacingImage is created.
5. Create a new PVC with following yaml:
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-restore-pvc
spec:
storageClassName: longhorn
dataSource:
name: test-snapshot-backing
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
```
5. Attach the PVC `test-restore-pvc` to a workload and verify the data
6. Delete the PVC
#### Scenerios 3: Restore pre-provisioned BackingImage
1. Create a BackingImage `test-bi` using longhorn test raw image `https://longhorn-backing-image.s3-us-west-1.amazonaws.com/parrot.qcow2`
2. Create a VolumeSnapshotContent with `snapshotHandle` pointing to BackingImage `test-bi` and provide the parameters.
```yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
name: test-existing-backing
spec:
volumeSnapshotClassName: longhorn-snapshot-vsc
driver: driver.longhorn.io
deletionPolicy: Delete
source:
snapshotHandle: bi://backing?backingImageDataSourceType=download&backingImage=test-bi&url=https%3A%2F%2Flonghorn-backing-image.s3-us-west-1.amazonaws.com%2Fparrot.qcow2&backingImageChecksum=bd79ab9e6d45abf4f3f0adf552a868074dd235c4698ce7258d521160e0ad79ffe555b94e7d4007add6e1a25f4526885eb25c53ce38f7d344dd4925b9f2cb5d3b
volumeSnapshotRef:
name: test-snapshot-existing-backing
namespace: default
```
3. Create a VolumeSnapshot associated with the VolumeSnapshotContent
```yaml
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
name: test-snapshot-existing-backing
spec:
volumeSnapshotClassName: longhorn-snapshot-vsc
source:
volumeSnapshotContentName: test-existing-backing
```
4. Create a PVC with the following yaml
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-restore-existing-backing
spec:
storageClassName: longhorn
dataSource:
name: test-snapshot-existing-backing
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
```
5. Attach the PVC `test-restore-existing-backing` to a workload and verify the data
#### Scenerios 4: Restore on-demand provisioning BackingImage
- Type `download`
1. Create a VolumeSnapshotContent with `snapshotHandle` providing the required parameters and BackingImage name `test-bi`
```yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
name: test-on-demand-backing
spec:
volumeSnapshotClassName: longhorn-snapshot-vsc
driver: driver.longhorn.io
deletionPolicy: Delete
source:
snapshotHandle: bi://backing?backingImageDataSourceType=download&backingImage=test-bi&url=https%3A%2F%2Flonghorn-backing-image.s3-us-west-1.amazonaws.com%2Fparrot.qcow2&backingImageChecksum=bd79ab9e6d45abf4f3f0adf552a868074dd235c4698ce7258d521160e0ad79ffe555b94e7d4007add6e1a25f4526885eb25c53ce38f7d344dd4925b9f2cb5d3b
volumeSnapshotRef:
name: test-snapshot-on-demand-backing
namespace: default
```
2. Create a VolumeSnapshot associated with the VolumeSnapshotContent
```yaml
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
name: test-snapshot-on-demand-backing
spec:
volumeSnapshotClassName: longhorn-snapshot-vsc
source:
volumeSnapshotContentName: test-on-demand-backing
```
3. Create a PVC with the following yaml
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-restore-on-demand-backing
spec:
storageClassName: longhorn
dataSource:
name: test-snapshot-on-demand-backing
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
```
4. Verify BackingImage `test-bi` is created
5. Attach the PVC `test-restore-on-demand-backing` to a workload and verify the data
- Type `export-from-volume`
- Success
1. Create a Volme `test-vol` and write some data to it.
2. Create a VolumeSnapshotContent with `snapshotHandle` providing the required parameters and BackingImage name `test-bi`
```yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
name: test-on-demand-backing
spec:
volumeSnapshotClassName: longhorn-snapshot-vsc
driver: driver.longhorn.io
deletionPolicy: Delete
source:
snapshotHandle: bi://backing?backingImageDataSourceType=export-from-volume&backingImage=test-bi&volume-name=test-vol&export-type=qcow2
volumeSnapshotRef:
name: test-snapshot-on-demand-backing
namespace: default
```
2. Create a VolumeSnapshot associated with the VolumeSnapshotContent
```yaml
apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
name: test-snapshot-on-demand-backing
spec:
volumeSnapshotClassName: longhorn-snapshot-vsc
source:
volumeSnapshotContentName: test-on-demand-backing
```
3. Create a PVC with the following yaml
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-restore-on-demand-backing
spec:
storageClassName: longhorn
dataSource:
name: test-snapshot-on-demand-backing
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
```
4. Verify BackingImage `test-bi` is created
5. Attach the PVC `test-restore-on-demand-backing` to a workload and verify the data
### Upgrade strategy
No upgrade strategy needed
## Note [optional]
We need to update the docs and examples to reflect the new type of parameter `type` in the VolumeSnapshotClass.

View File

@ -0,0 +1,68 @@
# Azure Blob Storage Backup Store Support
## Summary
Longhorn supports Azure Blob Storage as a backup storage.
### Related Issues
https://github.com/longhorn/longhorn/issues/1309
## Motivation
### Goals
- Support Azure Blob Storage as a backup storage.
## Proposal
- Introduce Azure Blob Storage client for supporting Azure Blob Storage as a backup storage.
## User Stories
Longhorn already supports NFSv4, CIFS and S3 servers as backup storage. However, certain users may still want to be able to utilize Azure blob storage to push/pull backups to/from.
### User Experience In Details
- Users can configure a Azure Blob Storage as a backup storage
- Set **Backup Target**. The path to a Azure Blob Storage is like
```bash
azblob://${container}@blob.core.windows.net/${path name}
```
- Set **Backup Target Credential Secret**
- Create a secret and deploy it
```yaml
apiVersion: v1
kind: Secret
metadata:
name: azblob-secret
namespace: longhorn-system
type: Opaque
data:
AZBLOB_ACCOUNT_NAME: ${AZBLOB_ACCOUNT_NAME}
AZBLOB_ACCOUNT_KEY: ${AZBLOB_ACCOUNT_KEY}
```
- Set the setting **Backup Target Credential Secret** to `azblob-secret`
## Design
### Implementation Overview
- longhorn-manager
- Introduce the fields `AZBLOB_ACCOUNT_NAME` and `AZBLOB_ACCOUNT_KEY` in credentials. The two fields are passed to engine and replica processes for volume backup and restore operations.
- backupstore
- Implement Azure Blob Storage register/unregister and basic CRUD functions.
## Test Plan
### Integration Tests
1. Set a Azure Blob Storage as backup storage.
2. Create volumes and write some data.
3. Back up volumes to the backup storage and the operation should succeed.
4. Restore backups and operations should succeed.
5. All data is not corrupted.

View File

@ -0,0 +1,430 @@
# Engine Identity Validation
## Summary
Longhorn-manager communicates with longhorn-engine's gRPC ControllerService, ReplicaService, and SyncAgentService by
sending requests to TCP/IP addresses kept up-to-date by its various controllers. Additionally, the longhorn-engine
controller server sends requests to the longhorn-engine replica server's ReplicaService and SyncAgentService using
TCP/IP addresses it keeps in memory. These addresses are relatively stable in normal operation. However, during periods
of high process turnover (e.g. a node reboot or network event), it is possible for one longhorn-engine component to stop
and another longhorn-engine component to start in its place using the same ports. If this happens quickly enough, other
components with stale address lists attempting to execute requests against the old component may errantly execute
requests against the new component. One harmful effect of this behavior that has been observed is the [expansion of an
unintended longhorn-engine replica](https://github.com/longhorn/longhorn/issues/5709).
This proposal intends to ensure all gRPC requests to longhorn-engine components are actually served by the intended
component.
### Related Issues
https://github.com/longhorn/longhorn/issues/5709
## Motivation
### Goals
- Eliminate the potential for negative effects caused by a Longhorn component communicating with an incorrect
longhorn-engine component.
- Provide effective logging when incorrect communication occurs to aide in fixing TCP/IP address related race
conditions.
### Non-goals
- Fix race conditions within the Longhorn control plane that lead to attempts to communicate with an incorrect
longhorn-engine component.
- Refactor the in-memory data structures the longhorn-engine controller server uses to keep track of and initiate
communication with replicas.
## Proposal
Today, longhorn-manager knows the volume name and instance name of the process it is trying to communicate with, but it
only uses the TCP/IP information of each process to initiate communication. Additionally, longhorn-engine components are
mostly unaware of the volume name (in the case of longhorn-engine's replica server) and instance name (for both
longhorn-engine controller and replica servers) they are associated with. If we provide this information to
longhorn-engine processes when we start them and then have longhorn-manager provide it on every communication attempt,
we can ensure no accidental communication occurs.
1. Add additional flags to the longhorn-engine CLI that inform controller and replica servers of their associated volume
and/or instance name.
1. Use [gRPC client interceptors](https://github.com/grpc/grpc-go/blob/master/examples/features/interceptor/README.md)
to automatically inject [gRPC metadata](https://github.com/grpc/grpc-go/blob/master/Documentation/grpc-metadata.md)
(i.e. headers) containing volume and/or instance name information every time a gRPC request is made by a
longhorn-engine client to a longhorn-engine server.
1. Use [gRPC server interceptors](https://github.com/grpc/grpc-go/blob/master/examples/features/interceptor/README.md)
to automatically validate the volume and/or instance name information in [gRPC
metadata](https://github.com/grpc/grpc-go/blob/master/Documentation/grpc-metadata.md) (i.e. headers) every time a
gRPC request made by a longhorn-engine client is received by a longhorn-engine server.
1. Reject any request (with an appropriate error code) if the provided information does not match the information a
controller or replica server was launched with.
1. Log the rejection at the client and the server, making it easy to identify situations in which incorrect
communication occurs.
1. Modify instance-manager's `ProxyEngineService` (both server and client) so that longhorn-manager can provide the
necessary information for gRPC metadata injection.
1. Modify longhorn-manager so that is makes proper use of the new `ProxyEngineService` client and launches
longhorn-engine controller and replica servers with additional flags.
### User Stories
#### Story 1
Before this proposal:
As an administrator, after an intentional or unintentional node reboot, I notice one or more of my volumes is degraded
and new or existing replicas aren't coming online. In some situations, the UI reports confusing information or one or
more of my volumes might be unable to attach at all. Digging through logs, I see errors related to mismatched sizes, and
at least one replica does appear to have a larger size reported in `volume.meta` than others. I don't know how to
proceed.
After this proposal:
As an administrator, after an intentional or unintentional node reboot, my volumes work as expected. If I choose to dig
through logs, I may see some messages about refused requests to incorrect components, but this doesn't seem to
negatively affect anything.
#### Story 2
Before this proposal:
As a developer, I am aware that it is possible for one Longhorn component to communicate with another, incorrect
component, and that this communication can lead to unexpected replica expansion. I want to work to fix this behavior.
However, when I look at a support bundle, it is very hard to catch this communication occurring. I have to trace TCP/IP
addresses through logs, and if no negative effects are caused, I may never notice it.
After this proposal:
Any time one Longhorn component attempts to communicate with another, incorrect component, it is clearly represented in
the logs.
### User Experience In Detail
See the user stories above. This enhancement is intended to be largely transparent to the user. It should eliminate rare
failures so that users can't run into them.
### API Changes
#### Longhorn-Engine
Increment the longhorn-engine CLIAPIVersion by one. Do not increment the longhorn-engine CLIAPIMinVersion. The changes
in this LEP are backwards compatible. All gRPC metadata validation is by demand of the client. If a less sophisticated
(not upgraded) client does not inject any metadata, the server performs no validation. If a less sophisticated (not
upgraded) client only injects some metadata (e.g. `volume-name` but not `instance-name`), the server only validates the
metadata provided.
Add a global `volume-name` flag and a global `engine-instance-name` flag to the engine CLI (e.g. `longhorn -volume-name
<volume-name> -engine-instance-name <engine-instance-name> <command> <args>`). Virtually all CLI commands create a
controller client and these flags allow appropriate gRPC metadata to be injected into every client request. Requests
that reach the wrong longhorn-engine controller server are rejected.
Use the global `engine-instance-name` flag and the pre-existing `volume-name` positional argument to allow the
longhorn-engine controller server to remember its volume and instance name (e.g. `longhorn -engine-instance-name
<instance-name> controller <volume-name>`). Ignore the global `volume-name` flag, as it is redundant.
Use the global `volume-name` flag or the pre-existing local `volume-name` flag and a new `replica-instance-name` flag to
allow the longhorn-engine replica server to remember its volume and instance name (e.g. `longhorn -volume-name
<volume-name> replica <directory> -replica-instance-name <replica-instance-name>`).
Use the global `volume-name` flag and a new `replica-instance-name` flag to allow the longhorn-engine sync-agent server
to remember its volume and instance name (e.g. `longhorn -volume-name <volume-name> sync-agent -replica-instance-name
<replica-instance-name>`).
Add an additional `replica-instance-name` flag to CLI commands that launch asynchronous tasks that communicate directly
with the longhorn-engine replica server (e.g. `longhorn -volume-name <volume-name> add-replica <address> -size <size>
-current-size <current-size> -replica-instance-name <replica-instance-name>`). All such commands create a replica
client and these flags allow appropriate gRPC metadata to be injected into every client request. Requests that reach the
wrong longhorn-engine replica server are rejected.
Return 9 FAILED_PRECONDITION with an appropriate message when metadata validation fails. This code is chosen in
accordance with the [RPC API](https://grpc.github.io/grpc/core/md_doc_statuscodes.html), which instructs developers to
use FAILED_PRECONDITION if the client should not retry until the system system has been explicitly fixed.
#### Longhorn-Instance-Manager
Increment the longhorn-instance-manager InstanceManagerProxyAPIVersion by one. Do not increment the
longhorn-instance-manager InstanceManagerProxyAPIMinVersion. The changes in this LEP are backwards compatible. No added
fields are required and their omission is ignored. If a less sophisticated (not upgraded) client does not include them,
no metadata is injected into engine or replica requests and no validation occurs (the behavior is the same as before the
implementation of this LEP).
Add `volume_name` and `instance_name` fields to the `ProxyEngineRequest` protocol buffer message. This message, which
currently only contains an `address` field, is included in all `ProxyEngineService` RPCs. Updated clients can pass
information about the engine process they expect to be communicating with in these fields. When instance-manager creates
an asynchronous task to carry out the requested operation, the resulting controller client includes the gRPC interceptor
described above.
Add `replica_instance_name` fields to any `ProxyEngineService` RPC associated with an asynchronous task that
communicates directly with a longhorn-engine replica server. When instance-manager creates the task, the resulting
replica client includes the gRPC interceptor described above.
Return 5 NOT FOUND with an appropriate message when metadata validation fails at a lower layer. (The particular return
code is definitely open to discussion.)
## Design
### Implementation Overview
#### Interceptors (longhorn-engine)
Add a gRPC server interceptor to all `grpc.NewServer` calls.
```golang
server := grpc.NewServer(withIdentityValidationInterceptor(volumeName, instanceName))
```
Implement the interceptor so that it validates metadata with best effort.
```golang
func withIdentityValidationInterceptor(volumeName, instanceName string) grpc.ServerOption {
return grpc.UnaryInterceptor(identityValidationInterceptor(volumeName, instanceName))
}
func identityValidationInterceptor(volumeName, instanceName string) grpc.UnaryServerInterceptor {
// Use a closure to remember the correct volumeName and/or instanceName.
return func(ctx context.Context, req any, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (any, error) {
md, ok := metadata.FromIncomingContext(ctx)
if ok {
incomingVolumeName, ok := md["volume-name"]
// Only refuse to serve if both client and server provide validation information.
if ok && volumeName != "" && incomingVolumeName[0] != volumeName {
return nil, status.Errorf(codes.InvalidArgument, "Incorrect volume name; check controller address")
}
}
if ok {
incomingInstanceName, ok := md["instance-name"]
// Only refuse to serve if both client and server provide validation information.
if ok && instanceName != "" && incomingInstanceName[0] != instanceName {
return nil, status.Errorf(codes.InvalidArgument, "Incorrect instance name; check controller address")
}
}
// Call the RPC's actual handler.
return handler(ctx, req)
}
}
```
Add a gRPC client interceptor to all `grpc.Dial` calls.
```golang
connection, err := grpc.Dial(serviceUrl, grpc.WithInsecure(), withIdentityValidationInterceptor(volumeName, instanceName))
```
Implement the interceptor so that it injects metadata with best effort.
```golang
func withIdentityValidationInterceptor(volumeName, instanceName string) grpc.DialOption {
return grpc.WithUnaryInterceptor(identityValidationInterceptor(volumeName, instanceName))
}
func identityValidationInterceptor(volumeName, instanceName string) grpc.UnaryClientInterceptor {
// Use a closure to remember the correct volumeName and/or instanceName.
return func(ctx context.Context, method string, req any, reply any, cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
if volumeName != "" {
ctx = metadata.AppendToOutgoingContext(ctx, "volume-name", volumeName)
}
if instanceName != "" {
ctx = metadata.AppendToOutgoingContext(ctx, "instance-name", instanceName)
}
return invoker(ctx, method, req, reply, cc, opts...)
}
}
```
Modify all client constructors to include this additional information. Wherever these client packages are consumed (e.g.
the replica client is consumed by the controller, both the replica and the controller clients are consumed by
longhorn-manager), callers can inject this additional information into the constructor and get validation for free.
```golang
func NewControllerClient(address, volumeName, instanceName string) (*ControllerClient, error) {
// Implementation.
}
```
#### CLI Commands (longhorn-engine)
Add additional flags to all longhorn-engine CLI commands depending on their function.
E.g. command that launches a server:
```golang
func ReplicaCmd() cli.Command {
return cli.Command{
Name: "replica",
UsageText: "longhorn controller DIRECTORY SIZE",
Flags: []cli.Flag{
// Other flags.
cli.StringFlag{
Name: "volume-name",
Value: "",
Usage: "Name of the volume (for validation purposes)",
},
cli.StringFlag{
Name: "replica-instance-name",
Value: "",
Usage: "Name of the instance (for validation purposes)",
},
},
// Rest of implementation.
}
}
```
E.g. command that directly communicates with both a controller and replica server.
```golang
func AddReplicaCmd() cli.Command {
return cli.Command{
Name: "add-replica",
ShortName: "add",
Flags: []cli.Flag{
// Other flags.
cli.StringFlag{
Name: "volume-name",
Required: false,
Usage: "Name of the volume (for validation purposes)",
},
cli.StringFlag{
Name: "engine-instance-name",
Required: false,
Usage: "Name of the controller instance (for validation purposes)",
},
cli.StringFlag{
Name: "replica-instance-name",
Required: false,
Usage: "Name of the replica instance (for validation purposes)",
},
},
// Rest of implementation.
}
}
```
#### Instance-Manager Integration
Modify the ProxyEngineService server functions so that they can make correct use of the changes in longhorn-engine.
Funnel information from the additional fields in the ProxyEngineRequest message and in appropriate ProxyEngineService
RPCs into the longhorn-engine task and controller client constructors so it can be used for validation.
```protobuf
message ProxyEngineRequest{
string address = 1;
string volume_name = 2;
string instance_name = 3;
}
```
Modify the ProxyEngineService client functions so that consumers can provide the information required to enable
validation.
#### Longhorn-Manager Integration
Ensure the engine and replica controllers launch engine and replica processes with `-volume-name` and
`-engine-instance-name` or `-replica-instance-name` flags so that these processes can validate identifying gRPC metadata
coming from requests.
Ensure the engine controller supplies correct information to the ProxyEngineService client functions so that identity
validation can occur in the lower layers.
#### Example Validation Flow
This issue/LEP was inspired by [longhorn/longhorn#5709](https://github.com/longhorn/longhorn/issues/5709). In the
situation described in this issue:
1. An engine controller with out-of-date information (including a replica address the associated volume does not own)
[issues a ReplicaAdd
command](https://github.com/longhorn/longhorn-manager/blob/a7dd20cdbdb1a3cea4eb7490f14d94d2b0ef273a/controller/engine_controller.go#L1819)
to instance-manager's EngineProxyService.
2. Instance-manager creates a longhorn-engine task and [calls its AddReplica
method](https://github.com/longhorn/longhorn-instance-manager/blob/0e0ec6dcff9c0a56a67d51e5691a1d4a4f397f4b/pkg/proxy/replica.go#L35).
3. The task makes appropriate calls to a longhorn-engine controller and replica. The ReplicaService's [ExpandReplica
command](https://github.com/longhorn/longhorn-engine/blob/1f57dd9a235c6022d82c5631782020e84da22643/pkg/sync/sync.go#L509)
is used to expand the replica before a followup failure to actually add the replica to the controller's backend.
After this improvement, the above scenario will be impossible:
1. Both the engine and replica controllers will launch engine and replica processes with the `-volume-name` and
`-engine-instance-name` or `replica-instance-name` flags.
2. When the engine controller issues a ReplicaAdd command, it will do so using the expanded embedded
`ProxyEngineRequest` message (with `volume_name` and `instance_name` fields) and an additional
`replica_instance_name` field.
3. Instance-manager will create a longhorn-engine task that automatically injects `volume-name` and `instance-name` gRPC
metadata into each controller request.
4. When the task issues an ExpandReplica command, it will do so using a client that automatically injects `volume-name`
and `instance-name` gRPC metadata into it.
5. If either the controller or the replica does not agree with the information provided, gRPC requests will fail
immediately and there will be no change in any longhorn-engine component.
### Test plan
#### TODO: Integration Test Plan
In my test environment, I have experimented with:
- Running new versions of all components, making gRPC calls to the longhorn-engine controller and replica processes with
wrong gRPC metadata, and verifying that these calls fail.
- Running new versions of all components, making gRPC calls to instance-manager with an incorrect volume-name or
instance name, and verifying that these calls fail.
- Running new versions of all components, adding additional logging to longhorn-engine and verifying that metadata
validation is occurring during the normal volume lifecycle.
This is really a better fit for a negative testing scenario (do something that would otherwise result in improper
communication, then verify that communication fails), but we have already eliminated the only known recreate for
[longhorn/longhorn#5709](https://github.com/longhorn/longhorn/issues/5709).
#### Engine Integration Test Plan
Rework test fixtures so that:
- All controller and replica processes are created with the information needed for identity validation.
- It is convenient to create controller and replica clients with the information needed for identity validation.
- gRPC metadata is automatically injected into controller and replica client requests when clients have the necessary
information.
Do not modify the behavior of existing tests. Since these tests were using clients with identity validation information,
no identity validation is performed.
- Modify functions/fixtures that create engine/replica processes to allow the new flags to be passed, but do not pass
them by default.
- Modify engine/replica clients used by tests to allow for metadata injection, but do not enable it by default.
Create new tests that:
- Ensure validation fails when a directly created client attempts to communicate with a controller or replica server
using the wrong identity validation information.
- Ensure validation fails when an indirectly created client (by the engine) tries to communicate with a replica server
using the wrong identity validation information.
- Ensure validation fails when an indirectly created client (by a CLI command) tries to communicate with a controller or
replica server using the wrong identity validation information.
### Upgrade strategy
The user will get benefit from this behavior automatically, but only after they have upgraded all associated components
to a supporting version (longhorn-manager, longhorn-engine, and CRITICALLY instance-manager).
We will only provide volume name and instance name information to longhorn-engine controller and replica processes on a
supported version (as governed by the `CLIAPIVersion`). Even if other components are upgraded, when they send gRPC
metadata to non-upgraded processes, it will be ignored.
We will only populate extra ProxyEngineService fields when longhorn-manager is running with an update ProxyEngineService
client.
- RPCs from an old client to a new ProxyEngineService server will succeed, but without the extra fields,
instance-manager will have no useful gRPC metadata to inject into its longhorn-engine requests.
- RPCs from a new client to an old ProxyEngineService will succeed, but instance-manager will ignore the new fields and
not inject useful gRPC metadata into its longhorn-engine request.
## Note
### Why gRPC metadata?
We initially looked at adding volume name and/or instance name fields to all longhorn-engine ReplicaService and
ControllerService calls. However, this would be awkward with some of the existing RPCs. In addition, it doesn't make
much intuitive sense. Why should we provide the name of an entity we are communicating with to that entity as part of
its API? It makes more sense to think of this identity validation in terms of sessions or authorization/authentication.
In HTTP, information of this nature is handled through the use of headers, and metadata is the gRPC equivalent.
### Why gRPC interceptors?
We want to ensure the same behavior in every longhorn-engine ControllerService and ReplicaService call so that it is not
up to an individual developer writing a new RPC to remember to validate gRPC metadata (and to relearn how it should be
done). Interceptors work mostly transparently to ensure identity validation always occurs.

View File

@ -0,0 +1,149 @@
# Upgrade Checker Info Collection
The website https://metrics.longhorn.io/ offers valuable insights into how Longhorn is being utilized, which can be accessed by the public. This information serves as a useful reference for user who are new to Longhorn, as well as those considering upgrading Longhorn or the underlying Kubernetes version. Additionally, it is useful for the Longhorn team to understand how it is being used in the real world.
To gain a deeper understanding of usage patterns, it would be beneficial to gather additional information on volumes, host systems, and features. This data would not only offer insights into how to further improve Longhorn but also provide valuable ideas on how to steer Longhorn development in the right direction.
## Summary
This proposal aims to enhance Longhorn's upgrade checker `extraInfo` by collecting additional information includes node and cluster information, and some Longhorn settings.
This proposal introduces a new setting, `Allow Collecting Longhorn Usage Metrics`, to allow users to enable or disable the collection.
### Related Issues
https://github.com/longhorn/longhorn/issues/5235
## Motivation
### Goals
1. Extend collections of user cluster info during upgrade check.
1. Have a new setting to provide user with option to enable or disable the collection.
### Non-goals [optional]
`None`
## Proposal
1. Collect and sends through upgrade responder request.
- Node info:
- Kernel release
- OS distro
- Disk types (HDD, SSD, NVMe)
- Node provider
- Cluster info:
- Longhorn namespace UID for adaption rate
- Number of nodes
- Longhorn components CPU and memory usage
- Volumes info; such as access mode, frontend, average snapshot per volume, etc.
- Some Longhorn settings
1. Introduce new `Allow Collecting Longhorn Usage Metrics` setting.
### User Stories
Users can view how Longhorn is being utilized on https://metrics.longhorn.io/.
Additionally, users have the ability to disable the collection by Longhorn.
### User Experience In Detail
Users can find a list of items that Longhorn collects as extra information in the Longhorn documentation.
Users can enable or disable the collection through the `Allow Collecting Longhorn Usage Metrics` setting. This setting can be configured using the UI or through kubectl, similar to other settings.
### API changes
`None`
## Design
### Implementation Overview
#### `Allow Collecting Longhorn Usage Metrics` Setting
- If this value is set to false, extra information will not be collected.
- Setting definition:
```
DisplayName: "Allow Collecting Longhorn Usage Metrics"
Description: "Enabling this setting will allow Longhorn to provide additional usage metrics to https://metrics.longhorn.io. This information will help us better understand how Longhorn is being used, which will ultimately contribute to future improvements."
Category: SettingCategoryGeneral
Type: SettingTypeBool
Required: true
ReadOnly: false
Default: "true"
```
#### Extra Info Collection
##### Node Info
The following information is sent from each cluster node:
- Number of disks of different device (`longhorn_node_disk_<hdd/ssd/nvme/unknown>_count`).
> Note: this value may not be accurate if the cluster node is a virtual machine.
- Host kernel release (`host_kernel_release`)
- Host Os distro (`host_os_distro`)
- Kubernetest node provider (`kubernetes_node_provider`)
##### Cluster Info
The following information is sent from one of the cluster node:
- Longhorn namespace UID (`longhorn_namespace_uid`).
- Number of nodes (`longhorn_node_count`).
- Number of volumes of different access mode (`longhorn_volume_access_mode_<rwo/rwx/unknown>_count`).
- Number of volumes of different data locality (`longhorn_volume_data_locality_<disabled/best_effort/strict_local/unknown>_count`).
- Number of volumes of different frontend (`longhorn_volume_frontend_<blockdev/iscsi>_count`).
- Average volume size (`longhorn_volume_average_size`).
- Average volume actual size (`longhorn_volume_average_actual_size`).
- Average number of snapshots per volume (`longhorn_volume_average_snapshot_count`).
- Average number of replicas per volume (`longhorn_volume_average_number_of_replicas`).
- Average Longhorn component CPU usage (`longhorn_<engine_image/instance_manager/manager/ui>_average_cpu_usage_core`)
- Average Longhorn component CPU usage (`longhorn_<engine_image/instance_manager/manager/ui>_average_memory_usage_mib`)
- Settings (`longhorn_setting_<name>`):
- Settings to exclude:
- SettingNameBackupTargetCredentialSecret
- SettingNameDefaultEngineImage
- SettingNameDefaultInstanceManagerImage
- SettingNameDefaultShareManagerImage
- SettingNameDefaultBackingImageManagerImage
- SettingNameSupportBundleManagerImage
- SettingNameCurrentLonghornVersion
- SettingNameLatestLonghornVersion
- SettingNameStableLonghornVersions
- SettingNameDefaultLonghornStaticStorageClass
- SettingNameDeletingConfirmationFlag
- SettingNameDefaultDataPath
- SettingNameUpgradeChecker
- SettingNameAllowCollectingLonghornUsage
- SettingNameDisableReplicaRebuild (deprecated)
- SettingNameGuaranteedEngineCPU (deprecated)
- Settings that requires processing to identify their general purpose:
- SettingNameBackupTarget (the backup target type/protocol, ex: cifs, nfs, s3)
- Settings that should be collected as boolean (true if configured, false if not):
- SettingNameTaintToleration
- SettingNameSystemManagedComponentsNodeSelector
- SettingNameRegistrySecret
- SettingNamePriorityClass
- SettingNameStorageNetwork
- Other settings that should be collected as it is.
Example:
```
name: upgrade_request
time app_version host_kernel_release host_os_distro kubernetes_node_provider kubernetes_version longhorn_engine_image_average_cpu_usage_core longhorn_engine_image_average_memory_usage_mib longhorn_instance_manager_average_cpu_usage_core longhorn_instance_manager_average_memory_usage_mib longhorn_manager_average_cpu_usage_core longhorn_manager_average_memory_usage_mib longhorn_namespace_uid longhorn_node_count longhorn_node_disk_nvme_count longhorn_setting_allow_node_drain_with_last_healthy_replica longhorn_setting_allow_recurring_job_while_volume_detached longhorn_setting_allow_volume_creation_with_degraded_availability longhorn_setting_auto_cleanup_system_generated_snapshot longhorn_setting_auto_delete_pod_when_volume_detached_unexpectedly longhorn_setting_auto_salvage longhorn_setting_backing_image_cleanup_wait_interval longhorn_setting_backing_image_recovery_wait_interval longhorn_setting_backup_compression_method longhorn_setting_backup_concurrent_limit longhorn_setting_backup_target longhorn_setting_backupstore_poll_interval longhorn_setting_concurrent_automatic_engine_upgrade_per_node_limit longhorn_setting_concurrent_replica_rebuild_per_node_limit longhorn_setting_concurrent_volume_backup_restore_per_node_limit longhorn_setting_crd_api_version longhorn_setting_create_default_disk_labeled_nodes longhorn_setting_default_data_locality longhorn_setting_default_replica_count longhorn_setting_disable_revision_counter longhorn_setting_disable_scheduling_on_cordoned_node longhorn_setting_engine_replica_timeout longhorn_setting_failed_backup_ttl longhorn_setting_fast_replica_rebuild_enabled longhorn_setting_guaranteed_engine_manager_cpu longhorn_setting_guaranteed_instance_manager_cpu longhorn_setting_guaranteed_replica_manager_cpu longhorn_setting_kubernetes_cluster_autoscaler_enabled longhorn_setting_node_down_pod_deletion_policy longhorn_setting_node_drain_policy longhorn_setting_orphan_auto_deletion longhorn_setting_priority_class longhorn_setting_recurring_failed_jobs_history_limit longhorn_setting_recurring_successful_jobs_history_limit longhorn_setting_registry_secret longhorn_setting_remove_snapshots_during_filesystem_trim longhorn_setting_replica_auto_balance longhorn_setting_replica_file_sync_http_client_timeout longhorn_setting_replica_replenishment_wait_interval longhorn_setting_replica_soft_anti_affinity longhorn_setting_replica_zone_soft_anti_affinity longhorn_setting_restore_concurrent_limit longhorn_setting_restore_volume_recurring_jobs longhorn_setting_snapshot_data_integrity longhorn_setting_snapshot_data_integrity_cronjob longhorn_setting_snapshot_data_integrity_immediate_check_after_snapshot_creation longhorn_setting_storage_minimal_available_percentage longhorn_setting_storage_network longhorn_setting_storage_over_provisioning_percentage longhorn_setting_storage_reserved_percentage_for_default_disk longhorn_setting_support_bundle_failed_history_limit longhorn_setting_system_managed_components_node_selector longhorn_setting_system_managed_pods_image_pull_policy longhorn_setting_taint_toleration longhorn_ui_average_cpu_usage_core longhorn_ui_average_memory_usage_mib longhorn_volume_access_mode_rwo_count longhorn_volume_average_actual_size longhorn_volume_average_number_of_replicas longhorn_volume_average_size longhorn_volume_average_snapshot_count longhorn_volume_data_locality_disabled_count longhorn_volume_frontend_blockdev_count value
---- ----------- ------------------- -------------- ------------------------ ------------------ -------------------------------------------- ---------------------------------------------- ------------------------------------------------ -------------------------------------------------- --------------------------------------- ----------------------------------------- ---------------------- ------------------- ----------------------------- ----------------------------------------------------------- ---------------------------------------------------------- ----------------------------------------------------------------- ------------------------------------------------------- ------------------------------------------------------------------ ----------------------------- ---------------------------------------------------- ----------------------------------------------------- ------------------------------------------ ---------------------------------------- ------------------------------ ------------------------------------------ ------------------------------------------------------------------- ---------------------------------------------------------- ---------------------------------------------------------------- -------------------------------- -------------------------------------------------- -------------------------------------- -------------------------------------- ----------------------------------------- ---------------------------------------------------- --------------------------------------- ---------------------------------- --------------------------------------------- ---------------------------------------------- ------------------------------------------------ ----------------------------------------------- ------------------------------------------------------ ---------------------------------------------- ---------------------------------- ------------------------------------- ------------------------------- ---------------------------------------------------- -------------------------------------------------------- -------------------------------- -------------------------------------------------------- ------------------------------------- ------------------------------------------------------ ---------------------------------------------------- ------------------------------------------- ------------------------------------------------ ----------------------------------------- ---------------------------------------------- ---------------------------------------- ------------------------------------------------ -------------------------------------------------------------------------------- ----------------------------------------------------- -------------------------------- ----------------------------------------------------- ------------------------------------------------------------- ---------------------------------------------------- -------------------------------------------------------- ------------------------------------------------------ --------------------------------- ---------------------------------- ------------------------------------ ------------------------------------- ----------------------------------- ------------------------------------------ ---------------------------- -------------------------------------- -------------------------------------------- --------------------------------------- -----
1683598256887331729 v1.5.0-dev 5.3.18-59.37-default "sles" k3s v1.23.15+k3s1 5m 11 4m 83 22m 85 1b96b299-b785-468b-ab80-b5b5b12fbe00 3 1 false false true true true true 60 300 lz4 5 none 300 0 5 5 longhorn.io/v1beta2 false disabled 3 false true 8 1440 true 12 12 12 false do-nothing block-if-contains-last-replica false false 1 1 false false disabled 30 600 false true 5 false fast-check 0 0 */7 * * false 25 false 200 30 1 false if-not-present false 0 4 3 79816021 2 8589934592 0 3 3 1
1683598257082240493 v1.5.0-dev 5.3.18-59.37-default "sles" k3s v1.23.15+k3s1 1 1
1683598257825718008 v1.5.0-dev 5.3.18-59.37-default "sles" k3s v1.23.15+k3s1 1 1
```
### Test plan
1. Set up the upgrade responder server.
1. Verify the database when the `Allow Collecting Longhorn Usage Metrics` setting is enabled or disabled.
### Upgrade strategy
`None`
## Note [optional]
`None`

View File

@ -0,0 +1,68 @@
# Set RecurringJob to PersistentVolumeClaims (PVCs)
Managing recurring jobs for Longhorn Volumes is challenging for users utilizing gitops. Primary because gitops operates at the Kubernetes resource level while recurring job labeling is specific to individual Longhorn Volumes.
## Summary
This document proposes the implementation of a solution that allows configuring recurring jobs directly on PVCs.
By adopting this approach, users will have the capability to manage Volume recurring jobs through the PVCs.
### Related Issues
https://github.com/longhorn/longhorn/issues/5791
## Motivation
### Goals
1. Introduce support for enabling/disabling PVCs as a recurring job label source for the corresponding Volume.
1. The recurring job labels on PVCs are reflected on the associated Volume when the PVC is set as the recurring job label source.
### Non-goals [optional]
Sync Volume recurring job labels to PVC.
## Proposal
1. The existing behavior of recurring jobs will remain unchanged, with the Volume's recurring job labeling as the source of truth.
2. When the PVC is enabled as the recurring job label source, its recurring job labels will override all recurring job labels of the associated Volume.
### User Stories
As a user, I want to be able to set the RecurringJob label on the PVC. I expect that any updates made to the RecurringJob labels on the PVC will automatically reflect on the associated Volume.
### User Experience In Detail
To enable or disable the PVC as the recurring job label source, users can manage it by adding or removing the `recurring-job.longhorn.io/source: enable` label to the PVC.
Once the PVC is set as the recurring job label source, any recurring job labels added or removed from the PVC will be automatically synchronized by Longhorn to the associated Volume.
### API changes
`None`
## Design
### Implementation Overview
#### Sync Volume recurring job labels to PVC ####
If the PVC is labeled with `recurring-job.longhorn.io/source: enable`, the volume controller checks and updates the Volume to ensure the recurring job labels stay synchronized with the PVC by detecting recurring job label differences.
#### Remove PVC recurring job of the deleting RecurringJob ####
As of now, Longhorn includes a feature that automatically removes the Volume recurring job label associated with a deleting RecurringJob. This is also applicable to the PVC.
### Test plan
1. Update PVC recurring job label should reflect on the Volume.
1. Delete RecurringJob custom resource should delete the recurring job labels on both Volume and PVC.
### Upgrade strategy
`None`
## Note [optional]
`None`

View File

@ -0,0 +1,146 @@
# Support Volumes using V2 Data Engine
## Summary
Longhorn's storage stack, based on iSCSI and a customized protocol, has limitations such as increased I/O latencies and reduced IOPS due to the longer data path. This makes it less suitable for latency-critical applications. To overcome these challenges, Longhorn introduces the Storage Performance Development Kit (SPDK) to enhance overall performance. With SPDK integration, Longhorn optimizes system efficiency, addresses latency concerns, and provides a high-performance storage solution capable of meeting diverse workload demands.
### Related Issues
- [[FEATURE] Add a global setting for enabling and disabling SPDK feature](https://github.com/longhorn/longhorn/issues/5778)
- [[FEATURE] Support replica scheduling for SPDK volume](https://github.com/longhorn/longhorn/issues/5711)
- [[FEATURE] Implement Disk gRPC Service in Instance Manager for collecting SPDK disk statistics from SPDK gRPC service](https://github.com/longhorn/longhorn/issues/5744)
- [[FEATURE] Identify and manage orphaned lvols and raid bdevs if the associated Volume resources are not existing](https://github.com/longhorn/longhorn/issues/5827)
## Motivation
### Goals
- Introduce backend store drivers
- `v1`: legacy data path
- `v2`: a newly introduced data path based on SPDK
- Introduce disk types and management
- Support volume creation, attachment, detachment and deletion
- Support orphaned replica collection
### Non-goals [optional]
- Support runtime replica rebuilding
- Support changing number of replicas of a volume
- Support volume expansion
- Support volume backup
## Proposal
### User Stories
Longhorn's storage stack is built upon iSCSI and a customized protocol. However, the longer data path associated with this architecture introduces certain limitations, resulting in increased I/O latencies and reduced IOPS. Consequently, Longhorn may not be the ideal choice for latency-critical applications, as the performance constraints could impede their deployment on the platform.
By incorporating SPDK, Longhorn leverages its capabilities to significantly improve performance levels. The integration of SPDK enables Longhorn to optimize system efficiency, mitigate latency concerns, and deliver a high-performance storage solution that can better meet the demands of diverse workloads.
### User Experience In Detail
- Environment Setup
- Configure Kernel Modules (uio and uio_pci_generic) and Huge Pages for SPDK
```bash
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/prerequisite/longhorn-spdk-setup.yaml
```
- Install NVMe Userspace Tool and Load `nvme-tcp` Kernel Module
nvme-cli on each node and make sure that the version of nvme-cli is equal to or greater than version `1.12` .
```bash
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/prerequisite/longhorn-nvme-cli-installation.yaml
```
- Restart `kubelet`
Modifying the Huge Page configuration of a node requires either a restart of kubelet or a complete reboot of the node. This step is crucial to ensure that the changes take effect and are properly applied.
- Install Longhorn system
- Enable SPDK Support
Enable the SPDK feature by changing the `v2-data-engine` setting to `true` after installation. Following this, the instance-manager pods shall be automatically restarted.
- Add Disks for volumes using v2 data engine
- Legacy disks are classified as `filesystem`-type disks
- Add one or multiple `block`-type disks into `node.Spec.Disks`
```bash
block-disk-example1:
allowScheduling: true
evictionRequested: false
path: /path/to/block/device
storageReserved: 0
tags: []
diskType: block
```
- Create a storage class utilizing the enhanced performance capabilities offered by SPDK
```bash
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-v2-data-engine
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "2"
staleReplicaTimeout: "2880"
fromBackup: ""
fsType: "ext4"
backendStoreDriver: "v2"
```
- Create workloads that use Longhorn volumes provisioning based on the storage class.
### API changes
## Design
### Implementation Overview
- Global settings
- `v2-data-engine`: This setting allows users to enable v2 data engine support. Default: false.
- `v2-data-engine-hugepage-limit`: This setting allows users to specify the 2 MiB hugepage size for v2 data engine. Default: 2048.
- CRD
- Introduce `diskType` in `node.Spec.Disks`
- `filesystem`: disks for legacy volumes. These disks, which are actually directories, store and organize data in a hierarchical manner.
- `block`: block disks for volumes using v2 data engine
The replica scheduler assigns replicas of legacy volumes to `filesystem`-type disks while replicas of volumes using v2 data engine are scheduled to `block`-type disks.
- Introduce `backendStoreDriver` in `volume.Spec`, `engine.Spec` and `replica.Spec`.
- `backendStoreDriver` is utilized to differentiate between volume types and their associated data paths.
- Introduce `Instance`, `Disk` and `SPDK` gRPC services
![gRPC Services](image/spdk_services.png)
- `Instance` gRPC service: It is tasked with managing various operations related to instance management, including creation, deletion, retrieval, listing, and watching. An instance, either an engine or a replica of a legacy volume, represents a process. On the other hand, for replicas of volumes using v2 data engine, an instance represents a logical volume. In the case of an engine for an volume using v2 data engine, an instance is associated with a raid bdev, a frontend NVMe target/initiator pair and a bind mount device.
- `Disk` gRPC service: It is responsible for managing various disk operations, including creation, deletion, and retrieval. Additionally, it provides functionalities to list or delete replica instances associated with the disks. In the case of a legacy volume, a replica instance is represented as a replica directory on the disk. On the other hand, for an volume using v2 data engine, a replica instance is a replica chained by logical volumes.
- `SPDK` gRPC service: It manages replicas chained by logical volumes and engines constructed using SPDK raid1 bdevs. In addition, the service is responsible for the communication with `spdk_tgt`.
- Proxy gRPC service APIs
- Update gRPC service APIs for support different disk type, filesystem and block, and data engines, v1 and v2.
- Disk orchestration
Within the Longhorn system, an aio bdev and an lvstore are created on top of a block-type disk. Replicas in terms of logical volumes (lvols) are then created on the lvstore.
![Disks for Volumes Using V2 Data Engine](image/spdk_disks.png)
- Orphaned replicas collection
The features have been integrated into the existing framework for collecting and cleaning up orphaned replicas.
## Test Plan
## Note [optional]

View File

@ -0,0 +1,92 @@
# Volume Backup Policy for Longhorn System Backup
The current implementation of the Longhorn system backup lacks integration with the volume backup feature. As a result, users are required to manually ensure that all volume backups are up-to-date before initiating the Longhorn system backup.
## Summary
This document proposed to include the volume backup feature in the Longhorn system backup by introducing volume backup policies.
By implementing the volume backup policies, users will gain the ability to define how volume data should be backed up during the Longhorn system backup.
### Related Issues
https://github.com/longhorn/longhorn/issues/5011
## Motivation
### Goals
1. **Customization:** By offering different volume backup policy options, users can choose the one best fit with their requirements.
1. **Reduce Manual Efforts:** By integrating volume backup into the Longhorn system backup, users no longer have to ensure that all volume backups are up-to-date before initiating the system backup,
1. **Enhanced Data Integrity:** By aligning the system backup with a new up-to-date volume backups, the restored volume data will be more accurate.
Overall, the proposed volume backup policies aim to improve the Longhorn system backup functionality and providing a more robust and customizable system backup solution.
### Non-goals [optional]
`None`
## Proposal
1. When volume backup policy is specified:
- `if-not-present`: Longhorn will create a backup for volumes that do not have an existing backup.
- `always`: Longhorn will create a backup for all volumes, regardless of their existing backups.
- `disabled`: Longhorn will not create any backups for volumes.
1. If a volume backup policy is not specified, the policy will be automatically set to `if-not-present`. This ensures that volumes without any existing backups will be backed up during the Longhorn system backup.
### User Stories
As a user, I want the ability to specify the volume backup policy when creating the Longhorn system backup. This will allow me to define how volumes should be backed up according to my scenario.
- **Scenario 1: if-not-present Policy:** When I set the volume backup policy to `if-not-present`, I expect Longhorn to create a backup for volumes that do not already have a backup.
- **Scenario 2: always Policy:** When I set the volume backup policy to `always`, I expect Longhorn to create backups for all volumes, regardless of whether they already have a backup.
- **Scenario 3: disabled Policy:** When I set the volume backup policy to `disabled`, I expect Longhorn to not create any backups for the volumes.
In cases where I don't explicitly specify the volume backup policy during the system backup configuration, I expect Longhorn to automatically apply the `if-not-present` policy as the default.
### User Experience In Detail
To set the volume backup policy, users can set the volume backup policy when creating the system backup through the UI. Alternatively, users can specify it in the manifest when creating the SystemBackup custom resource using the kubectl command.
In scenarios where no specific volume backup policy is provided, Longhorn will automatically set the policy as `if-not-present`.
### API changes
Add a new `volumeBackupPolicy` field to the HTTP request and response payload.
## Design
### Implementation Overview
#### SystemBackup Custom Resource
- Introduce a new `volumeBackupPolicy` field. This field allows user to specify the volume backup policy.
- Add a new state (phase) called `CreatingVolumeBackups` to track the progress of volume backup creation during the Longhorn system backup.
#### CreatingVolumeBackups phase
- Iterate through each Longhorn volume.
- If the policy is `if-not-present`, create a volume snapshot and backup only for volumes that do not already have a backup (lastBackup is empty).
- If the policy is `always`, create a volume snapshot and backup for all volumes, regardless of their existing backups.
- If the policy is `disabled`, skip the volume backup creation step for all volumes and proceed to the next phase.
- Wait for all volume backups created by the SystemBackup to finish (completed or error state) before proceeding to the next phase (Generating or Error). Backup will have timeout limit of 24 hours. Any of the backups failure will lead the SystemBackup to and Error state.
#### Mutate empty volume backup policy
When the volume backup policy is not provided in the SystemBackup custom resource, automatically set the policy to `if-not-present`.
### Test plan
1. When the volume backup policy is `if-not-present`, the system backup should only create volume backup when there is no existing backup in Volume.
1. When the volume backup policy is `always`, the system backup should create volume backup regardless of the existing backup.
1. When the volume backup policy is `disabled`, the system backup should not create volume backup.
### Upgrade strategy
`None`
## Note [optional]
`None`

View File

@ -0,0 +1,77 @@
# Forcibly Activate A Restoring/DR Volume
## Summary
When users try to activate a restoring/DR volume with some replicas failed for some reasons, the volume will be stuck in attaching state and users can not do anything except deleting the volume. However the volume still can be rebuilt back to be normal as long as there is a healthy replica.
To improve user experience, Longhorn should forcibly activate a restoring/DR volume if there is a healthy replica and users allow the creation of a degraded volume.
### Related Issues
https://github.com/longhorn/longhorn/issues/1512
## Motivation
### Goals
Allow users to activate a restoring/DR volume as long as there is a healthy replica and the volume works well.
### Non-goals [optional]
`None`
## Proposal
Forcibly activate a restoring/DR volume if there is a healthy replica and users enable the global setting `allow-volume-creation-with-degraded-availability`.
### User Stories
Users can activate a restoring/DR volume by the CLI `kubectl` or Longhorn UI and the volume could work well.
### User Experience In Detail
#### Prerequisites
Set up two Kubernetes clusters. These will be called cluster A and cluster B. Install Longhorn on both clusters, and set the same backup target on both clusters.
1. In the cluster A, make sure the original volume X has a backup created or has recurring backups scheduled.
2. In backup page of cluster B, choose the backup volume X, then create disaster recovery volume Y.
#### Kubectl
User set `volume.spec.Standby` to `false` by editing the volume CR or the manifest creating the volume to activate the volume.
#### Longhorn UI
UI has click `Activate Disaster Recovery Volume` button in `Volume` or `Volume Details` pages to activate the volume.
### API changes
`None`
## Design
### Implementation Overview
1. Check if `volume.Spec.Standby` is set to `false`
2. Get the global setting `allow-volume-creation-with-degraded-availability`
3. Activate the DR volume if `allow-volume-creation-with-degraded-availability` is set to `true` and there are one or more ready replicas.
### Test plan
#### Test Forcibly Activated A Restoring/DR Volume
1. Create a DR volume
2. Set the global setting `concurrent-replica-rebuild-per-node-limit` to be 0
3. Failed some replicas
4. Check if there is at least one healthy replica
5. Call the API `activate`
6. The volume could be activated
7. Attach the volume to a node and check if data is correct
### Upgrade strategy
`None`
## Note [optional]
`None`

View File

@ -0,0 +1,86 @@
# Automatic Offline Replica Rebuilding
## Summary
Currently, Longhorn does not have the capability to support online replica rebuilding for volumes utilizing the V2 Data Engine. However, an automatic offline replica rebuilding mechanism has been implemented as a solution to address this limitation.
### Related Issues
https://github.com/longhorn/longhorn/issues/6071
## Motivation
### Goals
1. Support volumes using v2 data engine
### Non-goals
2. Support volumes using v1 data engine
## Proposal
## User Stories
In the event of abnormal power outages or network partitions, replicas of a volume may be lost, resulting in volume degradation. Unfortunately, volumes utilizing the v2 data engine do not currently have the capability for online replica rebuilding. As a solution to address this limitation, Longhorn has implemented an automatic offline replica rebuilding mechanism.
When a degraded volume is detached, this mechanism places the volume in maintenance mode and initiates the rebuilding process. After the rebuilding is successfully completed, the volume is detached according to the user's specified expectations.
### User Experience In Details
- If a volume using the v2 data engine is degraded, the online replica rebuilding process is currently unsupported.
- If offline replica rebuilding feature is enabled when one of the conditions is met
- Global setting `offline-replica-rebuild` is `enabled` and `Volume.Spec.OfflineReplicaRebuilding` is `ignored`
- `Volume.Spec.OfflineReplicaRebuilding` is `enabled`
The volume's `Spec.OfflineReplicaRebuildingRequired` is set to `true` if a volume is degraded.
- When a degraded volume is detached, this mechanism places the volume in maintenance mode and initiates the rebuilding process. After the rebuilding is successfully completed, the volume is detached according to the user's specified expectations.
- If a user attaches the volume without enabling maintenance mode while the replica rebuilding process is in progress, the ongoing replica rebuilding operation will be terminated.
## Design
### Implementation Overview
**Settings**
- Add global setting `offline-replica-rebuilding`. Default value is `enabled`. The available options are:
- `enabled`
- `disable`
**CRD**
- Add `Volume.Spec.OfflineReplicaRebuilding`. The available options are:
- ignored`: The volume's offline replica rebuilding behavior follows the settings defined by the global setting `offline-replica-rebuilding`.
- `enabled`: Offline replica rebuilding of the volume is always enabled.
- `disabled`: Offline replica rebuilding of the volume is always disabled.
- Add `Volume.Status.OfflineReplicaRebuildingRequired`
**Controller**
- Add `volume-rebuilding-controller` for creating and deleting `volume-rebuilding-controller` attachment ticket.
**Logics**
1. A volume-controller sets 'Volume.Status.OfflineReplicaRequired' to 'true' when it realizes a v2 data engine is degraded.
2. If a volume's `Volume.Status.OfflineReplicaRebuildingRequired` is `true`, volume-rebuilding-controller creates a `volume-rebuilding-controller` attachment ticket with frontend disabled and lower priority than tickets with workloads.
3. When the volume is detached, volume-attachment-controller attaches the volume with a `volume-rebuilding-controller` attachment ticket in maintenance mode.
4. volume-controller triggers replica rebuilding.
5. After finishing the replica rebuilding, the volume-controller sets `Volume.Status.OfflineReplicaRebuildingRequired` to `false` if a number of healthy replicas is expected.
6. volume-rebuilding-controller deletes the 'volume-rebuilding-controller' attachment ticket.
7. volume-attachment-controller is aware of the deletion of the `volume-rebuilding-controller` attachment ticket, which causes volume detachment.
### Test Plan
### Integration Tests
1. Degraded Volume lifecycle (creation, attachment, detachment and deletion) and automatic replica rebuilding

View File

@ -0,0 +1,100 @@
# SPDK Engine
## Summary
Longhorn will take advantage of SPDK to launch the second version engine with higher performance.
### Related Issues
https://github.com/longhorn/longhorn/issues/5406
https://github.com/longhorn/longhorn/issues/5282
https://github.com/longhorn/longhorn/issues/5751
## Motivation
### Goals
1. Have a set of APIs that talks with spdk_tgt to operate SPDK components.
2. Launch a control panel that manage and operate SPDK engines and replica.
## Proposal
1. The SPDK engine architecture is different from the legacy engine:
1. Unlike the legacy engine, the data flow will be taken over by SPDK. The new engine or replica won't directly touch the data handling. The new engine or replica is actually one or a set of SPDK components handled by spdk_tgt.
2. Since the main task is to manage SPDK components and abstract them as Longhorn engines or replicas, we can use a single service rather than separate processes to launch and manage engine or replicas.
3. As SPDK handles the disks by itself, the disk management logic should be moved to SPDK engine service as well.
2. The abstraction of SPDK engine and replica:
1. A data disk will be abstracted as an aio bdev + a lvstore.
2. Each snapshot or volume head file is a logical volume (lvol) inside a lvstore.
3. A remote replica is finally exposed as a NVMe-oF subsystem, in which the corresponding SPDK lvol stand behind. While a local replica is just a lvol.
4. An engine backend is actually a SPDK RAID1 bdev, which may consist of multiple attached replica NVMe-oF subsystems and local lvol.
5. An engine frontend is typically a NVMe-oF initiator plus a NVMe-oF subsystem of the RAID bdev.
3. Do spdk_tgt initializations during instance manager startup.
### User Stories
#### Launch SPDK volumes
Before the enhancement, users need to launch a RAID1 bdev then expose it as a NVMe-oF initiator as the Longhorn SPDK engine manually by following [the doc](https://github.com/longhorn/longhorn-spdk-engine/wiki/How-to-setup-a-RAID1-block-device-with-SPDK). Besides, rebuilding replicas would be pretty complicated.
After the enhancement, users can directly launch and control Longhorn SPDK engine via the gRPC SPDK engine service. And the rebuilding can be triggered and handled automatically.
### API Changes
- The new gRPC SPDK engine service:
- Replica:
| API | Caller | Input | Output | Comments |
| --- | --- | --- | --- | --- |
| Create | Instance manager proxy | name, lvsName, lvsUUID string, specSize uint64, exposeRequired bool | err error | Create a new replica or start an existing one |
| Delete | Instance manager proxy | name string, cleanupRequired bool | err error | Remove or stop an existing replica |
| List | Instance manager proxy | | replicas map\[string\]Replica, err error | Get all abstracted replica info from the cache of the SPDK engine service |
| Get | Instance manager proxy | | replica Replica, err error | Get the abstracted replica info from the cache of the SPDK engine service |
| Watch | Instance manager proxy | | ReplicaStream, err error | Establish a streaming for the replica update notification |
| SnapshotCreate | Instance manager proxy | name, snapshotName string | err error | |
| SnapshotDelete | Instance manager proxy | name, snapshotName string | err error | |
| Rebuilding APIs | The engine inside one gRPC SPDK engine service | | | This set of APIs is responsible for starting and finishing the rebuilding for source replica or destination replica. And it help start data transmission from src to dst |
- Engine:
| API | Caller | Input | Output | Comments |
| --- | --- | --- | --- | --- |
| Create | Instance manager proxy | name, lvsName, lvsUUID string, specSize uint64, exposeRequired bool | err error | Start a new engine and connect it with corresponding replicas |
| Delete | Instance manager proxy | name string, cleanupRequired bool | err error | Stop an existing engine |
| List | Instance manager proxy | | engines map\[string\]Engine, err error | Get the abstracted engine info from the cache of the SPDK engine service |
| Get | Instance manager proxy | | engine Engine, err error | Get the abstracted engine info from the cache of the SPDK engine service |
| Watch | Instance manager proxy | | EngineStream, err error | Establish a streaming for the engine update notification |
| SnapshotCreate | Instance manager proxy | name, snapshotName string | err error | |
| SnapshotDelete | Instance manager proxy | name, snapshotName string | err error | |
| ReplicaAdd | Instance manager proxy | engineName, replicaName, replicaAddress string | err error | Find a healthy RW replica as source replica then rebuild the destination replica. To rebuild a replica, the engine will call rebuilding start and finish APIs for both replicas and launch data transmission |
| ReplicaDelete | Instance manager proxy | engineName, replicaName, replicaAddress string | err error | Remove a replica from the engine |
- Disk:
| API | Caller | Input | Output | Comments |
| --- | --- | --- | --- | --- |
| Create | Instance manager proxy | diskName, diskUUID, diskPath string, blockSize int64 | disk Disk, err error | Use the specified block device as blob store |
| Delete | Instance manager proxy | diskName, diskUUID string | err error | Remove a store from spdk_tgt |
| Get | Instance manager proxy | diskName string | disk Disk, err error | Detect the store status and get the abstracted disk info from spdk_tgt |
## Design
### Implementation Overview
#### [Go SPDK Helper](https://github.com/longhorn/go-spdk-helper):
- The SPDK Target is exposed as a [JSON-RPC service](https://spdk.io/doc/jsonrpc.html).
- Instead of using the existing sample python script [rpc_http_proxy](https://spdk.io/doc/jsonrpc_proxy.html), we will have a helper repo similar to [longhorn/go-iscsi-helper](https://github.com/longhorn/go-iscsi-helper) to talk with spdk_tgt over Unix domain socket `/var/tmp/spdk.sock`..
- The SPDK target config and launching. Then live upgrade, and shutdown if necessary/possible.
- The JSON RPC client that directly talks with spdk_tgt.
- The exposed Golang SPDK component operating APIs. e.g., lvstore, lvol, RAID creation, deletion, and list.
- The NVMe initiator handling Golang APIs (for the engine frontend).
#### [SPDK Engine](https://github.com/longhorn/go-spdk-helper):
- Launch a gRPC server as the control panel.
- Have a goroutine that periodically check and update engine/replica caches.
- Implement the engine/replica/disk APIs listed above.
- Notify upper layers about the engine/replica update via streaming.
#### Instance Manager:
- Start spdk_tgt on demand.
- Update the proxy service so that it forwards SPDK engine/replica requests to the gRPC service.
### Test Plan
#### Integration tests
1. Starting and stopping related tests: If Longhorn can start or stop one engine + multiple replicas correctly.
2. Basic IO tests: If Data can be r/w correctly. And if data still exists after restart.
3. Basic snapshot tests: If snapshots can be created and keeps identical among all replicas. If a snapshot can be deleted from all replicas. If snapshot revert work.
#### Manual tests
1. SPDK volume creation/deletion/attachment/detachment tests.
2. Basic IO tests: If Data can be r/w correctly when volume is degraded or healthy. And if data still exists after restart.
3. Basic offline rebuilding tests.
### Upgrade strategy
This is an experimental engine. We do not need to consider the upgrade or compatibility issues now.

View File

@ -0,0 +1,155 @@
# Disk Anti-Affinity
## Summary
Longhorn supports multiple disks per node, but there is currently no way to ensure that two replicas for the same
volume that schedule to the same node end up on different disks. In fact, the replica scheduler currently doesn't make
any attempt achieve this goal, even when it is possible to do so.
With the addition of a Disk Anti-Affinity feature, the Longhorn replica scheduler will attempt to schedule two replicas
for the same volume to different disks when possible. Optionally, the scheduler will refuse to schedule a replica to a
disk that has another replica for the same volume.
Although the comparison is not perfect, this enhancement can be thought of as enabling RAID 1 for Longhorn (mirroring
across multiple disks on the same node).
See the [Motivation section](#motivation) for potential benefits.
### Related Issues
- https://github.com/longhorn/longhorn/issues/3823
- https://github.com/longhorn/longhorn/issues/5149
### Existing Related Features
#### Replica Node Level Soft Anti-Affinity
Disabled by default. When disabled, prevents the scheduling of a replica to a node with an existing healthy replica of
the same volume.
Can also be set at the volume level to override the global default.
#### Replica Zone Level Soft Anti-Affinity
Enabled by default. When disabled, prevents the scheduling of a replica to a zone with an existing healthy replica of
the same volume.
Can also be set at the volume level to override the global default.
## Motivation
- Large, multi-node clusters will likely not benefit from this enhancement.
- Single-node clusters and small, multi-node clusters (on which the number of replicas per volume exceeds the number
of available nodes) will experience:
- Increased data durability. If a single disk fails, a healthy replica will still exist on an disk that
has not failed.
- Increased data availability. If a single disk on a node becomes unavailable, but the node itself remains
healthy, at least one replica remains healthy. On a single-node cluster, this can directly prevent a volume crash.
On a small, multi-node cluster, this can prevent a future volume crash due to the loss of a different node.
### Goals
- In all situations, the Longhorn replica scheduler will make a best effort to ensure two replicas for the same volume
do not schedule to the same disk.
- Optionally, the scheduler will refuse to schedule a replica to a disk that has another replica of the same volume.
## Proposal
### User Stories
#### Story 1
My cluster consists of a single node with multiple attached SSDs. When I create any new volume, I want replicas to
distribute across these disks so that I can recover from n - 1 disk failures. If there are not as many available disks
as desired replicas, I want Longhorn to do the best it can.
#### Story 2
My cluster consists of a single node with multiple attached SSDs. When I create any new volume, I want replicas to
distribute across these disks so that I can recover from n - 1 disk failure. If there are not as many available disks
as desired replicas, I want scheduling to fail obviously. It is important that I know my volumes aren't being protected
so I can take action.
#### Story 3
My cluster consists of a single node with multiple attached SSDs. When I create a specific, high-priority volume, I want
replicas to distribute across these disks so that I can recover from n - 1 disk failure. If there are not as many
available disks as desired replicas, I want scheduling to fail obviously. It is important that I know high-priority
volume isn't being protected so I can take action.
### User Experience In Detail
### API changes
Introduce a new Replica Disk Level Soft Anti-Affinity setting with the following definition. By default, set it to
`true`. While it is generally desirable to schedule replicas to different disks, it would break with existing behavior
to refuse to schedule replicas when different disks are not available.
```golang
SettingDefinitionReplicaDiskSoftAntiAffinity = SettingDefinition{
DisplayName: "Replica Disk Level Soft Anti-Affinity",
Description: "Allow scheduling on disks with existing healthy replicas of the same volume",
Category: SettingCategoryScheduling,
Type: SettingTypeBool,
Required: true,
ReadOnly: false,
Default: "true",
}
```
Introduce a new `spec.replicaDiskSoftAntiAffinity` volume field. By default, set it to `ignored`. Similar to the
existing `spec.replicaSoftAntiAffinity` and `spec.replicaSoftZoneAntiAffinityFields`, override the global setting if
this field is set to `enabled` or `disabled`.
```yaml
replicaDiskSoftAntiAffinity:
description: Replica disk soft anti affinity of the volume. Set enabled
to allow replicas to be scheduled in the same disk.
enum:
- ignored
- enabled
- disabled
type: string
```
## Design
### Implementation Overview
The current replica scheduler does the following:
1. Determines which nodes a replica can be scheduled to based on node condition and the `ReplicaSoftAntiAffinity` and
`ReplicaZoneSoftAntiAffinity` settings.
1. Creates a list of all schedulable disks on these nodes.
1. Chooses the disk with the most available space for scheduling.
Add a step so that the replica scheduler:
1. Determines which nodes a replica can be scheduled to based on node condition and the `ReplicaSoftAntiAffinity` and
`ReplicaZoneSoftAntiAffinity` settings.
1. Creates a list of all schedulable disks on these nodes.
1. Filters the list to include only disks with the least number of existing matching replicas and optionally only disks
with no existing matching replicas.
1. Chooses the disk from the filtered list with the most available space for scheduling.
### Test plan
Minimally implement two new test cases:
1. In a cluster that includes nodes with multiple available disks, create a volume with
`spec.replicaSoftAntiAffinity = true`, `spec.replicaDiskSoftAntiAffinity = true`, and `numberOfReplicas` equal to the
total number of disks in the cluster. Confirm that each replica schedules to a different disk. It may be necessary
to tweak additional factors. For example, ensure that one disk has enough free space that the old scheduling
behavior would assign two replicas to it instead of distributing the replicas evenly among the disks.
1. In a cluster that includes nodes with multiple available disks, create a volume with
`spec.replicaSoftAntiAffinity = true`, `spec.replicaDiskSoftAntiAffinity = false`, and `numberOfReplicas` equal to
one more than the total number of disks in the cluster. Confirm that a replica fails to schedule. Previously,
multiple replicas would have scheduled to the same disk and no error would have occurred.
### Upgrade strategy
The Replica Disk Level Soft Anti-Affinity setting defaults to `true` to maintain backwards compatibility. It if is set
to `false``, new replicas that require scheduling will follow the new behavior.
The `spec.replicaDiskSoftAntiAffinity` volume field defaults to `ignored` to maintain backwards compatibility. If it is
set to `enabled` on a volume, new replicas for that volume that require scheduling will follow the new behavior.

View File

@ -0,0 +1,267 @@
# BackingImage Backup Support
## Summary
This feature enables Longhorn to backup the BackingImage to backup store and restore it.
### Related Issues
- [FEATURE] Restore BackingImage for BackupVolume in a new cluster [#4165](https://github.com/longhorn/longhorn/issues/4165)
## Motivation
### Goals
- When a Volume with a BackingImage being backed up, the BackingImage will also be backed up.
- User can manually back up the BackingImage.
- When restoring a Volume with a BackingImage, the BackingImage will also be restored.
- User can manually restore the BackingImage.
- All BackingImages are backed up in blocks.
- If the block contains the same data, BackingImages will reuse the same block in backup store instead of uploading another identical one.
## Proposal
### User Stories
With this feature, there is no need for user to manually handle BackingImage across cluster when backing up and restoring the Volumes with BackingImages.
### User Experience In Detail
Before this feature:
The BackingImage will not be backed up automatically when backing up a Volume with the BackingImage. So the user needs to prepare the BackingImage again in another cluster before restoring the Volume back.
After this feature:
A BackingImage will be backed up automatically when a Volume with the BackingImage is being backed up. User can also manually back up a BackingImage independently.
Then, when the Volume with the BackingImage is being restored from backup store, Longhorn will restore the BackingImage at the same time automatically. User can also manually restore the BackingImage independently.
This improve the user experience and reduce the operation overhead.
## Design
### Implementation Overview
#### Backup BackingImage - BackupStore
- Backup `BackingImage` is not the same as backup `Volume` which consists of a series of `Snapshots`. Instead, a `BackingImage` already has all the blocks we need to backup. Therefore, we don't need to find the delta between two `BackingImages` like what we do for`Snapshots` which delta might exist in other `Snapshots` between the current `Snapshot` and the last backup `Snapshot`.
- All the `BackingImages` share the same block pools in backup store, so we can reuse the blocks to increase the backup speed and save the space. This can happen when user create v1 `BackingImage`, use the image to add more data and then export another v2 `BackingImage`.
- For restoration, we still restore fully on one of the ready disk.
- Different from `Volume` backup, `BackingImage` does not have any size limit. It can be less than 2MB or not a multiple of 2MB. Thus, the last block might not be 2MB.
- When backing up `BackingImage`
1. `preload()`: the BackingImage to get the all the sectors that have data inside.
2. `createBackupBackingMapping()`: to get all the blocks we need to backup
- Block: offset + size (2MB for each block, last block might less than 2MB)
3. `backupMappings()`: write the block to the backup store
- if the block is already in the backup store, skip it.
4. `saveBackupBacking()`: save the metadata of the `BackupBackingImage` including the block mapping to the backup store. Mapping needs to include block size.
- When restoring `BackingImage`
- `loadBackupBacking()`: load the metadata of the `BackupBackingImage` from the backup store
- `populateBlocksForFullRestore() + restoreBlocks()`: based on the mapping, write the block data to the correct offset.
- We backup the blocks in async way to increase the backup speed.
- For qcow2 `BackingImage`, the format is not the same as raw file, we can't detect the hole and the data sector. So we back up all the blocks.
#### Backup BackingImage - Controller
1. Add a new CRD `backupbackingimage.longhorn.io`
```go
type BackupBackingImageSpec struct {
Labels map[string]string `json:"labels"`
BackingImageName string `json:"backingImageName"`
SyncRequestedAt metav1.Time `json:"syncRequestedAt"`
}
type BackupBackingImageStatus struct {
OwnerID string `json:"ownerID"`
Checksum string `json:"checksum"`
URL string `json:"url"`
Size string `json:"size"`
Labels map[string]string `json:"labels"`
State BackupBackingImageState `json:"state"`
Progress int `json:"progress"`
Error string `json:"error,omitempty"`
Messages map[string]string `json:"messages"`
ManagerAddress string `json:"managerAddress"`
BackupCreatedAt string `json:"backupCreatedAt"`
LastSyncedAt metav1.Time `json:"lastSyncedAt"`
CompressionMethod BackupCompressionMethod `json:"compressionMethod"`
}
```
```go
type BackupBackingImageState string
const (
BackupBackingImageStateNew = BackupBackingImageState("")
BackupBackingImageStatePending = BackupBackingImageState("Pending")
BackupBackingImageStateInProgress = BackupBackingImageState("InProgress")
BackupBackingImageStateCompleted = BackupBackingImageState("Completed")
BackupBackingImageStateError = BackupBackingImageState("Error")
BackupBackingImageStateUnknown = BackupBackingImageState("Unknown")
)
```
- Field `Spec.ManagerAddress` indicates the address of the backing-image-manager running BackingImage backup.
- Field `Status.Checksum` records the checksum of the BackingImage. Users may create a new BackingImage with the same name but different content after deleting an old one or there is another BackingImage with the same name in another cluster. To avoid the confliction, we use checksum to check if they are the same.
- If cluster already has the `BackingImage` with the same name as in the backup store, we still create the `BackupBackingImage` CR. User can use the checksum to check if they are the same. Therefore we don't use `UUID` across cluster since user might already prepare the same BackingImage with the same name and content in another cluster.
2. Add a new controller `BackupBackingImageController`.
- Workflow
- Check and update the ownership.
- Do cleanup if the deletion timestamp is set.
- Cleanup the backup `BackingImage` on backup store
- Stop the monitoring
- If `Status.LastSyncedAt.IsZero() && Spec.BackingImageName != ""` means **it is created by the User/API layer**, we need to do the backup
- Start the monitor
- Pick one `BackingImageManager`
- Request `BackingImageManager` to backup the `BackingImage` by calling `CreateBackup()` grpc
- Else it means the `BackupBackingImage` CR is created by `BackupTargetController` and the backup `BackingImage` already exists in the remote backup target before the CR creation.
- Use `backupTargetClient` to get the info of the backup `BackingImage`
- Sync the status
3. In `BackingImageManager - manager(backing_image.go)`
- Implement `CreateBackup()` grpc
- Backup `BackingImage` to backup store in blocks
4. In controller `BackupTargetController`
- Workflow
- Implement `syncBackupBackingImage()` function
- Create the `BackupBackingImage` CRs whose name are in the backup store but not in the cluster
- Delete the `BackupBackingImage` CRs whose name are in the cluster but not in the backup store
- Request `BackupBackingImageController` to reconcile those `BackupBackingImage` CRs
5. Add a backup API for `BackingImage`
- Add new action `backup` to `BackingImage` (`"/v1/backingimages/{name}"`)
- create `BackupBackingImage` CR to init the backup process
- if `BackupBackingImage` already exists, it means there is already a `BackupBackingImage` in backup store, user can check the checksum to verify if they are the same.
- API Watch: establish a streaming connection to report BackupBackingImage info.
6. Trigger
- Back up through `BackingImage` operation manually
- Back up `BackingImage` when user back up the volume
- in `SnapshotBackup()` API
- we get the `BackingImage` of the `Volume`
- back up `BackingImage` if the `BackupBackingImage` does not exist
#### Restoring BackingImage - Controller
2. Add new data source type `restore` for `BackingImageDataSource`
```go
type BackingImageDataSourceType string
const (
BackingImageDataSourceTypeDownload = BackingImageDataSourceType("download")
BackingImageDataSourceTypeUpload = BackingImageDataSourceType("upload")
BackingImageDataSourceTypeExportFromVolume = BackingImageDataSourceType("export-from-volume")
BackingImageDataSourceTypeRestore = BackingImageDataSourceType("restore")
DataSourceTypeRestoreParameterBackupURL = "backup-url"
)
// BackingImageDataSourceSpec defines the desired state of the Longhorn backing image data source
type BackingImageDataSourceSpec struct {
NodeID string `json:"nodeID"`
UUID string `json:"uuid"`
DiskUUID string `json:"diskUUID"`
DiskPath string `json:"diskPath"`
Checksum string `json:"checksum"`
SourceType BackingImageDataSourceType `json:"sourceType"`
Parameters map[string]string `json:"parameters"`
FileTransferred bool `json:"fileTransferred"`
}
```
3. Create BackingImage APIs
- No need to change
- Create BackingImage CR with `type=restore` and `restore-url=${URL}`
- If BackingImage already exists in the cluster, user can use checksum to verify if they are the same.
4. In `BackingImageController`
- No need to change, it will create the `BackingImageDataSource` CR
5. In `BackingImageDataSourceController`
- No need to change, it will create the `BackingImageDataSourcePod` to do the restore.
6. In `BackingImageManager - data_source`
- When init the service, if the type is `restore`, then restore from `backup-url` by requesting sync service in the same pod.
```go
requestURL := fmt.Sprintf("http://%s/v1/files", client.Remote)
req, err := http.NewRequest("POST", requestURL, nil)
q := req.URL.Query()
q.Add("action", "restoreFromBackupURL")
q.Add("url", backupURL)
q.Add("file-path", filePath)
q.Add("uuid", uuid)
q.Add("disk-uuid", diskUUID)
q.Add("expected-checksum", expectedChecksum)
````
- In `sync/service` implement `restoreFromBackupURL()` to restore the `BackingImage` from backup store to the local disk.
7. In `BackingImageDataSourceController`
- No need to change, it will take over control when `BackingImageDataSource` status is `ReadyForTransfer`.
- If it failed to restore the `BackingImage`, the status of the `BackingImage` will be failed and `BackingImageDataSourcePod` will be cleaned up and retry with backoff limit like `type=download`. The process is the same as other `BackingImage` creation process.
8. Trigger
- Restore through `BackingImage` operation manually
- Restore when user restore the `Volume` with `BackingImage`
- Restoring a Volume is actually requesting `Create` a Volume with `fromBackup` in the spec
- In `Create()` API we check if the `Volume` has `fromBackup` parameters and has `BackingImage`
- Check if `BackingImage` exists
- Check and restore `BackupBackingImage` if `BackingImage` does not exist
- Restore `BackupBackingImage` by creating `BackingImage` with type `restore` and `backupURL`
- Then Create the `Volume` CR so the admission webhook won't failed because of missing `BackingImage` ([ref](https://github.com/longhorn/longhorn-manager/blob/master/webhook/resources/volume/validator.go#L86))
- Restore when user create `Volume` through `CSI`
- In `CreateVolume()` we check if the `Volume` has `fromBackup` parameters and has `BackingImage`
- In `checkAndPrepareBackingImage()`, we restore `BackupBackingImage` by creating `BackingImage` with type `restore` and `backupURL`
#### API and UI changes In Summary
1. `longhorn-ui`:
- Add a new page of `BackupBackingImage` like `Backup`
- The columns on `BackupBackingImage` list page should be: `Name`, `Size`, `State`, `Created At`, `Operation`.
- `Name` can be clicked and will show `Checksum` of the `BackupBackingImage`
- `State`: `BackupBackingImageState` of the `BackupBackingImage` CR
- `Operation` includes
- `restore`
- `delete`
- Add a new operation `backup` for every `BackingImage` in the `BackingImage` page
2. `API`:
- Add new action `backup` to `BackingImage` (`"/v1/backingimages/{name}"`)
- create `BackupBackingImage` CR to init the backup process
- `BackupBackingImage`
- `GET "/v1/backupbackingimages"`: get all `BackupBackingImage`
- API Watch: establish a streaming connection to report `BackupBackingImage` info change.
### Test plan
Integration tests
1. `BackupBackingImage` Basic Operation
- Setup
- Create a `BackingImage`
- Setup the backup target
- Back up `BackingImage`
- `BackupBackingImage` CR should be complete
- Delete the `BackingImage` in the cluster
- Restore the `BackupBackingImage`
- Checksum should be the same
2. Back up `BackingImage` when backing up and restoring Volume
- Setup
- Create a `BackingImage`
- Setup the backup target
- Create a Volume with the `BackingImage`
- Backup the `Volume`
- `BackupBackingImage` CR should be created and complete
- Delete the `BackingImage`
- Restore the Volume with same `BackingImage`
- `BackingImage` should be restored and the `Volume` should also be restored successfully
- `Volume` checksum is the same
Manual tests
1. `BackupBackingImage` reuse blocks
- Setup
- Create a `BackingImage` A
- Setup the backup target
- Create a `Volume` with `BackingImage` A, write some data and export to another `BackingImage` B
- Back up `BackingImage` A
- Back up `BackingImage` B
- Check it reuses the blocks when backing up `BackingImage` B (by trace log)

View File

@ -0,0 +1,73 @@
# Engine Upgrade Enforcement
## Summary
The current Longhorn upgrade process lacks enforcement of the engine version, potentially leading to compatibility issues. To address this concern, we propose the implementation of an Engine Upgrade Enforcement feature.
### Related Issues
https://github.com/longhorn/longhorn/issues/5842
## Motivation
Longhorn needs to be able to safely deprecated and remove the certain fields, such as [[TASK] Remove deprecated instances field and instance type from instance manager CR #5844](https://github.com/longhorn/longhorn/issues/5844)
### Goals
The primary goal of this proposal is to enhance Longhorn's upgrade mechanism by introducing logic that prevents upgrading to Longhorn versions while there are incompatible engine images in use.
### Non-goals [optional]
`None`
## Proposal
This proposal focuses on preventing users from upgrading to unsupported or incompatible engine versions. This enhancement will build upon the existing pre-upgrade checks to include validation of engine version compatibility.
### User Stories
#### Story 1: Preventing Incompatible Upgrades
Previously, users had the freedom to continue using an older engine version after a Longhorn upgrade. With the proposed enhancement, the Longhorn upgrade process will be blocked if it includes an incompatible engine version. This will enforce users to manually upgrade the engine to a compatible version before proceeding with the Longhorn upgrade.
### User Experience In Detail
User will perform upgrade a usual. Longhorn will examine the compatibility of the current engine version. If the current engine version is incompatible with the target engine version for the upgrade, Longhorn will halt the upgrade process and prompt the user to address the engine version mismatch before proceeding.
### API changes
`None`
## Design
### Implementation Overview
The implementation approach for this feature will be similar to the [Upgrade Path Enforcement feature](https://github.com/longhorn/longhorn/blob/master/enhancements/20230315-upgrade-path-enforcement.md).
Key implementation steps include:
1. Enhance the function [CheckUpgradePathSupported(...)](https://github.com/longhorn/longhorn-manager/blob/v1.5.1/upgrade/util/util.go#L168) to include the new checks.
```
func CheckUpgradePathSupported(namespace string, lhClient lhclientset.Interface) error {
if err := CheckLHUpgradePathSupported(namespace, lhClient); err != nil {
return err
}
return CheckEngineUpgradePathSupported(namespace, lhClient, emeta.GetVersion())
}
```
1. Retrieve the current engine images being used and record the versions.
1. Prevent upgrades if the targeting engine version is detact to be downgrading.
1. Prevent upgrades if the engine image version is lower than [the minimum required version for the new engine image controller API](https://github.com/longhorn/longhorn-engine/blob/v1.5.1/pkg/meta/version.go#L10).
### Test plan
- Create unit test for the new logic.
- Run manual test to verify the handling of incompatible engine image versions (e.g., Longhorn v1.4.x -> v1.5.x -> v1.6.x.)
### Upgrade strategy
`None`
## Note [optional]
`None`

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Some files were not shown because too many files have changed in this diff Show More