From 52fe75f693dbeeb38cfe5fc133979d41658c2a3e Mon Sep 17 00:00:00 2001 From: Chin-Ya Huang Date: Fri, 25 Jun 2021 20:20:26 +0800 Subject: [PATCH] Add LEP - label-driven recurring job Longhorn issue 467 Signed-off-by: Chin-Ya Huang --- .../20210624-label-driven-recurring-job.md | 1043 +++++++++++++++++ 1 file changed, 1043 insertions(+) create mode 100644 enhancements/20210624-label-driven-recurring-job.md diff --git a/enhancements/20210624-label-driven-recurring-job.md b/enhancements/20210624-label-driven-recurring-job.md new file mode 100644 index 0000000..6121b4e --- /dev/null +++ b/enhancements/20210624-label-driven-recurring-job.md @@ -0,0 +1,1043 @@ +# Label-driven Recurring Job + +## Summary + +Replace volume spec `recurringJobs` with the label-driven model. Abstract volume recurring jobs to a new CRD named "RecurringJob". The names or groups of recurring jobs can be referenced in volume labels. + +Users can set a recurring job to the `Default` group and Longhorn will automatically apply when the volume has no job labels. + +Only one cron job will be created per recurring job. Users can also update the recurring job, and Longhorn reflects the changes to the associated cron job. + +When instruct Longhorn to delete a recurring job, this will also remove the associated cron job. + +StorageClass now should use `recurringJobSelector` instead to refer to the recurring job names. + +During the version upgrade, existing volume spec `recurringJobs` and storageClass `recurringJobs` will automatically translate to volume labels, and recurring job CRs will get created. + +### Related Issues + +https://github.com/longhorn/longhorn/issues/467 + +## Motivation + +### Goals + +Phase 1: +- Each recurring job can be in single or multiple groups. +- Each group can have single or multiple recurring jobs. +- The jobs and groups can be reference with the volume label. +- The recurring job in `default` group will automatically apply to a volume that has no job labels. +- Can create multiple recurring jobs with UI or YAML. +- The recurring job should include settings; `name`, `groups`, `task`, `cron`, `retain`, `concurrency`, `labels`. + +Phase2: +The StorageClass and upgrade migration are still dependent on volume spec, thus complete removal of volume spec should be done in phase 2. + +### Non-goals [optional] + +1. Does the snapshot/backup operation one by one. The operation order can be defined as sequential, consistent (for volume group snapshot), or throttled (with a concurrent number as a parameter) in the future. + +> https://github.com/longhorn/longhorn/pull/2737#issuecomment-887985811 + +## Proposal + +#### Story 1 - set recurring jobs and groups by volume labels + +As a Longhorn user / System administrator. + +I want to directly update recurring jobs referenced in multiple volumes. + +So I do not need to update each volume with cron job definition. + +#### Story 2 - automatically applies for the default recurring jobs. + +As a Longhorn user / System administrator. + +I want the ability to set one or multiple `backup` and `snapshot` recurring jobs as default. All volumes without any recurring job label should automatically apply with the default recurring jobs. + +So I can be assured that all volumes without any recurring job label will automatically apply with default. + +#### Story 3 - automatically upgrade migration + +As a Longhorn user / System administrator + +I want Longhorn to automatically convert existing volume spec `recurringJobs` to volume labels, and create associate recurring job CRs. + +So I don't have to manually create recurring job CRs and patch the labels. + +### User Experience In Detail + +#### Story 1 - set recurring job in volume + +1. Create a recurring job on UI `Recurring Job` page or via `kubectl`. +2. In UI, Navigate to Volume, `Recurring Jobs Schedule`. +3. User can choose from `Job` or job `Group` from the tab. + - On `Job` tab, + 1. User sees existing recurring jobs that volume had labeled. + 1. User able to select `Backup` or `Snapshot` for the `Type` from the drop-down list. + 2. User able to edit the `Schedule`, `Retain`, `Concurrency` and `Labels`. + - On the job `Group` tab. + 1. User sees all existing recurring job groups from the `Name` drop-down list. + 2. User selects the job from the drop-down list. + 3. User sees all recurring `jobs` under the `group`. +4. Click `Save` updates to the volume label. +5. Update the recurring job CRs also reflect on the cron job and UI `Recurring Jobs Schedule`. + +**Before enhancement** + +Recurring jobs can only be added and updated per volume spec. + +Create cron job definitions for each volume causing duplicated setup effort. + +The recurring job can only be updated per volume. + +**After enhancement** + +Recurring jobs can be added and update as the volume label. + +Can select a recurring job from the UI drop-down menu and will automatically show the information from the recurring job CRs. + +Update the recurring job definition will automatically apply to all volumes with the job label. + +#### Story 2 - automatically apply to default recurring jobs + +1. Add `default` to one or multiple recurring jobs `Groups` in UI or `kubectl`. +2. Longhorn automatically applies the `default` group recurring jobs to all volumes without job labels. + +**Before enhancement** + +Default recurring jobs are set via StorageClass at PVC creation only. No default recurring job can be set up for UI-created volumes. + +Updating StorageClass does not reflect on the existing volumes. + +**After enhancement** + +Have the option to set default recurring jobs via `StorageClass` or `RecurringJob`. + +Longhorn recurring job controller automatically applies default recurring jobs to all volumes without the job labels. + +Longhorn adds the default recurring jobs when all job labels are removed from the volume. + +When the `RecurringJobSelector` is set in the `StorageClass`, it will be used as default instead. + +#### Story 3 - automatically upgrade migration + +1. Perform upgrade. +2. StorageClass `recurringJobs` will get convert to `recurringJobSelector`. +3. Recurring job CRs will get created from `recurringJobs`. +4. Volume will be labeled with `recurring-job.longhorn.io/---: enabled` from volume spec `recurringJobs`. +5. Recurring job CRs will get created from volume spec `recurringJobs`. When the config is identical among multiple volumes, only one will get created and volumes will share this recurring job CR. +6. Volume spec `recurringJobs` will get removed. + +### API Changes + +- Add new HTTP endpoints: + - GET `/v1/recurringjobs` to list of recurring jobs. + - GET `/v1/recurringjobs/{name}` to get specific recurring job. + - DELETE `/v1/recurringjobs/{name}` to delete specific recurring job. + - POST `/v1/recurringjobs` to create a recurring job. + - PUT `/v1/recurringjobs/{name}` to update specific recurring job. + - `/v1/ws/recurringjobs` and `/v1/ws/{period}/recurringjobs` for websocket stream. +- Add new RESTful APIs for the new `RecurringJob` CRD: + - `Create` + - `Update` + - `List` + - `Get` + - `Delete` +- Add new APIs for users to update recurring jobs for individual volume. The : + - `/v1/volumes/?action=recurringJobAdd`, expect request's body in form {name:, isGroup:}. + - `/v1/volumes/?action=recurringJobList`. + - `/v1/volumes/?action=recurringJobDelete`, expect request's body in form {name:, isGroup:}. + +## Design + +### Implementation Overview + +#### Add Recurring Job CRD. + - Update the ClusterRole to include `recurringjob`. + - Printer column should include `Name`, `Groups`, `Task`, `Cron`, `Retain`, `Concurrency`, `Age`, `Labels`. + ``` + NAME GROUPS TASK CRON RETAIN CONCURRENCY AGE LABELS + snapshot1 ["default","group1"] snapshot * * * * * 1 2 14m {"label/1":"a","label/2":"b"} + + ``` + - The `Name`: String used to reference the recurring job in volume with the label `recurring-job.longhorn.io/: enabled`. + - The `Groups`: Array of strings that set groupings to the recurring job. This is used to reference the recurring job group in volume with the label `recurring-job-group.longhorn.io/: enabled`. When including `default`, the recurring job will be added to the volume label if no other job exists in the volume label. + - The `Task`: String of either one of `backup` or `snapshot`. + Also, add validation in the CRD YAML with pattern regex match. + - The `Cron`: String in cron expression represents recurring job scheduling. + - The `Retain`: Integer of the number of snapshots/backups to keep for the volume. + - The `Concurrency`: Integer of the concurrent job to run by each cron job. + - The `Age`: Date of the CR creation timestamp. + - The `Labels`: Dictionary of the labels. + +#### Add Command `recurring-job` To `longhorn-manager` Binary + + 1. Add new command `recurring-job --longhorn-manager ` and remove old command `snapshot`. + > Get the `recurringJob.Spec` on execution using Kubernetes API. + 2. Get volumes by label selector `recurring-job.longhorn.io/: enabled` to filter out volumes. + 3. Get volumes by label selector `recurring-job-group.longhorn.io/: enabled` to filter out volumes if the job is set with a group. + 4. Filter and create a list of the volumes in the state `attached` or setting `allow-recurring-job-while-volume-detached`. + 5. Use the concurrent number parameter to throttle goroutine with channel. Each goroutine creates `NewJob()` and `job.run()` for the volumes. + The job `snapshotName` format will be `-c-`. + +#### Changes In The Volume Controller + + 1. The `updateRecurringJobs` method is responsible to add the default label if not other labels exist. + + > Since the storage class and upgrade migration contains recurringJobs spec. So we will keep the `VolumeSpec.RecurringJobs` in code to create the recurring jobs for volumes from the `storageClass`. + + > In case names are duplicated between different `storageClasses`, only one recurring job CR will be created. + +#### Changes In The VolumeManager `CreateVolume` + + - Add new method input `recurringJobSelector`: + 1. Convert `Volume.Spec.RecurringJobs` to `recurringJobSelector`. + 2. Add recurring job label if `recurringJobSelector` method input is not empty. + +#### Changes In The Datastore + + - For `CreateVolume` and `UpdateVolume` add a function similar to `fixupMetadata` that handles recurring jobs: + 1. Add recurring job labels if `Volume.Spec.RecurringJobs` is not empty. Then unset `Volume.Spec.RecurringJobs`. + 2. Label with `default` job-group if no other recurring job label exists. + +#### Introduce `recurringJobSelector` As Part Of StorageClass Parameters. + - The CSI controller can use `recurringJobSelector` for volume creation. + +#### Changes In CSI Controller Server + + 1. Put `recurringJobSelector` to `vol.RecurringJobSelector` at HTTP API layer to use for adding volume recurring job label in `VolumeManager.CreateVolume`. The `CreateVolume` method will have a new input `recurringJobSelector`. + 2. Get `recurringJobs` from parameters, validate and create recurring job CRs via API if not already exist. + +#### Add Recurring Job Controller + + - The code structure will be the same as other controllers. + - Add the finalizer to the recurring job CRs if not exist. + - The controller will be informed by `recurringJobInformer` and `enqueueRecurringJob`. + - Create and update CronJob per recurring job. + 1. Generate a new cron job object. + - Include labels `recurring-job.longhorn.io`. + ``` + recurring-job.longhorn.io: + ``` + - Compose command, + ``` + longhorn-manager -d\ + recurring-job \ + --manager-url + ``` + 2. Create new cron job with annotation `last-applied-cronjob-spec` or update cron job if the new cron job spec is different from the `last-applied-cronjob-spec`. + - Use defer to clean up CronJob. + 1. When a recurring job gets deleted. + 1. Delete the cron job with selected labels: `recurring-job.longhorn.io/`. + 2. Remove the finalizer. + + +#### UI + +##### Add New Page `Recurring Job` In UI + +A new page for `Recurring Job` to create/update/delete recurring jobs. + +``` +Recurring Job [Custom Column] +==================================================================================================================== +[Create] [Delete] [Search Box v ][__________][Go] + | Name + | Group + | Type + | Schedule + | Labels + | Retain + | Concurrency +=================================================================================================================== +[] | Name | Group | Type | Schedule | Labels | Retain | Concurrency | Operation | +---+-------+--------+--------+-----------------+--------------+--------+-------------+-------------+--------------| +[] | dummy | aa, bb | backup | 00:00 every day | k1:v1, k2:v2 | 20 | 10 | [Icon] v | + | Update + | Delete +=================================================================================================================== + [<] [1] [>] +``` + +**Scenario: Add Recurring Job** + +*Given* user sees `Create` on top left of the page. + +*When* user click `Create`. +*Then* user sees a pop-up form. + +``` +* Name +[ ] + +Groups + + +* Task +[Backup] + +* Schedule +[00:00 every day] + +* Retain +[20] + +* Concurrency +[10] + +* Labels + +``` + +- Field with `*` is mendatory +- User can click on `+` next to `Group` to add more groups. +- User can click on the `Schedule` field and a window will pop-up for `Cron` and `Generate Cron`. +- `Retain` cannot be `0`. +- `Concurrency` cannot be `0`. +- User can click on `+` next to `Labels` to add more labels. + +*When* user click `OK`. +*Then* frontend **POST** `/v1/recurringjobs` to create a recurring job. +``` +❯ curl -X POST -H "Content-Type: application/json" \ +-d '{"name": "sample", "groups": ["group-1", "group-2"], "task": "snapshot", "cron": "* * * * *", "retain": 2, "concurrency": 1, "labels": {"label/1": "a"}}' \ +http://54.251.150.85:30944/v1/recurringjobs | jq +{ + "actions": {}, + "concurrency": 1, + "cron": "* * * * *", + "groups": [ + "group-1", + "group-2" + ], + "id": "sample", + "labels": { + "label/1": "a" + }, + "links": { + "self": "http://54.251.150.85:30944/v1/recurringjobs/sample" + }, + "name": "sample", + "retain": 2, + "task": "snapshot", + "type": "recurringJob" +} +``` + +**Scenario: Update Recurring Job** + +*Given* an `Operation` drop-down list next to the recurring job. + +*When* user click `Edit`. +*Then* user sees a pop-up form. +``` +Name +[sample] + +Groups +[group-1] +[group-2] + +Task +[Backup] + +Schedule +[00:00 every day] + +Retain +[20] + +Concurrency +[10] + +Labels +[labels/1]: [a] +[labels/2]: [b] +``` +- `Name` field should be immutable. +- `Task` field should be imuutable. + +*And* user edit the fields in the form. + +*When* user click `Save`. +*Then* frontend **PUT** `/v1/recurringjobs/{name}` to update specific recurring job. +``` +❯ curl -X PUT -H "Content-Type: application/json" \ +-d '{"name": "sample", "groups": ["group-1", "group-2"], "task": "snapshot", "cron": "* * * * *", "retain": 2, "concurrency": 1, "labels": {"label/1": "a", "label/2": "b"}}' \ +http://54.251.150.85:30944/v1/recurringjobs/sample | jq +{ + "actions": {}, + "concurrency": 1, + "cron": "* * * * *", + "groups": [ + "group-1", + "group-2" + ], + "id": "sample", + "labels": { + "label/1": "a", + "label/2": "b" + }, + "links": { + "self": "http://54.251.150.85:30944/v1/recurringjobs/sample" + }, + "name": "sample", + "retain": 2, + "task": "snapshot", + "type": "recurringJob" +} +``` + +**Scenario: Delete Recurring Job** + +*Given* an `Operation` drop-down list next to the recurring job. + +*When* user click `Delete`. +*Then* user should see a pop-up window for confirmation. + +*When* user click `OK`. +*Then* frontend **DELETE** `/v1/recurringjobs/{name}` to delete specific recurring job. +``` +❯ curl -X DELETE http://54.251.150.85:30944/v1/recurringjobs/sample | jq +``` + +> Also need a button for batch deletion on top left of the table. + +##### Updates `Volume` Page On UI + +**Scenario: Select From Recurring Job or Job Group** + +*When* user should be able to choose if want to add recurring job as `Job` or `Group` from the tab. + +**Scenario: Add Recurring Job Group On Volume Page** + +*Given* user go to job `Group` tab. +*When* user click `+ New`. +*And* Frontend can **GET** `/v1/recurringjobs` to list of recurring jobs. +*And* Frontend need to gather all `groups` from data. +``` +❯ curl -X GET http://54.251.150.85:30783/v1/recurringjobs | jq +{ + "data": [ + { + "actions": {}, + "concurrency": 2, + "cron": "* * * * *", + "groups": [ + "group2", + "group3" + ], + "id": "backup1", + "labels": null, + "links": { + "self": "http://54.251.150.85:30783/v1/recurringjobs/backup1" + }, + "name": "backup1", + "retain": 1, + "task": "backup", + "type": "recurringJob" + }, + { + "actions": {}, + "concurrency": 2, + "cron": "* * * * *", + "groups": [ + "default", + "group1" + ], + "id": "snapshot1", + "labels": { + "label/1": "a", + "label/2": "b" + }, + "links": { + "self": "http://54.251.150.85:30783/v1/recurringjobs/snapshot1" + }, + "name": "snapshot1", + "retain": 1, + "task": "snapshot", + "type": "recurringJob" + } + ], + "links": { + "self": "http://54.251.150.85:30783/v1/recurringjobs" + }, + "resourceType": "recurringJob", + "type": "collection" +} +``` +*Then* the user selects the group from the drop-down list. + +*When* user click on `Save`. +*Then* frontend **POST** `/v1/volumes/?action=recurringJobAdd` with request body `{name: , isGroup: true}`. +``` +❯ curl -X POST -H "Content-Type: application/json" \ +-d '{"name": "test3", "isGroup": true}' \ +http://54.251.150.85:30783/v1/volumes/pvc-4011f9a6-bae3-43e3-a2a1-893997d0aa63\?action\=recurringJobAdd | jq +{ + "data": [ + { + "actions": {}, + "id": "default", + "isGroup": true, + "links": { + "self": "http://54.251.150.85:30783/v1/volumerecurringjobs/default" + }, + "name": "default", + "type": "volumeRecurringJob" + }, + { + "actions": {}, + "id": "test3", + "isGroup": true, + "links": { + "self": "http://54.251.150.85:30783/v1/volumerecurringjobs/test3" + }, + "name": "test3", + "type": "volumeRecurringJob" + } + ], + "links": { + "self": "http://54.251.150.85:30783/v1/volumes/pvc-4011f9a6-bae3-43e3-a2a1-893997d0aa63" + }, + "resourceType": "volumeRecurringJob", + "type": "collection" +} +``` +*And* user sees all `jobs` with the `group`. + +**Scenario: Remove Recurring Job Group On Volume Page** + +*Given* user go to job `Group` tab. +*When* user click the `bin` icon of the recurring job group. +*Then* frontend `/v1/volumes/?action=recurringJobDelete` with request body `{name: , isGroup: true}`. +``` +❯ curl -X POST -H "Content-Type: application/json" \ +-d '{"name": "test3", "isGroup": true}' \ +http://54.251.150.85:30783/v1/volumes/pvc-4011f9a6-bae3-43e3-a2a1-893997d0aa63\?action\=recurringJobDelete | jq +{ + "data": [ + { + "actions": {}, + "id": "default", + "isGroup": true, + "links": { + "self": "http://54.251.150.85:30783/v1/volumerecurringjobs/default" + }, + "name": "default", + "type": "volumeRecurringJob" + } + ], + "links": { + "self": "http://54.251.150.85:30783/v1/volumes/pvc-4011f9a6-bae3-43e3-a2a1-893997d0aa63" + }, + "resourceType": "volumeRecurringJob", + "type": "collection" +} +``` + +**Scenario: Add Recurring Job On Volume Page** + +*Given* user go to `Job` tab. +*When* user click `+ New`. +*And* user sees the name is auto-generated. +*And* user can select `Backup` or `Snapshot` from the drop-down list. +*And* user can edit `Schedule`, `Labels`, `Retain` and `Concurrency`. + +*When* user click on `Save`. +*Then* frontend **POST** /v1/recurringjobs to create a recurring job. +``` +❯ curl -X POST -H "Content-Type: application/json" \ +-d '{"name": "backup1", "groups": [], "task": "backup", "cron": "* * * * *", "retain": 2, "concurrency": 1, "labels": {"label/1": "a"}}' \ +http://54.251.150.85:30944/v1/recurringjobs | jq +{ + "actions": {}, + "concurrency": 1, + "cron": "* * * * *", + "groups": [], + "id": "backup1", + "labels": { + "label/1": "a" + }, + "links": { + "self": "http://54.251.150.85:30783/v1/recurringjobs/backup1" + }, + "name": "backup1", + "retain": 2, + "task": "backup", + "type": "recurringJob" +} +``` +*And* frontend **POST** `/v1/volumes/?action=recurringJobAdd` with request body `{name: , isGroup: false}`. +``` +❯ curl -X POST -H "Content-Type: application/json" \ +-d '{"name": "backup1", "isGroup": false}' \ +http://54.251.150.85:30783/v1/volumes/pvc-4011f9a6-bae3-43e3-a2a1-893997d0aa63\?action\=recurringJobAdd | jq +{ + "data": [ + { + "actions": {}, + "id": "default", + "isGroup": true, + "links": { + "self": "http://54.251.150.85:30783/v1/volumerecurringjobs/default" + }, + "name": "default", + "type": "volumeRecurringJob" + }, + { + "actions": {}, + "id": "backup1", + "isGroup": false, + "links": { + "self": "http://54.251.150.85:30783/v1/volumerecurringjobs/backup1" + }, + "name": "backup1", + "type": "volumeRecurringJob" + } + ], + "links": { + "self": "http://54.251.150.85:30783/v1/volumes/pvc-4011f9a6-bae3-43e3-a2a1-893997d0aa63" + }, + "resourceType": "volumeRecurringJob", + "type": "collection" +} +``` + +**Scenario: Delete Recurring Job On Volume Page** + +Same as **Scenario: Remove Recurring Job Group in Volume Page** with request body `{name: , isGroup: false}`. +``` +❯ curl -X POST -H "Content-Type: application/json" \ +-d '{"name": "backup1", "isGroup": false}' \ +http://54.251.150.85:30783/v1/volumes/pvc-4011f9a6-bae3-43e3-a2a1-893997d0aa63\?action\=recurringJobDelete | jq +{ + "data": [ + { + "actions": {}, + "id": "default", + "isGroup": true, + "links": { + "self": "http://54.251.150.85:30783/v1/volumerecurringjobs/default" + }, + "name": "default", + "type": "volumeRecurringJob" + } + ], + "links": { + "self": "http://54.251.150.85:30783/v1/volumes/pvc-4011f9a6-bae3-43e3-a2a1-893997d0aa63" + }, + "resourceType": "volumeRecurringJob", + "type": "collection" +} +``` + +**Scenario: Keep Recurring Job Details Updated On Volume Page** + +- Frontend can monitor new websocket `/v1/ws/recurringjobs` and `/v1/ws/{period}/recurringjobs`. +- When a volume is labeled with a none-existing recurring job or job-group. UI should show warning icon. + +### Test plan + +> The existing recurring job test cases need to be fixed or replaced. + +#### Integration test - test_recurring_job_group + +Scenario: test recurring job groups (S3/NFS) + + Given create `snapshot1` recurring job with `group-1, group-2` in groups. + set cron job to run every 2 minutes. + set retain to 1. + create `backup1` recurring job with `group-1` in groups. + set cron job to run every 3 minutes. + set retain to 1 + And volume `test-job-1` created, attached, and healthy. + volume `test-job-2` created, attached, and healthy. + + When set `group1` recurring job in volume `test-job-1` label. + set `group2` recurring job in volume `test-job-2` label. + And write some data to volume `test-job-1`. + write some data to volume `test-job-2`. + And wait for 2 minutes. + And write some data to volume `test-job-1`. + write some data to volume `test-job-2`. + And wait for 1 minute. + + Then volume `test-job-1` should have 3 snapshots after scheduled time. + volume `test-job-2` should have 2 snapshots after scheduled time. + And volume `test-job-1` should have 1 backup after scheduled time. + volume `test-job-2` should have 0 backup after scheduled time. + +#### Integration test - test_recurring_job_default + +Scenario: test recurring job set with default in groups + + Given 1 volume created, attached, and healthy. + + When create `snapshot1` recurring job with `default, group-1` in groups. + create `snapshot2` recurring job with `default` in groups.. + create `snapshot3` recurring job with `` in groups. + create `backup1` recurring job with `default, group-1` in groups. + create `backup2` recurring job with `default` in groups. + create `backup3` recurring job with `` in groups. + Then default `snapshot1` cron job should exist. + default `snapshot2` cron job should exist. + `snapshot3` cron job should not exist. + default `backup1` cron job should exist. + default `backup2` cron job should exist. + `backup3` cron job should not exist. + + # Setting recurring job in volume label should not remove the defaults. + When set `snapshot3` recurring job in volume label. + Then should contain `default` job-group in volume labels. + should contain `snapshot3` job in volume labels. + And default `snapshot1` cron job should exist. + default `snapshot2` cron job should exist. + `snapshot3` cron job should exist. + default `backup1` cron job should exist. + default `backup2` cron job should exist. + `backup3` cron job should not exist. + + # Should be able to remove the default. + When delete recurring job-group `default` in volume label. + And default `snapshot1` cron job should not exist. + default `snapshot2` cron job should not exist. + `snapshot3` cron job should exist. + default `backup1` cron job should not exist. + default `backup2` cron job should not exist. + `backup3` cron job should not exist. + + # Remove all volume recurring job labels should bring in default + When delete all recurring jobs in volume label. + Then default `snapshot1` cron job should exist. + default `snapshot2` cron job should exist. + `snapshot3` cron job should not exist. + default `backup1` cron job should exist. + default `backup2` cron job should exist. + `backup3` cron job should not exist. + + # Add `default` to snapshot3 and backup3 recurring job `Group`. + # should also reflect on the cron jobs + When add `snapshot3` recurring job with `default` in groups. + add `backup3` recurring job with `default` in groups. + Then default `snapshot1` cron job should exist. + default `snapshot2` cron job should exist. + default `snapshot3` cron job should exist. + default `backup1` cron job should exist. + default `backup2` cron job should exist. + default `backup3` cron job should exist. + + # Remove `default` in recurring job `Group` should also + # reflect on the cron jobs + When remove `default` from `snapshot3` recurring job groups. + Then default `snapshot1` cron job should exist. + default `snapshot2` cron job should exist. + `snapshot3` cron job should not exist. + default `backup1` cron job should exist. + default `backup2` cron job should exist. + default `backup3` cron job should exist. + + # Remove `default` in all recurring job `Group` should also + # reflect on the cron jobs + When remove `default` from all recurring jobs groups. + Then `snapshot1` cron job should not exist. + `snapshot2` cron job should not exist. + `snapshot3` cron job should not exist. + `backup1` cron job should not exist. + `backup2` cron job should not exist. + `backup3` cron job should not exist. + + +#### Integration test - test_recurring_job_delete + +Scenario: test delete recurring job + + Given 1 volume created, attached, and healthy. + + When create `snapshot1` recurring job with `default, group-1` in groups. + create `snapshot2` recurring job with `default` in groups.. + create `snapshot3` recurring job with `` in groups. + create `backup1` recurring job with `default, group-1` in groups. + create `backup2` recurring job with `default` in groups. + create `backup3` recurring job with `` in groups. + Then default `snapshot1` cron job should exist. + default `snapshot2` cron job should exist. + `snapshot3` cron job should not exist. + default `backup1` cron job should exist. + default `backup2` cron job should exist. + `backup3` cron job should not exist. + + # Delete `snapshot2` recurring job should delete the cron job + When delete `snapshot-2` recurring job. + Then default `snapshot1` cron job should exist. + default `snapshot2` cron job should not exist. + `snapshot3` cron job should not exist. + default `backup1` cron job should exist. + default `backup2` cron job should exist. + `backup3` cron job should not exist. + + # Delete multiple recurring jobs should reflect on the cron jobs. + When delete `backup-1` recurring job. + delete `backup-2` recurring job. + delete `backup-3` recurring job. + Then default `snapshot1` cron job should exist. + default `snapshot2` cron job should not exist. + `snapshot3` cron job should not exist. + default `backup1` cron job should not exist. + default `backup2` cron job should not exist. + `backup3` cron job should not exist. + + # Should be able to delete recurring job while existing in volume label + When add `snapshot1` recurring job to volume label. + add `snapshot3` recurring job to volume label. + And default `snapshot1` cron job should exist. + default `snapshot2` cron job should not exist. + `snapshot3` cron job should exist. + And delete `snapshot1` recurring job. + delete `snapshot3` recurring job. + Then default `snapshot1` cron job should not exist. + default `snapshot2` cron job should not exist. + `snapshot3` cron job should not exist. + + +#### Integration test - test_recurring_job_volume_labeled_none_existing_recurring_job + +Scenario: test volume with a none-existing recurring job label + and later on added back. + + Given create `snapshot1` recurring job. + create `backup1` recurring job. + And 1 volume created, attached, and healthy. + add `snapshot1` recurring job to volume label. + add `backup1` recurring job to volume label. + And `snapshot1` cron job exist. + `backup1` cron job exist. + + When delete `snapshot1` recurring job. + delete `backup1` recurring job. + Then `snapshot1` cron job should not exist. + `backup1` cron job should not exist. + And `snapshot1` recurring job should exist in volume label. + `backup1` recurring job should exist in volume label. + + # Add back the recurring jobs. + When create `snapshot1` recurring job. + create `backup1` recurring job. + Then `snapshot1` cron job should exist. + `backup1` cron job should exist. + +#### Integration test - test_recurring_job_with_multiple_volumes + +Scenario: test recurring job with multiple volumes + + Given volume `test-job-1` created, attached and healthy. + And create `snapshot1` recurring job with `default` in groups. + create `snapshot2` recurring job with `` in groups. + create `backup1` recurring job with `default` in groups. + create `backup2` recurring job with `` in groups. + And volume `test-job-1` should have recurring job-group `default` label. + And default `snapshot1` cron job exist. + default `backup1` cron job exist. + + When create and attach volume `test-job-2`. + wait for volume `test-job-2` to be healthy. + Then volume `test-job-2` should have recurring job-group `default` label. + + When add `snapshot2` in `test-job-2` volume label. + add `backup2` in `test-job-2` volume label. + Then default `snapshot1` cron job should exist. + `snapshot2` cron job should exist. + default `backup1` cron job should exist. + `backup2` cron job should exist. + And volume `test-job-1` should have recurring job-group `default` label. + volume `test-job-2` should have recurring job `snapshot2` label. + volume `test-job-2` should have recurring job `backup2` label. + +#### Integration test - test_recurring_job_snapshot + +Scenario: test recurring job snapshot + + Given volume `test-job-1` created, attached, and healthy. + volume `test-job-2` created, attached, and healthy. + + When create `snapshot1` recurring job with `default` in groups. + Then should have 1 cron job. + And volume `test-job-1` should have volume-head 1 snapshot. + volume `test-job-2` should have volume-head 1 snapshot. + + When write some data to volume `test-job-1`. + write some data to volume `test-job-2`. + Then volume `test-job-1` should have 2 snapshots after scheduled time. + volume `test-job-2` should have 2 snapshots after scheduled time. + + When write some data to volume `test-job-1`. + write some data to volume `test-job-2`. + And wait for `snapshot1` cron job scheduled time. + Then volume `test-job-1` should have 3 snapshots after scheduled time. + volume `test-job-2` should have 3 snapshots after scheduled time. + +#### Integration test - test_recurring_job_backup + +Scenario: test recurring job backup (S3/NFS) + + Given volume `test-job-1` created, attached, and healthy. + volume `test-job-2` created, attached, and healthy. + + When create `backup1` recurring job with `default` in groups. + Then should have 1 cron job. + And volume `test-job-1` should have 0 backup. + volume `test-job-2` should have 0 backup. + + When write some data to volume `test-job-1`. + write some data to volume `test-job-2`. + And wait for `backup1` cron job scheduled time. + Then volume `test-job-1` should have 1 backups. + volume `test-job-2` should have 1 backups. + + When write some data to volume `test-job-1`. + write some data to volume `test-job-2`. + And wait for `backup1` cron job scheduled time. + Then volume `test-job-1` should have 2 backups. + volume `test-job-2` should have 2 backups. + +#### Integration test - test_recurring_job_while_volume_detached + +Scenario: test recurring job while volume is detached + + Given volume `test-job-1` created, and detached. + volume `test-job-2` created, and detached. + And attach volume `test-job-1` and write some data. + attach volume `test-job-2` and write some data. + And detach volume `test-job-1`. + detach volume `test-job-2`. + + When create `snapshot1` recurring job running at 1 minute interval, + and with `default` in groups, + and with `retain` set to `2`. + And 1 cron job should be created. + And wait for 2 minutes. + Then attach volume `test-job-1` and wait until healthy. + And volume `test-job-1` should have only 1 snapshot. + + When wait for 1 minute. + Then volume `test-job-1` should have only 2 snapshots. + + When set setting `allow-recurring-job-while-volume-detached` to `true`. + And wait for 2 minutes. + Then attach volume `test-job-2` and wait until healthy. + And volume `test-job-2` should have only 2 snapshots. + +#### Manual test - recurring job skip to create job while volume is detached + +Scenario: test recurring job while volume is detached + + Given volume `test-job-1` created, and detached. + volume `test-job-2` created, and detached. + + When create `snapshot1` recurring job running at 1 minute interval, + And wait until job pod created and complete + + Then monitor the job pod logs. + And should see `Cannot create job for test-job-1 volume in state detached`. + should see `Cannot create job for test-job-2 volume in state detached`. + +#### Manual test - recurring job upgrade migration + +Scenario: test recurring job upgrade migration + + Given cluster with Longhorn version prior to v1.2.0. + And storageclass with recurring job `snapshot1`. + And volume `test-job-1` created, and attached. + + When upgrade Longhorn to v1.2.0. + + Then should have recurring job CR created with format `---`. + And volume should be labeled with `recurring-job.longhorn.io/---: enabled`. + And recurringJob should be removed in volume spec. + And storageClass in `longhorn-storageclass` configMap should not have `recurringJobs`. + storageClass in `longhorn-storageclass` configMap should have `recurringJobSelector`. + ``` + recurringJobSelector: '[{"name":"snapshot-1-97893a05-77074ba4","isGroup":false},{"name":"backup-1-954b3c8c-59467025","isGroup":false}]' + ``` + + When create new PVC. + And volume should be labeled with items in `recurringJobSelector`. + And recurringJob should not exist in volume spec. + +#### Manual test - snapshot concurrency + +Scenario: test recurring job concurrency + + Given create `snapshot1` recurring job with `concurrency` set to `2`. + include `snapshot1` recurring job `default` in groups. + + When create volume `test-job-1`. + create volume `test-job-2`. + create volume `test-job-3`. + create volume `test-job-4`. + create volume `test-job-5`. + + Then moniter the cron job pod log. + And should see 2 jobs created concurrently. + + When update `snapshot1` recurring job with `concurrency` set to `3`. + Then moniter the cron job pod log. + And should see 3 jobs created concurrently. + +### Upgrade strategy + +#### Automated Migration + +1. Create `v110to120/upgrade.go` +2. Translate `storageClass` `recurringJobs` to `recurringJobSelector`. + 1. Convert the `recurringJobs` to `recurringJobSelector` object. + ``` + { + Name: --- + IsGroup: false, + } + ``` + 2. Add `recurringJobSelector` to `longhorn-storageclass` configMap. + 3. Remove `recurringJobs` in configMap. + 4. Update configMap. + ``` + parameters: + fromBackup: "" + numberOfReplicas: "3" + recurringJobSelector: '[{"name":"snapshot-1-97893a05-77074ba4","isGroup":false},{"name":"backup-1-954b3c8c-59467025","isGroup":false}]' + staleReplicaTimeout: "2880" + provisioner: driver.longhorn.io + ``` +3. Translate volume spec `recurringJobs` to volume labels. + 1. List all volumes and its spec `recurringJobs` and create labels in format `recurring-job.longhorn.io/---: enabled`. + 2. Update volume labels and remove volume spec `recurringJobs`. + ``` + labels: + longhornvolume: pvc-d37caaed-5cda-43b1-ae49-9d0490ffb3db + recurring-job.longhorn.io/backup-1-954b3c8c-59467025: enabled + recurring-job.longhorn.io/snapshot-1-97893a05-77074ba4: enabled + ``` + +3. translate volume spec `recurringJobs` to recurringJob CRs. + 1. Gather the recurring jobs from `recurringJobSelector` and volume labels. + 2. Create recurringJob CRs. + ``` + NAME GROUPS TASK CRON RETAIN CONCURRENCY AGE LABELS + snapshot-1-97893a05-77074ba4 snapshot */1 * * * * 1 10 13m + backup-1-954b3c8c-59467025 backup */2 * * * * 1 10 13m {"interval":"2m"} + ``` + +4. Cleanup applied volume cron jobs. + 1. Get all applied cron jobs for volumes. + 2. Delete cron jobs. + +> The migration translates existing volume recurring job with format `recurring-job.longhorn.io/---: enabled`. The name maps to the recurring job CR `---`. + +> The migration translates existing volume recurring job with format `recurring-job.longhorn.io/---: enabled`. The numbers could look random and also differs from the recurring job name of the CR name created by the StorageClass - `recurring-job.longhorn.io/: enabled`. This is because there is no info to determine if the volume spec `recurringJob` is coming from a `storageClass` or which `storageClass`. Should note this behavior in the document to lessen the confusion unless there is a better solution. + +#### Manual + +After the migration, the `-` in volume label and recurring job name could look random and confusing. Users might want to rename it to something more meaningful. Currently, the only way is to create a new recurring job CR and replace the volume label. + +## Note [optional] + +`None`