longhorn/enhancements/20210624-label-driven-recurring-job.md

1044 lines
41 KiB
Markdown
Raw Normal View History

# Label-driven Recurring Job
## Summary
Replace volume spec `recurringJobs` with the label-driven model. Abstract volume recurring jobs to a new CRD named "RecurringJob". The names or groups of recurring jobs can be referenced in volume labels.
Users can set a recurring job to the `Default` group and Longhorn will automatically apply when the volume has no job labels.
Only one cron job will be created per recurring job. Users can also update the recurring job, and Longhorn reflects the changes to the associated cron job.
When instruct Longhorn to delete a recurring job, this will also remove the associated cron job.
StorageClass now should use `recurringJobSelector` instead to refer to the recurring job names.
During the version upgrade, existing volume spec `recurringJobs` and storageClass `recurringJobs` will automatically translate to volume labels, and recurring job CRs will get created.
### Related Issues
https://github.com/longhorn/longhorn/issues/467
## Motivation
### Goals
Phase 1:
- Each recurring job can be in single or multiple groups.
- Each group can have single or multiple recurring jobs.
- The jobs and groups can be reference with the volume label.
- The recurring job in `default` group will automatically apply to a volume that has no job labels.
- Can create multiple recurring jobs with UI or YAML.
- The recurring job should include settings; `name`, `groups`, `task`, `cron`, `retain`, `concurrency`, `labels`.
Phase2:
The StorageClass and upgrade migration are still dependent on volume spec, thus complete removal of volume spec should be done in phase 2.
### Non-goals [optional]
1. Does the snapshot/backup operation one by one. The operation order can be defined as sequential, consistent (for volume group snapshot), or throttled (with a concurrent number as a parameter) in the future.
> https://github.com/longhorn/longhorn/pull/2737#issuecomment-887985811
## Proposal
#### Story 1 - set recurring jobs and groups by volume labels
As a Longhorn user / System administrator.
I want to directly update recurring jobs referenced in multiple volumes.
So I do not need to update each volume with cron job definition.
#### Story 2 - automatically applies for the default recurring jobs.
As a Longhorn user / System administrator.
I want the ability to set one or multiple `backup` and `snapshot` recurring jobs as default. All volumes without any recurring job label should automatically apply with the default recurring jobs.
So I can be assured that all volumes without any recurring job label will automatically apply with default.
#### Story 3 - automatically upgrade migration
As a Longhorn user / System administrator
I want Longhorn to automatically convert existing volume spec `recurringJobs` to volume labels, and create associate recurring job CRs.
So I don't have to manually create recurring job CRs and patch the labels.
### User Experience In Detail
#### Story 1 - set recurring job in volume
1. Create a recurring job on UI `Recurring Job` page or via `kubectl`.
2. In UI, Navigate to Volume, `Recurring Jobs Schedule`.
3. User can choose from `Job` or job `Group` from the tab.
- On `Job` tab,
1. User sees existing recurring jobs that volume had labeled.
1. User able to select `Backup` or `Snapshot` for the `Type` from the drop-down list.
2. User able to edit the `Schedule`, `Retain`, `Concurrency` and `Labels`.
- On the job `Group` tab.
1. User sees all existing recurring job groups from the `Name` drop-down list.
2. User selects the job from the drop-down list.
3. User sees all recurring `jobs` under the `group`.
4. Click `Save` updates to the volume label.
5. Update the recurring job CRs also reflect on the cron job and UI `Recurring Jobs Schedule`.
**Before enhancement**
Recurring jobs can only be added and updated per volume spec.
Create cron job definitions for each volume causing duplicated setup effort.
The recurring job can only be updated per volume.
**After enhancement**
Recurring jobs can be added and update as the volume label.
Can select a recurring job from the UI drop-down menu and will automatically show the information from the recurring job CRs.
Update the recurring job definition will automatically apply to all volumes with the job label.
#### Story 2 - automatically apply to default recurring jobs
1. Add `default` to one or multiple recurring jobs `Groups` in UI or `kubectl`.
2. Longhorn automatically applies the `default` group recurring jobs to all volumes without job labels.
**Before enhancement**
Default recurring jobs are set via StorageClass at PVC creation only. No default recurring job can be set up for UI-created volumes.
Updating StorageClass does not reflect on the existing volumes.
**After enhancement**
Have the option to set default recurring jobs via `StorageClass` or `RecurringJob`.
Longhorn recurring job controller automatically applies default recurring jobs to all volumes without the job labels.
Longhorn adds the default recurring jobs when all job labels are removed from the volume.
When the `RecurringJobSelector` is set in the `StorageClass`, it will be used as default instead.
#### Story 3 - automatically upgrade migration
1. Perform upgrade.
2. StorageClass `recurringJobs` will get convert to `recurringJobSelector`.
3. Recurring job CRs will get created from `recurringJobs`.
4. Volume will be labeled with `recurring-job.longhorn.io/<jobTask>-<jobRetain>-<hash(jobCron)>-<hash(jobLabelJSON)>: enabled` from volume spec `recurringJobs`.
5. Recurring job CRs will get created from volume spec `recurringJobs`. When the config is identical among multiple volumes, only one will get created and volumes will share this recurring job CR.
6. Volume spec `recurringJobs` will get removed.
### API Changes
- Add new HTTP endpoints:
- GET `/v1/recurringjobs` to list of recurring jobs.
- GET `/v1/recurringjobs/{name}` to get specific recurring job.
- DELETE `/v1/recurringjobs/{name}` to delete specific recurring job.
- POST `/v1/recurringjobs` to create a recurring job.
- PUT `/v1/recurringjobs/{name}` to update specific recurring job.
- `/v1/ws/recurringjobs` and `/v1/ws/{period}/recurringjobs` for websocket stream.
- Add new RESTful APIs for the new `RecurringJob` CRD:
- `Create`
- `Update`
- `List`
- `Get`
- `Delete`
- Add new APIs for users to update recurring jobs for individual volume. The :
- `/v1/volumes/<VOLUME_NAME>?action=recurringJobAdd`, expect request's body in form {name:<name>, isGroup:<bool>}.
- `/v1/volumes/<VOLUME_NAME>?action=recurringJobList`.
- `/v1/volumes/<VOLUME_NAME>?action=recurringJobDelete`, expect request's body in form {name:<name>, isGroup:<bool>}.
## Design
### Implementation Overview
#### Add Recurring Job CRD.
- Update the ClusterRole to include `recurringjob`.
- Printer column should include `Name`, `Groups`, `Task`, `Cron`, `Retain`, `Concurrency`, `Age`, `Labels`.
```
NAME GROUPS TASK CRON RETAIN CONCURRENCY AGE LABELS
snapshot1 ["default","group1"] snapshot * * * * * 1 2 14m {"label/1":"a","label/2":"b"}
```
- The `Name`: String used to reference the recurring job in volume with the label `recurring-job.longhorn.io/<Name>: enabled`.
- The `Groups`: Array of strings that set groupings to the recurring job. This is used to reference the recurring job group in volume with the label `recurring-job-group.longhorn.io/<Name>: enabled`. When including `default`, the recurring job will be added to the volume label if no other job exists in the volume label.
- The `Task`: String of either one of `backup` or `snapshot`.
Also, add validation in the CRD YAML with pattern regex match.
- The `Cron`: String in cron expression represents recurring job scheduling.
- The `Retain`: Integer of the number of snapshots/backups to keep for the volume.
- The `Concurrency`: Integer of the concurrent job to run by each cron job.
- The `Age`: Date of the CR creation timestamp.
- The `Labels`: Dictionary of the labels.
#### Add Command `recurring-job` To `longhorn-manager` Binary
1. Add new command `recurring-job <job.name> --longhorn-manager <URL>` and remove old command `snapshot`.
> Get the `recurringJob.Spec` on execution using Kubernetes API.
2. Get volumes by label selector `recurring-job.longhorn.io/<job.name>: enabled` to filter out volumes.
3. Get volumes by label selector `recurring-job-group.longhorn.io/<job.group>: enabled` to filter out volumes if the job is set with a group.
4. Filter and create a list of the volumes in the state `attached` or setting `allow-recurring-job-while-volume-detached`.
5. Use the concurrent number parameter to throttle goroutine with channel. Each goroutine creates `NewJob()` and `job.run()` for the volumes.
The job `snapshotName` format will be `<job.name>-c-<RandomID>`.
#### Changes In The Volume Controller
1. The `updateRecurringJobs` method is responsible to add the default label if not other labels exist.
> Since the storage class and upgrade migration contains recurringJobs spec. So we will keep the `VolumeSpec.RecurringJobs` in code to create the recurring jobs for volumes from the `storageClass`.
> In case names are duplicated between different `storageClasses`, only one recurring job CR will be created.
#### Changes In The VolumeManager `CreateVolume`
- Add new method input `recurringJobSelector`:
1. Convert `Volume.Spec.RecurringJobs` to `recurringJobSelector`.
2. Add recurring job label if `recurringJobSelector` method input is not empty.
#### Changes In The Datastore
- For `CreateVolume` and `UpdateVolume` add a function similar to `fixupMetadata` that handles recurring jobs:
1. Add recurring job labels if `Volume.Spec.RecurringJobs` is not empty. Then unset `Volume.Spec.RecurringJobs`.
2. Label with `default` job-group if no other recurring job label exists.
#### Introduce `recurringJobSelector` As Part Of StorageClass Parameters.
- The CSI controller can use `recurringJobSelector` for volume creation.
#### Changes In CSI Controller Server
1. Put `recurringJobSelector` to `vol.RecurringJobSelector` at HTTP API layer to use for adding volume recurring job label in `VolumeManager.CreateVolume`. The `CreateVolume` method will have a new input `recurringJobSelector`.
2. Get `recurringJobs` from parameters, validate and create recurring job CRs via API if not already exist.
#### Add Recurring Job Controller
- The code structure will be the same as other controllers.
- Add the finalizer to the recurring job CRs if not exist.
- The controller will be informed by `recurringJobInformer` and `enqueueRecurringJob`.
- Create and update CronJob per recurring job.
1. Generate a new cron job object.
- Include labels `recurring-job.longhorn.io`.
```
recurring-job.longhorn.io: <Name>
```
- Compose command,
```
longhorn-manager -d\
recurring-job <job.name>\
--manager-url <url>
```
2. Create new cron job with annotation `last-applied-cronjob-spec` or update cron job if the new cron job spec is different from the `last-applied-cronjob-spec`.
- Use defer to clean up CronJob.
1. When a recurring job gets deleted.
1. Delete the cron job with selected labels: `recurring-job.longhorn.io/<Name>`.
2. Remove the finalizer.
#### UI
##### Add New Page `Recurring Job` In UI
A new page for `Recurring Job` to create/update/delete recurring jobs.
```
Recurring Job [Custom Column]
====================================================================================================================
[Create] [Delete] [Search Box v ][__________][Go]
| Name
| Group
| Type
| Schedule
| Labels
| Retain
| Concurrency
===================================================================================================================
[] | Name | Group | Type | Schedule | Labels | Retain | Concurrency | Operation |
---+-------+--------+--------+-----------------+--------------+--------+-------------+-------------+--------------|
[] | dummy | aa, bb | backup | 00:00 every day | k1:v1, k2:v2 | 20 | 10 | [Icon] v |
| Update
| Delete
===================================================================================================================
[<] [1] [>]
```
**Scenario: Add Recurring Job**
*Given* user sees `Create` on top left of the page.
*When* user click `Create`.
*Then* user sees a pop-up form.
```
* Name
[ ]
Groups +
* Task
[Backup]
* Schedule
[00:00 every day]
* Retain
[20]
* Concurrency
[10]
* Labels +
```
- Field with `*` is mendatory
- User can click on `+` next to `Group` to add more groups.
- User can click on the `Schedule` field and a window will pop-up for `Cron` and `Generate Cron`.
- `Retain` cannot be `0`.
- `Concurrency` cannot be `0`.
- User can click on `+` next to `Labels` to add more labels.
*When* user click `OK`.
*Then* frontend **POST** `/v1/recurringjobs` to create a recurring job.
```
curl -X POST -H "Content-Type: application/json" \
-d '{"name": "sample", "groups": ["group-1", "group-2"], "task": "snapshot", "cron": "* * * * *", "retain": 2, "concurrency": 1, "labels": {"label/1": "a"}}' \
http://54.251.150.85:30944/v1/recurringjobs | jq
{
"actions": {},
"concurrency": 1,
"cron": "* * * * *",
"groups": [
"group-1",
"group-2"
],
"id": "sample",
"labels": {
"label/1": "a"
},
"links": {
"self": "http://54.251.150.85:30944/v1/recurringjobs/sample"
},
"name": "sample",
"retain": 2,
"task": "snapshot",
"type": "recurringJob"
}
```
**Scenario: Update Recurring Job**
*Given* an `Operation` drop-down list next to the recurring job.
*When* user click `Edit`.
*Then* user sees a pop-up form.
```
Name
[sample]
Groups
[group-1]
[group-2]
Task
[Backup]
Schedule
[00:00 every day]
Retain
[20]
Concurrency
[10]
Labels
[labels/1]: [a]
[labels/2]: [b]
```
- `Name` field should be immutable.
- `Task` field should be immutable.
*And* user edit the fields in the form.
*When* user click `Save`.
*Then* frontend **PUT** `/v1/recurringjobs/{name}` to update specific recurring job.
```
curl -X PUT -H "Content-Type: application/json" \
-d '{"name": "sample", "groups": ["group-1", "group-2"], "task": "snapshot", "cron": "* * * * *", "retain": 2, "concurrency": 1, "labels": {"label/1": "a", "label/2": "b"}}' \
http://54.251.150.85:30944/v1/recurringjobs/sample | jq
{
"actions": {},
"concurrency": 1,
"cron": "* * * * *",
"groups": [
"group-1",
"group-2"
],
"id": "sample",
"labels": {
"label/1": "a",
"label/2": "b"
},
"links": {
"self": "http://54.251.150.85:30944/v1/recurringjobs/sample"
},
"name": "sample",
"retain": 2,
"task": "snapshot",
"type": "recurringJob"
}
```
**Scenario: Delete Recurring Job**
*Given* an `Operation` drop-down list next to the recurring job.
*When* user click `Delete`.
*Then* user should see a pop-up window for confirmation.
*When* user click `OK`.
*Then* frontend **DELETE** `/v1/recurringjobs/{name}` to delete specific recurring job.
```
curl -X DELETE http://54.251.150.85:30944/v1/recurringjobs/sample | jq
```
> Also need a button for batch deletion on top left of the table.
##### Updates `Volume` Page On UI
**Scenario: Select From Recurring Job or Job Group**
*When* user should be able to choose if want to add recurring job as `Job` or `Group` from the tab.
**Scenario: Add Recurring Job Group On Volume Page**
*Given* user go to job `Group` tab.
*When* user click `+ New`.
*And* Frontend can **GET** `/v1/recurringjobs` to list of recurring jobs.
*And* Frontend need to gather all `groups` from data.
```
curl -X GET http://54.251.150.85:30783/v1/recurringjobs | jq
{
"data": [
{
"actions": {},
"concurrency": 2,
"cron": "* * * * *",
"groups": [
"group2",
"group3"
],
"id": "backup1",
"labels": null,
"links": {
"self": "http://54.251.150.85:30783/v1/recurringjobs/backup1"
},
"name": "backup1",
"retain": 1,
"task": "backup",
"type": "recurringJob"
},
{
"actions": {},
"concurrency": 2,
"cron": "* * * * *",
"groups": [
"default",
"group1"
],
"id": "snapshot1",
"labels": {
"label/1": "a",
"label/2": "b"
},
"links": {
"self": "http://54.251.150.85:30783/v1/recurringjobs/snapshot1"
},
"name": "snapshot1",
"retain": 1,
"task": "snapshot",
"type": "recurringJob"
}
],
"links": {
"self": "http://54.251.150.85:30783/v1/recurringjobs"
},
"resourceType": "recurringJob",
"type": "collection"
}
```
*Then* the user selects the group from the drop-down list.
*When* user click on `Save`.
*Then* frontend **POST** `/v1/volumes/<VOLUME_NAME>?action=recurringJobAdd` with request body `{name: <group-name>, isGroup: true}`.
```
curl -X POST -H "Content-Type: application/json" \
-d '{"name": "test3", "isGroup": true}' \
http://54.251.150.85:30783/v1/volumes/pvc-4011f9a6-bae3-43e3-a2a1-893997d0aa63\?action\=recurringJobAdd | jq
{
"data": [
{
"actions": {},
"id": "default",
"isGroup": true,
"links": {
"self": "http://54.251.150.85:30783/v1/volumerecurringjobs/default"
},
"name": "default",
"type": "volumeRecurringJob"
},
{
"actions": {},
"id": "test3",
"isGroup": true,
"links": {
"self": "http://54.251.150.85:30783/v1/volumerecurringjobs/test3"
},
"name": "test3",
"type": "volumeRecurringJob"
}
],
"links": {
"self": "http://54.251.150.85:30783/v1/volumes/pvc-4011f9a6-bae3-43e3-a2a1-893997d0aa63"
},
"resourceType": "volumeRecurringJob",
"type": "collection"
}
```
*And* user sees all `jobs` with the `group`.
**Scenario: Remove Recurring Job Group On Volume Page**
*Given* user go to job `Group` tab.
*When* user click the `bin` icon of the recurring job group.
*Then* frontend `/v1/volumes/<VOLUME_NAME>?action=recurringJobDelete` with request body `{name: <group-name>, isGroup: true}`.
```
curl -X POST -H "Content-Type: application/json" \
-d '{"name": "test3", "isGroup": true}' \
http://54.251.150.85:30783/v1/volumes/pvc-4011f9a6-bae3-43e3-a2a1-893997d0aa63\?action\=recurringJobDelete | jq
{
"data": [
{
"actions": {},
"id": "default",
"isGroup": true,
"links": {
"self": "http://54.251.150.85:30783/v1/volumerecurringjobs/default"
},
"name": "default",
"type": "volumeRecurringJob"
}
],
"links": {
"self": "http://54.251.150.85:30783/v1/volumes/pvc-4011f9a6-bae3-43e3-a2a1-893997d0aa63"
},
"resourceType": "volumeRecurringJob",
"type": "collection"
}
```
**Scenario: Add Recurring Job On Volume Page**
*Given* user go to `Job` tab.
*When* user click `+ New`.
*And* user sees the name is auto-generated.
*And* user can select `Backup` or `Snapshot` from the drop-down list.
*And* user can edit `Schedule`, `Labels`, `Retain` and `Concurrency`.
*When* user click on `Save`.
*Then* frontend **POST** /v1/recurringjobs to create a recurring job.
```
curl -X POST -H "Content-Type: application/json" \
-d '{"name": "backup1", "groups": [], "task": "backup", "cron": "* * * * *", "retain": 2, "concurrency": 1, "labels": {"label/1": "a"}}' \
http://54.251.150.85:30944/v1/recurringjobs | jq
{
"actions": {},
"concurrency": 1,
"cron": "* * * * *",
"groups": [],
"id": "backup1",
"labels": {
"label/1": "a"
},
"links": {
"self": "http://54.251.150.85:30783/v1/recurringjobs/backup1"
},
"name": "backup1",
"retain": 2,
"task": "backup",
"type": "recurringJob"
}
```
*And* frontend **POST** `/v1/volumes/<VOLUME_NAME>?action=recurringJobAdd` with request body `{name: <job-name>, isGroup: false}`.
```
curl -X POST -H "Content-Type: application/json" \
-d '{"name": "backup1", "isGroup": false}' \
http://54.251.150.85:30783/v1/volumes/pvc-4011f9a6-bae3-43e3-a2a1-893997d0aa63\?action\=recurringJobAdd | jq
{
"data": [
{
"actions": {},
"id": "default",
"isGroup": true,
"links": {
"self": "http://54.251.150.85:30783/v1/volumerecurringjobs/default"
},
"name": "default",
"type": "volumeRecurringJob"
},
{
"actions": {},
"id": "backup1",
"isGroup": false,
"links": {
"self": "http://54.251.150.85:30783/v1/volumerecurringjobs/backup1"
},
"name": "backup1",
"type": "volumeRecurringJob"
}
],
"links": {
"self": "http://54.251.150.85:30783/v1/volumes/pvc-4011f9a6-bae3-43e3-a2a1-893997d0aa63"
},
"resourceType": "volumeRecurringJob",
"type": "collection"
}
```
**Scenario: Delete Recurring Job On Volume Page**
Same as **Scenario: Remove Recurring Job Group in Volume Page** with request body `{name: <group-name>, isGroup: false}`.
```
curl -X POST -H "Content-Type: application/json" \
-d '{"name": "backup1", "isGroup": false}' \
http://54.251.150.85:30783/v1/volumes/pvc-4011f9a6-bae3-43e3-a2a1-893997d0aa63\?action\=recurringJobDelete | jq
{
"data": [
{
"actions": {},
"id": "default",
"isGroup": true,
"links": {
"self": "http://54.251.150.85:30783/v1/volumerecurringjobs/default"
},
"name": "default",
"type": "volumeRecurringJob"
}
],
"links": {
"self": "http://54.251.150.85:30783/v1/volumes/pvc-4011f9a6-bae3-43e3-a2a1-893997d0aa63"
},
"resourceType": "volumeRecurringJob",
"type": "collection"
}
```
**Scenario: Keep Recurring Job Details Updated On Volume Page**
- Frontend can monitor new websocket `/v1/ws/recurringjobs` and `/v1/ws/{period}/recurringjobs`.
- When a volume is labeled with a none-existing recurring job or job-group. UI should show warning icon.
### Test plan
> The existing recurring job test cases need to be fixed or replaced.
#### Integration test - test_recurring_job_group
Scenario: test recurring job groups (S3/NFS)
Given create `snapshot1` recurring job with `group-1, group-2` in groups.
set cron job to run every 2 minutes.
set retain to 1.
create `backup1` recurring job with `group-1` in groups.
set cron job to run every 3 minutes.
set retain to 1
And volume `test-job-1` created, attached, and healthy.
volume `test-job-2` created, attached, and healthy.
When set `group1` recurring job in volume `test-job-1` label.
set `group2` recurring job in volume `test-job-2` label.
And write some data to volume `test-job-1`.
write some data to volume `test-job-2`.
And wait for 2 minutes.
And write some data to volume `test-job-1`.
write some data to volume `test-job-2`.
And wait for 1 minute.
Then volume `test-job-1` should have 3 snapshots after scheduled time.
volume `test-job-2` should have 2 snapshots after scheduled time.
And volume `test-job-1` should have 1 backup after scheduled time.
volume `test-job-2` should have 0 backup after scheduled time.
#### Integration test - test_recurring_job_default
Scenario: test recurring job set with default in groups
Given 1 volume created, attached, and healthy.
When create `snapshot1` recurring job with `default, group-1` in groups.
create `snapshot2` recurring job with `default` in groups..
create `snapshot3` recurring job with `` in groups.
create `backup1` recurring job with `default, group-1` in groups.
create `backup2` recurring job with `default` in groups.
create `backup3` recurring job with `` in groups.
Then default `snapshot1` cron job should exist.
default `snapshot2` cron job should exist.
`snapshot3` cron job should not exist.
default `backup1` cron job should exist.
default `backup2` cron job should exist.
`backup3` cron job should not exist.
# Setting recurring job in volume label should not remove the defaults.
When set `snapshot3` recurring job in volume label.
Then should contain `default` job-group in volume labels.
should contain `snapshot3` job in volume labels.
And default `snapshot1` cron job should exist.
default `snapshot2` cron job should exist.
`snapshot3` cron job should exist.
default `backup1` cron job should exist.
default `backup2` cron job should exist.
`backup3` cron job should not exist.
# Should be able to remove the default.
When delete recurring job-group `default` in volume label.
And default `snapshot1` cron job should not exist.
default `snapshot2` cron job should not exist.
`snapshot3` cron job should exist.
default `backup1` cron job should not exist.
default `backup2` cron job should not exist.
`backup3` cron job should not exist.
# Remove all volume recurring job labels should bring in default
When delete all recurring jobs in volume label.
Then default `snapshot1` cron job should exist.
default `snapshot2` cron job should exist.
`snapshot3` cron job should not exist.
default `backup1` cron job should exist.
default `backup2` cron job should exist.
`backup3` cron job should not exist.
# Add `default` to snapshot3 and backup3 recurring job `Group`.
# should also reflect on the cron jobs
When add `snapshot3` recurring job with `default` in groups.
add `backup3` recurring job with `default` in groups.
Then default `snapshot1` cron job should exist.
default `snapshot2` cron job should exist.
default `snapshot3` cron job should exist.
default `backup1` cron job should exist.
default `backup2` cron job should exist.
default `backup3` cron job should exist.
# Remove `default` in recurring job `Group` should also
# reflect on the cron jobs
When remove `default` from `snapshot3` recurring job groups.
Then default `snapshot1` cron job should exist.
default `snapshot2` cron job should exist.
`snapshot3` cron job should not exist.
default `backup1` cron job should exist.
default `backup2` cron job should exist.
default `backup3` cron job should exist.
# Remove `default` in all recurring job `Group` should also
# reflect on the cron jobs
When remove `default` from all recurring jobs groups.
Then `snapshot1` cron job should not exist.
`snapshot2` cron job should not exist.
`snapshot3` cron job should not exist.
`backup1` cron job should not exist.
`backup2` cron job should not exist.
`backup3` cron job should not exist.
#### Integration test - test_recurring_job_delete
Scenario: test delete recurring job
Given 1 volume created, attached, and healthy.
When create `snapshot1` recurring job with `default, group-1` in groups.
create `snapshot2` recurring job with `default` in groups..
create `snapshot3` recurring job with `` in groups.
create `backup1` recurring job with `default, group-1` in groups.
create `backup2` recurring job with `default` in groups.
create `backup3` recurring job with `` in groups.
Then default `snapshot1` cron job should exist.
default `snapshot2` cron job should exist.
`snapshot3` cron job should not exist.
default `backup1` cron job should exist.
default `backup2` cron job should exist.
`backup3` cron job should not exist.
# Delete `snapshot2` recurring job should delete the cron job
When delete `snapshot-2` recurring job.
Then default `snapshot1` cron job should exist.
default `snapshot2` cron job should not exist.
`snapshot3` cron job should not exist.
default `backup1` cron job should exist.
default `backup2` cron job should exist.
`backup3` cron job should not exist.
# Delete multiple recurring jobs should reflect on the cron jobs.
When delete `backup-1` recurring job.
delete `backup-2` recurring job.
delete `backup-3` recurring job.
Then default `snapshot1` cron job should exist.
default `snapshot2` cron job should not exist.
`snapshot3` cron job should not exist.
default `backup1` cron job should not exist.
default `backup2` cron job should not exist.
`backup3` cron job should not exist.
# Should be able to delete recurring job while existing in volume label
When add `snapshot1` recurring job to volume label.
add `snapshot3` recurring job to volume label.
And default `snapshot1` cron job should exist.
default `snapshot2` cron job should not exist.
`snapshot3` cron job should exist.
And delete `snapshot1` recurring job.
delete `snapshot3` recurring job.
Then default `snapshot1` cron job should not exist.
default `snapshot2` cron job should not exist.
`snapshot3` cron job should not exist.
#### Integration test - test_recurring_job_volume_labeled_none_existing_recurring_job
Scenario: test volume with a none-existing recurring job label
and later on added back.
Given create `snapshot1` recurring job.
create `backup1` recurring job.
And 1 volume created, attached, and healthy.
add `snapshot1` recurring job to volume label.
add `backup1` recurring job to volume label.
And `snapshot1` cron job exist.
`backup1` cron job exist.
When delete `snapshot1` recurring job.
delete `backup1` recurring job.
Then `snapshot1` cron job should not exist.
`backup1` cron job should not exist.
And `snapshot1` recurring job should exist in volume label.
`backup1` recurring job should exist in volume label.
# Add back the recurring jobs.
When create `snapshot1` recurring job.
create `backup1` recurring job.
Then `snapshot1` cron job should exist.
`backup1` cron job should exist.
#### Integration test - test_recurring_job_with_multiple_volumes
Scenario: test recurring job with multiple volumes
Given volume `test-job-1` created, attached and healthy.
And create `snapshot1` recurring job with `default` in groups.
create `snapshot2` recurring job with `` in groups.
create `backup1` recurring job with `default` in groups.
create `backup2` recurring job with `` in groups.
And volume `test-job-1` should have recurring job-group `default` label.
And default `snapshot1` cron job exist.
default `backup1` cron job exist.
When create and attach volume `test-job-2`.
wait for volume `test-job-2` to be healthy.
Then volume `test-job-2` should have recurring job-group `default` label.
When add `snapshot2` in `test-job-2` volume label.
add `backup2` in `test-job-2` volume label.
Then default `snapshot1` cron job should exist.
`snapshot2` cron job should exist.
default `backup1` cron job should exist.
`backup2` cron job should exist.
And volume `test-job-1` should have recurring job-group `default` label.
volume `test-job-2` should have recurring job `snapshot2` label.
volume `test-job-2` should have recurring job `backup2` label.
#### Integration test - test_recurring_job_snapshot
Scenario: test recurring job snapshot
Given volume `test-job-1` created, attached, and healthy.
volume `test-job-2` created, attached, and healthy.
When create `snapshot1` recurring job with `default` in groups.
Then should have 1 cron job.
And volume `test-job-1` should have volume-head 1 snapshot.
volume `test-job-2` should have volume-head 1 snapshot.
When write some data to volume `test-job-1`.
write some data to volume `test-job-2`.
Then volume `test-job-1` should have 2 snapshots after scheduled time.
volume `test-job-2` should have 2 snapshots after scheduled time.
When write some data to volume `test-job-1`.
write some data to volume `test-job-2`.
And wait for `snapshot1` cron job scheduled time.
Then volume `test-job-1` should have 3 snapshots after scheduled time.
volume `test-job-2` should have 3 snapshots after scheduled time.
#### Integration test - test_recurring_job_backup
Scenario: test recurring job backup (S3/NFS)
Given volume `test-job-1` created, attached, and healthy.
volume `test-job-2` created, attached, and healthy.
When create `backup1` recurring job with `default` in groups.
Then should have 1 cron job.
And volume `test-job-1` should have 0 backup.
volume `test-job-2` should have 0 backup.
When write some data to volume `test-job-1`.
write some data to volume `test-job-2`.
And wait for `backup1` cron job scheduled time.
Then volume `test-job-1` should have 1 backups.
volume `test-job-2` should have 1 backups.
When write some data to volume `test-job-1`.
write some data to volume `test-job-2`.
And wait for `backup1` cron job scheduled time.
Then volume `test-job-1` should have 2 backups.
volume `test-job-2` should have 2 backups.
#### Integration test - test_recurring_job_while_volume_detached
Scenario: test recurring job while volume is detached
Given volume `test-job-1` created, and detached.
volume `test-job-2` created, and detached.
And attach volume `test-job-1` and write some data.
attach volume `test-job-2` and write some data.
And detach volume `test-job-1`.
detach volume `test-job-2`.
When create `snapshot1` recurring job running at 1 minute interval,
and with `default` in groups,
and with `retain` set to `2`.
And 1 cron job should be created.
And wait for 2 minutes.
Then attach volume `test-job-1` and wait until healthy.
And volume `test-job-1` should have only 1 snapshot.
When wait for 1 minute.
Then volume `test-job-1` should have only 2 snapshots.
When set setting `allow-recurring-job-while-volume-detached` to `true`.
And wait for 2 minutes.
Then attach volume `test-job-2` and wait until healthy.
And volume `test-job-2` should have only 2 snapshots.
#### Manual test - recurring job skip to create job while volume is detached
Scenario: test recurring job while volume is detached
Given volume `test-job-1` created, and detached.
volume `test-job-2` created, and detached.
When create `snapshot1` recurring job running at 1 minute interval,
And wait until job pod created and complete
Then monitor the job pod logs.
And should see `Cannot create job for test-job-1 volume in state detached`.
should see `Cannot create job for test-job-2 volume in state detached`.
#### Manual test - recurring job upgrade migration
Scenario: test recurring job upgrade migration
Given cluster with Longhorn version prior to v1.2.0.
And storageclass with recurring job `snapshot1`.
And volume `test-job-1` created, and attached.
When upgrade Longhorn to v1.2.0.
Then should have recurring job CR created with format `<jobTask>-<jobRetain>-<hash(jobCron)>-<hash(jobLabelJSON)>`.
And volume should be labeled with `recurring-job.longhorn.io/<jobTask>-<jobRetain>-<hash(jobCron)>-<hash(jobLabelJSON)>: enabled`.
And recurringJob should be removed in volume spec.
And storageClass in `longhorn-storageclass` configMap should not have `recurringJobs`.
storageClass in `longhorn-storageclass` configMap should have `recurringJobSelector`.
```
recurringJobSelector: '[{"name":"snapshot-1-97893a05-77074ba4","isGroup":false},{"name":"backup-1-954b3c8c-59467025","isGroup":false}]'
```
When create new PVC.
And volume should be labeled with items in `recurringJobSelector`.
And recurringJob should not exist in volume spec.
#### Manual test - snapshot concurrency
Scenario: test recurring job concurrency
Given create `snapshot1` recurring job with `concurrency` set to `2`.
include `snapshot1` recurring job `default` in groups.
When create volume `test-job-1`.
create volume `test-job-2`.
create volume `test-job-3`.
create volume `test-job-4`.
create volume `test-job-5`.
Then monitor the cron job pod log.
And should see 2 jobs created concurrently.
When update `snapshot1` recurring job with `concurrency` set to `3`.
Then monitor the cron job pod log.
And should see 3 jobs created concurrently.
### Upgrade strategy
#### Automated Migration
1. Create `v110to120/upgrade.go`
2. Translate `storageClass` `recurringJobs` to `recurringJobSelector`.
1. Convert the `recurringJobs` to `recurringJobSelector` object.
```
{
Name: <jobTask>-<jobRetain>-<hash(jobCron)>-<hash(jobLabelJSON)>
IsGroup: false,
}
```
2. Add `recurringJobSelector` to `longhorn-storageclass` configMap.
3. Remove `recurringJobs` in configMap.
4. Update configMap.
```
parameters:
fromBackup: ""
numberOfReplicas: "3"
recurringJobSelector: '[{"name":"snapshot-1-97893a05-77074ba4","isGroup":false},{"name":"backup-1-954b3c8c-59467025","isGroup":false}]'
staleReplicaTimeout: "2880"
provisioner: driver.longhorn.io
```
3. Translate volume spec `recurringJobs` to volume labels.
1. List all volumes and its spec `recurringJobs` and create labels in format `recurring-job.longhorn.io/<jobTask>-<jobRetain>-<hash(jobCron)>-<hash(jobLabelJSON)>: enabled`.
2. Update volume labels and remove volume spec `recurringJobs`.
```
labels:
longhornvolume: pvc-d37caaed-5cda-43b1-ae49-9d0490ffb3db
recurring-job.longhorn.io/backup-1-954b3c8c-59467025: enabled
recurring-job.longhorn.io/snapshot-1-97893a05-77074ba4: enabled
```
3. translate volume spec `recurringJobs` to recurringJob CRs.
1. Gather the recurring jobs from `recurringJobSelector` and volume labels.
2. Create recurringJob CRs.
```
NAME GROUPS TASK CRON RETAIN CONCURRENCY AGE LABELS
snapshot-1-97893a05-77074ba4 snapshot */1 * * * * 1 10 13m
backup-1-954b3c8c-59467025 backup */2 * * * * 1 10 13m {"interval":"2m"}
```
4. Cleanup applied volume cron jobs.
1. Get all applied cron jobs for volumes.
2. Delete cron jobs.
> The migration translates existing volume recurring job with format `recurring-job.longhorn.io/<jobTask>-<jobRetain>-<hash(jobCron)>-<hash(jobLabelJSON)>: enabled`. The name maps to the recurring job CR `<jobTask>-<jobRetain>-<hash(jobCron)>-<hash(jobLabelJSON)>`.
> The migration translates existing volume recurring job with format `recurring-job.longhorn.io/<jobTask>-<jobRetain>-<hash(jobCron)>-<hash(jobLabelJSON)>: enabled`. The numbers could look random and also differs from the recurring job name of the CR name created by the StorageClass - `recurring-job.longhorn.io/<name>: enabled`. This is because there is no info to determine if the volume spec `recurringJob` is coming from a `storageClass` or which `storageClass`. Should note this behavior in the document to lessen the confusion unless there is a better solution.
#### Manual
After the migration, the `<hash(jobCron)>-<hash(jobLabelJSON)>` in volume label and recurring job name could look random and confusing. Users might want to rename it to something more meaningful. Currently, the only way is to create a new recurring job CR and replace the volume label.
## Note [optional]
`None`