Replace volume spec `recurringJobs` with the label-driven model. Abstract volume recurring jobs to a new CRD named "RecurringJob". The names or groups of recurring jobs can be referenced in volume labels.
Users can set a recurring job to the `Default` group and Longhorn will automatically apply when the volume has no job labels.
Only one cron job will be created per recurring job. Users can also update the recurring job, and Longhorn reflects the changes to the associated cron job.
When instruct Longhorn to delete a recurring job, this will also remove the associated cron job.
StorageClass now should use `recurringJobSelector` instead to refer to the recurring job names.
During the version upgrade, existing volume spec `recurringJobs` and storageClass `recurringJobs` will automatically translate to volume labels, and recurring job CRs will get created.
### Related Issues
https://github.com/longhorn/longhorn/issues/467
## Motivation
### Goals
Phase 1:
- Each recurring job can be in single or multiple groups.
- Each group can have single or multiple recurring jobs.
- The jobs and groups can be reference with the volume label.
- The recurring job in `default` group will automatically apply to a volume that has no job labels.
- Can create multiple recurring jobs with UI or YAML.
- The recurring job should include settings; `name`, `groups`, `task`, `cron`, `retain`, `concurrency`, `labels`.
Phase2:
The StorageClass and upgrade migration are still dependent on volume spec, thus complete removal of volume spec should be done in phase 2.
### Non-goals [optional]
1. Does the snapshot/backup operation one by one. The operation order can be defined as sequential, consistent (for volume group snapshot), or throttled (with a concurrent number as a parameter) in the future.
#### Story 1 - set recurring jobs and groups by volume labels
As a Longhorn user / System administrator.
I want to directly update recurring jobs referenced in multiple volumes.
So I do not need to update each volume with cron job definition.
#### Story 2 - automatically applies for the default recurring jobs.
As a Longhorn user / System administrator.
I want the ability to set one or multiple `backup` and `snapshot` recurring jobs as default. All volumes without any recurring job label should automatically apply with the default recurring jobs.
So I can be assured that all volumes without any recurring job label will automatically apply with default.
#### Story 3 - automatically upgrade migration
As a Longhorn user / System administrator
I want Longhorn to automatically convert existing volume spec `recurringJobs` to volume labels, and create associate recurring job CRs.
So I don't have to manually create recurring job CRs and patch the labels.
### User Experience In Detail
#### Story 1 - set recurring job in volume
1. Create a recurring job on UI `Recurring Job` page or via `kubectl`.
2. In UI, Navigate to Volume, `Recurring Jobs Schedule`.
3. User can choose from `Job` or job `Group` from the tab.
- On `Job` tab,
1. User sees existing recurring jobs that volume had labeled.
1. User able to select `Backup` or `Snapshot` for the `Type` from the drop-down list.
2. User able to edit the `Schedule`, `Retain`, `Concurrency` and `Labels`.
- On the job `Group` tab.
1. User sees all existing recurring job groups from the `Name` drop-down list.
2. User selects the job from the drop-down list.
3. User sees all recurring `jobs` under the `group`.
4. Click `Save` updates to the volume label.
5. Update the recurring job CRs also reflect on the cron job and UI `Recurring Jobs Schedule`.
**Before enhancement**
Recurring jobs can only be added and updated per volume spec.
Create cron job definitions for each volume causing duplicated setup effort.
The recurring job can only be updated per volume.
**After enhancement**
Recurring jobs can be added and update as the volume label.
Can select a recurring job from the UI drop-down menu and will automatically show the information from the recurring job CRs.
Update the recurring job definition will automatically apply to all volumes with the job label.
#### Story 2 - automatically apply to default recurring jobs
1. Add `default` to one or multiple recurring jobs `Groups` in UI or `kubectl`.
2. Longhorn automatically applies the `default` group recurring jobs to all volumes without job labels.
**Before enhancement**
Default recurring jobs are set via StorageClass at PVC creation only. No default recurring job can be set up for UI-created volumes.
Updating StorageClass does not reflect on the existing volumes.
**After enhancement**
Have the option to set default recurring jobs via `StorageClass` or `RecurringJob`.
Longhorn recurring job controller automatically applies default recurring jobs to all volumes without the job labels.
Longhorn adds the default recurring jobs when all job labels are removed from the volume.
When the `RecurringJobSelector` is set in the `StorageClass`, it will be used as default instead.
#### Story 3 - automatically upgrade migration
1. Perform upgrade.
2. StorageClass `recurringJobs` will get convert to `recurringJobSelector`.
3. Recurring job CRs will get created from `recurringJobs`.
4. Volume will be labeled with `recurring-job.longhorn.io/<jobTask>-<jobRetain>-<hash(jobCron)>-<hash(jobLabelJSON)>: enabled` from volume spec `recurringJobs`.
5. Recurring job CRs will get created from volume spec `recurringJobs`. When the config is identical among multiple volumes, only one will get created and volumes will share this recurring job CR.
6. Volume spec `recurringJobs` will get removed.
### API Changes
- Add new HTTP endpoints:
- GET `/v1/recurringjobs` to list of recurring jobs.
- GET `/v1/recurringjobs/{name}` to get specific recurring job.
- DELETE `/v1/recurringjobs/{name}` to delete specific recurring job.
- POST `/v1/recurringjobs` to create a recurring job.
- PUT `/v1/recurringjobs/{name}` to update specific recurring job.
-`/v1/ws/recurringjobs` and `/v1/ws/{period}/recurringjobs` for websocket stream.
- Add new RESTful APIs for the new `RecurringJob` CRD:
-`Create`
-`Update`
-`List`
-`Get`
-`Delete`
- Add new APIs for users to update recurring jobs for individual volume. The :
-`/v1/volumes/<VOLUME_NAME>?action=recurringJobAdd`, expect request's body in form {name:<name>, isGroup:<bool>}.
- The `Name`: String used to reference the recurring job in volume with the label `recurring-job.longhorn.io/<Name>: enabled`.
- The `Groups`: Array of strings that set groupings to the recurring job. This is used to reference the recurring job group in volume with the label `recurring-job-group.longhorn.io/<Name>: enabled`. When including `default`, the recurring job will be added to the volume label if no other job exists in the volume label.
- The `Task`: String of either one of `backup` or `snapshot`.
Also, add validation in the CRD YAML with pattern regex match.
- The `Cron`: String in cron expression represents recurring job scheduling.
- The `Retain`: Integer of the number of snapshots/backups to keep for the volume.
- The `Concurrency`: Integer of the concurrent job to run by each cron job.
- The `Age`: Date of the CR creation timestamp.
- The `Labels`: Dictionary of the labels.
#### Add Command `recurring-job` To `longhorn-manager` Binary
1. Add new command `recurring-job <job.name> --longhorn-manager <URL>` and remove old command `snapshot`.
> Get the `recurringJob.Spec` on execution using Kubernetes API.
2. Get volumes by label selector `recurring-job.longhorn.io/<job.name>: enabled` to filter out volumes.
3. Get volumes by label selector `recurring-job-group.longhorn.io/<job.group>: enabled` to filter out volumes if the job is set with a group.
4. Filter and create a list of the volumes in the state `attached` or setting `allow-recurring-job-while-volume-detached`.
5. Use the concurrent number parameter to throttle goroutine with channel. Each goroutine creates `NewJob()` and `job.run()` for the volumes.
The job `snapshotName` format will be `<job.name>-c-<RandomID>`.
#### Changes In The Volume Controller
1. The `updateRecurringJobs` method is responsible to add the default label if not other labels exist.
> Since the storage class and upgrade migration contains recurringJobs spec. So we will keep the `VolumeSpec.RecurringJobs` in code to create the recurring jobs for volumes from the `storageClass`.
> In case names are duplicated between different `storageClasses`, only one recurring job CR will be created.
#### Changes In The VolumeManager `CreateVolume`
- Add new method input `recurringJobSelector`:
1. Convert `Volume.Spec.RecurringJobs` to `recurringJobSelector`.
2. Add recurring job label if `recurringJobSelector` method input is not empty.
#### Changes In The Datastore
- For `CreateVolume` and `UpdateVolume` add a function similar to `fixupMetadata` that handles recurring jobs:
1. Add recurring job labels if `Volume.Spec.RecurringJobs` is not empty. Then unset `Volume.Spec.RecurringJobs`.
2. Label with `default` job-group if no other recurring job label exists.
#### Introduce `recurringJobSelector` As Part Of StorageClass Parameters.
- The CSI controller can use `recurringJobSelector` for volume creation.
#### Changes In CSI Controller Server
1. Put `recurringJobSelector` to `vol.RecurringJobSelector` at HTTP API layer to use for adding volume recurring job label in `VolumeManager.CreateVolume`. The `CreateVolume` method will have a new input `recurringJobSelector`.
2. Get `recurringJobs` from parameters, validate and create recurring job CRs via API if not already exist.
#### Add Recurring Job Controller
- The code structure will be the same as other controllers.
- Add the finalizer to the recurring job CRs if not exist.
- The controller will be informed by `recurringJobInformer` and `enqueueRecurringJob`.
- Create and update CronJob per recurring job.
1. Generate a new cron job object.
- Include labels `recurring-job.longhorn.io`.
```
recurring-job.longhorn.io: <Name>
```
- Compose command,
```
longhorn-manager -d\
recurring-job <job.name>\
--manager-url <url>
```
2. Create new cron job with annotation `last-applied-cronjob-spec` or update cron job if the new cron job spec is different from the `last-applied-cronjob-spec`.
- Use defer to clean up CronJob.
1. When a recurring job gets deleted.
1. Delete the cron job with selected labels: `recurring-job.longhorn.io/<Name>`.
2. Remove the finalizer.
#### UI
##### Add New Page `Recurring Job` In UI
A new page for `Recurring Job` to create/update/delete recurring jobs.
3. Translate volume spec `recurringJobs` to volume labels.
1. List all volumes and its spec `recurringJobs` and create labels in format `recurring-job.longhorn.io/<jobTask>-<jobRetain>-<hash(jobCron)>-<hash(jobLabelJSON)>: enabled`.
2. Update volume labels and remove volume spec `recurringJobs`.
> The migration translates existing volume recurring job with format `recurring-job.longhorn.io/<jobTask>-<jobRetain>-<hash(jobCron)>-<hash(jobLabelJSON)>: enabled`. The name maps to the recurring job CR `<jobTask>-<jobRetain>-<hash(jobCron)>-<hash(jobLabelJSON)>`.
> The migration translates existing volume recurring job with format `recurring-job.longhorn.io/<jobTask>-<jobRetain>-<hash(jobCron)>-<hash(jobLabelJSON)>: enabled`. The numbers could look random and also differs from the recurring job name of the CR name created by the StorageClass - `recurring-job.longhorn.io/<name>: enabled`. This is because there is no info to determine if the volume spec `recurringJob` is coming from a `storageClass` or which `storageClass`. Should note this behavior in the document to lessen the confusion unless there is a better solution.
#### Manual
After the migration, the `<hash(jobCron)>-<hash(jobLabelJSON)>` in volume label and recurring job name could look random and confusing. Users might want to rename it to something more meaningful. Currently, the only way is to create a new recurring job CR and replace the volume label.