longhorn/enhancements/20230103-recurring-snapshot-cleanup.md
Chin-Ya Huang 9c1c474dc2 feat(lep): recurring snapshot delete design
Ref: 3836

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
2023-01-11 18:31:15 +08:00

4.2 KiB

Recurring Snapshot Cleanup

Summary

Currently, Longhorn's recurring job automatically cleans up older snapshots of volumes to retain no more than the defined snapshot number. However, this is limited to the snapshot created by the recurring job. For the non-recurring volume snapshots or snapshots created by backups, the user needs to clean them manually.

Having periodic snapshot cleanup could help to delete/purge those extra snapshots regardless of the creation method.

https://github.com/longhorn/longhorn/issues/3836

Motivation

Goals

Introduce new recurring job types:

  • snapshot-delete: periodically remove and purge all kinds of snapshots that exceed the retention count.
  • snapshot-cleanup: periodically purge removable or system snapshots.

Non-goals [optional]

None

Proposal

  • Introduce two new RecurringJobType:
    • snapshot-delete
    • snapshot-cleanup
  • Recurring job periodically deletes and purges the snapshots for RecurringJob using the snapshot-delete task type. Longhorn will retain snapshots based on the given retain number.
  • Recurring job periodically purges the snapshots for RecurringJob using the snapshot-cleanup task type.

User Stories

  • The user can create a RecurringJob with spec.task=snapshot-delete to instruct Longhorn periodically delete and purge snapshots.
  • The user can create a RecurringJob with spec.task=snapshot-cleanup to instruct Longhorn periodically purge removable or system snapshots.

User Experience In Detail

Recurring Snapshot Deletion

  1. Have some volume backups and snapshots.
  2. Create RecurringJob with the snapshot-delete task type.
    apiVersion: longhorn.io/v1beta2
    kind: RecurringJob
    metadata:
      name: recurring-snap-delete-per-min
      namespace: longhorn-system
    spec:
      concurrency: 1
      cron: '* * * * *'
      groups: []
      labels: {}
      name: recurring-snap-delete-per-min
      retain: 2
      task: snapshot-delete
    
  3. Assign the RecurringJob to volume.
  4. Longhorn deletes all expired snapshots. As a result of the above example, the user will see two snapshots after the job completes.

Recurring Snapshot Cleanup

  1. Have some system snapshots.
  2. Create RecurringJob with the snapshot-cleanup task type.
    apiVersion: longhorn.io/v1beta2
    kind: RecurringJob
    metadata:
      name: recurring-snap-cleanup-per-min
      namespace: longhorn-system
    spec:
      concurrency: 1
      cron: '* * * * *'
      groups: []
      labels: {}
      name: recurring-snap-cleanup-per-min
      task: snapshot-cleanup
    
  3. Assign the RecurringJob to volume.
  4. Longhorn deletes all expired system snapshots. As a result of the above example, the user will see 0 system snapshot after the job completes.

API changes

None

Design

Implementation Overview

The RecurringJob snapshot-delete Task Type

  1. List all expired snapshots (similar to the current listSnapshotNamesForCleanup implementation), and use as the cleanupSnapshotNames in doSnapshotCleanup.
  2. Continue with the current implementation to purge snapshots.

The RecurringJob snapshot-cleanup Task Type

  1. Do snapshot purge only in doSnapshotCleanup.

RecurringJob Mutate

  1. Mutate the Recurringjob.Spec.Retain to 0 when the task type is snapshot-cleanup since retain value has no effect on the purge.

Test plan

Test Recurring Snapshot Delete

  1. Create volume.
  2. Create 2 volume backups.
  3. Create 2 volume snapshots.
  4. Create a snapshot RecurringJob with the snapshot-delete task type.
  5. Assign the RecurringJob to volume.
  6. Wait until the recurring job is completed.
  7. Should see the number of snapshots matching the Recurring job spec.retain.

Test Recurring Snapshot Cleanup

  1. Create volume.
  2. Create 2 volume system snapshots, ex: delete replica, online expansion.
  3. Create a snapshot RecurringJob with the snapshot-cleanup task type.
  4. Assign the RecurringJob to volume.
  5. Wait until the recurring job is completed.
  6. Should see the volume has 0 system snapshots.

Upgrade strategy

None

Note [optional]

None