Debugging the never-ending backup job while using K10 with Longhorn CSI
Description:
The K10 backup job that runs while using Longhorn CSI drivers is never-ending even after the proper installation of CSI snapshotter components and controllers.
Error:
We don't generally notice any errors for this issue. But the job waits for the volumesnapshot object in the k8s to become readyToUse.
If we get the manifest for the volumesnapshot it shows "readyToUse: false"
$ kubectl get volumesnapshot k10-csi-snap-h5vglbt6b7fr6dnd -n postgresql
NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE
k10-csi-snap-h5vglbt6b7fr6dnd false data-postgres-postgresql-0 longhorn snapcontent-51725945-bed7-42cb-8c4c-fde992f5553c 53m
Resolution:
Longhorn supports snapshots in local volume as well as backups to a backup target.
But when the snapshot is invoked through a CSI driver, Longhorn will create both a local snapshot and a backup to target.
There is no explicit mention of this in the Longhorn documentation. You can see a mention of this in this github issue comment.
So It is required to have a backup target when we run k10 policies.
The following can be configured as Backup targets.
- S3 Object store
- S3 compatible Object store like MinIO
- NFS (must support NFSv4)