This document helps in troubleshooting catalog pod unavailability during K10 DR Restore
Issue Occurrence :
This issue occurs when the DR Restore is initiated but fails for some reason. Catalog is scaled down during DR restore. So, when DR Restore is re-triggered it looks for catalog which is unavailable at that time since it was scaled down. The following error message is seen
Internal error occurred: {"message":"Could not retrieve artifacts for prefix search","function":"kasten.io/k10/kio/rest/clients.FetchArtifactsForSearchPrefix","linenumber":52,"fields":[{"name":"key","value":"api-meta-label_restoreactions.actions.kio.kasten.io_k10.kasten.io/policyName"},
Resolution :
Following steps would help resolve this particular issue
-
Check for status of restore pod in kasten-io namespace. If DR restore fails, the pod status will be shown as 'failed'.
kubectl get po -n kasten-io
-
Check for catalog pod status. Make sure the catalog pod is running. If its not running ,scale the catalog deployment
kubectl scale deploy/catalog-svc --replicas=1 -n kasten-io
Note: During the DR restore catalog-svc deployment will be scaled down to recover catalog. If the DR restore fails for some reason, for re-initiating the restore it is required to scale up the catalog to 1 -
Uninstall k10-Restore helm chart (previously failed)
helm uninstall k10-restore -n kasten-io
-
Re-initiate the K10 DR restore process.
-
Check for the status of DR Restore Chart , Restore Pod and Catalog Pod
helm list -n kasten-io
kubectl get po -n kasten-io [If its successful - catalog pod should show up as running and restore pod will be terminated upon DR Completion]