When a namespace is stuck in “Terminating” state - it is often because APIService(s) in the cluster are not responding.
Description
While K10 is not always the issue here, K10 does have 3 APIService endpoints that can sometimes be in this state because the aggregatedapis-svc in the kasten-io namespace is not reachable from the Kubernetes API server.
For example, if:
-
K10 services were deleted manually but APIService entries were not
-
Networking issues where the pods in the kasten-io service are not reachable
Observation
$ kubectl get namespaces lists namespaces that stuck in a terminating state.
Debugging
-
List api services and their status
In the example below - you can see that the metrics service is not available
$ kubectl get apiservices NAME SERVICE AVAILABLE AGE v1. Local True 133d v1.admissionregistration.k8s.io Local True 133d v1.apiextensions.k8s.io Local True 133d v1.apps Local True 133d v1.authentication.k8s.io Local True 133d v1.authorization.k8s.io Local True 133d v1.autoscaling Local True 133d v1.batch Local True 133d v1.coordination.k8s.io Local True 133d v1.crd.projectcalico.org Local True 11d v1.networking.k8s.io Local True 133d v1.rbac.authorization.k8s.io Local True 133d v1.scheduling.k8s.io Local True 133d v1.storage.k8s.io Local True 133d v1alpha1.actions.kio.kasten.io kasten-io/aggregatedapis-svc True 30h v1alpha1.apps.kio.kasten.io kasten-io/aggregatedapis-svc True 30h v1alpha1.config.kio.kasten.io Local True 11d v1alpha1.cr.kanister.io Local True 6d1h v1alpha1.dynatrace.com Local True 6d1h v1alpha1.kube.cloud.ovh.com Local True 6d1h v1alpha1.snapshot.storage.k8s.io Local True 6d1h v1alpha1.vault.kio.kasten.io kasten-io/aggregatedapis-svc True 30h v1alpha2.acme.cert-manager.io Local True 6d1h v1alpha2.cert-manager.io Local True 20d v1beta1.admissionregistration.k8s.io Local True 133d v1beta1.apiextensions.k8s.io Local True 133d v1beta1.authentication.k8s.io Local True 133d v1beta1.authorization.k8s.io Local True 133d v1beta1.batch Local True 133d v1beta1.certificates.k8s.io Local True 133d v1beta1.coordination.k8s.io Local True 133d v1beta1.events.k8s.io Local True 133d v1beta1.extensions Local True 133d v1beta1.metrics.k8s.io kube-system/metrics-server False (MissingEndpoints) 133d v1beta1.networking.k8s.io Local True 133d v1beta1.node.k8s.io Local True 133d v1beta1.policy Local True 133d v1beta1.rbac.authorization.k8s.io Local True 133d v1beta1.scheduling.k8s.io Local True 133d v1beta1.storage.k8s.io Local True 133d v2beta1.autoscaling Local True 133d v2beta2.autoscaling Local True 133d
- For a namespace stuck in this state - list the namespace object YAML which will indicate what API is blocking namespace termination
$ kubectl get ns kasten-test -o yaml apiVersion: v1 kind: Namespace metadata: creationTimestamp: "2020-04-14T14:04:52Z" deletionTimestamp: "2020-04-14T15:14:26Z" name: kasten-test resourceVersion: "9402941896" selfLink: /api/v1/namespaces/kasten-test uid: 7547bcf4-0db0-41d2-93f7-d894df4fa256 spec: finalizers: - kubernetes status: conditions: - lastTransitionTime: "2020-04-14T15:15:30Z" message: 'Discovery failed for some groups, 1 failing: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request' reason: DiscoveryFailed status: "True" type: NamespaceDeletionDiscoveryFailure - lastTransitionTime: "2020-04-14T15:15:45Z" message: All legacy kube types successfully parsed reason: ParsedGroupVersions status: "False" type: NamespaceDeletionGroupVersionParsingFailure - lastTransitionTime: "2020-04-14T15:15:48Z" message: All content successfully deleted reason: ContentDeleted status: "False" type: NamespaceDeletionContentFailure phase: Terminating
The output above indicates that the metrics API is not responding which is causing issues here
Solution
Usually - this requires understanding which API service is not responding and debugging that issue. A brute force fix is often to delete the offending API service. For example - for above
$ kubectl delete apiservice v1beta1.metrics.k8s.io
or for the K10 API services
$ kubectl delete apiservice v1alpha1.actions.kio.kasten.io v1alpha1.apps.kio.kasten.io v1alpha1.vault.kio.kasten.io
This will unblock namespace deletion but it will also render the APIs in that group (for example the K10 APIs) unusable - so should only be used as a temporary solution if unblocking the customer is a priority.
Note: Replace missing API service(s) from the $kubectl command above.