Debug namespaces that are stuck in Terminating state

When a namespace is stuck in “Terminating” state - it is often because APIService(s) in the cluster are not responding.

Description

While K10 is not always the issue here, K10 does have 3 APIService endpoints that can sometimes be in this state because the aggregatedapis-svc in the kasten-io namespace is not reachable from the Kubernetes API server.

For example, if:

  • K10 services were deleted manually but APIService entries were not

  • Networking issues where the pods in the kasten-io service are not reachable

Observation

$ kubectl get namespaces lists namespaces that stuck in a terminating state.

Debugging

  • List api services and their status

In the example below - you can see that the metrics service is not available

$ kubectl get apiservices
NAME                                   SERVICE                        AVAILABLE                  AGE
v1.                                    Local                          True                       133d
v1.admissionregistration.k8s.io        Local                          True                       133d
v1.apiextensions.k8s.io                Local                          True                       133d
v1.apps                                Local                          True                       133d
v1.authentication.k8s.io               Local                          True                       133d
v1.authorization.k8s.io                Local                          True                       133d
v1.autoscaling                         Local                          True                       133d
v1.batch                               Local                          True                       133d
v1.coordination.k8s.io                 Local                          True                       133d
v1.crd.projectcalico.org               Local                          True                       11d
v1.networking.k8s.io                   Local                          True                       133d
v1.rbac.authorization.k8s.io           Local                          True                       133d
v1.scheduling.k8s.io                   Local                          True                       133d
v1.storage.k8s.io                      Local                          True                       133d
v1alpha1.actions.kio.kasten.io         kasten-io/aggregatedapis-svc   True                       30h
v1alpha1.apps.kio.kasten.io            kasten-io/aggregatedapis-svc   True                       30h
v1alpha1.config.kio.kasten.io          Local                          True                       11d
v1alpha1.cr.kanister.io                Local                          True                       6d1h
v1alpha1.dynatrace.com                 Local                          True                       6d1h
v1alpha1.kube.cloud.ovh.com            Local                          True                       6d1h
v1alpha1.snapshot.storage.k8s.io       Local                          True                       6d1h
v1alpha1.vault.kio.kasten.io           kasten-io/aggregatedapis-svc   True                       30h
v1alpha2.acme.cert-manager.io          Local                          True                       6d1h
v1alpha2.cert-manager.io               Local                          True                       20d
v1beta1.admissionregistration.k8s.io   Local                          True                       133d
v1beta1.apiextensions.k8s.io           Local                          True                       133d
v1beta1.authentication.k8s.io          Local                          True                       133d
v1beta1.authorization.k8s.io           Local                          True                       133d
v1beta1.batch                          Local                          True                       133d
v1beta1.certificates.k8s.io            Local                          True                       133d
v1beta1.coordination.k8s.io            Local                          True                       133d
v1beta1.events.k8s.io                  Local                          True                       133d
v1beta1.extensions                     Local                          True                       133d
v1beta1.metrics.k8s.io                 kube-system/metrics-server     False (MissingEndpoints)   133d
v1beta1.networking.k8s.io              Local                          True                       133d
v1beta1.node.k8s.io                    Local                          True                       133d
v1beta1.policy                         Local                          True                       133d
v1beta1.rbac.authorization.k8s.io      Local                          True                       133d
v1beta1.scheduling.k8s.io              Local                          True                       133d
v1beta1.storage.k8s.io                 Local                          True                       133d
v2beta1.autoscaling                    Local                          True                       133d
v2beta2.autoscaling                    Local                          True                 133d
  • For a namespace stuck in this state - list the namespace object YAML which will indicate what API is blocking namespace termination
$ kubectl get ns kasten-test -o yaml
apiVersion: v1
kind: Namespace
metadata:
  creationTimestamp: "2020-04-14T14:04:52Z"
  deletionTimestamp: "2020-04-14T15:14:26Z"
  name: kasten-test
  resourceVersion: "9402941896"
  selfLink: /api/v1/namespaces/kasten-test
  uid: 7547bcf4-0db0-41d2-93f7-d894df4fa256
spec:
  finalizers:
  - kubernetes
status:
  conditions:
  - lastTransitionTime: "2020-04-14T15:15:30Z"
    message: 'Discovery failed for some groups, 1 failing: unable to retrieve the
      complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently
      unable to handle the request'
    reason: DiscoveryFailed
    status: "True"
    type: NamespaceDeletionDiscoveryFailure
  - lastTransitionTime: "2020-04-14T15:15:45Z"
    message: All legacy kube types successfully parsed
    reason: ParsedGroupVersions
    status: "False"
    type: NamespaceDeletionGroupVersionParsingFailure
  - lastTransitionTime: "2020-04-14T15:15:48Z"
    message: All content successfully deleted
    reason: ContentDeleted
    status: "False"
    type: NamespaceDeletionContentFailure
  phase: Terminating

The output above indicates that the metrics API is not responding which is causing issues here

Solution

Usually - this requires understanding which API service is not responding and debugging that issue. A brute force fix is often to delete the offending API service. For example - for above

$ kubectl delete apiservice v1beta1.metrics.k8s.io

or for the K10 API services

$ kubectl delete apiservice v1alpha1.actions.kio.kasten.io v1alpha1.apps.kio.kasten.io v1alpha1.vault.kio.kasten.io

This will unblock namespace deletion but it will also render the APIs in that group (for example the K10 APIs) unusable - so should only be used as a temporary solution if unblocking the customer is a priority.

 

Note: Replace missing API service(s) from the $kubectl command above.