Issue with Openshift OAuth apiserver after K10 installation with SCCs

openshift-oauth-apiserver and openshift-apiserver pods fail to start during the restart or an upgrade of OCP cluster with K10 installed in it. This leads to the failure of OCP upgrades.

Background and Cause of the issue

Generally, openshift-oauth-apiserver and openshift-apiserver pods require permissions to run privileged containers and run as root. The runAsUser is not specified explicitly in the manifest and OCP uses a service account that has a cluster-admin access. So essentially it has access to all the SCCs in the cluster.

By default, there are 8 SCCs that come with OCP cluster. Out of these, there are only two SCCs node-exporter and privileged that allows privileged containers to run.

Below is the listing of the above pods with the SCCs that it uses.

~/op/install-cluster-jai-2# oc get pod -o 'custom-columns=NAMESPACE:metadata.namespace,NAME:metadata.name,APPLIED SCC:metadata.annotations.openshift\.io/scc' -n openshift-oauth-apiserver -n openshift-apiserver
NAMESPACE             NAME                         APPLIED SCC
openshift-apiserver   apiserver-5987f49db5-75r6b   node-exporter
openshift-apiserver   apiserver-5987f49db5-p4zkn   node-exporter
openshift-apiserver   apiserver-5987f49db5-x4ktp   node-exporter

~/op/install-cluster-jai-2# oc get pod -o 'custom-columns=NAMESPACE:metadata.namespace,NAME:metadata.name,APPLIED SCC:metadata.annotations.openshift\.io/scc' -n openshift-oauth-apiserver                       
NAMESPACE                   NAME                         APPLIED SCC
openshift-oauth-apiserver   apiserver-6f5fd9574c-72xgm   node-exporter
openshift-oauth-apiserver   apiserver-6f5fd9574c-vxm49   node-exporter
openshift-oauth-apiserver   apiserver-6f5fd9574c-wkmpb   node-exporter

SCCs deployed along with K10

When install K10 is installed in an OCP cluster with --set scc.create=true helm flag, it creates 2 SCCs, k10-k10 and k10-prometheus-server with a priority of 0 or null. k10-k10 SCC also allows privileged containers and privileged escalation but it disallows running as root.

SCC prioritization and ordering

SCCs are ordered based on the priority mentioned in its definition. SCCs with the highest priority takes precedence to be applied. If there are more than two SCCs with the same priority, then the most restrictive one is preferred over the others.

Highest priority first, nil is considered a 0 priority
If priorities are equal, the SCCs will be sorted from most restrictive to least restrictive
If both priorities and restrictions are equal the SCCs will be sorted by name

Ordering and SCC selection case when the pods are restarted or OCP upgraded

Once K10 is installed, pods will have 3 SCCs to select from that allow privileged containers, and all three have a priority of 0 or null.

Based on the above prioritization and ordering criteria, the order is applied to the most restrictive SCC and in K10 case, it is k10-k10 SCC that’s more restrictive relatively than the node-exporter and the privileged.

Since these pods do not have runAsUser explicitly mentioned in their SecurityContexts, and k10-k10 doesn’t allow the containers to run as root, the pods get stuck in CreateContainerConfigError state.

~/op/install-cluster-jai-2# oc get pod apiserver-6f5fd9574c-rq4s5 -o 'custom-columns=NAMESPACE:metadata.namespace,NAME:metadata.name,APPLIED SCC:metadata.annotations.openshift\.io/scc' -n openshift-oauth-apiserver
NAMESPACE                   NAME                         APPLIED SCC
openshift-oauth-apiserver   apiserver-6f5fd9574c-rq4s5   k10-k10

~/op/install-cluster-jai-2# oc get pods -n openshift-oauth-apiserver apiserver-6f5fd9574c-rq4s5                                               
NAME                         READY   STATUS                            RESTARTS   AGE
apiserver-6f5fd9574c-rq4s5   0/1     Init:CreateContainerConfigError   0          110s

Proposed workaround

A workaround for the above issue is to create a custom SCC with proper security contexts that has a priority value of 1 or above and apply the newly created SCC to the pod whenever system pods are restarted.

Since no other service accounts are allowed to use this custom SCC explicitly, it will be accessible to SA bound to cluster-admin clusterrole only and not by the other pods. This ensures the solution will not be a security concern.

Below is the manifest of the custom SCC that can be created

apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
  type: RunAsAny
groups: []
kind: SecurityContextConstraints
metadata:
  name: custom-scc
priority: 1
readOnlyRootFilesystem: false
requiredDropCapabilities: null
runAsUser:
  type: RunAsAny
seLinuxContext:
  type: RunAsAny
supplementalGroups:
  type: RunAsAny
users: []
volumes:
- '*'
allowHostDirVolumePlugin: true
allowHostIPC: false
allowHostNetwork: true
allowHostPID: true
allowHostPorts: true
allowPrivilegeEscalation: true
allowPrivilegedContainer: true
allowedCapabilities: null

Pods will be applied with the custom SCC whenever they are restarted or during the upgrades.

~/op/install-cluster-jai-2# oc get scc | grep true
custom-scc                        true    <no value>   RunAsAny    RunAsAny           RunAsAny    RunAsAny    1            false            ["*"]
k10-k10                           true    []           RunAsAny    MustRunAsNonRoot   RunAsAny    RunAsAny    0            false            ["*"]
node-exporter                     true    <no value>   RunAsAny    RunAsAny           RunAsAny    RunAsAny    <no value>   false            ["*"]
privileged                        true    ["*"]        RunAsAny    RunAsAny           RunAsAny    RunAsAny    <no value>   false            ["*"]

#restart any one of the pod to see if the custom-scc gets applied.
~/op/install-cluster-jai-2# oc get pods -n openshift-oauth-apiserver
NAME                         READY   STATUS    RESTARTS   AGE
apiserver-6f5fd9574c-72xgm   1/1     Running   0          77m
apiserver-6f5fd9574c-t2pmm   1/1     Running   0          23s
apiserver-6f5fd9574c-wkmpb   1/1     Running   0          4d17h

~/op/install-cluster-jai-2# oc get pod -o 'custom-columns=NAMESPACE:metadata.namespace,NAME:metadata.name,APPLIED SCC:metadata.annotations.openshift\.io/scc' -n openshift-oauth-apiserver                           
NAMESPACE                   NAME                         APPLIED SCC
openshift-oauth-apiserver   apiserver-6f5fd9574c-72xgm   node-exporter
openshift-oauth-apiserver   apiserver-6f5fd9574c-t2pmm   custom-scc
openshift-oauth-apiserver   apiserver-6f5fd9574c-wkmpb   node-exporter

Additional information regarding K10-K10 SCC

It was observed that in a OCP cluster with K10 installed even though SCC specific were installed to the K10 pods, it was not used. Rather anyuid was used.

This is because the k10-k10 serviceaccount has cluster-admin access and the anyuid is the SCC that has the highest priority with a value of 10.