Issue with Openshift OAuth apiserver after K10 installation with SCCs

openshift-oauth-apiserver and openshift-apiserver pods fail to start during the restart or an upgrade of OCP cluster with K10 installed in it. This leads to the failure of OCP upgrades.

Background and Cause of the issue

Generally, openshift-oauth-apiserver and openshift-apiserver pods require permissions to run privileged containers and run as root. The runAsUser is not specified explicitly in the manifest and OCP uses a service account that has a cluster-admin access. So essentially it has access to all the SCCs in the cluster.

By default, there are 8 SCCs that come with OCP cluster. Out of these, there are only two SCCs node-exporter and privileged that allows privileged containers to run.

Below is the listing of the above pods with the SCCs that it uses.

~/op/install-cluster-jai-2# oc get pod -o 'custom-columns=NAMESPACE:metadata.namespace,NAME:metadata.name,APPLIED SCC:metadata.annotations.openshift\.io/scc' -n openshift-oauth-apiserver -n openshift-apiserver
NAMESPACE NAME APPLIED SCC
openshift-apiserver apiserver-5987f49db5-75r6b node-exporter
openshift-apiserver apiserver-5987f49db5-p4zkn node-exporter
openshift-apiserver apiserver-5987f49db5-x4ktp node-exporter

~/op/install-cluster-jai-2# oc get pod -o 'custom-columns=NAMESPACE:metadata.namespace,NAME:metadata.name,APPLIED SCC:metadata.annotations.openshift\.io/scc' -n openshift-oauth-apiserver
NAMESPACE NAME APPLIED SCC
openshift-oauth-apiserver apiserver-6f5fd9574c-72xgm node-exporter
openshift-oauth-apiserver apiserver-6f5fd9574c-vxm49 node-exporter
openshift-oauth-apiserver apiserver-6f5fd9574c-wkmpb node-exporter

SCCs deployed along with K10

When install K10 is installed in an OCP cluster with --set scc.create=true helm flag, it creates 2 SCCs, k10-k10 and k10-prometheus-server with a priority of 0 or null. k10-k10 SCC also allows privileged containers and privileged escalation but it disallows running as root.

SCC prioritization and ordering

SCCs are ordered based on the priority mentioned in its definition. SCCs with the highest priority takes precedence to be applied. If there are more than two SCCs with the same priority, then the most restrictive one is preferred over the others.

  • Highest priority first, nil is considered a 0 priority
  • If priorities are equal, the SCCs will be sorted from most restrictive to least restrictive
  • If both priorities and restrictions are equal the SCCs will be sorted by name

Ordering and SCC selection case when the pods are restarted or OCP upgraded

Once K10 is installed, pods will have 3 SCCs to select from that allow privileged containers, and all three have a priority of 0 or null.

Based on the above prioritization and ordering criteria, the order is applied to the most restrictive SCC and in K10 case, it is k10-k10 SCC that’s more restrictive relatively than the node-exporter and the privileged.

Since these pods do not have runAsUser explicitly mentioned in their SecurityContexts, and k10-k10 doesn’t allow the containers to run as root, the pods get stuck in CreateContainerConfigError state.

~/op/install-cluster-jai-2# oc get pod apiserver-6f5fd9574c-rq4s5 -o 'custom-columns=NAMESPACE:metadata.namespace,NAME:metadata.name,APPLIED SCC:metadata.annotations.openshift\.io/scc' -n openshift-oauth-apiserver
NAMESPACE NAME APPLIED SCC
openshift-oauth-apiserver apiserver-6f5fd9574c-rq4s5 k10-k10

~/op/install-cluster-jai-2# oc get pods -n openshift-oauth-apiserver apiserver-6f5fd9574c-rq4s5
NAME READY STATUS RESTARTS AGE
apiserver-6f5fd9574c-rq4s5 0/1 Init:CreateContainerConfigError 0 110s 

Proposed workaround

A workaround for the above issue is to create a custom SCC with proper security contexts that has a priority value of 1 or above and apply the newly created SCC to the pod whenever system pods are restarted.

Since no other service accounts are allowed to use this custom SCC explicitly, it will be accessible to SA bound to cluster-admin clusterrole only and not by the other pods. This ensures the solution will not be a security concern.

Below is the manifest of the custom SCC that can be created

apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
type: RunAsAny
groups: []
kind: SecurityContextConstraints
metadata:
name: custom-scc
priority: 1
readOnlyRootFilesystem: false
requiredDropCapabilities: null
runAsUser:
type: RunAsAny
seLinuxContext:
type: RunAsAny
supplementalGroups:
type: RunAsAny
users: []
volumes:
- '*'
allowHostDirVolumePlugin: true
allowHostIPC: false
allowHostNetwork: true
allowHostPID: true
allowHostPorts: true
allowPrivilegeEscalation: true
allowPrivilegedContainer: true
allowedCapabilities: null

Pods will be applied with the custom SCC whenever they are restarted or during the upgrades.

~/op/install-cluster-jai-2# oc get scc | grep true
custom-scc true <no value> RunAsAny RunAsAny RunAsAny RunAsAny 1 false ["*"]
k10-k10 true [] RunAsAny MustRunAsNonRoot RunAsAny RunAsAny 0 false ["*"]
node-exporter true <no value> RunAsAny RunAsAny RunAsAny RunAsAny <no value> false ["*"]
privileged true ["*"] RunAsAny RunAsAny RunAsAny RunAsAny <no value> false ["*"]

#restart any one of the pod to see if the custom-scc gets applied.
~/op/install-cluster-jai-2# oc get pods -n openshift-oauth-apiserver
NAME READY STATUS RESTARTS AGE
apiserver-6f5fd9574c-72xgm 1/1 Running 0 77m
apiserver-6f5fd9574c-t2pmm 1/1 Running 0 23s
apiserver-6f5fd9574c-wkmpb 1/1 Running 0 4d17h

~/op/install-cluster-jai-2# oc get pod -o 'custom-columns=NAMESPACE:metadata.namespace,NAME:metadata.name,APPLIED SCC:metadata.annotations.openshift\.io/scc' -n openshift-oauth-apiserver
NAMESPACE NAME APPLIED SCC
openshift-oauth-apiserver apiserver-6f5fd9574c-72xgm node-exporter
openshift-oauth-apiserver apiserver-6f5fd9574c-t2pmm custom-scc
openshift-oauth-apiserver apiserver-6f5fd9574c-wkmpb node-exporter 

Additional information regarding K10-K10 SCC

It was observed that in a OCP cluster with K10 installed even though SCC specific were installed to the K10 pods, it was not used. Rather anyuid was used.

This is because the k10-k10 serviceaccount has cluster-admin access and the anyuid is the SCC that has the highest priority with a value of 10.

RedHat bugs related this issue