Kubernetes: ClusterRole created in the cluster are not visible during rbac checks

4/2/2020

I have a problem in my Kubernetes cluster, that suddendly appeared two weeks ago. The ClusterRoles I create are not visible when RBAC for a given ServiceAccount are resolved. Here is a minimal set to reproduce the problem.

Create relevant ClusterRole, ClusterRoleBinding and a ServiceAccount in the default namespace to have the rights to see Endpoints with this SA.

# test.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: test-sa
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: test-cr
rules:
- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: test-crb
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: test-cr
subjects:
- kind: ServiceAccount
  name: test-sa
  namespace: default
$ kubectl apply -f test.yaml
serviceaccount/test-sa created
clusterrole.rbac.authorization.k8s.io/test-cr created
clusterrolebinding.rbac.authorization.k8s.io/test-crb created

All objects, in particular the ClusterRole, are visible if requested directly.

$ kubectl get serviceaccount test-sa
NAME      SECRETS   AGE
test-sa   1         57s

$ kubectl get clusterrolebinding test-crb
NAME       AGE
test-crb   115s

$ kubectl get clusterrole test-cr
NAME      AGE
test-cr   2m19s

However, when I try to resolve the effective rights for this ServiceAccount, here the error I get back:

$ kubectl auth can-i get endpoints --as=system:serviceaccount:default:test-sa
no - RBAC: clusterrole.rbac.authorization.k8s.io "test-cr" not found

The RBAC rules created before the breakage are working properly. For instance, here for the ServiceAccount of my etcd-operator that I deployed with Helm several months ago:

$ kubectl auth can-i get endpoints --as=system:serviceaccount:etcd:etcd-etcd-operator-etcd-operator
yes

The version of Kubernetes in this cluster is the 1.17.0-0.

I am also seeing very slow deployements lately of new Pods, that can take up to 5 mins to start to be deployed after they have been created by a StatefulSet or a Deployment, if this can help.

Do you have any insight of what is going on, or even what I could do about it? Please note that my Kubernetes cluster is managed, so I do not have any control on the underlying system, I just have the cluster-admin privileges as a customer. But it would greatly help anyway if I could give any direction to the administrators.

Thanks in advance!

-- Adrien Ferrand
kubectl
kubernetes
rbac

1 Answer

4/22/2020

Thanks a lot for your answers!

It turned out that we will certainly never have the final world about what happen. The cluster provider just restarted the kube-apiserver, and this fixed the issue.

I suppose that something went wrong like caching or other transient failures, that can not be defined as a reproductible error.

To give a little more data for a future reader, the error occured on a Kubernetes cluster managed by OVH, and their specificity is to run the control plane itself as pods deployed in a master Kubernetes cluster on their side.

-- Adrien Ferrand
Source: StackOverflow