I have a problem in my Kubernetes cluster, that suddendly appeared two weeks ago. The ClusterRoles I create are not visible when RBAC for a given ServiceAccount are resolved. Here is a minimal set to reproduce the problem.
Create relevant ClusterRole, ClusterRoleBinding and a ServiceAccount in the default
namespace to have the rights to see Endpoints with this SA.
# test.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: test-sa
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: test-cr
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: test-crb
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: test-cr
subjects:
- kind: ServiceAccount
name: test-sa
namespace: default
$ kubectl apply -f test.yaml
serviceaccount/test-sa created
clusterrole.rbac.authorization.k8s.io/test-cr created
clusterrolebinding.rbac.authorization.k8s.io/test-crb created
All objects, in particular the ClusterRole, are visible if requested directly.
$ kubectl get serviceaccount test-sa
NAME SECRETS AGE
test-sa 1 57s
$ kubectl get clusterrolebinding test-crb
NAME AGE
test-crb 115s
$ kubectl get clusterrole test-cr
NAME AGE
test-cr 2m19s
However, when I try to resolve the effective rights for this ServiceAccount, here the error I get back:
$ kubectl auth can-i get endpoints --as=system:serviceaccount:default:test-sa
no - RBAC: clusterrole.rbac.authorization.k8s.io "test-cr" not found
The RBAC rules created before the breakage are working properly. For instance, here for the ServiceAccount of my etcd-operator that I deployed with Helm several months ago:
$ kubectl auth can-i get endpoints --as=system:serviceaccount:etcd:etcd-etcd-operator-etcd-operator
yes
The version of Kubernetes in this cluster is the 1.17.0-0
.
I am also seeing very slow deployements lately of new Pods, that can take up to 5 mins to start to be deployed after they have been created by a StatefulSet or a Deployment, if this can help.
Do you have any insight of what is going on, or even what I could do about it? Please note that my Kubernetes cluster is managed, so I do not have any control on the underlying system, I just have the cluster-admin
privileges as a customer. But it would greatly help anyway if I could give any direction to the administrators.
Thanks in advance!
Thanks a lot for your answers!
It turned out that we will certainly never have the final world about what happen. The cluster provider just restarted the kube-apiserver, and this fixed the issue.
I suppose that something went wrong like caching or other transient failures, that can not be defined as a reproductible error.
To give a little more data for a future reader, the error occured on a Kubernetes cluster managed by OVH, and their specificity is to run the control plane itself as pods deployed in a master Kubernetes cluster on their side.