I have a Google Kubernetes Engine cluster which until recently was happily pulling private container images from a Google Container Registry bucket. I haven't changed anything, but now when I update my Kubernetes Deployments, it's unable to launch new pods, and I get the following events:
Normal Pulling 14s kubelet, <node-id> pulling image "gcr.io/cloudsql-docker/gce-proxy:latest"
Normal Pulling 14s kubelet, <node-id> pulling image "gcr.io/<project-id>/backend:62d634e"
Warning Failed 14s kubelet, <node-id> Failed to pull image "gcr.io/<project-id>/backend:62d634e": rpc error: code = Unknown desc = unauthorized: authentication required
Warning Failed 14s kubelet, <node-id> Error: ErrImagePull
Normal Pulled 13s kubelet, <node-id> Successfully pulled image "gcr.io/cloudsql-docker/gce-proxy:latest"
Normal Created 13s kubelet, <node-id> Created container
Normal Started 13s kubelet, <node-id> Started container
Normal BackOff 11s (x2 over 12s) kubelet, <node-id> Back-off pulling image "gcr.io/<project-id>/backend:62d634e"
Warning Failed 11s (x2 over 12s) kubelet, <node-id> Error: ImagePullBackOff
I've checked the following things, which all seem to be as they should:
storage-ro
permissionI've also tried disabling and re-enabling the container.googleapis.com and containerregistry.googleapis.com services, but that doesn't help.
The Google documentation for the Container Registry states:
Kubernetes Engine clusters are automatically configured with access to pull private images from the Container Registry in the same project. You do not need to follow additional steps to configure authentication if the registry and the cluster are in the same Cloud project.
But this doesn't seem to be the case.
Can anyone shed additional light on what might be going on? Or additional steps to try?
In my case, the issue turned out to be that the node pools generated by a minimal spec file are missing the oauth2 scopes that give access to the registry. Adding
nodePools:
config:
oauthScopes:
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/servicecontrol
- https://www.googleapis.com/auth/service.management.readonly
- https://www.googleapis.com/auth/trace.append
to my spec fixed things. I think it's the devstorage
scope that's the important one, but I'm not sure since I just copy-pasted the whole list of scopes from the spec the web console generates.
Ok, this turned out to be tricky, but the cause was this:
I used Terraform to set the service account for the nodes in the GKE cluster, but instead of using the email
output of the google_service_account
resource to specify the service account, I used the unique_id
output instead. This was accepted fine by both Terraform and the Google Cloud API.
When Kubernetes (and other things) was trying to access the internal metadata API on each node to get an token it could use, it was receiving a response of Service account is invalid/disabled
and a 403 status.
Recreating the node pool with the correctly specified service account fixed the problem.
I got the same issue when I created a cluster with terraform. Firstly, I only specified service_account
in node_config
so node pool was made with too small OAuth scopes. Explicitly write both service_account
and oauth_scope
like below, nodes are able to pull images from private GCR repositories.
resource "google_container_node_pool" "primary_preemptible_nodes" {
node_config {
service_account = "${google_service_account.gke_nodes.email}"
oauth_scopes = [
"storage-ro",
"logging-write",
"monitoring"
]
}
}