Kubernetes Engine unable to pull image from non-private / GCR repository

4/27/2018

I was happily deploying to Kubernetes Engine for a while, but while working on an integrated cloud container builder pipeline, I started getting into trouble.

I don't know what changed. I can not deploy to kubernetes anymore, even in ways I did before without cloud builder.

The pods rollout process gives an error indicating that it is unable to pull from the registry. Which seems weird because the images exist (I can pull them using cli) and I granted all possibly related permissions to my user and the cloud builder service account.

I get the error ImagePullBackOff and see this in the pod events:

Failed to pull image "gcr.io/my-project/backend:f4711979-eaab-4de1-afd8-d2e37eaeb988": rpc error: code = Unknown desc = unauthorized: authentication required

What's going on? Who needs authorization, and for what?

-- Thijs Koerselman
google-cloud-platform
google-kubernetes-engine

2 Answers

9/5/2018

In my case, my cluster didn't have the Storage read permission, which is necessary for GKE to pull an image from GCR.

My cluster didn't have proper permissions because I created the cluster through terraform and didn't include the node_config.oauth_scopes block. When creating a cluster through the console, the Storage read permission is added by default.

-- Scott Jungwirth
Source: StackOverflow

5/4/2018

The credentials in my project somehow got messed up. I solved the problem by re-initializing a few APIs including Kubernetes Engine, Deployment Manager and Container Builder.

First time I tried this I didn't succeed, because to disable something you have to disable first all the APIs that depend on it. If you do this via the GCloud web UI then you'll likely see a list of services that are not all available for disabling in the UI.

I learned that using the gcloud CLI you can list all APIs of your project and disable everything properly.

Things worked after that.

The reason I knew things were messed up, is because I had a copy of the same things as a production environment, and there these problems did not exist. The development environment had a lot of iterations and messing around with credentials, so somewhere things got corrupted.

These are some examples of useful commands:

gcloud projects get-iam-policy $PROJECT_ID

gcloud services disable container.googleapis.com --verbosity=debug

gcloud services enable container.googleapis.com

More info here, including how to restore service account credentials.

-- Thijs Koerselman
Source: StackOverflow