GKE Cluster can't pull (ErrImagePull) from GCR Registry in same project (GitLab Kubernetes Integration): Why?

1/4/2019

So after googling a little bit (which is polluted by people having trouble with Pull Secrets) I am posting this here — and to GCP Support (will update as I hear).

I created a Cluster from GitLab Kubernetes integration (docs: https://about.gitlab.com/solutions/kubernetes) within the same project as my GCR registry / images.

When I add a new service / deployment to this Cluster using Kubectl (which relies on a private image within the GCR Registry in this project) the pods in the GitLab created cluster fail to pull from GCR with: ErrImagePull.

To be clear — I am NOT pulling from a GitLab private registry, I am attempting to pull from a GCR Registry within the same project as the GKE cluster created from GitLab (which should not require a Pull Secret).

Other Clusters (created from GCP console) within this project can properly access the same image so my thinking is that there is some difference between Clusters created via an API (in this case from GitLab) vs Clusters created from the GCP console.

I am hoping someone has run into this in the past — or can explain the differences in the Service Accounts etc that could be causing the problem.

I am going to attempt to create a service account and manually grant it Project Viewer role to see if that solves the problem.

Update: manually configured Service Account did not solve issue.

Note: I am trying to pull an image into the Cluster NOT into a GitLab Runner that is running on the Cluster. Ie. I want a separate Service / Deployment to be running along side my GitLab infrastructure.

-- Necevil
gitlab
google-cloud-platform
google-kubernetes-engine
kubernetes

2 Answers

1/5/2019

TL;DR — Clusters created by GitLab-Ci Kubernetes Integration will not be able to pull an image from a GCR Registry in the same project as the container images — without modifying the Node(s) permissions (scopes).

While you CAN manually modify the permissions on an Individual Node machine(s) to grant the Application Default Credentials (see: https://developers.google.com/identity/protocols/application-default-credentials) the proper scopes in real time — doing it this way would mean that if your node is re-created at some point in the future it WOULD NOT have your modified scopes and things would break.

Instead of modifying the permissions manually — create a new Node pool that has the proper Scope(s) to access your required GCP services.

Here are some resources I used for reference:

  1. https://medium.com/google-cloud/updating-google-container-engine-vm-scopes-with-zero-downtime-50bff87e5f80
  2. https://adilsoncarvalho.com/changing-a-running-kubernetes-cluster-permissions-a-k-a-scopes-3e90a3b95636

Creating a properly Scoped Node Pool Generally looks like this

gcloud container node-pools create [new pool name] \
 --cluster [cluster name] \
 --machine-type [your desired machine type] \
 --num-nodes [same-number-nodes] \
 --scopes [your new set of scopes]

If you aren't sure what the names of your required Scopes are — You can see a full list of Scopes AND Scope Aliases over here: https://cloud.google.com/sdk/gcloud/reference/container/node-pools/create

For me I did gke-default (same as my other cluster) and sql-admin. The reason for this being that I need to be able to access an SQL Database in Cloud SQL during part of my build and I don't want to have to connect to a pubic IP to do that.

gke-default Scopes (for reference)

  1. https://www.googleapis.com/auth/devstorage.read_only (allows you to pull)
  2. https://www.googleapis.com/auth/logging.write
  3. https://www.googleapis.com/auth/monitoring
  4. https://www.googleapis.com/auth/service.management.readonly
  5. https://www.googleapis.com/auth/servicecontrol
  6. https://www.googleapis.com/auth/trace.append

Contrast the above with more locked down permissions from a GitLab-Ci created cluster ( ONLY these two: https://www.googleapis.com/auth/logging.write, https://www.googleapis.com/auth/monitoring):

Obviosuly configuring your cluster to ONLY the minimum permissions needed is for sure the way to go here. Once you figure out what that is and create your new properly scoped Node Pool...

List your nodes with:

kubectl get nodes

The one you just created (most recent) is has the new settings while the older option is the default gitlab cluster that can pull from the GCR.

Then:

kubectl cordon [your-node-name-here]

After that you want to drain:

kubectl drain [your-node-name-here] --force

I ran into issues where the fact that I had a GitLab Runner installed meant that I couldn't drain the pods normally due to the local data / daemon set that was used to control it.

For that reason once I cordon'd my Node I just deleted the node from Kubectl (not sure if this will cause problems — but it was fine for me). Once your node is deleted you need to delete the 'default-pool' node pool created by GitLab.

List your node-pools:

gcloud container node-pools list --cluster [CLUSTER_NAME]

See the old scopes created by gitlab:

gcloud container node-pools describe default-pool \
    --cluster [CLUSTER_NAME]

Check to see if you have the correct new scopes (that you just added):

gcloud container node-pools describe [NEW_POOL_NAME] \
    --cluster [CLUSTER_NAME]

If your new Node Pool has the right scopes your deployments can now delete the default pool with:

gcloud container node-pools delete default-pool \
   --cluster <YOUR_CLUSTER_NAME> --zone <YOUR_ZONE>

In my personal case I am still trying to figure out how to allow access to the private network (ie. get to Cloud SQL via private IP) but I can pull my images now so I am half way there.

I think that's it — hope it saved you a few minutes!

-- Necevil
Source: StackOverflow

1/4/2019

TL;DR — Clusters created by GitLab-Ci Kubernetes Integration will not be able to pull an image from a GCR Registry in the same project as the container images — without modifying the Node(s) permissions (scopes).

By default the Cluster Nodes created by a Cluster which was itself created by GitLab-Ci's Kubernetes Integration are created with minimal permissions (scopes) to Google Cloud services.

You can see this visually from the GCP console dashboard for your cluster, scroll down to the permissions section and look for "Storage":

enter image description here

This essentially means that the Node(s) running within your GitLab-Ci Kubernetes integration cluster WILL NOT have the default GCR Registry (read-only) permissions necessary to pull an image from a GCR Registry.

It also means (as far as I can tell) that even if you grant a Service Account proper access to the GCR Registry it still will not work — not totally sure I set my Service Account up properly but I believe I did.

Great.

How to fix Permissions

Basically you have two options. The first one is to create a Cluster (ie. outside of GitLab Kubernetes Integration) and then re-connect your GitLab project to THAT Cluster by following the manual 'connect to an existing Cluster' directions that can be found here: https://gitlab.com/help/user/project/clusters/index#adding-an-existing-kubernetes-cluster

The second option is to modify your permissions but that is more complicated.

-- Necevil
Source: StackOverflow