When I create a GKE Cluster with a route through a NAT I am unable to pull a docker image because of permission issues

11/13/2018

I can create a regular GKE cluster and pull the docker image I need and get it running. When I create the GKE cluster with a routing rule through a NAT my user no longer has permission to pull the docker image.

I start the cluster with these settings:

resources:
######## Network ############
- name: gke-nat-network
  type: compute.v1.network
   properties: 
    autoCreateSubnetworks: false
######### Subnets ##########
######### For Cluster #########
- name: gke-cluster-subnet 
  type: compute.v1.subnetwork
   properties:
    network: $(ref.gke-nat-network.selfLink)
     ipCidrRange: 172.16.0.0/12
     region: us-east1
 ########## NAT Subnet ##########
 - name: nat-subnet
  type: compute.v1.subnetwork
   properties: 
    network: $(ref.gke-nat-network.selfLink)
    ipCidrRange: 10.1.1.0/24
    region: us-east1
########## NAT VM ##########
- name: nat-vm
  type: compute.v1.instance 
   properties:
    zone: us-east1-b
    canIpForward: true
    tags:
      items:
      - nat-to-internet
    machineType: https://www.googleapis.com/compute/v1/projects/{{ 
env["project"] }}/zones/us-east1-b/machineTypes/f1-micro
    disks:
      - deviceName: boot
        type: PERSISTENT
        boot: true
        autoDelete: true
        initializeParams:
          sourceImage: 
https://www.googleapis.com/compute/v1/projects/debian- 
cloud/global/images/debian-7-wheezy-v20150423
     networkInterfaces:
     - network: projects/{{ env["project"] }}/global/networks/gke-nat- 
 network
      subnetwork: $(ref.nat-subnet.selfLink)
       accessConfigs:
       - name: External NAT
         type: ONE_TO_ONE_NAT
     metadata:
       items:
       - key: startup-script
        value: |
          #!/bin/sh
          # --
          # ---------------------------
          # Install TCP DUMP
          # Start nat; start dump
          # ---------------------------
          apt-get update
          apt-get install -y tcpdump
          apt-get install -y tcpick 
          iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
          nohup tcpdump -e -l -i eth0 -w /tmp/nat.pcap &
          nohup tcpdump -e -l -i eth0 > /tmp/nat.txt &
          echo 1 | tee /proc/sys/net/ipv4/ip_forward
 ########## FIREWALL RULES FOR NAT VM ##########
 - name: nat-vm-firewall 
   type: compute.v1.firewall
   properties: 
    allowed:
    - IPProtocol : tcp
      ports: []
    sourceTags: 
    - route-through-nat
    network: $(ref.gke-nat-network.selfLink)
 - name: nat-vm-ssh
  type: compute.v1.firewall
  properties: 
    allowed:
     - IPProtocol : tcp
       ports: [22]
     sourceRanges: 
    - 0.0.0.0/0
    network: $(ref.gke-nat-network.selfLink)
 ########## GKE CLUSTER CREATION ##########
 - name: nat-gke-cluster
   type: container.v1.cluster
   metadata: 
   dependsOn:
   - gke-nat-network 
   - gke-cluster-subnet
   properties: 
    cluster: 
      name: nat-gke-cluster 
      initialNodeCount: 1
      network: gke-nat-network
      subnetwork: gke-cluster-subnet
      nodeConfig:
        machineType: n1-standard-4
        tags:
        - route-through-nat
    zone: us-east1-b
########## GKE MASTER ROUTE ##########
- name: master-route
  type: compute.v1.route
  properties:
    destRange: $(ref.nat-gke-cluster.endpoint)
    network: $(ref.gke-nat-network.selfLink)
    nextHopGateway: projects/{{ env["project"] 
}}/global/gateways/default-internet-gateway
    priority: 100
    tags:
    - route-through-nat
########## NAT ROUTE ##########
 - name: gke-cluster-route-through-nat
  metadata: 
    dependsOn:
    - nat-gke-cluster  
    - gke-nat-network
   type: compute.v1.route
   properties: 
    network: $(ref.gke-nat-network.selfLink)
     destRange: 0.0.0.0/0
     description: "route all other traffic through nat"
     nextHopInstance: $(ref.nat-vm.selfLink)
    tags:
    - route-through-nat
    priority: 800

When I try to pull and start a docker image I get:

ImagePullBackOff error Google Kubernetes Engine

When I do kubectl describe pod I get:

Failed to pull image : rpc error: code = Unknown desc = unauthorized: authentication required

Edit:

I have found out that the gcloud console command has changed since v1.10 https://cloud.google.com/kubernetes-engine/docs/how-to/access-scopes

Basically certain roles are not allowed by default for these clusters which includes pulling an image from google storage.

I am still having trouble figuring out how assign these roles while using

gcloud deployment-manager deployments create gke-with-nat --config gke-with-nat-route.yml

-- Apothan
docker
google-cloud-platform
google-kubernetes-engine
kubernetes

1 Answer

11/20/2018

So the reason the container images were not pulling is because gcloud clusters have changed how they handle permissions. It used to grant the 'storage-ro' role to new clusters allowing them to pull container images from the container registry. As per https://cloud.google.com/kubernetes-engine/docs/how-to/access-scopes .

I had to add scopes to the YML cluster deployment as I create my deployment using

gcloud deployment-manager deployments create gke-with-nat --config gke-with-nat-route.yml

The new YML included these settings

nodeConfig:
    serviceAccount: thisuser@project-id.iam.gserviceaccount.com
    oauthScopes:
      - https://www.googleapis.com/auth/devstorage.read_only

If you are using cluster create I think you can use

gcloud container clusters create example-cluster --scopes scope1,scope2

If you are using the website UI I think you can choose to use the legacy setting using a checkbox in the UI. I am not sure how long this will be supported.

-- Apothan
Source: StackOverflow