Kubernetes cluster cannot attach and mount automatically created google cloud platform disks to worker nodes

5/29/2019

Basically I'm trying to deploy a cluster on GCE via kubeadm with StorageClass supported (without using Google Kubernetes Engine).

Say I deployed a cluster with a master node at Tokyo, and 3 work nodes in Hong Kong, Taiwan and Oregon.

NAME              STATUS   ROLES    AGE     VERSION
k8s-node-hk       Ready    <none>   3h35m   v1.14.2
k8s-node-master   Ready    master   3h49m   v1.14.2
k8s-node-oregon   Ready    <none>   3h33m   v1.14.2
k8s-node-tw       Ready    <none>   3h34m   v1.14.2

Kube-controller-manager and kubelet both started with cloud-provider=gce, and now I can apply StorageClass and PersistentVolumeClaim then get disks created automatically (say a disk in Taiwan) on GCP and get PV and PVC bound.

kubectl get pvc:

NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
eth-pvc   Bound    pvc-bf35e3c9-81e2-11e9-8926-42010a920002   10Gi       RWO            slow           137m

kubectl get pv

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM             STORAGECLASS   REASON   AGE
pvc-bf35e3c9-81e2-11e9-8926-42010a920002   10Gi       RWO            Delete           Bound    default/eth-pvc   slow                    137m

However, kube-controller-manager cannot find the Taiwan node and mount the disk to the node in same zone, and it logged (we can see the zone asia-northeast1-a is not correct):

I0529 07:25:46.366500       1 reconciler.go:288] attacherDetacher.AttachVolume started for volume "pvc-bf35e3c9-81e2-11e9-8926-42010a920002" (UniqueName: "kubernetes.io/gce-pd/kubernetes-dynamic-pvc-bf35e3c9-81e2-11e9-8926-42010a920002") from node "k8s-node-tokyo" 
E0529 07:25:47.296824       1 attacher.go:102] Error attaching PD "kubernetes-dynamic-pvc-bf35e3c9-81e2-11e9-8926-42010a920002" to node "k8s-node-tokyo": GCE persistent disk not found: diskName="kubernetes-dynamic-pvc-bf35e3c9-81e2-11e9-8926-42010a920002" zone="asia-northeast1-a"

The kubelet on each node started with --cloud-provider=gce, but I didn't find how to configure the zone. And when I checked kubelet's log, I find this flag is already deprecated on kubernetes v1.14.2 (the latest in May 2019).

May 29 04:36:03 k8s-node-tw kubelet[29971]: I0529 04:36:03.623704   29971 server.go:417] Version: v1.14.2
May 29 04:36:03 k8s-node-tw kubelet[29971]: W0529 04:36:03.624039   29971 plugins.go:118] WARNING: gce built-in cloud provider is now deprecated. The GCE provider is deprecated and will be removed in a future release

However, kubelet annotated k8s-node-tw node with the correct zone and region:

May 29 04:36:05 k8s-node-tw kubelet[29971]: I0529 04:36:05.157665   29971 kubelet_node_status.go:331] Adding node label from cloud provider: beta.kubernetes.io/instance-type=n1-standard-1
May 29 04:36:05 k8s-node-tw kubelet[29971]: I0529 04:36:05.158093   29971 kubelet_node_status.go:342] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/zone=asia-east1-a
May 29 04:36:05 k8s-node-tw kubelet[29971]: I0529 04:36:05.158201   29971 kubelet_node_status.go:346] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/region=asia-east1

Thanks for reading here. My question is:

If it's possible, how can I configure kubelet or kube-controller-manager correctly to make it support GCP's storage class and the created disks attached and mounted successfully?

\==================K8s config files======================

Deployment (related part):

  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: eth-pvc
  - name: config
    configMap:
      name: {{ template "ethereum.fullname" . }}-geth-config
  - name: account
    secret:
      secretName: {{ template "ethereum.fullname" . }}-geth-account

PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: eth-pvc
spec:
  storageClassName: slow
  resources:
    requests:
      storage: 10Gi
  accessModes:
    - ReadWriteOnce

SC:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: slow
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-standard
  replication-type: none
  zone: asia-east1-a
-- kigawas
google-cloud-platform
google-compute-engine
kubeadm
kubelet
kubernetes

1 Answer

5/30/2019

After several days' research, I found that the reason is:

  1. Master node is in asia-northeast1-a (Tokyo)
  2. Worker nodes are in asia-east1-a (Taiwan) and other zones
  3. cloud-provider-gcp only search the zone in one region (normally the master node's zone, but you can specify it by setting local-zone in the cloud config file), which means it can only support one zone or multiple zones in one region by default

Conclusion:

In order to support multiple zones among multiple regions, we need to modify the gce provider code of configuration, like add another field to configure which zones should be searched.

\==========================UPDATE=========================

I modified the k8s code to add a extra-zones config field like this diff on github to make it work on my use case.

-- kigawas
Source: StackOverflow