I recently upgraded my GKE cluster from 1.10.x to 1.11.x and since then my calico-node
pods fail to connect to the etcd cluster and end up in a CrashLoopBackOff
due to livenessProbe error.
I saw that the calico-etcd
DaemonSet has desired state 0 and was wondering about that. nodeSelector is at node-role.kubernetes.io/master=
.
From the logs of such calico-node
s:
2018-12-19 19:18:28.989 [INFO][7] etcd.go 373: Unhandled error: client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://10.96.232.136:6666 exceeded header timeout
2018-12-19 19:18:28.989 [INFO][7] startup.go 254: Unable to query node configuration Name="gke-brokerme-ubuntu-pool-852d0318-j5ft" error=client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://10.96.232.136:6666 exceeded header timeout
State of the DaemonSets:
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
calico-etcd 0 0 0 0 0 node-role.kubernetes.io/master= 3d
calico-node 2 2 0 2 0 <none> 3d
k get nodes --show-labels
:
NAME STATUS ROLES AGE VERSION LABELS
gke-brokerme-ubuntu-pool-852d0318-7v4m Ready <none> 4d v1.11.5-gke.5 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,cloud.google.com/gke-nodepool=ubuntu-pool,cloud.google.com/gke-os-distribution=ubuntu,failure-domain.beta.kubernetes.io/region=europe-west1,failure-domain.beta.kubernetes.io/zone=europe-west1-b,kubernetes.io/hostname=gke-brokerme-ubuntu-pool-852d0318-7v4m,os=ubuntu
gke-brokerme-ubuntu-pool-852d0318-j5ft Ready <none> 1h v1.11.5-gke.5 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/fluentd-ds-ready=true,beta.kubernetes.io/instance-type=n1-standard-2,beta.kubernetes.io/os=linux,cloud.google.com/gke-nodepool=ubuntu-pool,cloud.google.com/gke-os-distribution=ubuntu,failure-domain.beta.kubernetes.io/region=europe-west1,failure-domain.beta.kubernetes.io/zone=europe-west1-b,kubernetes.io/hostname=gke-brokerme-ubuntu-pool-852d0318-j5ft,os=ubuntu
I did not modify any calico manifests, they should be 1:1 provisioned by GKE.
I would expect either the calico-node
s connect to the etc of my Kubernetes cluster, or to a calico-etcd
provisioned by the DaemonSet. As there is no master node that I can control in GKE, I kind of get why calico-etcd
is at state 0, but then, to which etc are the calico-node
s supposed to connect? What's wrong with my small and basic setup?
We are aware of the issue of calico crash looping in GKE 1.11.x. You can fix this issue, by upgrading to newer versions. , I would recommend you to upgrade to the version '1.11.4-gke.12' or '1.11.3-gke.23' which does not have this issue.