Kubernetes intermittent network failures with anonymous-auth=false

5/15/2018

We have a kubernetes cluster (v1.10.2) that behaves very strange when we set to false the "anonymous-auth" flag. We use the flannel network plugin and the logs in the flannel docker related to this issue are the following:

E0515 09:59:13.396856       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=5, ErrCode=NO_ERROR, debug=""
E0515 09:59:13.397503       1 reflector.go:304] github.com/coreos/flannel/subnet/kube/kube.go:295: Failed to watch *v1.Node: Get https://10.96.0.1:443/api/v1/nodes?resourceVersion=167760&timeoutSeconds=469&watch=true: dial tcp 10.96.0.1:443: getsockopt: connection refused
E0515 09:59:14.398383       1 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:295: Failed to list *v1.Node: Get https://10.96.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: connection refused
E0515 09:59:15.419773       1 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:295: Failed to list *v1.Node: Get https://10.96.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: connection refused
E0515 09:59:16.420411       1 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:295: Failed to list *v1.Node: Get https://10.96.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: connection refused

Also, the events (kubectl get events) related to this issue seems to be:

kube-system ........ kube-apiserver-{hash} Pod spec.containers{kube-apiserver} Warning Unhealthy kubelet, {...} Liveness probe failed: HTTP probe failed with statuscode: 401

This problem seems to happen randomly, every few minutes, the kubernetes cluster is not responsive during this time (The connection to the server xx.xx.xx.xx was refused - did you specify the right host or port?) for a few minutes (random again) and recovers by itself.

-- Adi Fatol
flannel
kubernetes
networking

1 Answer

5/16/2018

are you running the apiserver as a static pod? does it have a liveness check defined to call the /healthz endpoint? if so, that is likely run as an anonymous user and is failing when anonymous requests are disabled, causing the apiserver pod to be restarted

-- Jordan Liggitt
Source: StackOverflow