Kubelet fails to update node status

7/4/2016

On latest RHEL Atomic Host (Kubernetes 1.2) we are regularly seeing the following entries in the kubelet logs:

kubelet.go:2761] Error updating node status, will retry: nodes "x.y.z" cannot be updated: the object has been modified; please apply your changes to the latest version and try again

This causes the node to temporary go to NotReady. During these NotReady periods, the PODs on the node show Ready, but it looks like Kubernetes stops routing traffic to them, causing us a problem.

In the Go sources I can see that during the heartbeat of a Kubelet it does a GET to fetch the latest status, overwrites it with its own status, and sends the PUT back to the apiserver.

This is what we see in the logs:

Jul 15 12:42:45 lxa160j.srv.pl.ing.net kubelet[3736]: I0715 12:42:45.086322 3736 round_trippers.go:264] GET https://lxa160g.srv.pl.ing.net:6443/api/v1/nodes/lxa160j.srv.pl.ing.net Jul 15 12:42:45 lxa160j.srv.pl.ing.net kubelet[3736]: I0715 12:42:45.091579 3736 round_trippers.go:289] Response Status: 200 OK in 5 milliseconds Jul 15 12:42:45 lxa160j.srv.pl.ing.net kubelet[3736]: I0715 12:42:45.373091 3736 round_trippers.go:264] PUT https://lxa160g.srv.pl.ing.net:6443/api/v1/nodes/lxa160j.srv.pl.ing.net/status Jul 15 12:42:45 lxa160j.srv.pl.ing.net kubelet[3736]: I0715 12:42:45.409752 3736 round_trippers.go:289] Response Status: 200 OK in 36 milliseconds Jul 15 12:42:55 lxa160j.srv.pl.ing.net kubelet[3736]: I0715 12:42:55.411267 3736 round_trippers.go:264] GET https://lxa160g.srv.pl.ing.net:6443/api/v1/nodes/lxa160j.srv.pl.ing.net Jul 15 12:42:55 lxa160j.srv.pl.ing.net kubelet[3736]: I0715 12:42:55.431056 3736 round_trippers.go:289] Response Status: 200 OK in 19 milliseconds Jul 15 12:43:38 lxa160j.srv.pl.ing.net kubelet[3736]: I0715 12:43:38.020203 3736 round_trippers.go:264] PUT https://lxa160g.srv.pl.ing.net:6443/api/v1/nodes/lxa160j.srv.pl.ing.net/status Jul 15 12:43:38 lxa160j.srv.pl.ing.net kubelet[3736]: I0715 12:43:38.029575 3736 round_trippers.go:289] Response Status: 409 Conflict in 9 milliseconds Jul 15 12:43:38 lxa160j.srv.pl.ing.net kubelet[3736]: I0715 12:43:38.029772 3736 round_trippers.go:264] GET https://lxa160g.srv.pl.ing.net:6443/api/v1/nodes/lxa160j.srv.pl.ing.net Jul 15 12:43:38 lxa160j.srv.pl.ing.net kubelet[3736]: I0715 12:43:38.034980 3736 round_trippers.go:289] Response Status: 200 OK in 5 milliseconds Jul 15 12:43:38 lxa160j.srv.pl.ing.net kubelet[3736]: I0715 12:43:38.298752 3736 round_trippers.go:264] PUT https://lxa160g.srv.pl.ing.net:6443/api/v1/nodes/lxa160j.srv.pl.ing.net/status Jul 15 12:43:38 lxa160j.srv.pl.ing.net kubelet[3736]: I0715 12:43:38.320192 3736 round_trippers.go:289] Response Status: 200 OK in 21 milliseconds

So it takes a long time to fire a PUT after a successful GET. Why?

Thanks

-- Andrej
kubernetes

0 Answers