Restarting pods quickly

12/18/2015

I have been experimenting with kubernetes recently, and I have been trying to test the failover in pods, by having a replication controller, in which containers crash as soon as they are used (thus causing a restart).

I have adapted the bashttpd project for this: https://github.com/Chronojam/bashttpd

(Where in I have set it up so that it serves the hostname of the container, then exits)

This works great, except the restart is far to slow for what I am trying to do, as it works for the first couple of requests, then stops for a while - then starts working again when the pods are restarted. (ideally id like to see no interruption at all when accessing the service).

I think (but not sure) that the backup delay mentioned here is to blame: https://github.com/kubernetes/kubernetes/blob/master/docs/user-guide/pod-states.md#restartpolicy

some output:

#] kubectl get pods
NAME                         READY     STATUS    RESTARTS   AGE
chronojam-blog-a23ak         1/1       Running   0          6h
chronojam-blog-abhh7         1/1       Running   0          6h
chronojam-serve-once-1cwmb   1/1       Running   7          4h
chronojam-serve-once-46jck   1/1       Running   7          4h
chronojam-serve-once-j8uyc   1/1       Running   3          4h
chronojam-serve-once-r8pi4   1/1       Running   7          4h
chronojam-serve-once-xhbkd   1/1       Running   4          4h
chronojam-serve-once-yb9hc   1/1       Running   7          4h
chronojam-tactics-is1go      1/1       Running   0          5h
chronojam-tactics-tqm8c      1/1       Running   0          5h
#] curl http://serve-once.chronojam.co.uk
<h3> chronojam-serve-once-j8uyc </h3>
#] curl http://serve-once.chronojam.co.uk
<h3> chronojam-serve-once-r8pi4 </h3>
#] curl http://serve-once.chronojam.co.uk
<h3> chronojam-serve-once-yb9hc </h3>
#] curl http://serve-once.chronojam.co.uk
<h3> chronojam-serve-once-46jck </h3>
#] curl http://serve-once.chronojam.co.uk
#] curl http://serve-once.chronojam.co.uk

You'll also note that even though there should be 2 still-healthy pods there, it stops returning after the 4th.

So my question is two fold:

1)

Can I tweak the backoff delay?

2)

Why does my service not send my request to the healthy containers?

Observations:

I think that it might be the webserver itself not being able to start serving requests that quickly, so kubernetes is reckonizing those pods as healthy, and sending requests there (but coming back with nothing because the process hasnt started?)

-- Chronojam
kubernetes

1 Answer

2/2/2016

I filed an issue to document the recommended practice. I put a sketch of the approach in the issue:

https://github.com/kubernetes/kubernetes/issues/20473

  • ensure the pods have a non-zero terminationGracePeriodSeconds set
  • configure a readinessProbe on the main serving container of the pods
  • handle SIGTERM in the application: fail the readinessProbe but continue * to handle normal requests and do not exit
  • set maxUnavailable and/or maxSurge large enough to ensure enough serving instances in the Deployment API spec (available in 1.2)

Container restarts, especially when they pull images, are fairly expensive for the system. The Kubelet backs off restarts of crashing containers in order to degrade gracefully with DOSing docker, the registry, the apiserver, etc.

-- briangrant
Source: StackOverflow