Configure Kubernetes StatefulSet to start pods first restart failed containers after start?

10/23/2017

Basic info

Hi, I'm encountering a problem with Kubernetes StatefulSets. I'm trying to spin up a set with 3 replicas. These replicas/pods each have a container which pings a container in the other pods based on their network-id. The container requires a response from all the pods. If it does not get a response the container will fail. In my situation I need 3 pods/replicas for my setup to work.

Problem description

What happens is the following. Kubernetes starts 2 pods rather fast. However since I need 3 pods for a fully functional cluster the first 2 pods keep crashing as the 3rd is not up yet. For some reason Kubernetes opts to keep restarting both pods instead of adding the 3rd pod so my cluster will function.

I've seen my setup run properly after about 15 minutes because Kubernetes added the 3rd pod by then.

Question

So, my question.

Does anyone know a way to delay restarting failed containers until the desired amount of pods/replicas have been booted?

-- Byebye
containers
kubernetes
restart
statefulset

2 Answers

10/25/2017

I think a better way to deal with your problem is to leverage liveness probe, as described in the document, rather than delay the restart time (not configurable in the YAML).

Your pods respond to the liveness probe right after they are started to let Kubernetes know they are alive, which prevents them from being restarted. Meanwhile, your pods keep ping others until they are all up. Only when all your pods are started will serve the external requests. This is similar to creating a Zookeeper ensemble.

-- Jimmy Lu
Source: StackOverflow

1/21/2018

I've since found out the cause of this. StatefulSets launch pods in a specific order. If one of the pods fails to launch it does not launch the next one.

You can add a podManagementPolicy: "Parallel" to launch the pods without waiting for previous pods to be Running. See this documentation

-- Byebye
Source: StackOverflow