I'm trying to figure out a way to restart container on failure and NOT remove and create a new container to take it's place. It would be a plus to be able to try restarting it, say 3 times, and then stop the pod.
I have a statefulset that looks like this (I removed some insignificant parts):
apiVersion: "apps/v1beta1"
kind: StatefulSet
metadata:
name: cassandra-stateful
spec:
serviceName: cassandra
replicas: 1
template:
metadata:
labels:
app: cassandra-stateful
spec:
# Only one Cassandra node should exist for one Kubernetes node.
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- cassandra
topologyKey: "kubernetes.io/hostname"
containers:
- name: cassandra
image: localrepo/cassandra-kube
ports:
- containerPort: 7000
name: intra-node
- containerPort: 7001
name: tls-intra-node
- containerPort: 7199
name: jmx
- containerPort: 9042
name: cql
lifecycle:
preStop:
exec:
command: ["pkill java && while ps -p 1 > /dev/null; do sleep 1; done"]
The reason I know it's recreating the pods is that I'm purposefully killing my process with:
pkill java && while ps -p 1 > /dev/null; do sleep 1; done
If I do a describe for the pod I can see it recreates the container instead of restarting:
$ kubectl describe po cassandra-stateful-0
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
11m 11m 1 default-scheduler Normal Scheduled Successfully assigned cassandra-stateful-0 to node-136-225-226-236
11m 11m 1 kubelet, node-136-225-226-236 spec.containers{cassandra} Normal Created Created container with id cf5bbdc2989e231cdad4bb16dd26ad55b9a016200842cc3b2a3915f3d618737f
11m 11m 1 kubelet, node-136-225-226-236 spec.containers{cassandra} Normal Started Started container with id cf5bbdc2989e231cdad4bb16dd26ad55b9a016200842cc3b2a3915f3d618737f
4m 4m 1 kubelet, node-136-225-226-236 spec.containers{cassandra} Normal Created Created container with id fb4869eb91313512dc56608a6ef3d24590c88234a0ef453cd7c16dcf625e1f37
4m 4m 1 kubelet, node-136-225-226-236 spec.containers{cassandra} Normal Started Started container with id fb4869eb91313512dc56608a6ef3d24590c88234a0ef453cd7c16dcf625e1f37
Is there any rule that make this possible?