I'm running a k8s cluster on Google GKE where I have a statefulsets running Redis and ElasticSearch. So every now and then the pods end up in a completed state and so they aren't running anymore and my services depending on it fail. These pods will also never restart by themselves, a simple kubectl delete pod x
will resolve the problem but I want my pods to heal by themselves. I'm running the latest version available 1.6.4, I have no clue why they aren't pickup and restarted like any other regular pod. Maybe I'm missing something obvious.
edit: I've also notice the pod get a termination signal and shuts down properly so I'm wondering where that is coming from. I'm not manually shutting down and I experience the same with ElasticSearch
This is my statefulset resource declaration:
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: redis
spec:
serviceName: "redis"
replicas: 1
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:3.2-alpine
ports:
- name: redis-server
containerPort: 6379
volumeMounts:
- name: redis-storage
mountPath: /data
volumeClaimTemplates:
- metadata:
name: redis-storage
annotations:
volume.alpha.kubernetes.io/storage-class: anything
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
I am using same configuration as you but removing the annotation in the volumeClaimTemplates
since I am trying this on minikube:
$ cat sc.yaml
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: redis
spec:
serviceName: "redis"
replicas: 1
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:3.2-alpine
ports:
- name: redis-server
containerPort: 6379
volumeMounts:
- name: redis-storage
mountPath: /data
volumeClaimTemplates:
- metadata:
name: redis-storage
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
Now trying to simulate the case where redis
fails, so execing into the pod and killing the redis server process:
$ k exec -it redis-0 sh
/data # kill 1
/data # $
See that immediately after the process dies I can see that the STATUS
has changed to Completed
:
$ k get pods
NAME READY STATUS RESTARTS AGE
redis-0 0/1 Completed 1 38s
It took some time for me to get the redis
up and running:
$ k get pods
NAME READY STATUS RESTARTS AGE
redis-0 1/1 Running 2 52s
But soon after that I could see it starting the pod, can you see the events triggered when this happened? Like was there a problem when re-attaching the volume to the pod?
Check the version of docker you run, and whether the docker daemon was restarted during that time.
If the docker daemon was restarted, all the container would be terminated (unless you use the new "live restore" feature in 1.12). In some docker versions, docker may incorrectly reports "exit code 0" for all containers terminated in this situation. See https://github.com/docker/docker/issues/31262 for more details.