Kubernetes statefulset ends in completed state

6/15/2017

I'm running a k8s cluster on Google GKE where I have a statefulsets running Redis and ElasticSearch. So every now and then the pods end up in a completed state and so they aren't running anymore and my services depending on it fail. These pods will also never restart by themselves, a simple kubectl delete pod x will resolve the problem but I want my pods to heal by themselves. I'm running the latest version available 1.6.4, I have no clue why they aren't pickup and restarted like any other regular pod. Maybe I'm missing something obvious.

edit: I've also notice the pod get a termination signal and shuts down properly so I'm wondering where that is coming from. I'm not manually shutting down and I experience the same with ElasticSearch

This is my statefulset resource declaration:

---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: redis
spec:
  serviceName: "redis"
  replicas: 1
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:3.2-alpine
        ports:
          - name: redis-server
            containerPort: 6379
        volumeMounts:
        - name: redis-storage
          mountPath: /data
  volumeClaimTemplates:
  - metadata:
      name: redis-storage
      annotations:
        volume.alpha.kubernetes.io/storage-class: anything
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi
-- Niels
kubernetes
statefulset

2 Answers

6/16/2017

I am using same configuration as you but removing the annotation in the volumeClaimTemplates since I am trying this on minikube:

$ cat sc.yaml 
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: redis
spec:
  serviceName: "redis"
  replicas: 1
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:3.2-alpine
        ports:
          - name: redis-server
            containerPort: 6379
        volumeMounts:
        - name: redis-storage
          mountPath: /data
  volumeClaimTemplates:
  - metadata:
      name: redis-storage
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi

Now trying to simulate the case where redis fails, so execing into the pod and killing the redis server process:

$ k exec -it redis-0 sh
/data # kill 1
/data # $

See that immediately after the process dies I can see that the STATUS has changed to Completed:

$ k get pods                                                                                                                  
NAME      READY     STATUS      RESTARTS   AGE
redis-0   0/1       Completed   1          38s

It took some time for me to get the redis up and running:

$ k get pods
NAME      READY     STATUS    RESTARTS   AGE
redis-0   1/1       Running   2          52s

But soon after that I could see it starting the pod, can you see the events triggered when this happened? Like was there a problem when re-attaching the volume to the pod?

-- surajd
Source: StackOverflow

6/16/2017

Check the version of docker you run, and whether the docker daemon was restarted during that time.

If the docker daemon was restarted, all the container would be terminated (unless you use the new "live restore" feature in 1.12). In some docker versions, docker may incorrectly reports "exit code 0" for all containers terminated in this situation. See https://github.com/docker/docker/issues/31262 for more details.

source: https://stackoverflow.com/a/43051371/5331893

-- janetkuo
Source: StackOverflow