Kubernetes Keeps Restarting Pods of StatefulSet in Minikube With "Need to kill pod"

1/16/2018

Minikube version v0.24.1

kubernetes version 1.8.0

The problem that I am facing is that I have several statefulsets created in minikube each with one pod.

Sometimes when I start up minikube my pods will start up initially then keep being restarted by kubernetes. They will go from the creating container state, to running, to terminating over and over.

Now I've seen kubernetes kill and restart things before if kubernetes detects disk pressure, memory pressure, or some other condition like that, but that's not the case here as these flags are not raised and the only message in the pod's event log is "Need to kill pod".

What's most confusing is that this issue doesn't happen all the time, and I'm not sure how to trigger it. My minikube setup will work for a week or more without this happening then one day I'll start minikube up and the pods for my statefulsets just keep restarting. So far the only workaround I've found is to delete my minikube instance and set it up again from scratch, but obviously this is not ideal.

Seen here is a sample of one of the statefulsets whose pod keeps getting restarted. Seen in the logs kubernetes is deleting the pod and starting it again. This happens repeatedly. I'm unable to figure out why it keeps doing that and why it only gets into this state sometimes.

$ kubectl describe statefulsets mongo --namespace=storage
Name:               mongo
Namespace:          storage
CreationTimestamp:  Mon, 08 Jan 2018 16:11:39 -0600
Selector:           environment=test,role=mongo
Labels:             name=mongo
Annotations:        kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"apps/v1beta1","kind":"StatefulSet","metadata":{"annotations":{},"labels":{"name":"mongo"},"name":"mongo","namespace":"storage"},"...
Replicas:           1 desired | 1 total
Pods Status:        1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  environment=test
           role=mongo
  Containers:
   mongo:
    Image:  mongo:3.4.10-jessie
    Port:   27017/TCP
    Command:
      mongod
      --replSet
      rs0
      --smallfiles
      --noprealloc
    Environment:  <none>
    Mounts:
      /data/db from mongo-persistent-storage (rw)
   mongo-sidecar:
    Image:  cvallance/mongo-k8s-sidecar
    Port:   <none>
    Environment:
      MONGO_SIDECAR_POD_LABELS:       role=mongo,environment=test
      KUBERNETES_MONGO_SERVICE_NAME:  mongo
    Mounts:                           <none>
  Volumes:                            <none>
Volume Claims:
  Name:          mongo-persistent-storage
  StorageClass:  
  Labels:        <none>
  Annotations:   volume.alpha.kubernetes.io/storage-class=default
  Capacity:      5Gi
  Access Modes:  [ReadWriteOnce]
Events:
  Type    Reason            Age                From         Message
  ----    ------            ----               ----         -------
  Normal  SuccessfulDelete  23m (x46 over 1h)  statefulset  delete Pod mongo-0 in StatefulSet mongo successful
  Normal  SuccessfulCreate  3m (x62 over 1h)   statefulset  create Pod mongo-0 in StatefulSet mongo successful
-- Jordan
kubernetes
minikube

1 Answer

1/16/2018

After some more digging there seems to have been a bug which can affect statefulsets that creates multiple controllers for the same statefulset:

https://github.com/kubernetes/kubernetes/issues/56355

This issue seems to have been fixed and the fix seems to have been backported to version 1.8 of kubernetes and included in version 1.9, but minikube doesn't yet have the fixed version. A workaround if your system enters this state is to list the controller revisions like so:

$ kubectl get controllerrevisions --namespace=storage
NAME                  CONTROLLER              REVISION   AGE
mongo-68bd5cbcc6      StatefulSet/mongo       1          19h
mongo-68bd5cbcc7      StatefulSet/mongo       1          7d

and delete the duplicate controllers for each statefulset.

$ kubectl delete controllerrevisions mongo-68bd5cbcc6  --namespace=storage

or to simply use version 1.9 of kubernetes or above that includes this bug fix.

-- Jordan
Source: StackOverflow