Debugging colissionCount on Kubernetes

9/4/2019

I have a StatefulSet in Openshift that keeps restarting, but only on a single node. I don't see anything in PODs logs. In /var/log/messages I see only messages that container is restarting, volume is unmounted etc and some more cryptic 'error: Container is already stopped' and 'cleanup: failed to unmount secrets: invalid argument'.

However, when I look and Yaml for StatefulSet I see the following:

status:
  collisionCount: 1
  currentReplicas: 1

I suppose this is what is the real cause.
But how can I find out what has generated that collision?

-- 9ilsdx 9rvj 0lo
kubernetes
kubernetes-statefulset
openshift

2 Answers

9/4/2019

Did you try kubectl describe pod to look up the events?

-- Naresh Nagarajan
Source: StackOverflow

9/4/2019

StatefulSets internally perform an snapshot of the data via ControllerRevisions and generate a hash for each version.

What the collisionCount indicates is that the ControllerRevision hash collided, likely due to an implementation issue.

You can try to rule this out by getting the controller revisions:

$ kubectl get controllerrevisions

Since this is an internal mechanism in the object, there is little to do other than recreate the object to generate new hashes that don't collide. There is a merged PR that suggests that newer versions shouldn't face this issue. However, it might be the case that you're running a version without this patch.

-- yyyyahir
Source: StackOverflow