I have a StatefulSet in Openshift that keeps restarting, but only on a single node. I don't see anything in PODs logs. In /var/log/message
s I see only messages that container is restarting, volume is unmounted etc and some more cryptic 'error: Container is already stopped'
and 'cleanup: failed to unmount secrets: invalid argument'
.
However, when I look and Yaml for StatefulSet I see the following:
status:
collisionCount: 1
currentReplicas: 1
I suppose this is what is the real cause.
But how can I find out what has generated that collision?
Did you try kubectl describe pod
to look up the events?
StatefulSets
internally perform an snapshot of the data via ControllerRevisions
and generate a hash for each version.
What the collisionCount
indicates is that the ControllerRevision
hash collided, likely due to an implementation issue.
You can try to rule this out by getting the controller revisions:
$ kubectl get controllerrevisions
Since this is an internal mechanism in the object, there is little to do other than recreate the object to generate new hashes that don't collide. There is a merged PR that suggests that newer versions shouldn't face this issue. However, it might be the case that you're running a version without this patch.