We had a major outage when both our container registry and the entire K8S cluster lost power. When the cluster recovered faster than the container registry, my pod (part of a statefulset) is stuck in Error: ImagePullBackOff
.
Is there a config setting to retry downloading the image from the CR periodically or recover without manual intervention?
I looked at imagePullPolicy
but that does not apply for a situation when the CR is unavailable.
The BackOff
part in ImagePullBackOff
status means that Kubernetes is keep trying to pull the image from the registry, with an exponential back-off delay (10s, 20s, 40s, …). The delay between each attempt is increased until it reaches a compiled-in limit of 300 seconds (5 minutes) - more on it in Kubernetes docs.
backOffPeriod
parameter for the image pulls is a hard-coded constant in Kuberenets and unfortunately is not tunable now, as it can affect the node performance - otherwise, it can be adjusted in the very code for your custom kubelet binary.
There is still ongoing issue on making it adjustable.