I'm wondering whether using imagePullPolicy: ifNotPresent would provide any resiliency to the temporary loss of our private registry..
I have a multi-master, multi-worker bare-metal cluster, and all my pods are using images pulled from a local private registry, running outside the cluster on a single node. The cluster is therefore a lot more fault-tolerant than the registry itself.
What would happen if I set my workloads to imagePullPolicy: Always, and the registry failed? I wouldn't be able to (for example) scale up/down my pods, since I wouldn't be able to pull the image from the registry.
If I used imagePullPolicy: IfNotPresent, then provided the image existed on the nodes already, I could happily scale up/down, even in the absence of the registry.
The question is, if a pod couldn't start because the image couldn't be pulled, would Kubernetes ever try to reschedule that pod on a different node (which may have the image cached), or once it's scheduled (and failing) to one node, will it remain there until deleted/drained?
Cheers! D
AFAIK Kube scheduler doesn't handle the scenario where the image is present on one node and is not present on the other node.
You can write your own scheduler to handle this scenario Refer: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
You can also use tool like https://github.com/uber/kraken which is a P2P docker registry ie if image is present on one of the node it can be pulled by other nodes too. This would also make your registry fault-tolerant.
That is correct! If you have the registry down, and imagePullPolicy: Always, any new pod would fail to create timing out.
And no, it wouldn't reschedule the pod. It would stay there in state ImagePullBackOff until the registry comes up.
You could trick it to re-schedule the pod (not based on the registry availability though), but most probably the scheduler will decide to schedule it on the same node, unless resource wise something has changed.