I have followed this tutorial microsoft_website to pull images from an azure container. My yaml
successfully creates a pod job, which can pull the image, BUT only when it runs on the agentpool
node in my cluster.
For example, adding nodeName: aks-agentpool-33515997-vmss000000
to the yaml
works fine, but specifying a different node name, e.g. nodeName: aks-cpu1-33515997-vmss000000
, the pod fails. The error message I get with describe pods is Failed to pull image
and then kubelet Error: ErrImagePull
.
What I'm missing?
Create secret:
kubectl create secret docker-registry <secret-name> \
--docker-server=<container-registry-name>.azurecr.io \
--docker-username=<service-principal-ID> \
--docker-password=<service-principal-password>
Four things to check:
Edit
New-AzAksNodePool has a parameter -DefaultProfile
It can be AzContext, AzureRmContext, AzureCredential
If this is different between your nodes it would explain the error
As @user1571823 told solution to the problem is deleting the old image from the acr and creating/pushing a new one.
The problem was related to some sort of corruption in the image saved in the azure container registry (acr). The reason why one agent pool could pulled the image was actually because the image already existed in the VM.
Henceforth as @andov said it is good option to open an incident case to Azure support for AKS from your subscription, where AKS is deployed. The support team has full access to the AKS service backend and they can tell exactly what was causing your problem.