Azure Container Registry image pulls are very slow with image size ~150 MBs

2/12/2020

When deploying an image to an AKS instance, the image pull from the ACR (Premium SKU) is very slow, even for "small" images around ~150 MBs in size.

Both the AKS resource and the ACR resource are in the Canada East region.

Here is an example:

root@076fff2831b2:/tmp# kubectl describe pod application-service-59bcf96874-pvrmb
Name:           application-service-59bcf96874-pvrmb
Namespace:      default
Priority:       0
Node:           aks-41067869-1/10.255.13.163
Start Time:     Tue, 11 Feb 2020 18:15:53 -0500
Labels:         app.kubernetes.io/instance=application-service
                app.kubernetes.io/name=application-service
                pod-template-hash=59bcf96874
Annotations:    <none>
Status:         Running
IP:             10.255.13.175
IPs:            <none>
Controlled By:  ReplicaSet/application-service-59bcf96874
Containers:
  application-service:
    Container ID:   docker://0e86526a293d9055d482a09f043f0be68c594244fe4216f8fb190bc2caf6b65b
    Image:          myacr01.azurecr.io/microservices/application-service:0.0.6
    Image ID:       docker-pullable://myacr01.azurecr.io/microservices/application-service@sha256:cfbb3ffa7adc52da9cc0b8d7f78376076ea712025b59df8e406c559d369f4085
    Port:           3000/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 11 Feb 2020 18:35:00 -0500
      Finished:     Tue, 11 Feb 2020 18:35:00 -0500
    Ready:          False
    Restart Count:  5
    Liveness:       http-get https://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get https://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      PORT:                        3000
      undefined:                   undefined
    Mounts:
      /kvmnt from application-service-kv-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from application-service-token-9jk8j (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  application-service-kv-volume:
    Type:       FlexVolume (a generic volume resource that is provisioned/attached using an exec based plugin)
    Driver:     azure/kv
    FSType:
    SecretRef:  &LocalObjectReference{Name:kvcreds,}
    ReadOnly:   false
    Options:    map[keyvaultname:testIt2 keyvaultobjectnames:APPLICATION-SVC-SQLDB-CS;INGESTION-CONSUMER-EHB-CS;INGESTION-PRODUCER-EHB-CS keyvaultobjecttypes:secret;secret;secret tenantid:REMOVED usepodidentity:false usevmmanagedidentity:false]
  application-service-token-9jk8j:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  application-service-token-9jk8j
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                    From                     Message
  ----     ------     ----                   ----                     -------
  Normal   Scheduled  20m                    default-scheduler        Successfully assigned default/application-service-59bcf96874-pvrmb to aks-41067869-1
  Normal   Pulling    20m                    kubelet, aks-41067869-1  Pulling image "myacr01.azurecr.io/microservices/application-service:0.0.6"
  Normal   Pulled     4m39s                  kubelet, aks-41067869-1  Successfully pulled image "myacr01.azurecr.io/microservices/application-service:0.0.6"
  Normal   Started    3m36s (x4 over 4m33s)  kubelet, aks-41067869-1  Started container application-service
  Warning  BackOff    3m4s (x11 over 4m30s)  kubelet, aks-41067869-1  Back-off restarting failed container
  Normal   Pulled     2m52s (x4 over 4m32s)  kubelet, aks-41067869-1  Container image "myacr01.azurecr.io/microservices/application-service:0.0.6" already present on machine
  Normal   Created    2m51s (x5 over 4m33s)  kubelet, aks-41067869-1  Created container application-service

Some details were modified/removed for privacy reasons.

However, the thing to note is the ~15m needed to go from a state of "Pulling" to "Pulled" for an image from an ACR.

This issue is occurring daily. The Azure Insights blade of the AKS instance shows a maximum of 26% node CPU and 14.32% node memory utilization over the last 7 days.

How we can go about troubleshooting this further to determine the possible causes of delays?

Any help is greatly appreciated.

Thanks!

-- HXK
azure
azure-container-registry
azure-kubernetes

0 Answers