Pod creation time increasing as pods on node increases

1/21/2022

I have an 80 core 600 GB node. As pod count increases to 80+ pod startup time grows to an excess of 10 minutes. I'd like to understand how to reduce pod startup time in the presence of other pods or at least how to continue to diagnose the issue.

During pod startup there are many: "Unable to attach or mount volumes" for all pvcs which eventually self resolve. Looking at the kubelet log there appears to be a 6 minute gap between when the pod is created and the first mount is actually attempted. If no one magically knows the reason why the startup time is increasing I'd love to know what are the next steps I can take to help move towards a solution.

I've verified the CPU\Memory are not an issue.

I have a sneaking suspicion it's related to the number of mounts: mount | wc -l is over 2700 when my issue is at it's worst. That's said I'm a kubernetes expert beginner so I could be way off base.

Environment: AKS 1.21.2

Kubelet Logs below

Jan 20 18:35:51 aks-e80ids-62052009-vmss00000F kubelet[18395]: I0120 18:35:51.322846   18395 kubelet.go:1932] "SyncLoop ADD" source="api" pods=[default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s]
Jan 20 18:37:54 aks-e80ids-62052009-vmss00000F kubelet[18395]: E0120 18:37:54.379648   18395 kubelet.go:1701] "Unable to attach or mount volumes for pod; skipping pod" err="unmounted volumes=[host-root kube-api-access-pnkpf], unattached volumes=[host-root kube-api-access-pnkpf]: timed out waiting for the condition" pod="default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s"
Jan 20 18:37:54 aks-e80ids-62052009-vmss00000F kubelet[18395]: E0120 18:37:54.379699   18395 pod_workers.go:190] "Error syncing pod, skipping" err="unmounted volumes=[host-root kube-api-access-pnkpf], unattached volumes=[host-root kube-api-access-pnkpf]: timed out waiting for the condition" pod="default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s" podUID=557358f5-19f4-4f7d-94fe-eb7bc6155040
Jan 20 18:40:08 aks-e80ids-62052009-vmss00000F kubelet[18395]: E0120 18:40:08.313649   18395 kubelet.go:1701] "Unable to attach or mount volumes for pod; skipping pod" err="unmounted volumes=[host-root kube-api-access-pnkpf], unattached volumes=[host-root kube-api-access-pnkpf]: timed out waiting for the condition" pod="default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s"
Jan 20 18:40:08 aks-e80ids-62052009-vmss00000F kubelet[18395]: E0120 18:40:08.313694   18395 pod_workers.go:190] "Error syncing pod, skipping" err="unmounted volumes=[host-root kube-api-access-pnkpf], unattached volumes=[host-root kube-api-access-pnkpf]: timed out waiting for the condition" pod="default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s" podUID=557358f5-19f4-4f7d-94fe-eb7bc6155040
Jan 20 18:42:10 aks-e80ids-62052009-vmss00000F kubelet[18395]: I0120 18:42:10.237083   18395 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-pnkpf\" (UniqueName: \"kubernetes.io/projected/557358f5-19f4-4f7d-94fe-eb7bc6155040-kube-api-access-pnkpf\") pod \"node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s\" (UID: \"557358f5-19f4-4f7d-94fe-eb7bc6155040\") "
Jan 20 18:42:10 aks-e80ids-62052009-vmss00000F kubelet[18395]: I0120 18:42:10.237204   18395 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"host-root\" (UniqueName: \"kubernetes.io/host-path/557358f5-19f4-4f7d-94fe-eb7bc6155040-host-root\") pod \"node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s\" (UID: \"557358f5-19f4-4f7d-94fe-eb7bc6155040\") "
-- Mike Barry
azure-aks
kubernetes

0 Answers