I have an 80 core 600 GB node. As pod count increases to 80+ pod startup time grows to an excess of 10 minutes. I'd like to understand how to reduce pod startup time in the presence of other pods or at least how to continue to diagnose the issue.
During pod startup there are many: "Unable to attach or mount volumes" for all pvcs which eventually self resolve. Looking at the kubelet log there appears to be a 6 minute gap between when the pod is created and the first mount is actually attempted. If no one magically knows the reason why the startup time is increasing I'd love to know what are the next steps I can take to help move towards a solution.
I've verified the CPU\Memory are not an issue.
I have a sneaking suspicion it's related to the number of mounts: mount | wc -l is over 2700 when my issue is at it's worst. That's said I'm a kubernetes expert beginner so I could be way off base.
Environment: AKS 1.21.2
Kubelet Logs below
Jan 20 18:35:51 aks-e80ids-62052009-vmss00000F kubelet[18395]: I0120 18:35:51.322846 18395 kubelet.go:1932] "SyncLoop ADD" source="api" pods=[default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s]
Jan 20 18:37:54 aks-e80ids-62052009-vmss00000F kubelet[18395]: E0120 18:37:54.379648 18395 kubelet.go:1701] "Unable to attach or mount volumes for pod; skipping pod" err="unmounted volumes=[host-root kube-api-access-pnkpf], unattached volumes=[host-root kube-api-access-pnkpf]: timed out waiting for the condition" pod="default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s"
Jan 20 18:37:54 aks-e80ids-62052009-vmss00000F kubelet[18395]: E0120 18:37:54.379699 18395 pod_workers.go:190] "Error syncing pod, skipping" err="unmounted volumes=[host-root kube-api-access-pnkpf], unattached volumes=[host-root kube-api-access-pnkpf]: timed out waiting for the condition" pod="default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s" podUID=557358f5-19f4-4f7d-94fe-eb7bc6155040
Jan 20 18:40:08 aks-e80ids-62052009-vmss00000F kubelet[18395]: E0120 18:40:08.313649 18395 kubelet.go:1701] "Unable to attach or mount volumes for pod; skipping pod" err="unmounted volumes=[host-root kube-api-access-pnkpf], unattached volumes=[host-root kube-api-access-pnkpf]: timed out waiting for the condition" pod="default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s"
Jan 20 18:40:08 aks-e80ids-62052009-vmss00000F kubelet[18395]: E0120 18:40:08.313694 18395 pod_workers.go:190] "Error syncing pod, skipping" err="unmounted volumes=[host-root kube-api-access-pnkpf], unattached volumes=[host-root kube-api-access-pnkpf]: timed out waiting for the condition" pod="default/node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s" podUID=557358f5-19f4-4f7d-94fe-eb7bc6155040
Jan 20 18:42:10 aks-e80ids-62052009-vmss00000F kubelet[18395]: I0120 18:42:10.237083 18395 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-pnkpf\" (UniqueName: \"kubernetes.io/projected/557358f5-19f4-4f7d-94fe-eb7bc6155040-kube-api-access-pnkpf\") pod \"node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s\" (UID: \"557358f5-19f4-4f7d-94fe-eb7bc6155040\") "
Jan 20 18:42:10 aks-e80ids-62052009-vmss00000F kubelet[18395]: I0120 18:42:10.237204 18395 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"host-root\" (UniqueName: \"kubernetes.io/host-path/557358f5-19f4-4f7d-94fe-eb7bc6155040-host-root\") pod \"node-debugger-aks-e80ids-62052009-vmss00000f-cbd5s\" (UID: \"557358f5-19f4-4f7d-94fe-eb7bc6155040\") "