When a pod gets stuck in a Waiting state, what can I do to find out why it's Waiting?
For instance, I have a deployment to AKS which uses ACI.
When I deploy the yaml
file, a number of the pods will be stuck in a Waiting state. Running kubectl describe pod selenium121157nodechrome-7bf598579f-kqfqs
returns;
State: Waiting
Reason: Waiting
Ready: False
Restart Count: 0
kubectl logs selenium121157nodechrome-7bf598579f-kqfqs
returns nothing.
How can I find out what is the pod Waiting for?
Here's the yaml deployment;
apiVersion: apps/v1
kind: Deployment
metadata:
name: aci-helloworld2
spec:
replicas: 20
selector:
matchLabels:
app: aci-helloworld2
template:
metadata:
labels:
app: aci-helloworld2
spec:
containers:
- name: aci-helloworld
image: microsoft/aci-helloworld
ports:
- containerPort: 80
nodeSelector:
kubernetes.io/role: agent
beta.kubernetes.io/os: linux
type: virtual-kubelet
tolerations:
- key: virtual-kubelet.io/provider
operator: Exists
- key: azure.com/aci
effect: NoSchedule
Here's the output from a describe pod
that's been Waiting for 5 minutes;
matt@Azure:~/2020$ kubectl describe pod aci-helloworld2-86b8d7866d-b9hgc
Name: aci-helloworld2-86b8d7866d-b9hgc
Namespace: default
Priority: 0
Node: virtual-node-aci-linux/
Labels: app=aci-helloworld2
pod-template-hash=86b8d7866d
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/aci-helloworld2-86b8d7866d
Containers:
aci-helloworld:
Container ID: aci://95919def19c28c2a51a806928030d84df4bc6b60656d026d19d0fd5e26e3cd86
Image: microsoft/aci-helloworld
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: Waiting
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hqrj8 (ro)
Volumes:
default-token-hqrj8:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-hqrj8
Optional: false
QoS Class: BestEffort
Node-Selectors: beta.kubernetes.io/os=linux
kubernetes.io/role=agent
type=virtual-kubelet
Tolerations: azure.com/aci:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
virtual-kubelet.io/provider
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/aci-helloworld2-86b8d7866d-b9hgc to virtual-node-aci-linux
Based on the official documentation if your pod
is in waiting
state it means that it was scheduled on the node but it can't run on that machine with the image pointed out as the most common issue. You can try to run your image manually with docker pull
and docker run
and rule out the issues with image.
The information from kubectl describe <pod-name>
should give you some information, especially the events section down to the bottom. Here`s an example how they can look like:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/testpod to cafe
Normal BackOff 50s (x6 over 2m16s) kubelet, cafe Back-off pulling image "busybox"
Normal Pulling 37s (x4 over 2m17s) kubelet, cafe Pulling image "busybox"
It could be also issue with your NodeSelector
and Tolerations
but again that would be shown in your events once you describe your pod.
Let me know if it helps and what are your outputs from describe pod
.