kubernetes pod stuck in waiting

3/21/2020

When a pod gets stuck in a Waiting state, what can I do to find out why it's Waiting?

For instance, I have a deployment to AKS which uses ACI.

When I deploy the yaml file, a number of the pods will be stuck in a Waiting state. Running kubectl describe pod selenium121157nodechrome-7bf598579f-kqfqs returns;

State:          Waiting
  Reason:       Waiting
Ready:          False
Restart Count:  0

kubectl logs selenium121157nodechrome-7bf598579f-kqfqs returns nothing.

How can I find out what is the pod Waiting for?

Here's the yaml deployment;

apiVersion: apps/v1
kind: Deployment
metadata:
  name: aci-helloworld2
spec:
  replicas: 20
  selector:
    matchLabels:
      app: aci-helloworld2
  template:
    metadata:
      labels:
        app: aci-helloworld2
    spec:
      containers:
      - name: aci-helloworld
        image: microsoft/aci-helloworld
        ports:
        - containerPort: 80
      nodeSelector:
        kubernetes.io/role: agent
        beta.kubernetes.io/os: linux
        type: virtual-kubelet
      tolerations:
      - key: virtual-kubelet.io/provider
        operator: Exists
      - key: azure.com/aci
        effect: NoSchedule

Here's the output from a describe pod that's been Waiting for 5 minutes;

matt@Azure:~/2020$ kubectl describe pod aci-helloworld2-86b8d7866d-b9hgc
Name:           aci-helloworld2-86b8d7866d-b9hgc
Namespace:      default
Priority:       0
Node:           virtual-node-aci-linux/
Labels:         app=aci-helloworld2
                pod-template-hash=86b8d7866d
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/aci-helloworld2-86b8d7866d
Containers:
  aci-helloworld:
    Container ID:   aci://95919def19c28c2a51a806928030d84df4bc6b60656d026d19d0fd5e26e3cd86
    Image:          microsoft/aci-helloworld
    Image ID:
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       Waiting
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-hqrj8 (ro)
Volumes:
  default-token-hqrj8:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-hqrj8
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  beta.kubernetes.io/os=linux
                 kubernetes.io/role=agent
                 type=virtual-kubelet
Tolerations:     azure.com/aci:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
                 virtual-kubelet.io/provider
Events:
  Type    Reason     Age        From               Message
  ----    ------     ----       ----               -------
  Normal  Scheduled  <unknown>  default-scheduler  Successfully assigned default/aci-helloworld2-86b8d7866d-b9hgc to virtual-node-aci-linux
-- Matt
azure-aks
azure-container-instances
kubernetes

1 Answer

3/23/2020

Based on the official documentation if your pod is in waiting state it means that it was scheduled on the node but it can't run on that machine with the image pointed out as the most common issue. You can try to run your image manually with docker pull and docker run and rule out the issues with image.

The information from kubectl describe <pod-name> should give you some information, especially the events section down to the bottom. Here`s an example how they can look like:

Events:
 Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  <unknown>            default-scheduler  Successfully assigned default/testpod to cafe
  Normal   BackOff    50s (x6 over 2m16s)  kubelet, cafe      Back-off pulling image "busybox"
  Normal   Pulling    37s (x4 over 2m17s)  kubelet, cafe      Pulling image "busybox"

It could be also issue with your NodeSelector and Tolerations but again that would be shown in your events once you describe your pod.

Let me know if it helps and what are your outputs from describe pod.

-- acid_fuji
Source: StackOverflow