Debugging a NoDiskConflict in kubernetes

6/3/2016

We had a pod working great for about a month, and suddenly it cannot be scheduled anymore. Describing the pod seems to indicate that a disk is full or otherwise unavailable, but it's not very specific (see full output of describing the pod below).

I have confirmed that the disk on this node has plenty of space (95G) and the GCEPersistentDisk it references also has plenty of space (450G). What else can I look for to get this working again?

So far I have tried restarting the node and even deleting the node to start from scratch. This is a one-node cluster on GKE.

Thanks for any tips!

> kubectl --namespace=bakery-production describe pods bakery-deployment-3841321805-l84nc
Name:       bakery-deployment-3841321805-l84nc
Namespace:  bakery-production
Node:       /
Labels:     pod-template-hash=3841321805,service=bakery
Status:     Pending
IP:     
Controllers:    ReplicaSet/bakery-deployment-3841321805
Containers:
  bakery:
    Image:  gcr.io/pear-deck-production/bakery:38fda09f727493e4e88def14d49fe36883414e08
    Port:   80/TCP
    QoS Tier:
      cpu:  BestEffort
      memory:   BestEffort
    Environment Variables:
      PEARDECK_CONTAINER_REGISTRY:  gcr.io/pear-deck-production
Volumes:
  docker-images:
    Type:   GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
    PDName: bakery-docker-images
    FSType: ext4
    Partition:  0
    ReadOnly:   false
  bakery-secret-volume:
    Type:   Secret (a volume populated by a Secret)
    SecretName: bakery-secret
  default-token-z3ew1:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-z3ew1
Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason          Message
  --------- --------    -----   ----            -------------   --------    ------          -------
  20s       13s     4   {default-scheduler }            Warning     FailedScheduling    pod (bakery-deployment-3841321805-l84nc) failed to fit in any node
fit failure on node (gke-peardeck-infrastructure-0f42f748-node-qa5a): NoDiskConflict
-- Riley Lark
kubernetes

1 Answer

6/3/2016

NoDiskConflict is returned by the scheduler if you try to schedule a pod that is referencing a volume that is already referenced by another (already scheduled) pod and the volume does not support multiple mounts. GCE PD allow multiple mounts only if they're all read-only.

So, make sure you have no more than one pod referencing a GCE PD in read-write mode.

See https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/scheduler/algorithm/predicates/predicates.go#L105

-- Saad Ali
Source: StackOverflow