How respawn a pod with persistent volume which stuck on failed node in Kubernetes

4/20/2018

I have a simple k8s installation with few nodes and ceph (kubernetes.io/rbd) as storageclass. I have a deployment with a single pod which uses a persistent volume from the persistent volume claim (ReadWriteOnce) from this storage class.

A node with this pod have failed (NotReady in get nodes output for a long time and it's physically dead).

K8s could not create a new pod for my deploy because of 'Multi-Attach error for volume "pvc-..." Volume is already exclusively attached to one node and can't be attached to another'.

I see that pv is bounded to the failed node: "Status: Bound".

How can I force kubernetes to forget about old pod to allow a new pod to bound to the persistent volume?

-- George Shuklin
ceph
kubernetes

1 Answer

4/20/2018

It is a complex problem.

Kubelet daemon, which manages mounts of Volumes, should set the information about a new status of volume to enable the Scheduler to spawn a Pod on the other node.

But, you have the 'NotReady' status, which means Kubernetes cannot communicate with the Kubelet to check the current status of Volumes. In Kubernetes, the status of the Volume is the last one which has been received - "Bound." It is not possible to reset that status somehow without changing the state of the node.

I see only 2 workarounds here:

  1. Use PVC in ReadWriteManymode instead of ReadWriteOnce. CephFS can work in that mode, but RBD can't. That mode allows Kubernetes to claim the same volume on several nodes at the same time.
  2. Delete failed node from the cluster. It will remove all objects linked to the node and Scheduler will be able to claim your Volume again.
-- Anton Kostenko
Source: StackOverflow