Containerized kubelet and local disk volume lifecycle

2/16/2020

Platform: OEL 7.7 + kube 1.15.5 + docker 19.03.1

We're building an erasure-coded object store on k8s using a containerized kubelet approach. We're having a tough time coming up with a viable disk life cycle approach. As it is now, we must provide an "extra_binds" argument to the kubelet which specifies the base mount point where our block devices are mounted. (80 SSDs per node, formatted as ext4)

That all works fine. Creating PV's and deploying apps works fine. Our problem comes when a PVC is deleted and we want to scrub the disk(s) that were used and make the disk(s) available again.

So far the only thing that works is to cordon that node, remove the extra binds from kubelet, bounce the node, reconfigure the block device, re-add the kubelet binds. Obviously this is too clunky for production. For starters, bouncing kubelet is not an option.

Once a PV gets used, something is locking this block device, even though checking lsof on the bare metal system shows non open handles. I can't unmount or create a new filesystem on the device. Merely bouncing kubelet doesn't free up the "lock".

Anyone using a containerized kubernetes control plane with an app using local disks in a similar fashion? Anyone found a viable way to work around this issue?

Our long term plan is to write an operator that manages disks but even with an operator I don't see how it can mitigate this problem.

Thanks for any help,

-- Crashk1d
kubernetes
persistent-volumes

1 Answer

2/17/2020

First look at your Finalizers:

$ kubectl describe pvc <PVC_NAME> | grep Finalizers
$ kubectl describe pv <PV_NAME> | grep Finalizers

if they are set to Finalizers: [kubernetes.io/pvc-protection] (explained here) that mean they are protected and you need to edit that, for example using:

$ kubectl patch pvc <PVC_NAME> -p '{"metadata":{"finalizers":null}}'

As for forcefully removing PersistentVolumes you can try

$ kubectl delete pv <PV_NAME> --force --grace-period=0

Also please check VolumeAttachment do still exist $ kubectl get volumeattachment as they might be blocked.

I also remember there was as issue on stack Kubernetes PV refuses to bind after delete/re-create stating that pv holds uid of pvc that was claimed by. You can check that by displaying whole yaml of the pv:

$ kubectl get pv <PV_NAME> -o yaml and looking for:

  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: packages-pvc
    namespace: default
    resourceVersion: "10218121"
    uid: 1aede3e6-eaa1-11e9-a594-42010a9c0005

You would need to provide more information regarding your k8s cluster and pv, pvc configuration so I could go deeper into to or even test it.

-- Crou
Source: StackOverflow