We're running kubelet 1.18.3 with Docker 19.03.12 on a baremetal Linux cluster. We noticed today that we had a lot of pod evictions that we traced to disk space pressure on a node. For that particular node, we had passed the 80% threshold on the file system holding the Docker nodefs.
But the culprit was in /var/lib/kubelet/pods/{{pod-uid}}/volumes/kubernetes.io~nfs/{{pv}}. On inspection, that directory held a copy. (a cache???) of an NFS persistent volume mounted ReadOnlyMany by the pod through a PersistentVolumeClaim. The pod is ultimately based on Debian Stretch and OpenJDK. If we exec into the pod, we see the nfs mount point for the PV as we would expect:
server:/export-path on /local-volume type nfs4 (ro,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=nn.nn.nn.nn,local_lock=none,addr=nn.nn.nn.nn)
However, docker inspect -f '{{.Mounts}}' container-id
also shows the mount point /local-volume
as being bound to /var/lib/kubelet/pods/{{pod-uid}}/volumes/kubernetes.io~nfs/{{pv}}
So two questions: 1. AFAIK, nfsv4 client doesn't cache files in the mounted file system, so where are these copies coming from? 2. Is there a way to manage this local cache? The external PV is pretty large, and we don't need it to be cached as a local volume in the pod.
I should have looked deeper into the host's mtab. Kubelet mounts the pod's persistent volume claim binding into /var/lib/kubelet/pods/{{pod-uid}}/volumes/kubernetes.io~nfs/{{pv}}
, so the container's persistent volume appears in the host's file system.
Lesson learned. Use du -x -h -s * | sort -h
when hunting down big disk space offenders to not get distracted by (i.e., nfs) mount points.