My kubernetes K3s cluster gives this error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 17m default-scheduler 0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate.
Warning FailedScheduling 17m default-scheduler 0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate.
In order to list the taints in the cluster I executed:
kubectl get nodes -o json | jq '.items[].spec'
which outputs:
{
"podCIDR": "10.42.0.0/24",
"podCIDRs": [
"10.42.0.0/24"
],
"providerID": "k3s://antonis-dell",
"taints": [
{
"effect": "NoSchedule",
"key": "node.kubernetes.io/disk-pressure",
"timeAdded": "2021-12-17T10:54:31Z"
}
]
}
{
"podCIDR": "10.42.1.0/24",
"podCIDRs": [
"10.42.1.0/24"
],
"providerID": "k3s://knodea"
}
When I use kubectl describe node antonis-dell
I get:
Name: antonis-dell
Roles: control-plane,master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=k3s
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=antonis-dell
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=true
node-role.kubernetes.io/master=true
node.kubernetes.io/instance-type=k3s
Annotations: csi.volume.kubernetes.io/nodeid: {"ch.ctrox.csi.s3-driver":"antonis-dell"}
flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"f2:d5:6c:6a:85:0a"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.1.XX
k3s.io/hostname: antonis-dell
k3s.io/internal-ip: 192.168.1.XX
k3s.io/node-args: ["server"]
k3s.io/node-config-hash: YANNMDBIL7QEFSZANHGVW3PXY743NWWRVFKBKZ4FXLV5DM4C74WQ====
k3s.io/node-env:
{"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/e61cd97f31a54dbcd9893f8325b7133cfdfd0229ff3bfae5a4f845780a93e84c","K3S_KUBECONFIG_MODE":"644"}
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 17 Dec 2021 12:11:39 +0200
Taints: node.kubernetes.io/disk-pressure:NoSchedule
where it seems that node has a disk-pressure taint.
This command doesn't work: kubectl taint node antonis-dell node.kubernetes.io/disk-pressure:NoSchedule-
and it seems to me that even if it worked, this is not a good solution because the taint assigned by the control plane.
Furthermore in the end of command kubectl describe node antonis-dell
I observed this:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FreeDiskSpaceFailed 57m kubelet failed to garbage collect required amount of images. Wanted to free 32967806976 bytes, but freed 0 bytes
Warning FreeDiskSpaceFailed 52m kubelet failed to garbage collect required amount of images. Wanted to free 32500092928 bytes, but freed 0 bytes
Warning FreeDiskSpaceFailed 47m kubelet failed to garbage collect required amount of images. Wanted to free 32190205952 bytes, but freed 0 bytes
Warning FreeDiskSpaceFailed 42m kubelet failed to garbage collect required amount of images. Wanted to free 32196628480 bytes, but freed 0 bytes
Warning FreeDiskSpaceFailed 37m kubelet failed to garbage collect required amount of images. Wanted to free 32190926848 bytes, but freed 0 bytes
Warning FreeDiskSpaceFailed 2m21s (x7 over 32m) kubelet (combined from similar events): failed to garbage collect required amount of images. Wanted to free 30909374464 bytes, but freed 0 bytes
Maybe the disk-pressure is related to this? How can I delete the unwanted images?
Posting the answer as a community wiki, feel free to edit and expand.
node.kubernetes.io/disk-pressure:NoSchedule
taint indicates that some disk pressure happens (as it's called).
The
kubelet
detects disk pressure based onimagefs.available
,imagefs.inodesFree
,nodefs.available
andnodefs.inodesFree
(Linux only) observed on a Node. The observed values are then compared to the corresponding thresholds that can be set on thekubelet
to determine if the Node condition and taint should be added/removed.
More details on disk-pressure
are available in Efficient Node Out-of-Resource Management in Kubernetes under How Does Kubelet Decide that Resources Are Low?
section:
memory.available
— A signal that describes the state of cluster memory. The default eviction threshold for the memory is 100 Mi. In other words, the kubelet starts evicting Pods when the memory goes down to 100 Mi.
nodefs.available
— The nodefs is a filesystem used by the kubelet for volumes, daemon logs, etc. By default, the kubelet starts reclaiming node resources if the nodefs.available < 10%.
nodefs.inodesFree
— A signal that describes the state of the nodefs inode memory. By default, the kubelet starts evicting workloads if the nodefs.inodesFree < 5%.
imagefs.available
— The imagefs filesystem is an optional filesystem used by a container runtime to store container images and container-writable layers. By default, the kubelet starts evicting workloads if the imagefs.available < 15 %.
imagefs.inodesFree
— The state of the imagefs inode memory. It has no default eviction threshold.
What to check
There are different things that can help, such as:
prune unused objects like images (with Docker CRI) - prune images.
The docker image prune command allows you to clean up unused images. By default, docker image prune only cleans up dangling images. A dangling image is one that is not tagged and is not referenced by any container.
check files/logs on the node if they take a lot of space.