My Pods are getting killed and recreated stating that OutOfephemeral-storage
Pod describe showing below message
Message: Pod Node didn't have enough resource: ephemeral-storage, requested: 53687091200, used: 0, capacity: 0
Node Capacity
Capacity:
cpu: 80
ephemeral-storage: 1845262880Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 790964944Ki
nvidia.com/gpu: 8
pods: 110
Allocatable:
cpu: 79900m
ephemeral-storage: 1700594267393
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 790612544Ki
nvidia.com/gpu: 8
pods: 110
node disk usage
]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 1.7T 25G 1.7T 2% /
devtmpfs 378G 0 378G 0% /dev
tmpfs 378G 16K 378G 1% /dev/shm
tmpfs 378G 3.8M 378G 1% /run
tmpfs 378G 0 378G 0% /sys/fs/cgroup
Still, the pod is getting rescheduled after some time? any thought why?
In most cases, this is happening due to excess of log messages are consuming the storage. Solution for that would be to configure the Docker logging driver to limit the amount of saved logs:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "10"
}
}
Also worth to mention Docker takes a conservative approach to cleaning up unused objects (often referred to as “garbage collection”), such as images, containers, volumes, and networks: these objects are generally not removed unless you explicitly ask Docker to do so. This can cause Docker to use extra disk space.
It helped for me to use docker function called prune
. This will clean up the system from unused objects. If you wish to cleanup multiple objects you can use docker system prune
. Check here more about prunning.
Next possible scenario is that that there are pods that use emptyDir without storage quotas. This will fill up the storage. The solution for this would be to set quota to limit this:
resources:
requests:
ephemeral-storage: "1Gi"
limits:
ephemeral-storage: "1Gi"
Without this being set any container can write any amount of storage to its node file system.
For more details how ephemeral storage works please see Ephemeral Storage Consumption.
The issue was with the filesystem, solved with help of the following steps
]# systemctl stop kubelet
]# systemctl stop docker
]# umount -l /<MountFolder>
]# fsck -y /dev/sdb1
]# init 6