Pods are getting killed and recreated stating - OutOfephemeral-storage?

4/12/2021

My Pods are getting killed and recreated stating that OutOfephemeral-storage

Pod describe showing below message

Message: Pod Node didn't have enough resource: ephemeral-storage, requested: 53687091200, used: 0, capacity: 0

Node Capacity

Capacity:
 cpu:                80
 ephemeral-storage:  1845262880Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             790964944Ki
 nvidia.com/gpu:     8
 pods:               110
Allocatable:
 cpu:                79900m
 ephemeral-storage:  1700594267393
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             790612544Ki
 nvidia.com/gpu:     8
 pods:               110

node disk usage

]$ df -h 
Filesystem                                                      Size  Used Avail Use% Mounted on
/dev/sda1                                                       1.7T   25G  1.7T   2% /
devtmpfs                                                        378G     0  378G   0% /dev
tmpfs                                                           378G   16K  378G   1% /dev/shm
tmpfs                                                           378G  3.8M  378G   1% /run
tmpfs                                                           378G     0  378G   0% /sys/fs/cgroup

Still, the pod is getting rescheduled after some time? any thought why?

-- Savio Mathew
kubernetes

2 Answers

4/21/2021

In most cases, this is happening due to excess of log messages are consuming the storage. Solution for that would be to configure the Docker logging driver to limit the amount of saved logs:

{
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "10"
}
}

Also worth to mention Docker takes a conservative approach to cleaning up unused objects (often referred to as “garbage collection”), such as images, containers, volumes, and networks: these objects are generally not removed unless you explicitly ask Docker to do so. This can cause Docker to use extra disk space. It helped for me to use docker function called prune. This will clean up the system from unused objects. If you wish to cleanup multiple objects you can use docker system prune. Check here more about prunning.

Next possible scenario is that that there are pods that use emptyDir without storage quotas. This will fill up the storage. The solution for this would be to set quota to limit this:

    resources:
      requests:
        ephemeral-storage: "1Gi"
      limits:
        ephemeral-storage: "1Gi"

Without this being set any container can write any amount of storage to its node file system.

For more details how ephemeral storage works please see Ephemeral Storage Consumption.

-- acid_fuji
Source: StackOverflow

6/9/2021

The issue was with the filesystem, solved with help of the following steps

]# systemctl stop kubelet
]# systemctl stop docker
]# umount -l /<MountFolder>
]# fsck -y /dev/sdb1
]# init 6
-- Savio Mathew
Source: StackOverflow