Kubernetes jobs getting failed with ephemeral-storage issue

4/6/2020

So far our on-prem kubernetes cluster is working fine. Lately we are seeing jobs are failing with the below error. I checked there are no space issues on Kube master as well as the worker nodes. There is plenty of space available under "/var/lib" as well as under persistent volume claims.

Version:

Client Version: v1.17.2
Server Version: v1.17.2

Host OS:
Centos 7.7

CNI:
Weave

Error:

The node was low on resource: ephemeral-storage.Container main was using 5056Ki, which exceeds its request of 0. Container wait was using 12Ki, which exceeds its request of 0.

any pointers will be helpful.

Thanks, CS

-- user1739504
kubectl
kubernetes

1 Answer

4/6/2020

The main reason why this could be happening is that pod logs, or emptyDir usage are filling up your ephemeral storage.

Docker takes a conservative approach to cleaning up unused objects (often referred to as “garbage collection”), such as images, containers, volumes, and networks: these objects are generally not removed unless you explicitly ask Docker to do so. This can cause Docker to use extra disk space.

You can use docker function called prune. This will clean up the system from unused objects. If you wish to cleanup multiple objects you can use docker system prune. Check here more about prunning.

There is also another tool called Garbage collector. It`s docker tool that removes. unused/abandoned/orphaned blobs. Check here more about it.

In the context of the Docker registry, garbage collection is the process of removing blobs from the filesystem when they are no longer referenced by a manifest. Blobs can include both layers and manifests.

If this does`t help you can try to configure logging driver and set its limit:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3",
    "labels": "production_status",
    "env": "os,customer"
  }
}

There is also another option if emptyDir has been used. Using emptyDir you allow container to write any amount of storage to it's node fs. You can request or limit settings for local ephemeral storage by setting up:

spec:
  containers:
  - name: test
    image: test-image
    resources:
      requests:
        ephemeral-storage: "1Gi"
      limits:
        ephemeral-storage: "1Gi"
  - name: test
    image: test-image2
    resources:
      requests:
        ephemeral-storage: "2Gi"
      limits:
        ephemeral-storage: "2Gi"

You can also check the containers running using docker ps and then inspect the container by yourself and locate the fs.

It should be found at this location:

/var/lib/docker/containers/<container-id>/<container-id>-json.log

Let me know if that helps.

-- acid_fuji
Source: StackOverflow