I am currently working on a Kubernetes
cluster, which uses docker.
This cluster allows me to launch jobs. For each job, I specify a memory request and a memory limit.
The memory limit will be used by Kubernetes
to fill the --memory
option of the docker run
command when creating the container. If this container exceeds this limit it will be killed for OOM reason.
Now, If I go inside a container, I am a little bit surprised to see that the available system memory is not the one from the --memory
option, but the one from the docker machine. (The Kubernetes Node)
I am surprised because a system with wrong information about available resources will not behave correctly.
Take for example the memory cache used by IO operations. If you write on disk, pages will be cached on the RAM before being written. To do this the system will evaluate how many pages could be cached using the sysctl vm.dirty_ratio
(20 % by default) and the memory size of the system. But how this could work if the container system memory size is wrong.
I verified it:
I ran a program with a lot of IO operations (os.write, decompression, ...) on a container limited at 10Gi of RAM, on a 180Gi Node. The container will be killed because it will reach the 10Gi memory limit. This OOM is caused by the wrong evaluation of dirty_ratio
* the system memory.
This is terrible.
So, my question is the following:
Is there a way to set the available resources of a docker container system using the docker container limit?