Prevent OOM inside container

7/21/2021

I'm running conda install -c anaconda pycodestyle in a container with this spec:

apiVersion: v1
kind: Pod
metadata:
  name: conda-${PYTHON_VERSION}
spec:
  securityContext:
    runAsUser: 0
    runAsGroup: 0
  containers:
  - name: python
    image: continuumio/conda-ci-linux-64-python${PYTHON_VERSION}
    command:
    - /bin/bash
    args:
    - "-c"
    - "sleep 99d"
    workingDir: /home/jenkins/agent
    resources:
      requests:
        memory: "256Mi"
        cpu: "1"
      limits:
        memory: "256Mi"
        cpu: "1"

My understanding was that if I set limits and requests to the same value, OOM won't be called... seems like I was wrong.

To make myself clear: I don't want overprovisioning to happen at all, I don't want the kernel to panic, if it allocated memory it cannot back up with real memory.

Ideally, I want to understand how to prevent these errors from happening in Kubernetes at all, not specifically for conda, but if there's any way to limit conda to a particular amount of memory, it'll help too.

The machine running these containers has 16MB of memory, and it might, at most, try to run three of those.

The OMM message looks like this:

14:07:03 2021] Task in /kubepods/burstable/poda6df66b5-bfc5-43be-b02d-66f09e7ecf0f/2203670eb25d83d72428831a35773b90445f19ee37c117f196d6774442022db8 killed as a result of limit of /kubepods/burstable/poda6df66b5-bfc5-43be-b02d-66f09e7ecf0f/2203670eb25d83d72428831a35773b90445f19ee37c117f196d6774442022db8
[Wed Jul 21 14:07:03 2021] memory: usage 262144kB, limit 262144kB, failcnt 17168
[Wed Jul 21 14:07:03 2021] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[Wed Jul 21 14:07:03 2021] kmem: usage 11128kB, limit 9007199254740988kB, failcnt 0
[Wed Jul 21 14:07:03 2021] Memory cgroup stats for /kubepods/burstable/poda6df66b5-bfc5-43be-b02d-66f09e7ecf0f/2203670eb25d83d72428831a35773b90445f19ee37c117f196d6774442022db8: cache:104KB rss:250524KB rss_huge:0KB shmem:0KB mapped_file:660KB dirty:0KB writeback:0KB inactive_anon:125500KB active_anon:125496KB inactive_file:8KB active_file:12KB unevictable:0KB
[Wed Jul 21 14:07:03 2021] Tasks state (memory values in pages):
[Wed Jul 21 14:07:03 2021] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[Wed Jul 21 14:07:03 2021] [3659435]     0 3659435     1012      169    40960       22           984 sleep
[Wed Jul 21 14:07:03 2021] [3660809]     0 3660809      597      155    45056       26           984 sh
[Wed Jul 21 14:07:03 2021] [3660823]     0 3660823      597      170    40960       18           984 sh
[Wed Jul 21 14:07:03 2021] [3660824]     0 3660824      597       14    40960        9           984 sh
[Wed Jul 21 14:07:03 2021] [3660825]     0 3660825      597      170    45056       23           984 sh
[Wed Jul 21 14:07:03 2021] [3660827]     0 3660827   162644    44560   753664    38159           984 conda
[Wed Jul 21 14:07:03 2021] [3660890]     0 3660890     1012      169    49152       22           984 sleep
[Wed Jul 21 14:07:03 2021] Memory cgroup out of memory: Kill process 3660827 (conda) score 1123 or sacrifice child
[Wed Jul 21 14:07:03 2021] Killed process 3660827 (conda) total-vm:650576kB, anon-rss:165968kB, file-rss:12272kB, shmem-rss:0kB
[Wed Jul 21 14:07:03 2021] oom_reaper: reaped process 3660827 (conda), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

I also don't like the word "burstable" here... I thought it was supposed to be "guaranteed"

-- wvxvw
conda
kubernetes
memory-management

0 Answers