container restart reason OOMKilled with exit code 1

2/21/2022

Recently, the same container of several pods in a deployment restarted with OOMKilled event. Here is the description of one of the containers:

State:          Running
      Started:      Tue, 15 Feb 2022 23:33:06 +0000
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    1
      Started:      Fri, 11 Feb 2022 17:48:21 +0000
      Finished:     Tue, 15 Feb 2022 23:33:05 +0000
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     1
      memory:  512Mi
    Requests:
      cpu:      1
      memory:   512Mi

If the container would exceed the limit of the available memory then it would exit with code 137. And I guess the container did not reached the limit. So my question what could happen if the exit code is 1 and the Reason is OOMKilled.

Update: The process is actually a python app which has threads, this is the code:

ret = subprocess.run(args, stderr=subprocess.PIPE, universal_newlines=True, check=False)
if ret.returncode != 0:
    logging.warning("Executing cmd failed: %s, code: %d, stderr: %s", cmd, ret.returncode, ret.stderr)
    raise Exception("Failed")

and the relevant logs when called, it return with -9:

2022-02-15T23:33:30.510Z WARNING "MainThread - Executing cmd failed: iptables-restore -n -w 3 restore-filter, code: -9, stderr: "
raise Exception("Failed")
Exception: Failed

from the description of subprocess.run(): A negative value -N indicates that the child was terminated by signal N (POSIX only).

So because the exception is raised the python code exited with 1? Probably..

-- laplasz
kubernetes
python

1 Answer

2/22/2022

Two possible reasons:

Reason #1

Subprocess was killed by OOM killer (it received SIGKILL(9) from OOM killer), resulting in application crashing with exit code 1, and OOMKilled reason for termination.

Reason #2

If you have initContainers specified, init container could have been killed by OOM killer, resulting in OOMKilled reason, and application crashing with exit code 1 due to the bad initialization.


OOM kill is not very well documented in Kubernetes docs. For example

Containers are marked as OOM killed only when the init pid gets killed by the kernel OOM killer. There are apps that can tolerate OOM kills of non init processes and so we chose to not track non-init process OOM kills. <sup>[source]</sup>

I could not find any mentions of it anywhere, othen than this GitHub issue.


First reason is more probable in my opinion.
Possible solution is to increase memory limits (if you have any).

-- p10l
Source: StackOverflow