I am launching Jobs and I'm trying to use the lifecycle hooks to launch a script at start and another one at shutdown of the container.
I am also specifying resource limits, and they look like this:
resources:
required:
memory: 1Gi
cpu: 1
limits:
memory: 1Gi
cpu: 1
My cluster currently has 4 nodes with 1 CPU and 4 GB of RAM each, and is running on EC2 machines.
The postStart
script is at the moment very simple, and looks like this:
export SOME_VAR=some_value
node someScript.js
The only thing the Node script does is update a value on a database, so it's not an especially intensive task.
After launching the job, the following events happen:
As you can see the postStart
hook fails with error 137, and gives no error message.
Any help for solving this issue is highly welcome and appreciated.
Since the first answer has pointed to the fact that the command executed for the cook might not be correctly built, I think it's important to say that I build the jobs using the API Kubernetes publishes through kubectl proxy
.
This is how I specify the lifecycle
instructions:
"lifecycle": {
"postStart": {
"exec": {
"command": [
"/bin/sh",
"postStart.sh"
]
}
},
"preStop": {
"exec": {
"command": [
"/bin/sh",
"preStop.sh"
]
}
}
}
I think this translates to YAML the way it's supposed to; please correct me if I am wrong on this.
You have 2 problems, so you get 2 answers :-)
You pod specifies the requirement of cpu: 1
- this means 1 cpu core. Your nodes have 1 cpu core in total, but are already running some pods, like kube-proxy. So none of them have a full core available for your application, so the scheduling fails.
The error message No nodes are available that match all of the predicates: Insufficient cpu (4), PodToleratesNodeTaints (1)
means:
kubectl describe node nameofyournode
, and look at the Allocatable:
and the Allocated resources:
part of the output. In Non-terminated Pods:
you will see that is taking up some of your cpu, possibly a kube-proxy pod.The solution is to lower the requirement for your pod (500mi
means 500 millicores, or 0.5 cores):
resources:
required:
memory: 1Gi
cpu: 500mi
limits:
memory: 1Gi
cpu: 500mi
... or resize your machines so they have 2 cores instead of 1.
Now what is most curious is that somehow in the end the pod did get scheduled, but thereafter killed. Code 126 means Command invoked cannot execute
, so the postStart:
command is probably invalid. You did not post the full yaml file, but from the error message it looks like you have specified something like:
lifecycle:
postStart:
exec:
command: ["/bin/sh postStart.sh"]
please check if that is the case. If so, it is incorrect. You need to separate each parameter into a different element in the command
array like so:
lifecycle:
postStart:
exec:
command: ["/bin/sh", "postStart.sh"]
Alternatively, make sure that postStart.sh
is marked executable in the container image and specify a shell shebang in the first line (#!/bin/bash
). If you do that you can define the postStart hook like this:
lifecycle:
postStart:
exec:
command: ["/path/to/postStart.sh"]