I have 2 pod running of one deployment on kubernetes GKE cluster. I have scale this stateless deployment replicas to 2.
Both replicas almost started at same time both are restating with error code 137 ERROR. To change restart timing i have deleted one pod manually so that RS (replicaset) create new one.
Now again both pods are restarting at same time. Is there any connection between them ?. Both has to work independently.
i have not set a resource limit. In cluster free space upto 3 GB and deployment not taking much memory still getting 137 and restart in pods.
Why both pod restarting at same time that's issue? other all 15 microservices running perfectly.
This is a common mistake when pods are defined. If you do not set a CPU and memory limit, there is no upper bound and the pod might take all resources, crash and restart. Those are discussed here [2][3]. You will also see that user “ciokan” [1] fixed his issue by setting the limit.
[1]https://github.com/kubernetes/kubernetes/issues/19825 [2]memory:https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/ [3]CPU:https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/
try to get more logs describing the pod
kubectl describe po
usually the 137 code means out of memory
did you allocated properly memory for you pods? https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/
Error code 137 is the result of a kill -9
(137 = 128 + 9). There can be several reasons:
As others also pointed out in their answers, this can happen because of out of memory condition. Notice that it could be the application or a process that runs out of memory even though there is no resources.limits.memory
set. For example, the JVM of a Java application runs out of heap memory.
Another reason could be that the application/process didn't handle SIGTERM
(kill -15
), which was then followed by SIGKILL
(kill -9
) to guarantee the shutdown.
It is very likely that both pods get restarted at almost the same time because an error condition is met almost at the same time. For example:
both pods are started at the same time and get about the same traffic and/or do about the same amount and kind of work and, thus, they run out of memory at almost the same time.
both pods fail the liveness probe at the same time as the probe's settings in the deployment are the same for both pods.
Check out the events (e.g. kubectl get events --sort-by=.metadata.creationTimestamp
) - they could show something to help determine the reason for terminating the container(s)/pods.