2 pod of same deployment restarting at same time

11/13/2019

I have 2 pod running of one deployment on kubernetes GKE cluster. I have scale this stateless deployment replicas to 2.

Both replicas almost started at same time both are restating with error code 137 ERROR. To change restart timing i have deleted one pod manually so that RS (replicaset) create new one.

Now again both pods are restarting at same time. Is there any connection between them ?. Both has to work independently.

i have not set a resource limit. In cluster free space upto 3 GB and deployment not taking much memory still getting 137 and restart in pods.

Why both pod restarting at same time that's issue? other all 15 microservices running perfectly.

-- Harsh Manvar
docker
google-kubernetes-engine
kubernetes
nginx-ingress

3 Answers

11/13/2019

This is a common mistake when pods are defined. If you do not set a CPU and memory limit, there is no upper bound and the pod might take all resources, crash and restart. Those are discussed here [2][3]. You will also see that user “ciokan” [1] fixed his issue by setting the limit.

[1]https://github.com/kubernetes/kubernetes/issues/19825 [2]memory:https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/ [3]CPU:https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/

-- Ali Reza Izadi
Source: StackOverflow

11/13/2019

try to get more logs describing the pod

kubectl describe po

usually the 137 code means out of memory

did you allocated properly memory for you pods? https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/

-- iliefa
Source: StackOverflow

11/14/2019

Error code 137 is the result of a kill -9 (137 = 128 + 9). There can be several reasons:

  • As others also pointed out in their answers, this can happen because of out of memory condition. Notice that it could be the application or a process that runs out of memory even though there is no resources.limits.memory set. For example, the JVM of a Java application runs out of heap memory.

  • Another reason could be that the application/process didn't handle SIGTERM (kill -15), which was then followed by SIGKILL (kill -9) to guarantee the shutdown.

It is very likely that both pods get restarted at almost the same time because an error condition is met almost at the same time. For example:

  • both pods are started at the same time and get about the same traffic and/or do about the same amount and kind of work and, thus, they run out of memory at almost the same time.

  • both pods fail the liveness probe at the same time as the probe's settings in the deployment are the same for both pods.

Check out the events (e.g. kubectl get events --sort-by=.metadata.creationTimestamp) - they could show something to help determine the reason for terminating the container(s)/pods.

-- apisim
Source: StackOverflow