Why pod goes to CrashLoopBackOff after restarting successfully few times due to liveliness probe failed

4/4/2021

I've a simple spring boot application with following liveness probe:

        livenessProbe:
          httpGet:
            path: /health
            port: 56017
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 1
          failureThreshold: 3

In the health endpoint, I simply throw an exception causing it to return 500. Here is an statistics of a fresh pod after watching many restarts:

PS C:\Users\xxx\yyy\Desktop> k get pods -n xyz  -w

NAME                                         READY   STATUS    		    RESTARTS   AGE
springapi-577c6f94b9-9r4lm                   1/1     Running   		    0          15s
springapi-577c6f94b9-9r4lm                   1/1     Running   		    1          69s
springapi-577c6f94b9-9r4lm                   1/1     Running   		    2          2m10s
springapi-577c6f94b9-9r4lm                   1/1     Running  		    3          3m10s
springapi-577c6f94b9-9r4lm                   1/1     Running   		    4          4m10s
springapi-577c6f94b9-9r4lm                   1/1     Running   		    5          5m10s
springapi-577c6f94b9-9r4lm                   0/1     CrashLoopBackOff   5          6m8s
springapi-577c6f94b9-9r4lm                   1/1     Running            6          7m33s
springapi-577c6f94b9-9r4lm                   0/1     CrashLoopBackOff   6          8m28s
springapi-577c6f94b9-9r4lm                   1/1     Running            7          11m
springapi-577c6f94b9-9r4lm                   0/1     CrashLoopBackOff   7          12m
springapi-577c6f94b9-9r4lm                   1/1     Running            8          17m
springapi-577c6f94b9-9r4lm                   1/1     Running            9          18m
springapi-577c6f94b9-9r4lm                   0/1     CrashLoopBackOff   9          19m
springapi-577c6f94b9-9r4lm                   1/1     Running            10         24m
springapi-577c6f94b9-9r4lm                   1/1     Running            11         25m
springapi-577c6f94b9-9r4lm                   0/1     CrashLoopBackOff   11         26m

I notice that first couple of restarts are quick as expected. Then #3, #4, #5 are 1 minutes apart. So far kind of makes sense. After that I start seeing CrashLoopBackOff and time between two restarts goes up to 5 minutes apart. Why CrashLoopBackOff? and why restarts are so much apart after few restarts?

I saw the logs of the pod. Nothing unusual. The log output is something like this (these logs are after many restarts):

2021-04-04 00:46:49.172 DEBUG 1 --- Spring boot startup stuff ...
...
2021-04-04 00:47:23.121  INFO 1 --- Spring boot startup stuff ...
2021-04-04 00:47:23.178 ERROR 1 --- exception stack trace
2021-04-04 00:47:33.010 ERROR 1 --- exception stack trace
2021-04-04 00:47:43.005 ERROR 1 --- exception stack trace
2021-04-04 00:47:43.092  INFO 1 --- [extShutdownHook] o.s.s.concurrent.ThreadPoolTaskExecutor  : Shutting down ExecutorService 'applicationTaskExecutor'

When I run describe command, I see messages like

Container springapi failed liveness probe, will be restarted
Liveness probe failed: HTTP probe failed with statuscode: 500
Back-off restarting failed container

Btw, during these 5 minutes, the pod remains in CrashLoopBackOff state. I've restarted the pod many times. I see same behavior everytime.

-- Atiq
crashloopbackoff
java
kubernetes
livenessprobe
spring-boot

1 Answer

4/4/2021

I found this explanation in an article:

Failed containers that are restarted by the kubelet are restarted with an exponential back-off delay (10s, 20s, 40s …) capped at five minutes, and is reset after ten minutes of successful execution. 

Looks like its an expected behavior.

-- Atiq
Source: StackOverflow