I've a simple spring boot application with following liveness probe:
livenessProbe:
httpGet:
path: /health
port: 56017
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 1
failureThreshold: 3
In the health endpoint, I simply throw an exception causing it to return 500. Here is an statistics of a fresh pod after watching many restarts:
PS C:\Users\xxx\yyy\Desktop> k get pods -n xyz -w
NAME READY STATUS RESTARTS AGE
springapi-577c6f94b9-9r4lm 1/1 Running 0 15s
springapi-577c6f94b9-9r4lm 1/1 Running 1 69s
springapi-577c6f94b9-9r4lm 1/1 Running 2 2m10s
springapi-577c6f94b9-9r4lm 1/1 Running 3 3m10s
springapi-577c6f94b9-9r4lm 1/1 Running 4 4m10s
springapi-577c6f94b9-9r4lm 1/1 Running 5 5m10s
springapi-577c6f94b9-9r4lm 0/1 CrashLoopBackOff 5 6m8s
springapi-577c6f94b9-9r4lm 1/1 Running 6 7m33s
springapi-577c6f94b9-9r4lm 0/1 CrashLoopBackOff 6 8m28s
springapi-577c6f94b9-9r4lm 1/1 Running 7 11m
springapi-577c6f94b9-9r4lm 0/1 CrashLoopBackOff 7 12m
springapi-577c6f94b9-9r4lm 1/1 Running 8 17m
springapi-577c6f94b9-9r4lm 1/1 Running 9 18m
springapi-577c6f94b9-9r4lm 0/1 CrashLoopBackOff 9 19m
springapi-577c6f94b9-9r4lm 1/1 Running 10 24m
springapi-577c6f94b9-9r4lm 1/1 Running 11 25m
springapi-577c6f94b9-9r4lm 0/1 CrashLoopBackOff 11 26m
I notice that first couple of restarts are quick as expected. Then #3, #4, #5 are 1 minutes apart. So far kind of makes sense. After that I start seeing CrashLoopBackOff and time between two restarts goes up to 5 minutes apart. Why CrashLoopBackOff? and why restarts are so much apart after few restarts?
I saw the logs of the pod. Nothing unusual. The log output is something like this (these logs are after many restarts):
2021-04-04 00:46:49.172 DEBUG 1 --- Spring boot startup stuff ...
...
2021-04-04 00:47:23.121 INFO 1 --- Spring boot startup stuff ...
2021-04-04 00:47:23.178 ERROR 1 --- exception stack trace
2021-04-04 00:47:33.010 ERROR 1 --- exception stack trace
2021-04-04 00:47:43.005 ERROR 1 --- exception stack trace
2021-04-04 00:47:43.092 INFO 1 --- [extShutdownHook] o.s.s.concurrent.ThreadPoolTaskExecutor : Shutting down ExecutorService 'applicationTaskExecutor'
When I run describe command, I see messages like
Container springapi failed liveness probe, will be restarted
Liveness probe failed: HTTP probe failed with statuscode: 500
Back-off restarting failed container
Btw, during these 5 minutes, the pod remains in CrashLoopBackOff state. I've restarted the pod many times. I see same behavior everytime.
I found this explanation in an article:
Failed containers that are restarted by the kubelet are restarted with an exponential back-off delay (10s, 20s, 40s …) capped at five minutes, and is reset after ten minutes of successful execution.
Looks like its an expected behavior.