I have a k8s cluster, in our cluster we do not want the pods to get evicted, because pod eviction causes lot of side effects to the applications running on it.
To prevent pod eviction from happening, we have configured all the pods as Guaranteed QoS. I know even with this the pod eviction can happen if there are any resource starvation in the system. We have monitors to alert us when there are resource starvation within the pod and node. So we get to know way before a pod gets evicted. This helps us in taking measures before pod gets evicted.
The other reasons for pod eviction to happen is if the node is in not-ready state, then kube-controller-manager will check the pod-eviction-timeout and it will evict the pods after this timeout. We have monitor to alert us when the node goes to not-ready state. now after this alert we wanted to take some measures to clean-up from application side, so the application will end gracefully. To do this clean-up we need more than few hours, but pod-eviction-timeout is by default 5 minutes.
Is it fine to increase the pod eviction timeout to 300m? what are the impacts of increasing this timeout to such a limit?
P.S: I know during this wait time, if the pod utilises more resources, then kubelet can itself evict this pod. I wanted to know what other impact of waiting for such a long time?
As @coderanger said, your limits are incorrect and this should be fixed instead of lowering self-healing capabilities of Kubernetes.
If your pod
dies no matter what was the issue with it, by default it will be rescheduled based on your configuration. If you are having a problem with this then I would recommend redoing your architecture and rewriting the app to use Kubernetes how it's supposed to be used.
pod
still being send requests when it's unresponsive, you should implement a LB in front or queue the requests,pod
restarts, this should be fixed by using DNS and service
instead of connecting directly to a pod,if your pod
is being evicted check why, make the limits and requests,
As for the node, there is a really nice blog post about Improving Kubernetes reliability: quicker detection of a Node down, it's opposite of what you are thinking of doing but it also mentions why 340s is too much
Once the node is marked as unhealthy, the kube controller manager will remove its pods based on –pod-eviction-timeout=5m0s
This is a very important timeout, by default it’s 5m which in my opinion is too high, because although the node is already marked as unhealthy the kube controller manager won’t remove the pods so they will be accessible through their service and requests will fail.
If you still want to change default values to higher you can look into changing these:
to higher ones.
If you provide more details I'll try to help more.