we have a small problem with the kubernetes cluster.
Because one of our applications is so demanding that sometimes consume all of our resources and finally some of pods are killed. The real problem starts when system pods like flannel or cache became removed.
Is there a recommended way to control what is being removed? How "save" system pods? Maybe someone has experience in this topic?
One of the ideas is to change QoS for all pods/apps from the kube-system to "Guaranteed". But I'm afraid that this will not work well if we limit resources, even with a large margin.
Btw. where can I read about what (default) requirements system services have? How set it on cluster creation phase?
The second idea is setting the Eviction Policy and/or Taints and Tolerations, but there is a anxiety that our key application will be (re)moved as one of the first. Unfortunately it currently works only in one copy and the initialization can take up to several minutes, so switching between nodes is currently unacceptable and impossible.
The final idea is to use Priority and Preemption, but from what I see in the 1.8.1 documentation is still in the "alpha" phase, and I have serious concerns about the stability of this solution.
Maybe there is something else I did not think about? I will be happy to listen other proposals.