I'm running several experiments in GCE with a Kubernetes cluster built with KOPS. I can start my experiments, verify that they're running, then close to the end of the run the node responsible for generating the load for my cluster get a state "Unknown" for the "MemoryPressure", "DiskPressure" and "Ready" types.
Coincidentally the pods that run on the node require the most resources towards the end of the run as well.
So my question is, is it possible that the node is unable to respond to a request from the kube-controller or api-server due to its load-generation?
If so, how do I resolve this? Since, my experiments potentially render the node unresponsive for a maximum of about half an hour or more.
Thanks for any responses in advance.
If the load is growing because of growing amount of Pods, you can try to use Node autoscaling. Here you can find the instruction about it.
If only several Pods consume all Node resources, then the only way is to use Nodes with bigger amount of CPU and Memory
Turns out one of my pods was consuming all the CPU on the node. Causing kubelte to become unresponsive. I've set a limit on the pod's CPU-time and that fixed the issue. Also, added a kube-reserved setting to ensure kubelet gets the CPU-time it needs.