I have a K8s cluster on premise with five nodes running 1.19.8. For some time now one of those nodes gets a lot more pressure than the other four.
That node has double the amount of CPU cores available. But the same amount of memory as the others.
When I describe it, I see the same amount of memory and the same pod limit of 110.
Last night I had a look at it and it was running more than 130 pods! Sometimes it has nearly 98 percent of memory used whilst other nodes have 60 to 70 % and way fewer pods assigned. But it still gets assigned new ones.
At some point then it gets a SystemOOM
and the OS starts killing processes...
Has anyone any idea what could go wrong here? Why this is happening? Or where I could start to check?
Thanks in advance!