I have a cluster(It was deployed by rancher RKE) with 3 masters(HA) and 8 workers like below
worker7 Ready worker 199d v1.15.5 10.116.18.42 <none> Red Hat Enterprise Linux Server 7.5 (Maipo) 3.10.0-1062.el7.x86_64 docker://19.3.4And it's using ingress-nginx(image tag 0.25) as an ingress controller, cannel as network plugins. The cluster works well, see the top below
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master1 219m 5% 4497Mi 78%
master2 299m 7% 4053Mi 71%
master3 266m 6% 4255Mi 72%
worker1 778m 4% 27079Mi 42%
worker2 691m 4% 43636Mi 67%
worker3 528m 3% 48660Mi 75%
worker4 677m 4% 37532Mi 58%
worker5 895m 5% 51634Mi 80%
worker6 838m 5% 47337Mi 73%
worker7 2388m 14% 47065Mi 73%
worker8 1805m 11% 40601Mi 63%The pods on the worker1 below
Non-terminated Pods: (10 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
cattle-prometheus exporter-node-cluster-monitoring-jqqkv 100m (0%) 200m (1%) 30Mi (0%) 200Mi (0%) 197d
cattle-prometheus prometheus-cluster-monitoring-1 1350m (8%) 1800m (11%) 5200Mi (8%) 5350Mi (8%) 4d23h
cattle-system cattle-node-agent-ml7fl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 173d
ingress-nginx nginx-ingress-controller-hdbjp 0 (0%) 0 (0%) 0 (0%) 0 (0%) 92d
kube-system canal-bpqjl 250m (1%) 0 (0%) 0 (0%) 0 (0%) 165d
sigma-demo apollo-configservice-dev-64f54f4b58-8tdm8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d23h
sigma-demo ibor-8d9c9d54d-8bmh9 700m (4%) 1 (6%) 1Gi (1%) 4Gi (6%) 2d16h
sigma-sit ibor-admin-7f886488cb-k4t5p 100m (0%) 1500m (9%) 1Gi (1%) 4Gi (6%) 2d19h
sigma-sit ibor-collect-5698947546-69zz9 200m (1%) 1 (6%) 1Gi (1%) 2Gi (3%) 2d16h
utils filebeat-filebeat-59hx7 100m (0%) 1 (6%) 100Mi (0%) 200Mi (0%) 6d13h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 2800m (17%) 6500m (40%)
memory 8402Mi (13%) 15990Mi (24%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>As you can see they're not so many high resources request(ibor was an Java program to load data(need high CPU and high memory used, and although there needs to optimize) and apollo was a config center)
But while I log into worker1 node and use htop command we can get the report with system loads high and already filled up all the numbers of CPU
But I do not understand wich process lets the system load so high. And it will grow up to about 30~40 and finally broken the system. Everything is not different just the high cs and us.
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
3 0 0 15227600 3176 40686872 0 0 0 26 1 2 4 1 95 0 0
0 0 0 15227772 3176 40686952 0 0 0 34 16913 14861 2 2 96 0 0
1 0 0 15226836 3176 40686976 0 0 0 33 18861 13368 2 2 96 0 0
0 0 0 15226736 3176 40686984 0 0 0 630 15778 14887 2 1 97 0 0
0 0 0 15226716 3176 40687196 0 0 0 31 17228 14023 4 2 95 0 0
0 0 0 15225188 3176 40687224 0 0 0 0 20546 17126 3 2 95 0 0
0 0 0 15224868 3176 40687240 0 0 0 32 16025 14326 2 1 97 0 0
2 0 0 15224128 3176 40687544 0 0 0 34 20494 16183 3 2 95 0 0
0 0 0 15224324 3176 40687548 0 0 0 33 15158 12917 3 1 95 0 0
0 0 0 15225152 3176 40687572 0 0 0 0 19292 15307 2 2 96 0 0
2 0 0 15224764 3176 40687576 0 0 0 33 15634 13430 3 1 95 0 0
1 0 0 15220824 3176 40687768 0 0 0 0 21238 15215 11 2 86 0 0
2 0 0 15221352 3176 40687776 0 0 0 33 14481 12017 3 1 95 0 0
2 0 0 15220140 3176 40687796 0 0 0 33 20263 16450 4 3 93 0 0
1 0 0 15220200 3176 40688108 0 0 0 0 16103 12503 2 1 97 0 0
1 0 0 15220692 3176 40688116 0 0 0 64 20478 15081 2 2 95 0 0So, I'd like to ask for help with which process causes this situation and how to check?