I have kubernetes cluster with weave CNI plugin consisting of 3 nodes:
The trouble is that top
shows that kubelet has 60-100% CPU usage on first worker. In journalctl -u kubelet
I see a lot of messages (hundreds every minute)
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.075243 3843 docker_sandbox.go:205] Failed to stop sandbox "011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640": Error response from daemon: {"message":"No such container: 011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640"}
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.075360 3843 remote_runtime.go:109] StopPodSandbox "011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-p6kwb_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.075380 3843 kuberuntime_gc.go:138] Failed to stop sandbox "011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-p6kwb_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.076549 3843 docker_sandbox.go:205] Failed to stop sandbox "0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf": Error response from daemon: {"message":"No such container: 0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf"}
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.076654 3843 remote_runtime.go:109] StopPodSandbox "0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-6g8jq_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.076676 3843 kuberuntime_gc.go:138] Failed to stop sandbox "0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-6g8jq_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.079585 3843 docker_sandbox.go:205] Failed to stop sandbox "014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772": Error response from daemon: {"message":"No such container: 014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772"}
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.079805 3843 remote_runtime.go:109] StopPodSandbox "014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-r30cw_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772
It's happen after wrong cronetes tasks which failed during creation. I removed all pods with --force
but kubelet still try to remove them. Also I restarted kubelet on that worker with no result. How can I talk to kubelet to forget them?
Version info
Kubernetes v1.6.1
Docker version 1.12.0, build 8eab29e
Linux kube-worker1 4.4.0-72-generic #93-Ubuntu SMP
Container manifest (without metadata)
job:
apiVersion: batch/v1
kind: Job
spec:
template:
spec:
containers:
- name: cron-task
image: docker.company.ru/image:v2.3.2
command: ["rake", "db:refresh_views"]
env:
- name: RAILS_ENV
value: namespace
- name: CONFIG_PATH
value: /config
volumeMounts:
- name: config
mountPath: /config
volumes:
- name: config
configMap:
name: task-conf
restartPolicy: Never
Also I didn't found any mention of this pod's part of name (2533948c46c1) in cluster's etcd.
This seems to be related to the Pods with hostNetwork=true cannot be removed (and generate errors) when using CNI issue in Kubernetes 1.6.x. Those messages are not critical anyhow but of course it's annoying when you try to find actual issues. Try using the most recent version of Kubernetes to mitigate the issues.
I ran into the same problem as you and did go profiling for this and find the cause is kubelet pleg mechanism and remove the '/var/lib/dockershim/sandbox' did the magic.
Finally I found the solution.
Kubelet stores information about all pods, running on it in
/var/lib/dockershim/sandbox
So when I ls
in that folder I found files for all missing pods. Then I deleted these files and log messages disappeared and CPU usage returns to normal value (even without kubelet restart)