Kubelet process has high CPU usage over long time

5/19/2017

I have kubernetes cluster with weave CNI plugin consisting of 3 nodes:

  • 1 master node (virtual machine)
  • 2 worker baremetall nodes (4 cores xeon with hyperthreading - 8 logical nodes)

The trouble is that top shows that kubelet has 60-100% CPU usage on first worker. In journalctl -u kubelet I see a lot of messages (hundreds every minute)

May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.075243    3843 docker_sandbox.go:205] Failed to stop sandbox "011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640": Error response from daemon: {"message":"No such container: 011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640"}
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.075360    3843 remote_runtime.go:109] StopPodSandbox "011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-p6kwb_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.075380    3843 kuberuntime_gc.go:138] Failed to stop sandbox "011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-p6kwb_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.076549    3843 docker_sandbox.go:205] Failed to stop sandbox "0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf": Error response from daemon: {"message":"No such container: 0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf"}
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.076654    3843 remote_runtime.go:109] StopPodSandbox "0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-6g8jq_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.076676    3843 kuberuntime_gc.go:138] Failed to stop sandbox "0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-6g8jq_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.079585    3843 docker_sandbox.go:205] Failed to stop sandbox "014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772": Error response from daemon: {"message":"No such container: 014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772"}
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.079805    3843 remote_runtime.go:109] StopPodSandbox "014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-r30cw_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772

It's happen after wrong cronetes tasks which failed during creation. I removed all pods with --force but kubelet still try to remove them. Also I restarted kubelet on that worker with no result. How can I talk to kubelet to forget them?

Version info

Kubernetes v1.6.1
Docker version 1.12.0, build 8eab29e
Linux kube-worker1 4.4.0-72-generic #93-Ubuntu SMP

Container manifest (without metadata)

  job:
    apiVersion: batch/v1
    kind: Job
    spec:
      template:
        spec:
          containers:
          - name: cron-task
            image: docker.company.ru/image:v2.3.2
            command: ["rake", "db:refresh_views"]
            env:
            - name: RAILS_ENV
              value: namespace
            - name: CONFIG_PATH
              value: /config
            volumeMounts:
            - name: config
              mountPath: /config
          volumes:
          - name: config
            configMap:
              name: task-conf
          restartPolicy: Never

Also I didn't found any mention of this pod's part of name (2533948c46c1) in cluster's etcd.

-- user1802474
kubelet
kubernetes

3 Answers

5/19/2017

This seems to be related to the Pods with hostNetwork=true cannot be removed (and generate errors) when using CNI issue in Kubernetes 1.6.x. Those messages are not critical anyhow but of course it's annoying when you try to find actual issues. Try using the most recent version of Kubernetes to mitigate the issues.

-- pagid
Source: StackOverflow

6/17/2017

I ran into the same problem as you and did go profiling for this and find the cause is kubelet pleg mechanism and remove the '/var/lib/dockershim/sandbox' did the magic.

-- WizardCXY
Source: StackOverflow

5/25/2017

Finally I found the solution.
Kubelet stores information about all pods, running on it in

/var/lib/dockershim/sandbox

So when I ls in that folder I found files for all missing pods. Then I deleted these files and log messages disappeared and CPU usage returns to normal value (even without kubelet restart)

-- user1802474
Source: StackOverflow