Why does dockerd on a node get bad?

11/14/2018

After a few days of running dockerd on a kubernetes host, where pods are scheduled by kubelet, dockerd goes bad - consuming a lot of resources (50% memory - ~4gigs).

When it gets to this state, it is unable to act on commands for containers that appear to be running via $ docker ps. Also checking ps -ef on the host these containers don't map to any underlying host processes.

$ docker exec yields -

level=error msg="Error running exec in container: rpc error: code = 2 desc = containerd: container not found"

Cannot kill container 6a8d4....8: rpc error: code = 14 desc = grpc: the connection is unavailable"

level=fatal msg="open /var/run/docker/libcontainerd/containerd/7657...4/65...6/process.json: no such file or directory"

Looking through the process tree on the host there seem to be a lot of defunct processes which point to dockerd as the parent id. Any pointers on what the issue might be or where to look further?

Enabled debug on dockerd to see if the issue re-occurs, a dockerd restart fixes the issue.

-- user2062360
docker
kubernetes

1 Answer

11/14/2018

Sounds like you have a container misbehaving and docker is not able to reap it. I would take a look at what has been scheduled on the nodes where you see the problem. The error you are seeing seems like the docker daemon not responding to API requests issued by the docker CLI. Some pointers:

  • Has the container exited successfully or with an error?
  • Did they containers get killed for some reason?
  • Check the kubelet logs
  • Check the kube-scheduler logs?
  • Follow the logs in the containers on your node docker logs -f <containerid>
-- Rico
Source: StackOverflow