Correctly keeping docker VSTS / Azure Devops build agent clean yet cached

2/13/2019

We have added a dockerised build agent to our development Kubernetes cluster which we use to build our applications as part of our Azure Devops pipelines. We created our own image based on the deprecated Microsoft/vsts-agent-docker on Github.

The build agent uses Docker outside of Docker (DooD) to create images on our development cluster.

This agent was working well for a few days but then an error would occasionally occur on the docker commands in our build pipeline:

Error response from daemon: No such image: fooproject:ci-3284.2 /usr/local/bin/docker failed with return code: 1

We realised that the build agent was creating tons of images that weren't being removed. There were tons of images that were blocking up the build agent and there were missing images, which would explain the "no such image" error message.

By adding a step to our build pipelines with the following command we were able to get our build agent working again:

docker system prune -f -a

But of course this then removes all our images, and they must be built from scratch every time, which causes our builds to take an unnecessarily long time.

I'm sure this must be a solved problem but I haven't been able to locate any documentation on the normal strategy for dealing with a dockerised build agent becoming clogged over time. Being new to docker and kubernetes I may simply not know what I am looking for. What is the best practice for creating a dockerised build agent that stays clean and functional, while maintaining a cache?

EDIT: Some ideas:

  • Create a build step that cleans up all but the latest image for the given pipeline (this might still clog the build server though).
  • Have a cron job run that removes all the images every x days (this would result in slow builds the first time after the job is run, and could still clog the build server if it sees heavy usage.
  • Clear all images nightly and run all builds outside of work hours. This way builds would run quickly during the day. However heavy usage could still clog the build server.

EDIT 2:

I found someone with a docker issue on Github that seems to be trying to do exactly the same thing as me. He came up with a solution which he described as follows:

I was exactly trying to figure out how to remove "old" images out of my automated build environment without removing my build dependencies. This means I can't just remove by age, because the nodejs image might not change for weeks, while my app builds can be worthless in literally minutes.

docker image rm $(docker image ls --filter reference=docker --quiet)

That little gem is exactly what I needed. I dropped my repository name in the reference variable (not the most self-explanatory.) Since I tag both the build number and latest the docker image rm command fails on the images I want to keep. I really don't like using daemon errors as a protection mechanism, but its effective.

Trying to follow these directions, I have applied the latest tag to everything that is built during the process, and then run

docker image ls --filter reference=fooproject

If I try to remove these I get the following error:

Error response from daemon: conflict: unable to delete b870ec9c12cc (must be forced) - image is referenced in multiple repositories

Which prevents the latest one from being removed. However this is not exactly a clean way of doing this. There must be a better way?

-- Ivan
azure-devops
azure-pipelines
build
docker
kubernetes

1 Answer

8/26/2019

Probably you've already found a solution, but it might be useful for the rest of the community to have an answer here.

docker prune has a limited purpose. It was created to address the issue with cleaning up all local Docker images. (As it was mentioned by thaJeztah here)

To remove images in the more precise way it's better to divide this task into two parts: 1. select/filter images to delete 2. delete the list of selected images

E.g:

docker image rm $(docker image ls --filter reference=docker --quiet)
docker image rm $(sudo docker image ls | grep 1.14 | awk '{print $3}')
docker image ls --filter reference=docker --quiet | xargs docker image rm

It is possible to combine filters clauses to get exactly what you what:
(I'm using Kubernetes master node as an example environment)

$ docker images

REPOSITORY                           TAG                 IMAGE ID            CREATED             SIZE
k8s.gcr.io/kube-proxy                v1.14.2             5c24210246bb        3 months ago        82.1MB
k8s.gcr.io/kube-apiserver            v1.14.2             5eeff402b659        3 months ago        210MB
k8s.gcr.io/kube-controller-manager   v1.14.2             8be94bdae139        3 months ago        158MB
k8s.gcr.io/kube-scheduler            v1.14.2             ee18f350636d        3 months ago        81.6MB  # before
quay.io/coreos/flannel               v0.11.0-amd64       ff281650a721        6 months ago        52.6MB
k8s.gcr.io/coredns                   1.3.1               eb516548c180        7 months ago        40.3MB  # since
k8s.gcr.io/etcd                      3.3.10              2c4adeb21b4f        8 months ago        258MB
k8s.gcr.io/pause                     3.1                 da86e6ba6ca1        20 months ago       742kB

$ docker images --filter "since=eb516548c180" --filter "before=ee18f350636d" 

REPOSITORY               TAG                 IMAGE ID            CREATED             SIZE
quay.io/coreos/flannel   v0.11.0-amd64       ff281650a721        6 months ago        52.6MB

$ docker images --filter "since=eb516548c180" --filter "reference=quay.io/coreos/flannel" 
REPOSITORY               TAG                 IMAGE ID            CREATED             SIZE
quay.io/coreos/flannel   v0.11.0-amd64       ff281650a721        6 months ago        52.6MB

$ docker images --filter "since=eb516548c180" --filter "reference=quay*/*/*" 
REPOSITORY               TAG                 IMAGE ID            CREATED             SIZE
quay.io/coreos/flannel   v0.11.0-amd64       ff281650a721        6 months ago        52.6MB

$ docker images --filter "since=eb516548c180" --filter "reference=*/*/flan*" 
REPOSITORY               TAG                 IMAGE ID            CREATED             SIZE
quay.io/coreos/flannel   v0.11.0-amd64       ff281650a721        6 months ago        52.6MB

As mentioned in the documentation, images / image ls filter is much better than docker prune filter, which supports until clause only:

The currently supported filters are:
• dangling (boolean - true or false)  
• label (label=<key> or label=<key>=<value>)  
• before (<image-name>[:<tag>], <image id> or <image@digest>) - filter images created before given id or references
• since (<image-name>[:<tag>], <image id> or <image@digest>) - filter images created since given id or references

If you need more than one filter, then pass multiple flags (e.g., --filter "foo=bar" --filter "bif=baz")

You can use other linux cli commands to filter docker images output:

grep "something"      # to include only specified images
grep -v "something"   # to exclude images you want to save
sort [-k colN] [-r] [-g]] | head/tail -nX  # to select X oldest or newest images

Combining them and putting the result to CI/CD pipeline allows you to leave only required images in the local cache without collecting a lot of garbage on your build server.

I've copied here a good example of using that approach provided by strajansebastian in the comment:

#example of deleting all builds except last 2 for each kind of image 
#(the image kind is based on the Repository value.)

#If you want to preserve just last build modify to tail -n+2.

# delete dead containers
docker container prune -f

# keep last 2 builds for each image from the repository
for diru in `docker images --format "{{.Repository}}" | sort | uniq`; do
    for dimr in `docker images --format "{{.ID}};{{.Repository}}:{{.Tag}};'{{.CreatedAt}}'" --filter reference="$diru" | sed -r "s/\s+/~/g" | tail -n+3`; do 
        img_tag=`echo $dimr | cut -d";" -f2`; 
        docker rmi $img_tag;
    done;
done

# clean dangling images if any
docker image prune -f
-- VAS
Source: StackOverflow