I am using Jenkins-X for a relatively large project, which consists of approximately 30 modules, 15 of which are services (and therefore, contain Dockerfiles, and a respective Helm chart for deployment).
During some of these relatively large builds, I am intermittently (~every other build) seeing a build pod become evicted, using kubectl describe pod <podname>
I can investigate and I've noticed that the pod is evicted due to the following:
the node was low on resource imagefs
Full data:
Name: maven-96wmn
Namespace: jx
Node: ip-192-168-66-176.eu-west-1.compute.internal/
Start Time: Tue, 06 Nov 2018 10:22:54 +0000
Labels: jenkins=slave
jenkins/jenkins-maven=true
Annotations: <none>
Status: Failed
Reason: Evicted
Message: The node was low on resource: imagefs.
IP:
Containers:
maven:
Image: jenkinsxio/builder-maven:0.0.516
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
Args:
cat
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 400m
memory: 512Mi
Environment:
JENKINS_SECRET: 131c407141521c0842f62a69004df926be6cb531f9318edf0885aeb96b0662b4
JENKINS_TUNNEL: jenkins-agent:50000
DOCKER_CONFIG: /home/jenkins/.docker/
GIT_AUTHOR_EMAIL: jenkins-x@googlegroups.com
GIT_COMMITTER_EMAIL: jenkins-x@googlegroups.com
GIT_COMMITTER_NAME: jenkins-x-bot
_JAVA_OPTIONS: -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -Dsun.zip.disableMemoryMapping=true -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -Xms10m -Xmx192m
GIT_AUTHOR_NAME: jenkins-x-bot
JENKINS_NAME: maven-96wmn
XDG_CONFIG_HOME: /home/jenkins
JENKINS_URL: http://jenkins:8080
HOME: /home/jenkins
Mounts:
/home/jenkins from workspace-volume (rw)
/home/jenkins/.docker from volume-2 (rw)
/home/jenkins/.gnupg from volume-3 (rw)
/root/.m2 from volume-1 (rw)
/var/run/docker.sock from volume-0 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from jenkins-token-smvvp (ro)
jnlp:
Image: jenkinsci/jnlp-slave:3.14-1
Port: <none>
Host Port: <none>
Args:
131c407141521c0842f62a69004df926be6cb531f9318edf0885aeb96b0662b4
maven-96wmn
Requests:
cpu: 100m
memory: 128Mi
Environment:
JENKINS_SECRET: 131c407141521c0842f62a69004df926be6cb531f9318edf0885aeb96b0662b4
JENKINS_TUNNEL: jenkins-agent:50000
DOCKER_CONFIG: /home/jenkins/.docker/
GIT_AUTHOR_EMAIL: jenkins-x@googlegroups.com
GIT_COMMITTER_EMAIL: jenkins-x@googlegroups.com
GIT_COMMITTER_NAME: jenkins-x-bot
_JAVA_OPTIONS: -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -Dsun.zip.disableMemoryMapping=true -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -Xms10m -Xmx192m
GIT_AUTHOR_NAME: jenkins-x-bot
JENKINS_NAME: maven-96wmn
XDG_CONFIG_HOME: /home/jenkins
JENKINS_URL: http://jenkins:8080
HOME: /home/jenkins
Mounts:
/home/jenkins from workspace-volume (rw)
/home/jenkins/.docker from volume-2 (rw)
/home/jenkins/.gnupg from volume-3 (rw)
/root/.m2 from volume-1 (rw)
/var/run/docker.sock from volume-0 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from jenkins-token-smvvp (ro)
Volumes:
volume-0:
Type: HostPath (bare host directory volume)
Path: /var/run/docker.sock
HostPathType:
volume-2:
Type: Secret (a volume populated by a Secret)
SecretName: jenkins-docker-cfg
Optional: false
volume-1:
Type: Secret (a volume populated by a Secret)
SecretName: jenkins-maven-settings
Optional: false
workspace-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
volume-3:
Type: Secret (a volume populated by a Secret)
SecretName: jenkins-release-gpg
Optional: false
jenkins-token-smvvp:
Type: Secret (a volume populated by a Secret)
SecretName: jenkins-token-smvvp
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Created 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal Created container
Normal SuccessfulMountVolume 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal MountVolume.SetUp succeeded for volume "workspace-volume"
Normal SuccessfulMountVolume 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal MountVolume.SetUp succeeded for volume "volume-0"
Normal SuccessfulMountVolume 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal MountVolume.SetUp succeeded for volume "volume-1"
Normal SuccessfulMountVolume 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal MountVolume.SetUp succeeded for volume "volume-2"
Normal SuccessfulMountVolume 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal MountVolume.SetUp succeeded for volume "volume-3"
Normal SuccessfulMountVolume 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal MountVolume.SetUp succeeded for volume "jenkins-token-smvvp"
Normal Pulled 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal Container image "jenkinsxio/builder-maven:0.0.516" already present on machine
Normal Scheduled 7m default-scheduler Successfully assigned maven-96wmn to ip-192-168-66-176.eu-west-1.compute.internal
Normal Started 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal Started container
Normal Pulled 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal Container image "jenkinsci/jnlp-slave:3.14-1" already present on machine
Normal Created 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal Created container
Normal Started 7m kubelet, ip-192-168-66-176.eu-west-1.compute.internal Started container
Warning Evicted 5m kubelet, ip-192-168-66-176.eu-west-1.compute.internal The node was low on resource: imagefs.
Normal Killing 5m kubelet, ip-192-168-66-176.eu-west-1.compute.internal Killing container with id docker://jnlp:Need to kill Pod
Normal Killing 5m kubelet, ip-192-168-66-176.eu-west-1.compute.internal Killing container with id docker://maven:Need to kill Pod
How can I remedy this issue? I generally do not fully understand what imagefs is, how I configure / increase it, or avoid saturating it.
ps. sorry this post is written so passively, I had to use an active tone to make the wording wordy enough for SO to allow me to not just post a code snippet.
Resolved; due to underlying size of storage being only 20gb, changed to 50gb in EBS and rebooted the nodes (which had increased nodefs) which removed this problem (as imagefs no longer was saturated).