Resourcequota & multiple memory-limited jobs - restarting pending jobs takes forever

4/27/2017

I'm testing Kubernetes with the intention of being able to run batch jobs in a queue. I've created a resourcequota with

$ kubectl create quota memoryquota --hard=memory=450Mi,

limiting the total memory usage of all containers in the used namespace to 450M. I also have a script run-memhog.sh that creates a memhog-job with a memory limit of X and using Y megs of memory:

kubectl run memhog-$(cat /dev/urandom | tr -dc 'a-z0-9' | fold -w 8 | head -n 1)
--replicas=1 --restart=OnFailure --limits=memory=$1Mi,cpu=100m --record
--image=derekwaynecarr/memhog --command -- memhog -r100 $2m

Running $ for i in {1..4}; do ./run-memhog.sh 200 100; done correctly causes four jobs to be created, two of which complete in around 20 seconds, and the other two, as expected, get a FailedCreate warning with a message

Error creating: pods "memhog-plgxke9m-" is forbidden: exceeded quota: memoryquota, requested: memory=200Mi, used: memory=400Mi, limited: memory=450Mi

Running $ kubectl get jobs shows an expected outcome:

NAME              DESIRED   SUCCESSFUL   AGE
memhog-2covdiww   1         0            35s
memhog-6bg0b6g6   1         1            35s
memhog-plgxke9m   1         0            35s
memhog-w2ujbg1b   1         1            35s

Everything's OK so far, and I'm expecting the two still uncompleted jobs to start running as soon as the resources become available (= after the previous pods/containers are cleared). However, the jobs stay in a pending state for who knows how long - I checked after two hours and they still didn't start running, after which I left the server running overnight and the jobs got completed somewhere during that time.

My question is: what is causing the jobs to be pending for such a long time? Is there anyway I can poll for resource availability more frequently? I tried to search through both the kubectl reference and kubernetes docs, but didn't find any mention of a fix/setting for this.

-- Fissio
jobs
kubernetes

0 Answers