Increase the container load timeout in kubernetes

4/10/2018

When loading a Pod with a container that has many/large layers, it can take more than 2 minutes on my cluster's machines (slower single thread performance coupled with 7200rpm spinning rust means slow untar/ungzip speeds).

This means Kubernetes will give up on that container, saying "context deadline exceeded", then retry. Allowed to run overnight (on accident), it will run out of disk as the attempts pile up more and more.

Example pod:

apiVersion: v1
kind: Pod
metadata:
  name: test-large-container-1
spec:
  containers:
  - name: X
    image: X:latest
    stdin: true
    tty: true
    command: ["bash"]

Is there a field in the PodSpec I missed or a configuration for kubelet itself?

Events seen:

2018-04-10 13:01:22 -0700 PDT   2018-04-10 13:01:22 -0700 PDT   1         test-large-container-1.15242b927c24ec40          Pod                                  Normal    Scheduled                 default-scheduler   Successfully assigned test-large-container-1 to node1
2018-04-10 13:01:29 -0700 PDT   2018-04-10 13:01:29 -0700 PDT   1         test-large-container-1.15242b942c41e77f          Pod       spec.initContainers{map}   Normal    Pulling                   kubelet, node1      pulling image "X:latest"
2018-04-10 13:01:30 -0700 PDT   2018-04-10 13:01:30 -0700 PDT   1         test-large-container-1.15242b948764b21a          Pod       spec.initContainers{map}   Normal    Pulled                    kubelet, node1      Successfully pulled image "X:latest"
2018-04-10 13:03:30 -0700 PDT   2018-04-10 13:03:30 -0700 PDT   1         test-large-container-1.15242bb0780e06ee          Pod       spec.initContainers{map}   Warning   Failed    kubelet, node1   Error: context deadline exceeded
-- fahhem
kubernetes

2 Answers

4/11/2018

I think initContainer:s run before the primary container:s are even docker pull-ed, so it may be worth trying docker:latest, volume mount the host's /var/run/docker.sock, and then use the initContainer to pull the image

-- mdaniel
Source: StackOverflow

4/17/2018

Thanks to bits! It was the --runtime-request-timeout flag that I needed to change. Once I increased it enough, it started working!

-- fahhem
Source: StackOverflow