I've got a K8s instance (ACS) v1.8.1 deployed on Azure using their v1.7.7 orchestrator (NOT the acs-engine CLI). Our VM's default disk (standard disk 30GiB) is bottlenecking our pods so I attached a premium SSD disk (300GiB) to our VM's per these instructions.
What's the proper procedure for pointing the kubelet (v1.8.1) to this new disk?
I thought I could just edit /etc/systemd/system/kubelet.service and point it to that new disk but I get all kinds of errors when doing that and I think I've bricked the kubelet on this instance because reverting the edits doesn't get me back to a working state.
Update 2:
I created a new cluster (ACS) with a single agent and updated /etc/systemd/system/docker.service.d/exec_start.conf
to point docker to the new attached disk; no other changes were made to the machine.
The pods attempt to start but I get "Error syncing pod" and "Pod sandbox changed, it will be killed and re-created." errors for every single pod on the agent.
docker ps
shows the hyperkube-amd64 image running.docker logs <hyperkube container>
shows a bunch of errors regarding resolv.confHyperkube container log:
E1126 07:50:23.693679 1897 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = rewrite resolv.conf failed for pod "kubernetes-dashboard-86cf46546d-vjzqd": ResolvConfPath "/poddisk/docker/containers/aaa27116bb39092f27ec6723f70be35d9bcb48d66e49811566c19915ff804516/resolv.conf" does not exist
E1126 07:50:23.693744 1897 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "kubernetes-dashboard-86cf46546d-vjzqd_kube-system(1d90eb2e-ee18-11e8-a6f7-000d3a727bf3)" failed: rpc error: code = Unknown desc = rewrite resolv.conf failed for pod "kubernetes-dashboard-86cf46546d-vjzqd": ResolvConfPath "/poddisk/docker/containers/aaa27116bb39092f27ec6723f70be35d9bcb48d66e49811566c19915ff804516/resolv.conf" does not exist
E1126 07:50:23.693781 1897 kuberuntime_manager.go:632] createPodSandbox for pod "kubernetes-dashboard-86cf46546d-vjzqd_kube-system(1d90eb2e-ee18-11e8-a6f7-000d3a727bf3)" failed: rpc error: code = Unknown desc = rewrite resolv.conf failed for pod "kubernetes-dashboard-86cf46546d-vjzqd": ResolvConfPath "/poddisk/docker/containers/aaa27116bb39092f27ec6723f70be35d9bcb48d66e49811566c19915ff804516/resolv.conf" does not exist
E1126 07:50:23.693868 1897 pod_workers.go:182] Error syncing pod 1d90eb2e-ee18-11e8-a6f7-000d3a727bf3 ("kubernetes-dashboard-86cf46546d-vjzqd_kube-system(1d90eb2e-ee18-11e8-a6f7-000d3a727bf3)"), skipping: failed to "CreatePodSandbox" for "kubernetes-dashboard-86cf46546d-vjzqd_kube-system(1d90eb2e-ee18-11e8-a6f7-000d3a727bf3)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kubernetes-dashboard-86cf46546d-vjzqd_kube-system(1d90eb2e-ee18-11e8-a6f7-000d3a727bf3)\" failed: rpc error: code = Unknown desc = rewrite resolv.conf failed for pod \"kubernetes-dashboard-86cf46546d-vjzqd\": ResolvConfPath \"/poddisk/docker/containers/aaa27116bb39092f27ec6723f70be35d9bcb48d66e49811566c19915ff804516/resolv.conf\" does not exist"
I1126 07:50:23.746435 1897 kubelet.go:1871] SyncLoop (PLEG): "kubernetes-dashboard-924040265-sr9v7_kube-system(8af36209-ec52-11e8-b632-000d3a727bf3)", event: &pleg.PodLifecycleEvent{ID:"8af36209-ec52-11e8-b632-000d3a727bf3", Type:"ContainerDied", Data:"410897d41aebe92b0d10a47572405c326228cc845bf8875d4bec27be8dccbf6f"}
W1126 07:50:23.746674 1897 pod_container_deletor.go:77] Container "410897d41aebe92b0d10a47572405c326228cc845bf8875d4bec27be8dccbf6f" not found in pod's containers
I1126 07:50:23.746700 1897 kubelet.go:1871] SyncLoop (PLEG): "kubernetes-dashboard-924040265-sr9v7_kube-system(8af36209-ec52-11e8-b632-000d3a727bf3)", event: &pleg.PodLifecycleEvent{ID:"8af36209-ec52-11e8-b632-000d3a727bf3", Type:"ContainerStarted", Data:"5c32fa4c57009725adfef3df7034fe1dd6166f6e0b56b60be1434f41f33a2f7d"}
I1126 07:50:23.835783 1897 kubelet.go:1871] SyncLoop (PLEG): "kube-dns-v20-765f4cf698-zlms6_kube-system(1d10753c-ee18-11e8-a6f7-000d3a727bf3)", event: &pleg.PodLifecycleEvent{ID:"1d10753c-ee18-11e8-a6f7-000d3a727bf3", Type:"ContainerDied", Data:"192e8df5e196e86235b7d79ecfb14d7ed458ec7709a09115ed8b995fbc90371f"}
W1126 07:50:23.835972 1897 pod_container_deletor.go:77] Container "192e8df5e196e86235b7d79ecfb14d7ed458ec7709a09115ed8b995fbc90371f" not found in pod's containers
I1126 07:50:23.939929 1897 kubelet.go:1871] SyncLoop (PLEG): "kube-dns-v20-3003781527-lw6p9_kube-system(1cf6caec-ee18-11e8-a6f7-000d3a727bf3)", event: &pleg.PodLifecycleEvent{ID:"1cf6caec-ee18-11e8-a6f7-000d3a727bf3", Type:"ContainerDied", Data:"605635bedd890c597a9675c23030d128eadd344e3c73fd5efaba544ce09dfa76"}
W1126 07:50:23.940026 1897 pod_container_deletor.go:77] Container "605635bedd890c597a9675c23030d128eadd344e3c73fd5efaba544ce09dfa76" not found in pod's containers
I1126 07:50:23.951129 1897 kuberuntime_manager.go:401] Sandbox for pod "heapster-342135353-x07fk_kube-system(a9810826-ec52-11e8-b632-000d3a727bf3)" has no IP address. Need to start a new one
I1126 07:50:24.047879 1897 kuberuntime_manager.go:401] Sandbox for pod "kubernetes-dashboard-924040265-sr9v7_kube-system(8af36209-ec52-11e8-b632-000d3a727bf3)" has no IP address. Need to start a new one
I1126 07:50:24.137353 1897 kuberuntime_manager.go:401] Sandbox for pod "kube-dns-v20-765f4cf698-zlms6_kube-system(1d10753c-ee18-11e8-a6f7-000d3a727bf3)" has no IP address. Need to start a new one
I1126 07:50:24.241774 1897 kuberuntime_manager.go:401] Sandbox for pod "kube-dns-v20-3003781527-lw6p9_kube-system(1cf6caec-ee18-11e8-a6f7-000d3a727bf3)" has no IP address. Need to start a new one
W1126 07:50:24.343902 1897 docker_service.go:333] Failed to retrieve checkpoint for sandbox "ce0304d171adf24619dac12e47914f2e3670d29d26b5bc3bec3358b631ebaf06": checkpoint is not found.
service kubelet status
output:
● kubelet.service - Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2018-11-22 06:03:42 UTC; 4 days ago
Process: 1822 ExecStartPre=/sbin/iptables -t nat --list (code=exited, status=0/SUCCESS)
Process: 1815 ExecStartPre=/sbin/ebtables -t nat --list (code=exited, status=0/SUCCESS)
Process: 1810 ExecStartPre=/sbin/sysctl -w net.ipv4.tcp_retries2=8 (code=exited, status=0/SUCCESS)
Process: 1806 ExecStartPre=/bin/mount --make-shared /var/lib/kubelet (code=exited, status=0/SUCCESS)
Process: 1797 ExecStartPre=/bin/bash -c if [ $(mount | grep "/var/lib/kubelet" | wc -l) -le 0 ] ; then /bin/mount --bind /var/lib/kubelet /var/lib/kubelet ; fi (code=exited, status=0/SUCCESS)
Process: 1794 ExecStartPre=/bin/mkdir -p /var/lib/kubelet (code=exited, status=0/SUCCESS)
Process: 1791 ExecStartPre=/bin/bash /opt/azure/containers/kubelet.sh (code=exited, status=0/SUCCESS)
Main PID: 1828 (docker)
Tasks: 9
Memory: 4.2M
CPU: 5min 9.259s
CGroup: /system.slice/kubelet.service
└─1828 /usr/bin/docker run --net=host --pid=host --privileged --rm --volume=/dev:/dev --volume=/sys:/sys:ro --volume=/var/run:/var/run:rw --volume=/var/lib/docker/:/var/lib/docker:rw --volume=/var/lib/kubelet/:/var/lib/kubelet:shared --volume=/var/log:/var/log:rw --volume=/etc/kubernetes/:/etc/kubernetes:ro --volume=/srv/kubernetes/:/srv/kubernetes:ro --volume=/var/lib/waagent/ManagedIdentity-Settings:/var/lib/waagent/ManagedIdentity-Settings:ro gcrio.azureedge.net/google_containers/hyperkube-amd64:v1.8.1 /hyperkube kubelet --kubeconfig=/var/lib/kubelet/kubeconfig --require-kubeconfig --pod-infra-container-image=gcrio.azureedge.net/google_containers/pause-amd64:3.0 --address=0.0.0.0 --allow-privileged=true --enable-server --pod-manifest-path=/etc/kubernetes/manifests --cluster-dns=10.0.0.10 --cluster-domain=cluster.local --node-labels=kubernetes.io/role=agent,agentpool=agent --cloud-provider=azure --cloud-config=/etc/kubernetes/azure.json --azure-container-registry-config=/etc/kubernetes/azure.json --network-plugin=kubenet --max-pods=110 --node-status-update-frequency=10s --image-gc-high-threshold=85 --image-gc-low-threshold=80 --v=2 --feature-gates=Accelerators=true
Nov 26 16:52:43 k8s-agent-CA50C8FA-0 docker[1828]: W1126 16:52:43.631813 1897 pod_container_deletor.go:77] Container "9e535d1d87c7c52bb154156bed1fbf40e3509ed72c76f0506ad8b6ed20b6c82d" not found in pod's containers
Nov 26 16:52:43 k8s-agent-CA50C8FA-0 docker[1828]: I1126 16:52:43.673560 1897 kuberuntime_manager.go:401] Sandbox for pod "kube-dns-v20-3003781527-rn3fz_kube-system(89854e4f-ec52-11e8-b632-000d3a727bf3)" has no IP address. Need to start a new one
Nov 26 16:52:43 k8s-agent-CA50C8FA-0 docker[1828]: I1126 16:52:43.711945 1897 kuberuntime_manager.go:401] Sandbox for pod "kubernetes-dashboard-86cf46546d-vjzqd_kube-system(1d90eb2e-ee18-11e8-a6f7-000d3a727bf3)" has no IP address. Need to start a new one
Nov 26 16:52:43 k8s-agent-CA50C8FA-0 docker[1828]: I1126 16:52:43.783121 1897 kubelet.go:1871] SyncLoop (PLEG): "kube-dns-v20-3003781527-lw6p9_kube-system(1cf6caec-ee18-11e8-a6f7-000d3a727bf3)", event: &pleg.PodLifecycleEvent{ID:"1cf6caec-ee18-11e8-a6f7-000d3a727bf3", Type:"ContainerDied", Data:"26e592abff8bf63a8d9b7a57778dd6768240112a6edafb6de55e7217258d764f"}
Nov 26 16:52:43 k8s-agent-CA50C8FA-0 docker[1828]: W1126 16:52:43.783563 1897 pod_container_deletor.go:77] Container "26e592abff8bf63a8d9b7a57778dd6768240112a6edafb6de55e7217258d764f" not found in pod's containers
Nov 26 16:52:43 k8s-agent-CA50C8FA-0 docker[1828]: I1126 16:52:43.783591 1897 kubelet.go:1871] SyncLoop (PLEG): "kube-dns-v20-3003781527-lw6p9_kube-system(1cf6caec-ee18-11e8-a6f7-000d3a727bf3)", event: &pleg.PodLifecycleEvent{ID:"1cf6caec-ee18-11e8-a6f7-000d3a727bf3", Type:"ContainerStarted", Data:"2ac43b964e4c7b4172fb6ab0caa0c673c5526fd4947a9760ee390a9d7f78ee14"}
Nov 26 16:52:43 k8s-agent-CA50C8FA-0 docker[1828]: W1126 16:52:43.863377 1897 docker_service.go:333] Failed to retrieve checkpoint for sandbox "cfec0b7704e0487ee3122488284adc0520ba1b57a5a8675df8c40deaeee1cf2e": checkpoint is not found.
Nov 26 16:52:43 k8s-agent-CA50C8FA-0 docker[1828]: I1126 16:52:43.935784 1897 kuberuntime_manager.go:401] Sandbox for pod "heapster-342135353-x07fk_kube-system(a9810826-ec52-11e8-b632-000d3a727bf3)" has no IP address. Need to start a new one
Nov 26 16:52:44 k8s-agent-CA50C8FA-0 docker[1828]: I1126 16:52:44.085925 1897 kuberuntime_manager.go:401] Sandbox for pod "kube-dns-v20-3003781527-lw6p9_kube-system(1cf6caec-ee18-11e8-a6f7-000d3a727bf3)" has no IP address. Need to start a new one
Nov 26 16:52:44 k8s-agent-CA50C8FA-0 docker[1828]: E1126 16:52:44.394661 1897 summary.go:92] Failed to get system container stats for "/docker/a29aa11ff8933b350e339bb96c02932a78aba63917114e505abd47b89460d453": failed to get cgroup stats for "/docker/a29aa11ff8933b350e339bb96c02932a78aba63917114e505abd47b89460d453": failed to get container info for "/docker/a29aa11ff8933b350e339bb96c02932a78aba63917114e505abd47b89460d453": unknown container "/docker/a29aa11ff8933b350e339bb96c02932a78aba63917114e505abd47b89460d453"
Update 1:
Per @Rico's answer I attempted updating /etc/default/docker
but it had no affect. I then located and updated /etc/systemd/system/docker.service.d/exec_start.conf
. This caused docker to re-create all of its files at the /kubeletdrive/docker
location. exec_start.conf
now looks like this:
[Service]
ExecStart=
ExecStart=/usr/bin/docker daemon -H fd:// -g /kubeletdrive/docker --storage-driver=overlay2 --bip=172.17.0.1/16
Running service docker status
shows this output and none of the containers are actually creating now. I see that something is adding the option --state-dir /var/run/docker/libcontainerd/containerd
to the mix but I have yet to find the file this is coming from. I think updating this to the same location will fix this?
docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/docker.service.d
└─clear_mount_propagation_flags.conf, exec_start.conf
Active: active (running) since Thu 2018-11-15 06:09:40 UTC; 4 days ago
Docs: https://docs.docker.com
Main PID: 1175 (dockerd)
Tasks: 259
Memory: 3.4G
CPU: 1d 15h 9min 15.414s
CGroup: /system.slice/docker.service
├─ 1175 dockerd -H fd:// -g /kubeletdrive/docker --storage-driver=overlay2 --bip=172.17.0.1/16
├─ 1305 docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim docker-containerd-shim --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --runtime docker-runc
├─ 1716 docker-containerd-shim 32387d1bf7a0fc58e26f0146b9a2cb21c7f0d673a730a71007d13cff3505cb5a /var/run/docker/libcontainerd/32387d1bf7a0fc58e26f0146b9a2cb21c7f0d673a730a71007d13cff3505cb5a docker-runc
├─ 1837 docker-containerd-shim 73fd1a01f7cb7a2c44f5d40ad0a6136398469e290dc6c554ff66a9971ba3fb1f /var/run/docker/libcontainerd/73fd1a01f7cb7a2c44f5d40ad0a6136398469e290dc6c554ff66a9971ba3fb1f docker-runc
├─ 1901 docker-containerd-shim ce674925877ba963b8ba8c85598cdc40137a81a70f996636be4e15d880580b05 /var/run/docker/libcontainerd/ce674925877ba963b8ba8c85598cdc40137a81a70f996636be4e15d880580b05 docker-runc
├─20715 docker-containerd-shim 83db148b4e55a8726d40567e7f66d84a9f25262c685a6e0c2dc2f4218b534378 /var/run/docker/libcontainerd/83db148b4e55a8726d40567e7f66d84a9f25262c685a6e0c2dc2f4218b534378 docker-runc
├─21047 docker-containerd-shim 4b1acbfcbec5ae3989607b16d59c5552e4bf9a4d88ba5f6d89bb7ef1d612cd63 /var/run/docker/libcontainerd/4b1acbfcbec5ae3989607b16d59c5552e4bf9a4d88ba5f6d89bb7ef1d612cd63 docker-runc
├─23319 docker-containerd-shim c24c646afc59573e54be3645bdfb691ce31a108d76052423618fcb767eaa8775 /var/run/docker/libcontainerd/c24c646afc59573e54be3645bdfb691ce31a108d76052423618fcb767eaa8775 docker-runc
├─23516 docker-containerd-shim 59e0414a170be175eb66221458e5811e3d8b15a5ed07b146a7be265dcc85e234 /var/run/docker/libcontainerd/59e0414a170be175eb66221458e5811e3d8b15a5ed07b146a7be265dcc85e234 docker-runc
├─23954 docker-containerd-shim da267fd3b43a3601b2d4938575bc3529cf174bd1291f2f6696cdc4981293b64f /var/run/docker/libcontainerd/da267fd3b43a3601b2d4938575bc3529cf174bd1291f2f6696cdc4981293b64f docker-runc
├─24396 docker-containerd-shim a8f843981f6f24144d52b77b659eb71f4b2bf30df9c6c74154f960e208af4950 /var/run/docker/libcontainerd/a8f843981f6f24144d52b77b659eb71f4b2bf30df9c6c74154f960e208af4950 docker-runc
├─26078 docker-containerd-shim 1345ae86c3fc7242bb156785230ebf7bdaa125ba48b849243388aa3d9506bf7e /var/run/docker/libcontainerd/1345ae86c3fc7242bb156785230ebf7bdaa125ba48b849243388aa3d9506bf7e docker-runc
├─27100 docker-containerd-shim 0749c242003cfa542ef9868f001335761be53eb3c52df00dcb4fa73f9e94a57b /var/run/docker/libcontainerd/0749c242003cfa542ef9868f001335761be53eb3c52df00dcb4fa73f9e94a57b docker-runc
├─28254 docker-containerd-shim 7934ba2701673f7e3c6567e4e35517625d14b97fe9b7846e716c0559a2442241 /var/run/docker/libcontainerd/7934ba2701673f7e3c6567e4e35517625d14b97fe9b7846e716c0559a2442241 docker-runc
└─29917 docker-containerd-shim 26f8f5963396a478e37aebdacdc0943af188d32dbe5bbe28f3ccc6edef003546 /var/run/docker/libcontainerd/26f8f5963396a478e37aebdacdc0943af188d32dbe5bbe28f3ccc6edef003546 docker-runc
Nov 19 16:38:42 k8s-agent-D24C3A06-0 docker[1175]: time="2018-11-19T16:38:42.722015704Z" level=error msg="Handler for POST /v1.24/containers/7409f3546ffa9e42da8b6cc694ba37571e908df43bfa2001e449a4cca3c50801/stop returned error: Container 7409f3546ffa9e42da8b6cc694ba37571e908df43bfa2001e449a4cca3c50801 is already stopped"
Nov 19 16:38:42 k8s-agent-D24C3A06-0 docker[1175]: time="2018-11-19T16:38:42.759151399Z" level=error msg="Handler for POST /v1.24/containers/ebbeed144e3768758c62749763977b65e4e2b118452bcf342b3f9d79ff0a5362/stop returned error: Container ebbeed144e3768758c62749763977b65e4e2b118452bcf342b3f9d79ff0a5362 is already stopped"
Nov 19 16:38:42 k8s-agent-D24C3A06-0 docker[1175]: time="2018-11-19T16:38:42.792131939Z" level=error msg="Handler for POST /v1.24/containers/85ff0f9d9feb893eb87062b00dc0f034ee47e639289a401e1c9f4e2ca7a5a202/stop returned error: Container 85ff0f9d9feb893eb87062b00dc0f034ee47e639289a401e1c9f4e2ca7a5a202 is already stopped"
Nov 19 16:38:42 k8s-agent-D24C3A06-0 docker[1175]: time="2018-11-19T16:38:42.830289673Z" level=error msg="Handler for POST /v1.24/containers/dc8f0fbaacfaba68453895996976706581aab817790bb1694f1a15de6cd2861f/stop returned error: Container dc8f0fbaacfaba68453895996976706581aab817790bb1694f1a15de6cd2861f is already stopped"
Nov 19 16:38:42 k8s-agent-D24C3A06-0 docker[1175]: time="2018-11-19T16:38:42.830618185Z" level=error msg="Handler for GET /v1.24/containers/702807afde28063ae46e321e86d18861440b691f254df52591c87ff732383467/json returned error: No such container: 702807afde28063ae46e321e86d18861440b691f254df52591c87ff732383467"
Nov 19 16:38:42 k8s-agent-D24C3A06-0 docker[1175]: time="2018-11-19T16:38:42.864109644Z" level=error msg="Handler for POST /v1.24/containers/702807afde28063ae46e321e86d18861440b691f254df52591c87ff732383467/stop returned error: No such container: 702807afde28063ae46e321e86d18861440b691f254df52591c87ff732383467"
Nov 19 16:38:42 k8s-agent-D24C3A06-0 docker[1175]: time="2018-11-19T16:38:42.874873849Z" level=error msg="Handler for GET /v1.24/containers/6f1f542a4f6bb30f21d0a747d915b458a34a5c6cedc66b301faa32a12a502d0f/json returned error: No such container: 6f1f542a4f6bb30f21d0a747d915b458a34a5c6cedc66b301faa32a12a502d0f"
Nov 19 16:38:42 k8s-agent-D24C3A06-0 docker[1175]: time="2018-11-19T16:38:42.898141823Z" level=error msg="Handler for POST /v1.24/containers/816c48d0dd01fb66769cc6275e799e68a1301fc0f6623a0a99558350a414ee7c/stop returned error: Container 816c48d0dd01fb66769cc6275e799e68a1301fc0f6623a0a99558350a414ee7c is already stopped"
Nov 19 16:38:42 k8s-agent-D24C3A06-0 docker[1175]: time="2018-11-19T16:38:42.928695972Z" level=error msg="Handler for POST /v1.24/containers/f35a7847f9d2563101337389968f2265e5d0cd8bc78e0a2790a52be7dd3a0f3a/stop returned error: Container f35a7847f9d2563101337389968f2265e5d0cd8bc78e0a2790a52be7dd3a0f3a is already stopped"
Nov 19 16:38:42 k8s-agent-D24C3A06-0 docker[1175]: time="2018-11-19T16:38:42.998395591Z" level=error msg="Handler for POST /v1.24/containers/a9d0acecee28b17992cbd99e8c782513157f0bca57acaa22d5688c07062d3346/stop returned error: Container a9d0acecee28b17992cbd99e8c782513157f0bca57acaa22d5688c07062d3346 is already stopped"
If the pod space is what you want to change and assuming you are using Docker (which most people are) you have to change the graph directory for Docker. You can do it with -g
option on your Docker daemon:
-g /mount/to/your-new-disk
Depending on your setup you might be able to change it in the /etc/defaults/docker
file.
DOCKER_OPTS="-g /mount/to/your-new-disk"
Or also in your systemctl service:
# /lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
...
[Service]
Type=notify
...
ExecStart=/usr/bin/dockerd -H fd:// -g /mount/to/your-new-disk
...
[Install]
WantedBy=multi-user.target
Another option is to add it to the /etc/docker/config.json
file.
If you are using containerd instead of Docker you can change the value for root
in the /etc/containerd/config.toml
file.
If you are using CRIO you can also use the root
option in the crio.conf