Occasionally I see problems where creating my deployments takes a much longer time than usual (this one is typically a minute or two). How do people normally deal with this? Is it best to remove the offending node? What's the right way to debug this?
error: deployment "hillcity-twitter-staging-deployment" exceeded its progress deadline
Waiting for rollout to complete (been 500s)...
NAME READY STATUS RESTARTS AGE IP NODE
hillcity-twitter-staging-deployment-5bf6b48779-5jvgv 2/2 Running 0 8m 10.168.41.12 gke-charles-test-cluster-default-pool-be943055-mq4j
hillcity-twitter-staging-deployment-5bf6b48779-knzkw 2/2 Running 0 8m 10.168.34.34 gke-charles-test-cluster-default-pool-be943055-czqr
hillcity-twitter-staging-deployment-5bf6b48779-qxmg8 0/2 ContainerCreating 0 8m <none> gke-charles-test-cluster-default-pool-be943055-rzg2
I've ssh-ed into the "rzg2" node but didn't see anything particularly wrong with it. Here's the k8s view:
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
gke-charles-test-cluster-default-pool-be943055-2q9f 385m 40% 2288Mi 86%
gke-charles-test-cluster-default-pool-be943055-35fl 214m 22% 2030Mi 76%
gke-charles-test-cluster-default-pool-be943055-3p95 328m 34% 2108Mi 79%
gke-charles-test-cluster-default-pool-be943055-67h0 204m 21% 1783Mi 67%
gke-charles-test-cluster-default-pool-be943055-czqr 342m 36% 2397Mi 90%
gke-charles-test-cluster-default-pool-be943055-jz8v 149m 15% 2299Mi 86%
gke-charles-test-cluster-default-pool-be943055-kl9r 246m 26% 1796Mi 67%
gke-charles-test-cluster-default-pool-be943055-mq4j 123m 13% 1523Mi 57%
gke-charles-test-cluster-default-pool-be943055-mx18 276m 29% 1755Mi 66%
gke-charles-test-cluster-default-pool-be943055-pb48 200m 21% 1667Mi 63%
gke-charles-test-cluster-default-pool-be943055-rzg2 392m 41% 2270Mi 85%
gke-charles-test-cluster-default-pool-be943055-wkxk 274m 29% 1954Mi 73%
```
Added: Here's some of the output of "$ sudo journalctl -u kubelet"
Sep 04 22:14:11 gke-charles-test-cluster-default-pool-be943055-rzg2 kubelet[1442]: E0904 22:14:11.882166 1442 fsHandler.go:121] failed to collect filesystem stats - rootDiskErr: du command failed on /var/lib/docker/overlay/83ed56fdfae736d5b1bd3afc3649555916a2ef24a287415256a408c463186107 with output stdout: , stderr: - signal: killed, rootInodeErr: <nil>, extraDiskErr: <nil>
[...repeated a lot...]
Sep 04 22:25:19 gke-charles-test-cluster-default-pool-be943055-rzg2 kubelet[1442]: E0904 22:25:19.917177 1442 kube_docker_client.go:324] Cancel pulling image "gcr.io/able-store-864/hillcity-worker:0.0.1" because of no progress for 1m0s, latest progress: "43f9fd4bd389: Extracting [=====> ] 32.77 kB/295.9 kB"