GKE : node restarted / pods missing

4/12/2016

Good day,

Running a single node cluster. Noticed that overnight all my pods gone missing.

kubectl get events

got me nothing.

Checking the node

# kubectl get no
NAME                            STATUS    AGE
gke-sg-etl-4ff0f964-node-jny8   Ready     20d

Checking the container on the node I have noticed that some system containers are up for only 21 hours while some others would be 2 weeks:

CONTAINER ID        IMAGE                                                                  COMMAND                  CREATED             STATUS              PORTS               NAMES
b451ef51174d        gcr.io/google_containers/glbc:0.6.0                                    "/glbc --default-back"   21 hours ago        Up 21 hours                             k8s_l7-lb-controller.96ad8505_l7-lb-controller-v0.6.0-keutx_kube-system_5ecb8fe7-0054-11e6-a3f3-42010af0003e_9af0cd81
86483feba88c        gcr.io/google_containers/defaultbackend:1.0                            "/server"                21 hours ago        Up 21 hours                             k8s_default-http-backend.33869026_l7-lb-controller-v0.6.0-keutx_kube-system_5ecb8fe7-0054-11e6-a3f3-42010af0003e_1fffdcaf
73bc0bbd18a1        gcr.io/google_containers/pause:2.0                                     "/pause"                 21 hours ago        Up 21 hours                             k8s_POD.364e00d5_l7-lb-controller-v0.6.0-keutx_kube-system_5ecb8fe7-0054-11e6-a3f3-42010af0003e_6b0f3678
95f0bdb6b87c        gcr.io/google_containers/exechealthz:1.0                               "/exechealthz '-cmd=n"   21 hours ago        Up 21 hours                             k8s_healthz.2bec1471_kube-dns-v11-ce2vc_kube-system_1c42c00f-0056-11e6-a3f3-42010af0003e_4e729ced
efde9c110e3c        gcr.io/google_containers/skydns:2015-10-13-8c72f8c                     "/skydns -machines=ht"   21 hours ago        Up 21 hours                             k8s_skydns.66853ac4_kube-dns-v11-ce2vc_kube-system_1c42c00f-0056-11e6-a3f3-42010af0003e_10c173ea
0db98a8b6b83        gcr.io/google_containers/kube2sky:1.14                                 "/kube2sky --domain=c"   21 hours ago        Up 21 hours                             k8s_kube2sky.4e15015f_kube-dns-v11-ce2vc_kube-system_1c42c00f-0056-11e6-a3f3-42010af0003e_23182cb4
c103d90e1bd9        gcr.io/google_containers/etcd-amd64:2.2.1                              "/usr/local/bin/etcd "   21 hours ago        Up 21 hours                             k8s_etcd.6d563523_kube-dns-v11-ce2vc_kube-system_1c42c00f-0056-11e6-a3f3-42010af0003e_987562c7
3b21c42444de        gcr.io/google_containers/pause:2.0                                     "/pause"                 21 hours ago        Up 21 hours                             k8s_POD.e2764897_kube-dns-v11-ce2vc_kube-system_1c42c00f-0056-11e6-a3f3-42010af0003e_08f0734a
7e642f5a1fe0        gcr.io/google_containers/kubernetes-dashboard-amd64:v1.0.0             "/dashboard --port=90"   21 hours ago        Up 21 hours                             k8s_kubernetes-dashboard.deca92bd_kubernetes-dashboard-v1.0.0-whzec_kube-system_19ab34c7-0056-11e6-a3f3-42010af0003e_433eec1f
2d0f5f11ad65        gcr.io/google_containers/pause:2.0                                     "/pause"                 21 hours ago        Up 21 hours                             k8s_POD.3a1c00d7_kubernetes-dashboard-v1.0.0-whzec_kube-system_19ab34c7-0056-11e6-a3f3-42010af0003e_0dfc9856
c210ea10b8ea        gcr.io/google_containers/heapster:v1.0.0                               "/heapster --source=k"   21 hours ago        Up 21 hours                             k8s_heapster.ce50f137_heapster-v1.0.0-el2r7_kube-system_1994710e-0056-11e6-a3f3-42010af0003e_b63303ac
a449b69dd498        gcr.io/google_containers/pause:2.0                                     "/pause"                 21 hours ago        Up 21 hours                             k8s_POD.6059dfa2_heapster-v1.0.0-el2r7_kube-system_1994710e-0056-11e6-a3f3-42010af0003e_3a238507
b9eaaa1cae94        gcr.io/google_containers/fluentd-gcp:1.18                              "/bin/sh -c '/usr/sbi"   2 weeks ago         Up 2 weeks                              k8s_fluentd-cloud-logging.fe59dd68_fluentd-cloud-logging-gke-sg-etl-4ff0f964-node-jny8_kube-system_da7e41ef0372c29c65a24b417b5dd69f_dd3f0627

I understand that the node might have been restarted.

Where do I look to understand as to why that has happened? My interpretation is that it is normal, but I would still like to get a bit of insight (pods are "pets", not "cattle").

-- Evgeny Minkevich
google-kubernetes-engine

1 Answer

4/21/2016

I doubt that it's due to a machine reboot because then I'd expect fluentd-gcp to have restarted as well.

There are a few avenues you can explore in this situation, but I don't have a single answer that will always tell you the answer. Here are some things you might try:

  • Run kubectl get pods -a, which will return all pods including ones that are no longer running. If pods had to be recreated by their controllers, you should be able to see the termination status of the ones that aren't running anymore.
  • SSH to the node and run last | grep boot to see when it was last booted.
  • SSH to the node and run docker ps -a to see all containers, including those that have stopped running. If there are some that stopped running, investigate them using docker logs or docker inspect.
  • SSH to the node and investigate the /var/log/kubelet.log file to see if it has any hints about why pods were restarted.
-- Alex Robinson
Source: StackOverflow