Good day,
Running a single node cluster. Noticed that overnight all my pods gone missing.
kubectl get events
got me nothing.
Checking the node
# kubectl get no
NAME STATUS AGE
gke-sg-etl-4ff0f964-node-jny8 Ready 20d
Checking the container on the node I have noticed that some system containers are up for only 21 hours while some others would be 2 weeks:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b451ef51174d gcr.io/google_containers/glbc:0.6.0 "/glbc --default-back" 21 hours ago Up 21 hours k8s_l7-lb-controller.96ad8505_l7-lb-controller-v0.6.0-keutx_kube-system_5ecb8fe7-0054-11e6-a3f3-42010af0003e_9af0cd81
86483feba88c gcr.io/google_containers/defaultbackend:1.0 "/server" 21 hours ago Up 21 hours k8s_default-http-backend.33869026_l7-lb-controller-v0.6.0-keutx_kube-system_5ecb8fe7-0054-11e6-a3f3-42010af0003e_1fffdcaf
73bc0bbd18a1 gcr.io/google_containers/pause:2.0 "/pause" 21 hours ago Up 21 hours k8s_POD.364e00d5_l7-lb-controller-v0.6.0-keutx_kube-system_5ecb8fe7-0054-11e6-a3f3-42010af0003e_6b0f3678
95f0bdb6b87c gcr.io/google_containers/exechealthz:1.0 "/exechealthz '-cmd=n" 21 hours ago Up 21 hours k8s_healthz.2bec1471_kube-dns-v11-ce2vc_kube-system_1c42c00f-0056-11e6-a3f3-42010af0003e_4e729ced
efde9c110e3c gcr.io/google_containers/skydns:2015-10-13-8c72f8c "/skydns -machines=ht" 21 hours ago Up 21 hours k8s_skydns.66853ac4_kube-dns-v11-ce2vc_kube-system_1c42c00f-0056-11e6-a3f3-42010af0003e_10c173ea
0db98a8b6b83 gcr.io/google_containers/kube2sky:1.14 "/kube2sky --domain=c" 21 hours ago Up 21 hours k8s_kube2sky.4e15015f_kube-dns-v11-ce2vc_kube-system_1c42c00f-0056-11e6-a3f3-42010af0003e_23182cb4
c103d90e1bd9 gcr.io/google_containers/etcd-amd64:2.2.1 "/usr/local/bin/etcd " 21 hours ago Up 21 hours k8s_etcd.6d563523_kube-dns-v11-ce2vc_kube-system_1c42c00f-0056-11e6-a3f3-42010af0003e_987562c7
3b21c42444de gcr.io/google_containers/pause:2.0 "/pause" 21 hours ago Up 21 hours k8s_POD.e2764897_kube-dns-v11-ce2vc_kube-system_1c42c00f-0056-11e6-a3f3-42010af0003e_08f0734a
7e642f5a1fe0 gcr.io/google_containers/kubernetes-dashboard-amd64:v1.0.0 "/dashboard --port=90" 21 hours ago Up 21 hours k8s_kubernetes-dashboard.deca92bd_kubernetes-dashboard-v1.0.0-whzec_kube-system_19ab34c7-0056-11e6-a3f3-42010af0003e_433eec1f
2d0f5f11ad65 gcr.io/google_containers/pause:2.0 "/pause" 21 hours ago Up 21 hours k8s_POD.3a1c00d7_kubernetes-dashboard-v1.0.0-whzec_kube-system_19ab34c7-0056-11e6-a3f3-42010af0003e_0dfc9856
c210ea10b8ea gcr.io/google_containers/heapster:v1.0.0 "/heapster --source=k" 21 hours ago Up 21 hours k8s_heapster.ce50f137_heapster-v1.0.0-el2r7_kube-system_1994710e-0056-11e6-a3f3-42010af0003e_b63303ac
a449b69dd498 gcr.io/google_containers/pause:2.0 "/pause" 21 hours ago Up 21 hours k8s_POD.6059dfa2_heapster-v1.0.0-el2r7_kube-system_1994710e-0056-11e6-a3f3-42010af0003e_3a238507
b9eaaa1cae94 gcr.io/google_containers/fluentd-gcp:1.18 "/bin/sh -c '/usr/sbi" 2 weeks ago Up 2 weeks k8s_fluentd-cloud-logging.fe59dd68_fluentd-cloud-logging-gke-sg-etl-4ff0f964-node-jny8_kube-system_da7e41ef0372c29c65a24b417b5dd69f_dd3f0627
I understand that the node might have been restarted.
Where do I look to understand as to why that has happened? My interpretation is that it is normal, but I would still like to get a bit of insight (pods are "pets", not "cattle").
I doubt that it's due to a machine reboot because then I'd expect fluentd-gcp to have restarted as well.
There are a few avenues you can explore in this situation, but I don't have a single answer that will always tell you the answer. Here are some things you might try:
kubectl get pods -a
, which will return all pods including ones that are no longer running. If pods had to be recreated by their controllers, you should be able to see the termination status of the ones that aren't running anymore.last | grep boot
to see when it was last booted.docker ps -a
to see all containers, including those that have stopped running. If there are some that stopped running, investigate them using docker logs
or docker inspect
.