Kubernetes pod/containers running but not listed with 'kubectl get pods'?

2/28/2018

I have an issue that, at face value, appears to indicate that I have two deployments running in parallel within my kube cluster, but 'kubectl get pods' only shows one deployment.

My deployment is composed of a pod with two containers. One of the containers runs a golang application that creates an http API endpoint, and the other runs Telegraf to read metrics from the API endpoint and push them to InfluxDB. When writing the data to Influx I tag the data with the source host as the name of the pod. I use Grafana to plot the metrics and I can clearly see incoming streaming data coming from two hosts (e.g. I can set a "WHERE host=" query clause as either "application-pod-name-231620957-7n32f" and "application-pod-name-1931165991-x154c").

Based on the above, I'm fairly certain that two deployments of the pod are running, each with the two containers (one providing application metrics and the other with telegraf sending metrics to InfluxDB).

However, kube seems to think that one of the deployments doesn't exist. As mentioned, "kubectl get pods" doesn't display the 2nd pod name in any way shape or form. Only one of them.

Has anyone seen this? Any ideas on further troubleshooting? I've attempted to use the pod name (that I have within telegraf) to query more information using kubectl but always get the response that the pod doesn't exist... but it must exist! It's sending live data!

-- theoneandonly2
kubectl
kubernetes
telegraf

1 Answer

3/1/2018

We had been experiencing issues with a node within the cluster. Specifically, the node was experiencing GC failures and communications into the cluster from that node was broken. Due to these failures, someone on our team performed a 'kubectl delete' on the node from within the cluster. By doing so the node continued running, but also the kubelet running on the node remained in a broken state, and so the node couldn't re-auto-register itself into the cluster. This node happened to be running the 2nd pod, and the pods running on the node continued running without issue. In our case, the node was running on AWS, in which case the way to avoid this situation is to reboot the node either from the AWS console or AWS API.

-- theoneandonly2
Source: StackOverflow