Logging solutions for Kubernetes (GKE)

7/27/2020

I'm looking to capture logs from a pod in kubernetes for two use cases: 1) Realtime -> for which I'm using kubectl logs --- right now 2) Not realtime -> using stackdriver to pipe to bigquery

For both use cases, everything is working, however, when a container exits early due to an error, I lose the logs (i.e. stackdriver doesn't pick them up fast enough).

Is this latency documented somewhere? And assuming stackdriver isn't fast enough, is there another logging solution that would prove more effective? I'm considering having a sidecar container that captures logs, but I'm not sure if this is the best approach.

-- Jay K.
google-kubernetes-engine
kubernetes
logging
monitoring
sidecar

1 Answer

7/27/2020

The logging stack on GKE uses fluentd to pick the logs from the stdout, stderr that the container runtime writes to the nodes, as show in the node logging agent approach.

This isn't much different from what you do when you use kubectl logs:

When you run kubectl logs as in the basic logging example, the kubelet on the node handles the request and reads directly from the log file, returning the contents in the response.

You issue doesn't sound like Stackdriver isn't fast enough but, your container runtime is, for some reason not writing logs to the aforementioned log file where fluentd picks the logs before exporting them.

Before changing the logging architecture, you might want to determine the reasons for pod failure and even customize the termination message path in order to later retrieve it with a custom fluentd log collector.

If this doesn't suit your needs, you can try Elasticsearch instead.

As for the sidecar approach, while it's completely feasible, the official documentation warns on some drawback on this approach:

Using a logging agent in a sidecar container can lead to significant resource consumption. Moreover, you won't be able to access those logs using kubectl logs command, because they are not controlled by the kubelet.

Finally, you should also consider that all the previous information relies on the fact that the container gets to the phase of creation and it's able to write to the log file. If your containers are having "early exits", meaning that aren't even created, then the logs might not even be there for a start, and Stackdriver will never pick them.

Edit:

To mention that you want to also consider that a failed container needs to write to both outputs, stdout and stderr. If it's failing "silently", that will also won't be reflected in Stackdriver.

-- yyyyahir
Source: StackOverflow