Currently, I have the following architecuture in kubernetes:
logger
) is running./var/log/containers/*_logger-*.log
, which is a link to another file (the latest file since the last file rotation, to the older files, no link points).This configration works as expected for thousands of messages.
However, here is the problem:
I noticed that fluentd sometimes only forwards logmessage 1 or 2 of the 3 messages that belong together, although all 3 messages are written by the service and the sidecar container.
For the explaination, assume 1 is forwarded, 2 and 3 not. After some research, I found out, that in such cases, message 1 is the last message before the log rotates, message 2 and 3 are in another file (where the symbolic link points to since the rotation, and therefore should be read).
Therefore, it looks like fluentd skips some lines before continue reading at the new file after the kubernetes log rotation.
I am using the docker-image fluent/fluentd-kubernetes-daemonset:v0.12.33-elasticsearch
If more information is required, please let me know.
TLDR:
In theory this should work with the latest version of fluentd-kubernetes-daemonset
. If it's not the default value of rotate_wait
will probably need to be overwritten for the in_tail_container_logs
configuration because of timing issues.
To do so you'll need to create a custom docker image that will overwrite the kubernetes.conf
file, or use a config map with your custom config, mount it in the container and set FLUENT_CONF
to the main config file in the mounted directory.
Explanation:
The docker process is reading from both stdout and stderr of a container. While flushing the streams to the logfile it will also keep track of the set limits. When a limit has been reached it will start the log rotation.
At the same time fluentd is watching the symlink. When the symlink changes fluentd's file watcher will get triggered to update its internal pointer to the actual log file and reset the position in the pos file because the newly created log file is empty.
Using the config parameter rotate_wait
we're telling fluentd to wait for the set amount of seconds (defaults to 5) so the last log lines that have been flushed to the file (or are soon to be) can be picked up before we're continuing with the newly created log file. This will also make sure that the log lines are processed in the correct order.