I'm using humio (https://www.humio.com) to aggregate logs sended by kuberntes pods. In some pod's a annotated the logs with humio-parser=json-for-action or humio-parser=json The pod logs are correctly json objects like:
{"@timestamp":"2021-11-16T08:46:32.557Z","@version":"1","message":"HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@47ce61b9 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.","logger_name":"com.zaxxer.hikari.pool.PoolBase","thread_name":"http-nio-8080-exec-3","level":"WARN","level_value":30000}
The problem is in humio console I can see the pods logs but they all have a datetime stdout F before the start of the json, which is causing parser error. Like as seen in the figure below:
The humio kubernetes is using the oficial helm-chart (https://github.com/humio/humio-helm-charts) which in turn use the fluentbit for log discovery and parser.
I suspect that I need to tweak the configuration of fluent bit, but how to do it?
I found an answer in https://github.com/microsoft/fluentbit-containerd-cri-o-json-log the problem is my container runtime is containerd, which requires a different parser than the default docker parser.
To fix the issue in humio helm chart we need the following:
humio-fluentbit:
parserConfig: |-
[PARSER]
Name apache
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache2
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache_error
Format regex
Regex ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$
[PARSER]
Name nginx
Format regex
Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name json
Format json
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
[PARSER]
Name syslog
Format regex
Regex ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
Time_Key time
Time_Format %b %d %H:%M:%S
[PARSER]
Name cri
Format regex
Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
inputConfig: |-
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser cri
Tag kube.*
Refresh_Interval 5
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Which add the cri parser and override the parser in the input config.