Humio kubernetes parser logs in json

11/16/2021

I'm using humio (https://www.humio.com) to aggregate logs sended by kuberntes pods. In some pod's a annotated the logs with humio-parser=json-for-action or humio-parser=json The pod logs are correctly json objects like:

{"@timestamp":"2021-11-16T08:46:32.557Z","@version":"1","message":"HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@47ce61b9 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.","logger_name":"com.zaxxer.hikari.pool.PoolBase","thread_name":"http-nio-8080-exec-3","level":"WARN","level_value":30000}

The problem is in humio console I can see the pods logs but they all have a datetime stdout F before the start of the json, which is causing parser error. Like as seen in the figure below:

Humio problem

The humio kubernetes is using the oficial helm-chart (https://github.com/humio/humio-helm-charts) which in turn use the fluentbit for log discovery and parser.

I suspect that I need to tweak the configuration of fluent bit, but how to do it?

-- Giovanni Silva
fluent-bit
humio
kubernetes

1 Answer

11/16/2021

I found an answer in https://github.com/microsoft/fluentbit-containerd-cri-o-json-log the problem is my container runtime is containerd, which requires a different parser than the default docker parser.

To fix the issue in humio helm chart we need the following:

humio-fluentbit:
 parserConfig: |-
   [PARSER]
       Name apache
       Format regex
       Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
       Time_Key time
       Time_Format %d/%b/%Y:%H:%M:%S %z
   [PARSER]
       Name apache2
       Format regex
       Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
       Time_Key time
       Time_Format %d/%b/%Y:%H:%M:%S %z
   [PARSER]
       Name apache_error
       Format regex
       Regex  ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$
   [PARSER]
       Name nginx
       Format regex
       Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")
       Time_Key time
       Time_Format %d/%b/%Y:%H:%M:%S %z
   [PARSER]
       Name json
       Format json
       Time_Key time
       Time_Format %d/%b/%Y:%H:%M:%S %z
   [PARSER]
       Name docker
       Format json
       Time_Key time
       Time_Format %Y-%m-%dT%H:%M:%S.%L
       Time_Keep   On
   [PARSER]
       Name syslog
       Format regex
       Regex ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
       Time_Key time
       Time_Format %b %d %H:%M:%S
   [PARSER]
       Name cri
       Format regex
       Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
       Time_Key time
       Time_Format %Y-%m-%dT%H:%M:%S.%L%z
 inputConfig: |-
   [INPUT]
     Name             tail
     Path             /var/log/containers/*.log
     Parser           cri
     Tag              kube.*
     Refresh_Interval 5
     Mem_Buf_Limit    5MB
     Skip_Long_Lines  On

Which add the cri parser and override the parser in the input config.

-- Giovanni Silva
Source: StackOverflow