We are trying FluentBit as we couldn't manage to remove color codes in multiline text with FluentD.
On Kubernetes, I have an application that outputs Java error traces in STDOUT/STDERR, so Kubernetes merges those logs in /var/log/containers/* in JSON format.
I can break down and create the multiline filter successfully, but I cannot remove the color codes from the output.
I followed those instructions: https://github.com/fluent/fluent-bit/issues/1278#issuecomment-499583503 with no success, even as they are for version 1.2 and we are using 1.5, where its recommended to remove the Format encoding. Please, find the examples and my fluentbit config below:
Log entry:
{"log":"\u001b[2m2021-06-03 15:34:27.056\u001b[0;39m \u001b[31mERROR [account,9decd63637c3167b,9decd63637c3167b,true]\u001b[0;39m \u001b[35m1\u001b[0;39m \u001b[2m---\u001b[0;39m \u001b[2m[nio-8080-exec-4]\u001b[0;39m \u001b[36md.q.a.c.c.GenericControllerAdvice \u001b[0;39m \u001b[2m:\u001b[0;39m An exception occurred: GenericResponse(status=400, code=1.2, message=Invalid object, path=/)\r\n","stream":"stdout","time":"2021-06-03T15:34:27.056731062Z"}
Log field output in Kibana:
�[0;39m �[35m1�[0;39m �[2m---�[0;39m �[2m[nio-8080-exec-4]�[0;39m �[36md.q.a.c.c.GenericControllerAdvice �[0;39m �[2m:�[0;39m An exception occurred: GenericResponse(status=400, code=1.2, message=Invalid object, path=/)
As you can see, "[0;39m","[35m1","[2m", etc color codes have been saved in ES with or without Format field as "encode", "encode_utf8", "json" in Parser applied.
I also tried creating a filter with "(\s\S*?)([0-9m|[0-9m|[0;0-9{2}m|[0-9{2}m|)" regex that should work on regex testers, but unfortunately logs come in empty to Kibana if we use this filter.
Here you have the config we are working on currently:
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level debug
Daemon off
Parsers_File custom-parsers.conf
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*service*.log
Multiline On
#Multiline_Flush 5
Parser_Firstline firstline
Parser_1 line_0
[OUTPUT]
Name es
Match kube.*
Host ${FLUENT_ELASTICSEARCH_HOST}
Port ${FLUENT_ELASTICSEARCH_PORT}
Logstash_DateFormat ${FLUENT_ELASTICSEARCH_INDEX_DATE_FORMAT}
Logstash_Format On
Retry_Limit False
Logstash_Prefix ${FLUENT_ELASTICSEARCH_INDEX}
Time_Key time
Generate_ID On
[FILTER]
Name parser
Match kube.*
Key_Name log
Parser java
custom-parsers.conf: |
[PARSER]
Name firstline
Format regex
Regex ^(\{"log":")?(\\[a-z]\d{3}[a-z]\[\dm)?(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3})(\\[a-z]\d{3}[a-z]\[[\w; ]+)? (\\[\w]+\[[\w]+)? ?([\w]\\\[+)?(?<level>WARN|INFO|ERROR) \[(?<service>[a-z]+),([\w]+)?,([\w]+)?,(true|false)?\](?<log>([\s\S]*?)(\[[0-9]m|\[[0-9][0-9]m|\[0;[0-9]{2}m|\[[0-9]{2}m|))","stream":.*$
Time_Key time
Time_Format %Y-%m-%d %H:%M:%S.%L
#Decode_Field_As json log do_next
Decode_Field_As escaped_utf8 log
[PARSER]
Name line_0
Format regex
Regex ^(\{"log":")?(\\[a-z]\d{4})?(?<log>([\s\S]*?)(\[[0-9]m|\[[0-9][0-9]m|\[0;[0-9]{2}m|\[[0-9]{2}m|))\\r\\n","stream":.*$
#Decode_Field_As json log do_next
Decode_Field_As escaped_utf8 log
[PARSER]
Name java
Format regex
Regex ([\s\S]*?)(\[[0-9]m|\[[0-9][0-9]m|\[0;[0-9]{2}m|\[[0-9]{2}m|)