fluentbit get container_name from log file name to use as a custom field

2/15/2022

I need to extract a part related to the container_name from the log file name and use it as a field in the fluentbit output.

For example given a log file name:

kube.default.var.log.containers.xml-builder-66587b7696-ns9bq_default_xml-builder-ded2966c8929ad811b9468916a071b6fbb445034ac014e28af23654c1ba4ca4a.log

I would like to extract from it and use the part:

xml-builder-66587b7696

In the documentation I saw that the Tag and Tag_Regex could be used but it is not clear on how to extract the fields based on this information...

Below is a part of my fluentbit configuration:

data:
  # Configuration files: server, input, filters and output
  # ======================================================
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020

    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE output-kafka.conf

  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Tag               kube.default.*
        Path              /var/log/containers/*default*.log
        Parser            docker
        Tag              kube.<namespace_name>.<pod_name>.<container_name>
        Tag_Regex        (?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10

My question: How can I use the information from the Tag and Tag_Regexp to use it in the filter that performs Modify operation?

[FILTER]
    Name record_modifier
    Match *
    Record custom_field <what to add here?>

It seems that the code below will not work:

[FILTER]
    Name record_modifier
    Match *
    Record custom_field kube.<namespace_name>.<pod_name>.<container_name>

Is there any way to extract this information (container_name) from the log file name or some other approach should be used (let's say kubernetes plugin and then customization of the output with adding or modifying the information returned by kubernetes)?

Thank you.

-- Alex Konkin
fluent-bit
kubernetes
logging

1 Answer

2/15/2022

I am not sure if the solution that I mentioned above is doable (extract container_name from the log file name), however the question is also related to the information that is stored/handled inside a kubernetes cluster.

The hint for this solution was found in the article:

https://stackoverflow.com/questions/65382688/fluentbit-kubernetes-how-to-add-kubernetes-metadata-in-application-logs-which

an additional information related to the fluentbit/lift feature could be found here:

https://docs.fluentbit.io/manual/pipeline/filters/nest

The logic of the solution:

-use kubernetes filter that adds info about container_name to the log message

-use fluentbit/lift feature to move keys stored unter the kubernetes node in the log fine one level upper

-use container_name according to my needs

I would also like to share with you a resulting fluentbit configmap

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
  labels:
    k8s-app: fluent-bit
data:
  # Configuration files: server, input, filters and output
  # ======================================================
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020

    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE output-stdout.conf

  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*default*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10

  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix     kube.var.log.containers.
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off

    [FILTER]
        Name nest
        Match *
        Operation lift
        Nested_under kubernetes

    [FILTER]
        Name modify
        Match *
        Copy container_name container_identity

  output-stdout.conf: |
    [OUTPUT]
        Name stdout
        Match *

  parsers.conf: |
    [PARSER]
        Name   json
        Format json
        Time_Key time
        Time_Format %Y-%m-%dT:%H:%M:%S %z

    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On
-- Alex Konkin
Source: StackOverflow