Fluentd grep + output logs

11/20/2018

I have a service, deployed into a kubernetes cluster, with fluentd set as a daemon set. And i need to diversify logs it receives so they end up in different s3 buckets. One bucket would be for all logs, generated by kubernetes and our debug/error handling code, and another bucket would be a subset of logs, generated by the service, parsed by structured logger and identified by a specific field in json. Think of it one bucket is for machine state and errors, another is for "user_id created resource image_id at ts" description of user actions

The service itself is ignorant of the fluentd, so i cannot manually set the tag for logs based on which s3 bucket i want them to end in. Now, the fluentd.conf i use sets s3 stuff like this:

<match **>
  # docs: https://docs.fluentd.org/v0.12/articles/out_s3
  # note: this configuration relies on the nodes have an IAM instance profile with access to your S3 bucket
  type copy
  <store>
    type s3
    log_level info
    s3_bucket "#{ENV['S3_BUCKET_NAME']}"
    s3_region "#{ENV['S3_BUCKET_REGION']}"
    aws_key_id "#{ENV['AWS_ACCESS_KEY_ID']}"
    aws_sec_key "#{ENV['AWS_SECRET_ACCESS_KEY']}"
    s3_object_key_format %{path}%{time_slice}/cluster-log-%{index}.%{file_extension}
    format json
    time_slice_format %Y/%m/%d
    time_slice_wait 1m
    flush_interval 10m
    utc
    include_time_key true
    include_tag_key true
    buffer_chunk_limit 128m
    buffer_path /var/log/fluentd-buffers/s3.buffer
  </store>
  <store>
  ...
  </store>
</match>

So, what i would like to do is to have something like a grep plugin

<store>
  type grep
  <regexp>
    key type
    pattern client-action
  </regexp>
</store>

Which would send logs into a separate s3 bucket to the one defined for all logs

-- dgmt
amazon-s3
devops
fluent
kubernetes

1 Answer

11/20/2018

I am assuming that user action logs are generated by your service and system logs include docker, kubernetes and systemd logs from the nodes. I found your example yaml file at the official fluent github repo. If you check out the folder in that link, you'll see two more files called kubernetes.conf and systemd.conf. These files have got source sections where they tag their data.

The match section in fluent.conf is matching **, i.e. all logs and sending them to s3. You want to split your log types here. Your container logs are being tagged kubernetes.* in kubernetes.conf on this line.

so your above config turns into

<match kubernetes.* >
@type s3
# user log s3 bucket
...

and for system logs match every other tag except kubernetes.*

-- Siddhesh Rane
Source: StackOverflow