Fluentd in Kubernetes DaemonSet selectively parsing different logs

9/19/2018

So the basic architecture is a Fluentd DaemonSet scrapping Docker logs from pods setup by following this blog post, which in the end makes use of these resources.

I have multiple pods and services running in the Cluster and I can't control their log format. However I do have control over the services I'm working on and I would like to have them use a JSON output format that is compatible with Logstash (key-value).

Now I don't know if there's a way to selectively parse certain logs through this extra step, because some of the services won't have these log format. Of course I want this data to be accessible to Elasticsearch/Kibana for data visualization.

Example JSON formatted log (Rails app deployed): enter image description here

Example non JSON formatted log (CI/CD service): enter image description here

The Fluentd specific configs are found on this file

Any help/suggestions are greatly appreciated

EDIT 1: More details provided

So the problem here is that a JSON log is extracted through the following steps

# JSON Log Example
{"log":"2014/09/25 21:15:03 Got request with path wombat\n", "stream":"stderr", "time":"2014-09-25T21:15:03.499185026Z"}

The logs are being read from the docker logs in each node using tail type:

<source>
  @id fluentd-containers.log
  @type tail
  path /var/log/containers/*.log
  pos_file /var/log/es-containers.log.pos
  tag raw.kubernetes.*
  read_from_head true
  <parse>
    @type multi_format
    <pattern>
      format json
      time_key time
      time_format %Y-%m-%dT%H:%M:%S.%NZ
    </pattern>
    <pattern>
      format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
      time_format %Y-%m-%dT%H:%M:%S.%N%:z
    </pattern>
  </parse>
</source>

Kubernetes metadata plugin adds extra data

{
  "log":"2014/09/25 21:15:03 Got request with path wombat\n",
  "stream":"stderr",
  "time":"2014-09-25T21:15:03.499185026Z",
  "kubernetes": {
    "namespace": "default",
    "pod_name": "synthetic-logger-0.25lps-pod",
    "container_name": "synth-lgr"
  },
  "docker": {
    "container_id": "997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b"
  }
}

However, I have a log that looks like the following:

{"method":"GET","path":"/","format":"html","controller":"HomeController","action":"index","status":200,"duration":1922.45,"view":1912.5,"db":883.41,"request_ip":"127.0.0.1","@timestamp":"2018-09-20T16:43:17.322Z","@version":"1","message":"[200] GET / (HomeController#index)"}

And the problem with this is that the entire log will be "jammed" inside the "log" field, as you can see in the first screenshot posted above. I would guess that with kubernetes metadata it would be something like:

{
  "log":"{"method":"GET","path":"/","format":"html","controller":"HomeController","action":"index","status":200,"duration":1922.45,"view":1912.5,"db":883.41,"request_ip":"127.0.0.1","@timestamp":"2018-09-20T16:43:17.322Z","@version":"1","message":"[200] GET / (HomeController#index)"}",
  "stream":"stdout",
  "time":"2014-09-25T21:15:03.499185026Z",
  "kubernetes": {
    "namespace": "default",
    "pod_name": "rails-pod",
    "container_name": "rails-app"
  },
  "docker": {
    "container_id": "801599971ee6366d4a5921f25b79286ad45ff37a74494f260c3bc98d909d0a7b"
  }
}

I'm looking for a way to filter/parse/extract that data into something similar to this:

{
  "rails-log": {
    "method": "GET",
    "path": "/",
    "format": "html",
    "controller": "HomeController",
    "action": "index",
    ...
    "message": "[200] GET / (HomeController#index)"
  },
  "stream":"stdout",
  "time":"2014-09-25T21:15:03.499185026Z",
  "kubernetes": {
    "namespace": "default",
    "pod_name": "rails-pod",
    "container_name": "rails-app"
  },
  "docker": {
    "container_id": "801599971ee6366d4a5921f25b79286ad45ff37a74494f260c3bc98d909d0a7b"
  }
}

I have a feeling this can be achieved by a parse/filtering, I'm just unsure of how to go about putting this inside the fluentd configs (complete current configs found here)

After the Kubernetes plugin executes (adds kubernetes specific metadata to the "payload") I can select (in Kibana) the logs from my specific service by doing a query with a filter: kubernetes.labels.app is '<service_name>'. Therefore this leads me to believe there's a way to make a "conditional extraction/transformation" of the "log" field into something I can work with.

-- Fdo
elasticsearch
fluentd
kibana
kubernetes
logging

0 Answers