So the basic architecture is a Fluentd DaemonSet scrapping Docker logs from pods setup by following this blog post, which in the end makes use of these resources.
I have multiple pods and services running in the Cluster and I can't control their log format. However I do have control over the services I'm working on and I would like to have them use a JSON output format that is compatible with Logstash (key-value).
Now I don't know if there's a way to selectively parse certain logs through this extra step, because some of the services won't have these log format. Of course I want this data to be accessible to Elasticsearch/Kibana for data visualization.
Example JSON formatted log (Rails app deployed):
Example non JSON formatted log (CI/CD service):
The Fluentd specific configs are found on this file
Any help/suggestions are greatly appreciated
So the problem here is that a JSON log is extracted through the following steps
# JSON Log Example
{"log":"2014/09/25 21:15:03 Got request with path wombat\n", "stream":"stderr", "time":"2014-09-25T21:15:03.499185026Z"}
The logs are being read from the docker logs in each node using tail
type:
<source>
@id fluentd-containers.log
@type tail
path /var/log/containers/*.log
pos_file /var/log/es-containers.log.pos
tag raw.kubernetes.*
read_from_head true
<parse>
@type multi_format
<pattern>
format json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
</pattern>
<pattern>
format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
time_format %Y-%m-%dT%H:%M:%S.%N%:z
</pattern>
</parse>
</source>
Kubernetes metadata plugin adds extra data
{
"log":"2014/09/25 21:15:03 Got request with path wombat\n",
"stream":"stderr",
"time":"2014-09-25T21:15:03.499185026Z",
"kubernetes": {
"namespace": "default",
"pod_name": "synthetic-logger-0.25lps-pod",
"container_name": "synth-lgr"
},
"docker": {
"container_id": "997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b"
}
}
However, I have a log that looks like the following:
{"method":"GET","path":"/","format":"html","controller":"HomeController","action":"index","status":200,"duration":1922.45,"view":1912.5,"db":883.41,"request_ip":"127.0.0.1","@timestamp":"2018-09-20T16:43:17.322Z","@version":"1","message":"[200] GET / (HomeController#index)"}
And the problem with this is that the entire log will be "jammed" inside the "log" field, as you can see in the first screenshot posted above. I would guess that with kubernetes metadata it would be something like:
{
"log":"{"method":"GET","path":"/","format":"html","controller":"HomeController","action":"index","status":200,"duration":1922.45,"view":1912.5,"db":883.41,"request_ip":"127.0.0.1","@timestamp":"2018-09-20T16:43:17.322Z","@version":"1","message":"[200] GET / (HomeController#index)"}",
"stream":"stdout",
"time":"2014-09-25T21:15:03.499185026Z",
"kubernetes": {
"namespace": "default",
"pod_name": "rails-pod",
"container_name": "rails-app"
},
"docker": {
"container_id": "801599971ee6366d4a5921f25b79286ad45ff37a74494f260c3bc98d909d0a7b"
}
}
I'm looking for a way to filter/parse/extract that data into something similar to this:
{
"rails-log": {
"method": "GET",
"path": "/",
"format": "html",
"controller": "HomeController",
"action": "index",
...
"message": "[200] GET / (HomeController#index)"
},
"stream":"stdout",
"time":"2014-09-25T21:15:03.499185026Z",
"kubernetes": {
"namespace": "default",
"pod_name": "rails-pod",
"container_name": "rails-app"
},
"docker": {
"container_id": "801599971ee6366d4a5921f25b79286ad45ff37a74494f260c3bc98d909d0a7b"
}
}
I have a feeling this can be achieved by a parse/filtering, I'm just unsure of how to go about putting this inside the fluentd configs (complete current configs found here)
After the Kubernetes plugin executes (adds kubernetes specific metadata to the "payload") I can select (in Kibana) the logs from my specific service by doing a query with a filter: kubernetes.labels.app is '<service_name>'
. Therefore this leads me to believe there's a way to make a "conditional extraction/transformation" of the "log" field into something I can work with.