how to use fluentd to parse mutliple log of kubernetes pod output

11/1/2021

I tried to implement EFK stack into our current environment with Fluentd.

There is a configuration I have is

    <source>
      ...
      path /var/log/containers/*.log
      ...
    </source>

which supposed to grab all the standard output of all pods on the worker node. But when I ssh into that node and take inspect of the format of the output, I found the standard output log with multiple line was broken into different log entries, for example:

{"log":"Error [ERR_HTTP_HEADERS_SENT]: Cannot set headers after they are sent to the client\n","stream":"stderr","time":"2021-10-29T18:26:26.011079366Z"}
{"log":"    at ServerResponse.setHeader (_http_outgoing.js:530:11)\n","stream":"stderr","time":"2021-10-29T18:26:26.011130167Z"}
{"log":"    at sendEtagResponse (/app/node_modules/next/dist/next-server/server/send-payload.js:6:12)\n","stream":"stderr","time":"2021-10-29T18:26:26.011145267Z"}
{"log":"    at sendData (/app/node_modules/next/dist/next-server/server/api-utils.js:32:479)\n","stream":"stderr","time":"2021-10-29T18:26:26.011229869Z"}
{"log":"    at ServerResponse.apiRes.send (/app/node_modules/next/dist/next-server/server/api-utils.js:6:250)\n","stream":"stderr","time":"2021-10-29T18:26:26.011242369Z"}
{"log":"    at exports.modules.3626.__webpack_exports__.default (/app/.next/server/pages/api/users/[id]/organizations.js:350:34)\n","stream":"stderr","time":"2021-10-29T18:26:26.011252769Z"}
{"log":"    at runMicrotasks (\u003canonymous\u003e)\n","stream":"stderr","time":"2021-10-29T18:26:26.011264269Z"}
{"log":"    at processTicksAndRejections (internal/process/task_queues.js:97:5)\n","stream":"stderr","time":"2021-10-29T18:26:26.011275069Z"}
{"log":"    at async apiResolver (/app/node_modules/next/dist/next-server/server/api-utils.js:8:1)\n","stream":"stderr","time":"2021-10-29T18:26:26.011284869Z"}
{"log":"    at async Server.handleApiRequest (/app/node_modules/next/dist/next-server/server/next-server.js:66:462)\n","stream":"stderr","time":"2021-10-29T18:26:26.01129647Z"}
{"log":"    at async Object.fn (/app/node_modules/next/dist/next-server/server/next-server.js:58:580) {\n","stream":"stderr","time":"2021-10-29T18:26:26.01130717Z"}
{"log":"  code: 'ERR_HTTP_HEADERS_SENT'\n","stream":"stderr","time":"2021-10-29T18:26:26.01131707Z"}
{"log":"}\n","stream":"stderr","time":"2021-10-29T18:26:26.01132747Z"}

then all of those lines are broken into separate log pieces and got transported into Elasticsearch, is there a way that we can make those mutliple lines into one single piece?

Appreciated to the help of any kinds.

-- pyy
elastic-stack
fluentd
kubernetes
logging

1 Answer

11/1/2021

You can use the multiline plugin to achieve that.

This provides the format_firstline parameter where you can use a regex.

You didn't share much of your regular log output, so here is an example for a timestamp with format YYYY-MM-dd HH:mm:ss,zzz

firstline: /\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2},\d{3}/

You could also try to match on the beginning of the line e.g. ^(Info|Error).

That way fluentd will recognize multiple lines as one log entry.

Check docs for more info about configuring the plugin.

-- Chris
Source: StackOverflow