We are running java based microservices and we have following scenario
Question is how to solve this problem when we have a lot of logs to centralised and analysed. We are running 20 instances of this application. We have got 150GB logs in flat filess. Followings are the problems,
We are trying to evaluate following,
Any suggestions are welcomed.
We end up using custom pipeline on GCP where applications are pushing logs to pub/sub and dataflow is responsible to aggregate and transform the information.
You can use a single sidecar that runs something like fluentd or logstash. Both are log ingestion tools that can be customized with several plugins, which allow you to route to all destinations at once. In the case of logstash you might even want to use filebeat.
Also, fluentd seems to have an official plugin from Google that does most of what you want.
This is the procedure described in this k8s blog post about cluster-level logging and this blog post in the fluentd blog.
The idea is to run a DaemonSet (a set of pods that runs on every node in the cluster) that mounts the host path where container log files are located.
However, this will only collect the logs that your application produces to stdout. To collect the other ones, you can use the technique described here: run an extremely lightweight sidecar that just tails the log files.