I am trying to ship my K8s pod logs to Elasticsearch using Filebeat.
I am following the guide online here: https://www.elastic.co/guide/en/beats/filebeat/6.0/running-on-kubernetes.html
Everything works as expected however I want to filter out events from system pods. My updated config looks like:
apiVersion: v1
kind: ConfigMap
metadata:
name: filebeat-prospectors
namespace: kube-system
labels:
k8s-app: filebeat
kubernetes.io/cluster-service: "true"
data:
kubernetes.yml: |-
- type: log
paths:
- /var/lib/docker/containers/*/*.log
multiline.pattern: '^\s'
multiline.match: after
json.message_key: log
json.keys_under_root: true
processors:
- add_kubernetes_metadata:
in_cluster: true
namespace: ${POD_NAMESPACE}
- drop_event.when.regexp:
or:
kubernetes.pod.name: "weave-net.*"
kubernetes.pod.name: "external-dns.*"
kubernetes.pod.name: "nginx-ingress-controller.*"
kubernetes.pod.name: "filebeat.*"
I am trying to ignore weave-net
, external-dns
, ingress-controller
and filebeat
events via:
- drop_event.when.regexp:
or:
kubernetes.pod.name: "weave-net.*"
kubernetes.pod.name: "external-dns.*"
kubernetes.pod.name: "nginx-ingress-controller.*"
kubernetes.pod.name: "filebeat.*"
However they continue to arrive in Elasticsearch.
This worked for me in filebeat 6.1.3
- drop_event.when:
or:
- equals:
kubernetes.container.name: "filebeat"
- equals:
kubernetes.container.name: "prometheus-kube-state-metrics"
- equals:
kubernetes.container.name: "weave-npc"
- equals:
kubernetes.container.name: "nginx-ingress-controller"
- equals:
kubernetes.container.name: "weave"
The conditions need to be a list:
- drop_event.when.regexp:
or:
- kubernetes.pod.name: "weave-net.*"
- kubernetes.pod.name: "external-dns.*"
- kubernetes.pod.name: "nginx-ingress-controller.*"
- kubernetes.pod.name: "filebeat.*"
I'm not sure if your order of parameters works. One of my working examples looks like this:
- drop_event:
when:
or:
# Exclude traces from Zipkin
- contains.path: "/api/v"
# Exclude Jolokia calls
- contains.path: "/jolokia/?"
# Exclude pinging metrics
- equals.path: "/metrics"
# Exclude pinging health
- equals.path: "/health"
I am using a different approach, that is less efficient in terms on the number of logs that transit in the logging pipeline.
Similarly on how you did, I deployed one instance of filebeat on my nodes, using a daemonset. Nothing special here, this is the configuration I am using:
apiVersion: v1
data:
filebeat.yml: |-
filebeat.config:
prospectors:
# Mounted `filebeat-prospectors` configmap:
path: ${path.config}/prospectors.d/*.yml
# Reload prospectors configs as they change:
reload.enabled: false
modules:
path: ${path.config}/modules.d/*.yml
# Reload module configs as they change:
reload.enabled: false
processors:
- add_cloud_metadata:
output.logstash:
hosts: ['logstash.elk.svc.cluster.local:5044']
kind: ConfigMap
metadata:
labels:
k8s-app: filebeat
kubernetes.io/cluster-service: "true"
name: filebeat-config
And this one for the prospectors:
apiVersion: v1
data:
kubernetes.yml: |-
- type: log
paths:
- /var/lib/docker/containers/*/*.log
json.message_key: log
json.keys_under_root: true
processors:
- add_kubernetes_metadata:
in_cluster: true
namespace: ${POD_NAMESPACE}
kind: ConfigMap
metadata:
labels:
k8s-app: filebeat
kubernetes.io/cluster-service: "true"
name: filebeat-prospectors
The Daemonset spec:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
labels:
k8s-app: filebeat
kubernetes.io/cluster-service: "true"
name: filebeat
spec:
selector:
matchLabels:
k8s-app: filebeat
kubernetes.io/cluster-service: "true"
template:
metadata:
labels:
k8s-app: filebeat
kubernetes.io/cluster-service: "true"
spec:
containers:
- args:
- -c
- /etc/filebeat.yml
- -e
command:
- /usr/share/filebeat/filebeat
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: docker.elastic.co/beats/filebeat:6.0.1
imagePullPolicy: IfNotPresent
name: filebeat
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
securityContext:
runAsUser: 0
volumeMounts:
- mountPath: /etc/filebeat.yml
name: config
readOnly: true
subPath: filebeat.yml
- mountPath: /usr/share/filebeat/prospectors.d
name: prospectors
readOnly: true
- mountPath: /usr/share/filebeat/data
name: data
- mountPath: /var/lib/docker/containers
name: varlibdockercontainers
readOnly: true
restartPolicy: Always
terminationGracePeriodSeconds: 30
volumes:
- configMap:
name: filebeat-config
name: config
- hostPath:
path: /var/lib/docker/containers
type: ""
name: varlibdockercontainers
- configMap:
defaultMode: 384
name: filebeat-prospectors
name: prospectors
- emptyDir: {}
name: data
Basically, all data from all logs from all containers gets forwarded to logstash, reachable at the service endpoint: logstash.elk.svc.cluster.local:5044
(service called "logstash" in the "elk" namespace).
For brevity, I'm gonna give you only the configuration for logstash (if you need more specific help with kubernetes, please ask in the comments):
The logstash.yml file is very basic:
http.host: "0.0.0.0"
path.config: /usr/share/logstash/pipeline
Just indicating the mountpoint of the directory where I mounted the pipeline config files, which are the following:
10-beats.conf: declares an input for filebeat (port 5044 has to be exposed with a service called "logstash")
input {
beats {
port => 5044
ssl => false
}
}
49-filter-logs.conf: this filter basically drops logs coming from pods that don't have the "elk" label. For the pods that do have the "elk" label, it keeps the logs from containers named in the "elk" label of the pod. For instance, if a Pod has two containers, called "nginx" and "python", putting a label "elk" with value "nginx" will only keep the logs coming from the nginx container and drop the python ones. The type of the log is set as the namespace the pod is running in. This might not be a good fit for everybody (you're going to have a single index in elasticsearch for all logs belonging to a namespace) but it works for me because my logs are homologous.
filter {
if ![kubernetes][labels][elk] {
drop {}
}
if [kubernetes][labels][elk] {
# check if kubernetes.labels.elk contains this container name
mutate {
split => { "kubernetes[labels][elk]" => "." }
}
if [kubernetes][container][name] not in [kubernetes][labels][elk] {
drop {}
}
mutate {
replace => { "@metadata[type]" => "%{kubernetes[namespace]}" }
remove_field => [ "beat", "host", "kubernetes[labels][elk]", "kubernetes[labels][pod-template-hash]", "kubernetes[namespace]", "kubernetes[pod][name]", "offset", "prospector[type]", "source", "stream", "time" ]
rename => { "kubernetes[container][name]" => "container" }
rename => { "kubernetes[labels][app]" => "app" }
}
}
}
The rest of the configuration is about log parsing and is not relevant in this context. The only other important part is the output:
99-output.conf: Send data to elasticsearch:
output {
elasticsearch {
hosts => ["http://elasticsearch.elk.svc.cluster.local:9200"]
manage_template => false
index => "%{[@metadata][type]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}
}
Hope you got the point here.
PROs of this approach
CONs of this approach
I am sure there are better approaches to this problem, but I think this solution is quite handy, at least for my use case.