What is the best way to configure a long term retention of logs in S3 for a kubernetes cluster with ElasticSearch, FluentD, and Kibana installed?
If you haven't already installed efk stack you can do so like this:
helm repo add cryptexlabs https://helm.cryptexlabs.com
helm install my-efk-stack cryptexlabs/efk
Or add to your Chart.yaml
dependencies
- name: efk
version: 7.8.0
repository: https://helm.cryptexlabs.com
condition: efk.enabled
Next create a configmap which will also contain your AWS secrets
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-extra-config
data:
s3.conf: |-
<match **>
@type copy
copy_mode deep
<store>
@type s3
aws_key_id xxx
aws_sec_key xxx
s3_bucket "#{ENV['AWS_S3_BUCKET']}"
s3_region "#{ENV['AWS_REGION']}"
path "#{ENV['S3_LOGS_BUCKET_PREFIX']}"
buffer_path /var/log/fluent/s3
s3_object_key_format %{path}%{time_slice}/cluster-log-%{index}.%{file_extension}
time_slice_format %Y%m%d-%H
time_slice_wait 10m
flush_interval 60s
buffer_chunk_limit 256m
</store>
</match>
Optionally create a secret with your AWS access key and id, see below for more info. Don't forget that opaque secrets must be base64 encoded
apiVersion: v1
kind: Secret
metadata:
name: s3-log-archive-secret
type: Opaque
data:
AWS_ACCESS_KEY_ID: xxx
AWS_SECRET_ACCESS_KEY: xxx
If you're wondering why I didn't use an environment variable for the aws access key and id, well its because it doesn't work: https://github.com/fluent/fluent-plugin-s3/issues/340. If you're using kube-2-iam or kiam then this wouldn't matter. See the documentation for the fluentd s3 pluging to configure it to assume a role instead of use credentials.
These values will allow you to run the s3 plugin with the config map. Some important things to note:
envFrom
that imports the secret as environment variables.efk:
enabled: true
elasticsearch:
antiAffinity: "soft"
fluentd:
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch-master"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
- name: AWS_REGION
value: us-east-1
- name: AWS_S3_BUCKET
value: your_buck_name_goes_here
- name: S3_LOGS_BUCKET_PREFIX
value: ""
envFrom:
- secretRef:
name: s3-log-archive-secret
extraVolumeMounts:
- name: extra-config
mountPath: /fluentd/etc/conf.d
extraVolumes:
- name: extra-config
configMap:
name: fluentd-extra-config
items:
- key: s3.conf
path: s3.conf
image:
repository: docker.io/cryptexlabs/fluentd
tag: k8s-daemonset-elasticsearch-s3
If you want to make your own docker image you can do so like so:
FROM fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
RUN fluent-gem install \
fluent-plugin-s3
Next thing is that you probably want to set a retention period for the s3 data. Either you want to delete it after a certain period of time or move it to Glacier depending on your requirements.
Finally since we have a longer term retention of our logs in S3 we can safely set a retention period of something smaller like 30 days for the data that is sent to elasticsearch using ElasticSearch Curator.
You can install currator like so:
helm repo add stable https://kubernetes-charts.storage.googleapis.com
helm install curator stable/elasticsearch-curator
Or add to your Chart.yaml
dependencies:
- name: elasticsearch-curator
version: 2.1.5
repository: https://kubernetes-charts.storage.googleapis.com
values.yaml
:
elasticsearch-curator:
configMaps:
action_file_yml: |-
1: &delete
action: delete_indices
description: "Delete selected indices"
options:
ignore_empty_list: True
continue_if_exception: True
timeout_override: 300
filters:
- filtertype: pattern
kind: prefix
value: 'logstash-'
- filtertype: age
source: name
direction: older
timestring: '%Y-%m-%d'
unit: days
unit_count: 30