How to configure long term retention of logs for EFK stack using S3?

7/9/2020

What is the best way to configure a long term retention of logs in S3 for a kubernetes cluster with ElasticSearch, FluentD, and Kibana installed?

-- Josh Woodcock
amazon-s3
efk
elasticsearch
fluent
kubernetes

1 Answer

7/9/2020

If you haven't already installed efk stack you can do so like this:

helm repo add cryptexlabs https://helm.cryptexlabs.com
helm install my-efk-stack cryptexlabs/efk

Or add to your Chart.yaml dependencies

  - name: efk
    version: 7.8.0
    repository: https://helm.cryptexlabs.com
    condition: efk.enabled

Next create a configmap which will also contain your AWS secrets

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-extra-config
data:
  s3.conf: |-
    <match **>
      @type copy
      copy_mode deep
      <store>
        @type s3
        aws_key_id xxx
        aws_sec_key xxx
        s3_bucket "#{ENV['AWS_S3_BUCKET']}"
        s3_region "#{ENV['AWS_REGION']}"
        path "#{ENV['S3_LOGS_BUCKET_PREFIX']}"
        buffer_path /var/log/fluent/s3
        s3_object_key_format %{path}%{time_slice}/cluster-log-%{index}.%{file_extension}
        time_slice_format %Y%m%d-%H
        time_slice_wait 10m
        flush_interval 60s
        buffer_chunk_limit 256m
      </store>
    </match>

Optionally create a secret with your AWS access key and id, see below for more info. Don't forget that opaque secrets must be base64 encoded

apiVersion: v1
kind: Secret
metadata:
  name: s3-log-archive-secret
type: Opaque
data:
  AWS_ACCESS_KEY_ID: xxx
  AWS_SECRET_ACCESS_KEY: xxx

If you're wondering why I didn't use an environment variable for the aws access key and id, well its because it doesn't work: https://github.com/fluent/fluent-plugin-s3/issues/340. If you're using kube-2-iam or kiam then this wouldn't matter. See the documentation for the fluentd s3 pluging to configure it to assume a role instead of use credentials.

These values will allow you to run the s3 plugin with the config map. Some important things to note:

  • I use antiAffinity of "soft" because I run a single instance metal cluster.
  • S3_LOGS_BUCKET_PREFIX is empty because I use a separate bucket for each environment but you could share a bucket for environments and set the prefix as the environment name
  • You need a docker image that extends the fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch image and has the s3 plugin installed on it.
  • If you skipped the step to create a secret for access key and id then you can remove the envFrom that imports the secret as environment variables.
efk:
  enabled: true
  elasticsearch:
    antiAffinity: "soft"
  fluentd:
    env:
      - name: FLUENT_ELASTICSEARCH_HOST
        value: "elasticsearch-master"
      - name: FLUENT_ELASTICSEARCH_PORT
        value: "9200"
      - name: AWS_REGION
        value: us-east-1
      - name: AWS_S3_BUCKET
        value: your_buck_name_goes_here
      - name: S3_LOGS_BUCKET_PREFIX
        value: ""
    envFrom:
      - secretRef:
          name: s3-log-archive-secret
    extraVolumeMounts:
      - name: extra-config
        mountPath: /fluentd/etc/conf.d
    extraVolumes:
      - name: extra-config
        configMap:
          name: fluentd-extra-config
          items:
            - key: s3.conf
              path: s3.conf
    image:
      repository: docker.io/cryptexlabs/fluentd
      tag: k8s-daemonset-elasticsearch-s3

If you want to make your own docker image you can do so like so:

FROM fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch

RUN fluent-gem install \
 fluent-plugin-s3

Next thing is that you probably want to set a retention period for the s3 data. Either you want to delete it after a certain period of time or move it to Glacier depending on your requirements.

Finally since we have a longer term retention of our logs in S3 we can safely set a retention period of something smaller like 30 days for the data that is sent to elasticsearch using ElasticSearch Curator.

You can install currator like so:

helm repo add stable https://kubernetes-charts.storage.googleapis.com
helm install curator stable/elasticsearch-curator

Or add to your Chart.yaml dependencies:

  - name: elasticsearch-curator
    version: 2.1.5
    repository: https://kubernetes-charts.storage.googleapis.com

values.yaml:

elasticsearch-curator:
  configMaps:
    action_file_yml: |-
      1: &delete
        action: delete_indices
        description: "Delete selected indices"
        options:
          ignore_empty_list: True
          continue_if_exception: True
          timeout_override: 300
        filters:
        - filtertype: pattern
          kind: prefix
          value: 'logstash-'
        - filtertype: age
          source: name
          direction: older
          timestring: '%Y-%m-%d'
          unit: days
          unit_count: 30
-- Josh Woodcock
Source: StackOverflow