How to use nginx's host overwrite logstash's host information when send it by filebeat?

1/18/2021

Now using filebeat and logstash sending nginx's json log on k8s.

The nginx's configuration likes

nginx.conf

http {
    log_format bucket escape=json
    '{'
        '"request_id": "$request_id",'
        '"method": "$request_method",'
        '"status": "$status",'
        '"forwarded_for": "$http_x_forwarded_for",'
        '"host": "$host",'
        '"url": "$request_uri",'
        '"referer": "$http_referer",'
        '"remote_ip": "$remote_addr",'
        '"server_ip": "$server_addr",'
        '"user_agent": "$http_user_agent",'
    '}';
}

server {
    access_log  /var/log/nginx/access.json  bucket;
}

Filebeat's configuration:

filebeat.yml

filebeat.shutdown_timeout: 5s

filebeat.inputs:
  - type: log
    enabled: true
    paths:
      - /var/log/nginx/access.json*
    exclude_files: ['\.gz
#x27;
] tags: ["access"] processors: - decode_json_fields: fields: ["message"] process_array: true max_depth: 1 target: "" overwrite_keys: true add_error_key: false output.logstash: hosts: ["logstash.default.svc.cluster.local:5044"]

Here overwirte_keys is true so it should overwrite metadata, right?

Logstash's configuration:

logstash.conf

input {
  beats {
    port => 5044
  }
}

filter {
  if "access" in [tags] {
    mutate {
      add_field => { "[@metadata][tags]" => "%{tags}" }
      remove_field => [
        "agent",
        "event",
        "service",
        "log",
        "input",
        "fileset",
        "ecs",
        "container",
        "kubernetes",
        "@timestamp",
        "@version",
        "message",
        "tags"
      ]
    }
  }
}

output {
  if "access" in [@metadata][tags] {
    google_cloud_storage {
      bucket => "nginx_logs"
      json_key_file => "/secrets/service_account/credentials.json"
      temp_directory => "/tmp/nginx_logs"
      log_file_prefix => "logstash_nginx_logs"
      max_file_size_kbytes => 1024
      output_format => "json"
      date_pattern => "%Y-%m-%dT%H:00"
      flush_interval_secs => 2
      gzip => false
      gzip_content_encoding => false
      uploader_interval_secs => 60
      include_uuid => true
      include_hostname => true
    }
  }
}

It works well at the beginning. The log data has been generated to json files as:

{"user_agent":"Mozilla/5.0 (iPhone; CPU iPhone OS 14_2_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 [FBAN/FBIOS;FBDV/iPhone13,1;FBMD/iPhone;FBSN/iOS;FBSV/14.2.1;FBSS/3;FBID/phone;FBLC/ja_JP;FBOP/5]","forwarded_for":"1.2.3.4","host":"api.mysite.com","method":"OPTIONS","request_id":"0127054b954fe4973852e1886130a6ca","referer":"https://www.world.com/","remote_ip":"2.3.4.5","server_ip":"3.4.5.6","status":"204","url":"/api/v1/post"}

But recently, this data occurred:

{"user_agent":"Mozilla/5.0 (iPhone; CPU iPhone OS 14_2_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 [FBAN/FBIOS;FBDV/iPhone13,1;FBMD/iPhone;FBSN/iOS;FBSV/14.2.1;FBSS/3;FBID/phone;FBLC/ja_JP;FBOP/5]","forwarded_for":"1.2.3.4","host":"api.mysite.com","method":"OPTIONS","request_id":"0127054b954fe4973852e1886130a6ca","referer":"https://www.world.com/","remote_ip":"2.3.4.5","server_ip":"3.4.5.6","status":"204","url":"/api/v1/post"}
{"host":{"name":"filebeat-adio3"}}
{"host":{"name":"filebeat-adio3"}}
{"host":{"name":"filebeat-adio3"}}

This is not a regular data. It looks like filebeat server's host metadata has been sent. But why? Is it a filebeat's mistake or logstash's? Is there an another good way to filter this host data to ensure to be sent without conflict with fb/logstash's metadata?

-- iooi
filebeat
host
kubernetes
logstash
nginx

0 Answers