GCP - Logging k8s: Error while sending request to Stackdriver googleapi: Error 400: One or more TimeSeries could not be written

7/30/2020

I've recently added the logging for my GKE instances on the GCP instances. Nowadays the following error occurs three times a second and therefore a massive amount of errors will be generated. Unfortunately all important errors will be lost, cause of the massive amount of errors in the logs. The following JSON is one of these errors:

{
  "insertId": "42",
  "jsonPayload": {
    "pid": "1",
    "source": "stackdriver.go:60",
    "message": "Error while sending request to Stackdriver googleapi: Error 400: One or more TimeSeries could not be written: Unknown metric:
kubernetes.io/internal/addons/workload_identity/go_gc_duration_seconds_count: timeSeries[31]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_gc_duration_seconds_sum: timeSeries[4]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_goroutines: timeSeries[0]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_info: timeSeries[47]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_alloc_bytes: timeSeries[55]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_alloc_bytes_total: timeSeries[40]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_buck_hash_sys_bytes: timeSeries[13]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_frees_total: timeSeries[2]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_gc_cpu_fraction: timeSeries[56]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_gc_sys_bytes: timeSeries[19]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_heap_alloc_bytes: timeSeries[46]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_heap_idle_bytes: timeSeries[32]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_heap_inuse_bytes: timeSeries[42]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_heap_objects: timeSeries[1]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_heap_released_bytes: timeSeries[8]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_heap_sys_bytes: timeSeries[43]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_last_gc_time_seconds: timeSeries[33]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_lookups_total: timeSeries[34]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_mallocs_total: timeSeries[3]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_mcache_inuse_bytes: timeSeries[18]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_mcache_sys_bytes: timeSeries[11]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_mspan_inuse_bytes: timeSeries[38]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_mspan_sys_bytes: timeSeries[23]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_next_gc_bytes: timeSeries[10]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_other_sys_bytes: timeSeries[16]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_stack_inuse_bytes: timeSeries[17]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_stack_sys_bytes: timeSeries[12]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_memstats_sys_bytes: timeSeries[21]; Unknown metric: kubernetes.io/internal/addons/workload_identity/go_threads: timeSeries[41]; Unknown metric: kubernetes.io/internal/addons/workload_identity/process_cpu_seconds_total: timeSeries[20]; Unknown metric: kubernetes.io/internal/addons/workload_identity/process_max_fds: timeSeries[22]; Unknown metric: kubernetes.io/internal/addons/workload_identity/process_open_fds: timeSeries[9]; Unknown metric: kubernetes.io/internal/addons/workload_identity/process_resident_memory_bytes: timeSeries[39]; Unknown metric: kubernetes.io/internal/addons/workload_identity/process_start_time_seconds: timeSeries[45]; Unknown metric: kubernetes.io/internal/addons/workload_identity/process_virtual_memory_bytes: timeSeries[30]; Unknown metric: kubernetes.io/internal/addons/workload_identity/process_virtual_memory_max_bytes: timeSeries[44]; Unknown metric: kubernetes.io/internal/addons/workload_identity/promhttp_metric_handler_requests_in_flight: timeSeries[7]; Unknown metric: kubernetes.io/internal/addons/workload_identity/promhttp_metric_handler_requests_total: timeSeries[35-37]; Value type for metric kubernetes.io/internal/addons/workload_identity/metadata_server_build_info must be DOUBLE, but is INT64.: timeSeries[48], badRequest"
  },
  "resource": {
    "type": "k8s_container",
    "labels": {
      "cluster_name": "cluster-a",
      "location": "europe-west3",
      "pod_name": "prometheus-to-sd-jcmwn",
      "project_id": "my-nice-project-id",
      "container_name": "prometheus-to-sd-new-model",
      "namespace_name": "kube-system"
    }
  },
  "timestamp": "2020-07-30T06:26:01.784963Z",
  "severity": "ERROR",
  "labels": {
    "k8s-pod/pod-template-generation": "1",
    "k8s-pod/controller-revision-hash": "7984bf4f95",
    "k8s-pod/k8s-app": "prometheus-to-sd"
  },
  "logName": "projects/my-nice-project-id/logs/stderr",
  "sourceLocation": {
    "file": "stackdriver.go",
    "line": "60"
  },
  "receiveTimestamp": "2020-07-30T06:26:03.411798926Z"
}

What is the problem of this behaviour and how I can fix it?

-- theexiile1305
google-cloud-platform
google-kubernetes-engine
kubernetes

1 Answer

8/1/2020

It looks like a bug in GKE clusters with the Workload Identity feature enabled.
The bug reproduced for me in 1.14.10-gke.42 with Workload Identity, but works as expected with GKE cluster deployed with version 1.15.12-gke.2.

There is an open issue in GitHub. If you can't upgrade your cluster version, I suggest you to contact Google Cloud support and ask them for their recommended mitigation (Although they probably will instruct you to upgrade your cluster version as well).

-- hilsenrat
Source: StackOverflow