I have k8s with 1.14.10-gke.27 in europe-west1-d zone.
In the last couple of days I have a lot stackdriver-metadata-agent-cluster-level pod restarts in kube-system ns with errors
I0402 16:39:12.688053 1 main.go:142] All resources are being watched, agent has started successfully
I0402 16:39:12.688108 1 main.go:145] No statusz port provided; not starting a server
I0402 16:39:29.383562 1 retry.go:80] call failed with err=rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp [2a00:1450:400c:c09::5f]:443: i/o timeout", retrying.
I0402 16:39:29.383667 1 retry.go:80] call failed with err=rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp [2a00:1450:400c:c09::5f]:443: i/o timeout", retrying.
I0402 16:39:30.483072 1 retry.go:80] call failed with err=rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp [2a00:1450:400c:c09::5f]:443: i/o timeout", retrying.
I0402 16:39:30.783091 1 retry.go:80] call failed with err=rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp [2a00:1450:400c:c09::5f]:443: i/o timeout", retrying.
I0402 16:40:09.186357 1 binarylog.go:265] rpc: flushed binary log to ""
I0402 16:41:29.383025 1 binarylog.go:265] rpc: flushed binary log to ""
is this google network issue ?
I'm adding it as an answer as there is quite a lot of code which will be totally unreadable if I put it in comments. Once we manage to figure out the solution I will edit it.
Could you run these Stackdriver logs queries and post the output in your question as a code sample (use ctrl+k on the selected text) ?
resource.type="k8s_container"
resource.labels.project_id="<project_id>"
resource.labels.location="<location e.g. us-central1-c>"
resource.labels.cluster_name="<cluster-name>"
resource.labels.namespace_name="kube-system"
labels.k8s-pod/app="stackdriver-metadata-agent"
labels.k8s-pod/cluster-level="true"
"oom"
resource.type="k8s_container"
resource.labels.project_id="<project-id>"
resource.labels.location="<location e.g. us-central1-c>"
resource.labels.cluster_name="<cluster-name>"
resource.labels.namespace_name="kube-system"
labels.k8s-pod/app="stackdriver-metadata-agent"
labels.k8s-pod/cluster-level="true"
severity>=WARNING
sourceLocation.file!="reflector.go"
Please don't put it as screenshot as it is quite useless when it comes to searching through it.