GKE DNS resolution errors

7/25/2018

We use Kubernetes cronjobs on GKE (version 1.9) for running several periodic tasks. From the pods, we need to make several calls to external API outside our network. Often (but not all the time), these calls fail because of DNS resolution timeouts.

The current hypothesis I have is that the upstream DNS server for the service we are trying to contact is rate limiting the requests where we make lots of repeated DNS requests because the TTL for those records was either too low or just because we dropped those entries from dnsmasq cache due to low cache size.

I tried editing the kube-dns deployment to change the cache size and ttl arguments passed to dnsmasq container, but the changes get reverted because it's a managed deployment by GKE. Is there a way to persist these changes so that GKE does not overwrite them? Any other ideas to deal with dns issues on GKE or Kubernetes engine in general?

-- Ashu Pachauri
dns
google-kubernetes-engine
kubernetes

2 Answers

10/5/2018

I suggest you to use ExternalDNS pods, Like KubeDNS, it retrieves a list of resources (Services, Ingresses, etc.) from the Kubernetes API to determine a desired list of DNS records.

-- Alioua
Source: StackOverflow

8/21/2018

Not sure if all knobs are covered, but if you update the ConfigMap used by the deployment you should be able to reconfigure KubeDNS on GKE. It will use the ConfigMap when deploying new instances. Then nuke the existing pods to redeploy them with the new config.

-- KarlKFI
Source: StackOverflow