We have a deployment of Kubernetes in Google Cloud Platform. Recently we hit one of the well known issues related on a problem with the kube-dns that happens at high amount of requests https://github.com/kubernetes/kubernetes/issues/56903 (its more related to SNAT/DNAT and contract but the final result is out of service of kube-dns).
After a few days of digging on that topic we found that k8s already have a solution witch is currently in alpha (https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/)
The solution is to create a caching CoreDNS as a daemonset on each k8s node so far so good.
Problem is that after you create the daemonset you have to tell to kubelet to use it with --cluster-dns option and we cant find any way to do that in GKE environment. Google bootstraps the cluster with "configure-sh" script in instance metadata. There is an option to edit the instance template and "hardcode" the required values but that is not an option if you upgrade the cluster or use the horizontal autoscaling all of the modified values will be lost. The last idea was to use custom startup script that pull configuration and update the metadata server but this is a too complicated task.
You can spin up another kube-dns deployment e.g. in different node-pool and thus having 2x nameserver in the pod's resolv.conf.
This would mitigate the evictions and other failures and generally allow you to completely control your kube-dns service in the whole cluster.
In addition to what was mentioned in this answer - With beta support on GKE, the nodelocal caches now listen on the kube-dns service IP, so there is no need for a kubelet flag change.
As of 2019/12/10, GKE now supports through the gcloud
CLI in beta:
Kubernetes Engine
- Promoted NodeLocalDNS Addon to beta. Use
--addons=NodeLocalDNS
withgcloud beta container clusters create
. This addon can be enabled or disabled on existing clusters using--update-addons=NodeLocalDNS=ENABLED
or--update-addons=NodeLocalDNS=DISABLED
with gcloud container clusters update.
See https://cloud.google.com/sdk/docs/release-notes#27300_2019-12-10