New to GCloud and GKE and having a frustrating time with DNS.
We have a VPN between our office and GCloud running a Shared VPC. Existing firewall rules seem to work fine. We can ping both ways, we can ssh to Google successfully.
So now from within GKE, we need to be able to resolve hostnames across the VPN using DNS. Should be simple.
I edited the kube-dns config map and added our internal domain name using stubDomains pointing to our two DNS servers. After the kube-dns pods get redeployed, I verified that in the logs, they are getting the new stubDomain section. However I still can't resolve any hosts, even from the kube-dns containers themselves.
While logged into dnsmasq container:
/etc/k8s/dns/dnsmasq-nanny # cat stubDomains
{"internal.domain.com": ["10.85.128.5", "10.85.128.6"]}
/ # nslookup google.com
nslookup: can't resolve '(null)': Name does not resolve
Name: google.com
Address 1: 108.177.9.138 ox-in-f138.1e100.net
Address 2: 108.177.9.101 ox-in-f101.1e100.net
Address 3: 108.177.9.139 ox-in-f139.1e100.net
Address 4: 108.177.9.100 ox-in-f100.1e100.net
Address 5: 108.177.9.102 ox-in-f102.1e100.net
Address 6: 108.177.9.113 ox-in-f113.1e100.net
Address 7: 2607:f8b0:4003:c13::71 ox-in-x71.1e100.net
/etc/k8s/dns/dnsmasq-nanny # cd /
/ # nslookup rancher.internal.domain.com
nslookup: can't resolve '(null)': Name does not resolve
nslookup: can't resolve 'rancher.internal.domain.com': Name does not resolve
nslookup: can't resolve 'rancher.internal.domain.com': Name does not resolve
/ # nslookup rancher.internal.domain.com 10.85.128.5
Server: 10.85.128.5
Address 1: 10.85.128.5
nslookup: can't resolve 'rancher.internal.domain.com': Name does not resolve
Now as far as I can tell, the Egress is supposed to be an explicit allow from Google to anything else.
But just in case, I added an egress rule to allow TCP/UDP 53 to the servers. No luck either.
Any thoughts?
To recap our discussion in the comments for the greater public:
You can use the kube-DNS configmap to add stubDomains which your pods will use for name resolution. Once the configmap is changed, the kube-dns pods needs to be recreated for the change to take effect. Any pod using default dns settings (clusterFirst is the default) will resolve using kube-dns.
Pods using the "default" setting for the dns config (resolve against node resolv.conf) will ignore the stubDomains configured in the configmap. Instead, we need to update the nodes resolve.conf file.
There are two things to note for this. 1) The resolv.conf file on every GCE VM (nodes included) is overwritten by the metadata server whenever the DHCP lease is renewed. 2) There is no way to programmatically append dns entries during cluster creation.
To address this, use a daemonset as a startup script that will append the new additional nameservers to the the resolv.conf file and then, to ensure the metadata server does not revert the file back, make the file immutable
I'm trying a guess because we don't have your GKE cluster configuration but I already encounter something similar and I bet you didn't configure IP Aliasing https://cloud.google.com/kubernetes-engine/docs/how-to/alias-ips
A little explanation: you can't access a vpc pearing from another vpc pearing, that mean that if you are in a VPC, you can't access a managed service from a shared connection to another project or your office (through a vpn ipsec tunneling I suppose). As GKE is a managed service, by default it will live inside a private network and open a pearing to your project so you can't use a lot of thing (prometheus for monitoring or do DNS resolution as the cluster won't know how to join your other network).
IP Aliasing is resolving this fact by creating the cluster inside the network of your project, so that you can access, your cluter in the same ip range as the rest of your project and use vpc pearing.
Hope it solves your problem.