The first thing I will say is that I have this same install working in AWS with the same configuration files for apiserver
, manager
, scheduler
, kubelet
, and kube-proxy
.
/usr/bin/kubelet \
--require-kubeconfig \
--allow-privileged=true \
--cluster-dns=10.32.0.10 \
--container-runtime=docker \
--docker=unix:///var/run/docker.sock \
--network-plugin=kubenet \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--serialize-image-pulls=true \
--cgroup-root=/ \
--system-container=/system \
--node-status-update-frequency=4s \
--tls-cert-file=/var/lib/kubernetes/kubernetes.pem \
--tls-private-key-file=/var/lib/kubernetes/kubernetes-key.pem \
--v=2
/usr/bin/kube-proxy \
--master=https://10.240.0.6:6443 \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--proxy-mode=iptables \
--v=2
Login to any of the pods on any node:
nslookup kubernetes 10.32.0.10
Server: 10.32.0.10
Address 1: 10.32.0.10 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve 'kubernetes': Try again
nslookup kubernetes.default.svc.cluster.local. 10.32.0.10
Server: 10.32.0.10
Address 1: 10.32.0.10 kube-dns.kube-system.svc.cluster.local
Name: kubernetes.default.svc.cluster.local.
Address 1: 10.32.0.1 kubernetes.default.svc.cluster.local
So I figured out that on azure, the resolv.conf looked like this:
; generated by /usr/sbin/dhclient-script
search ssnci0siiuyebf1tqq5j1a1cyd.bx.internal.cloudapp.net
10.32.0.10
options ndots:5
If I added the search domains of default.svc.cluster.local
svc.cluster.local cluster.local
.
Everything started working and I understand why.
However, this is problematic because for every namespace I create, I would need to manage the resolv.conf.
This does not happen when I deploy in Amazon so I am kind of stumped on why it is happening in Azure.
Kubelet has a command line flag, cluster-domain
which it looks like you're missing. See the docs
Add --cluster-domain=cluster.local
to your kubelet command start up, and it should start working as expected.