After deploying the kubernetes cluster using kargo, I found out that kubedns pod is not working properly:
$ kcsys get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
dnsmasq-alv8k 1/1 Running 2 1d 10.233.86.2 kubemaster
dnsmasq-c9y52 1/1 Running 2 1d 10.233.82.2 kubeminion1
dnsmasq-sjouh 1/1 Running 2 1d 10.233.76.6 kubeminion2
kubedns-hxaj7 2/3 CrashLoopBackOff 339 22h 10.233.76.3 kubeminion2
PS : kcsys
is an alias of kubectl --namespace=kube-system
Logs for each container (kubedns, dnsmasq) seems OK except healthz container as following:
2017/03/01 07:24:32 Healthz probe error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local' error exit status 1
Update
kubedns rc description
apiVersion: v1
kind: ReplicationController
metadata:
creationTimestamp: 2017-02-28T08:31:57Z
generation: 1
labels:
k8s-app: kubedns
kubernetes.io/cluster-service: "true"
version: v19
name: kubedns
namespace: kube-system
resourceVersion: "130982"
selfLink: /api/v1/namespaces/kube-system/replicationcontrollers/kubedns
uid: 5dc9f9f2-fd90-11e6-850d-005056a020b4
spec:
replicas: 1
selector:
k8s-app: kubedns
version: v19
template:
metadata:
creationTimestamp: null
labels:
k8s-app: kubedns
kubernetes.io/cluster-service: "true"
version: v19
spec:
containers:
- args:
- --domain=cluster.local.
- --dns-port=10053
- --v=2
image: gcr.io/google_containers/kubedns-amd64:1.9
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 5
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: kubedns
ports:
- containerPort: 10053
name: dns-local
protocol: UDP
- containerPort: 10053
name: dns-tcp-local
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readiness
port: 8081
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: 100m
memory: 170Mi
requests:
cpu: 70m
memory: 70Mi
terminationMessagePath: /dev/termination-log
- args:
- --log-facility=-
- --cache-size=1000
- --no-resolv
- --server=127.0.0.1#10053
image: gcr.io/google_containers/kube-dnsmasq-amd64:1.3
imagePullPolicy: IfNotPresent
name: dnsmasq
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
resources:
limits:
cpu: 100m
memory: 170Mi
requests:
cpu: 70m
memory: 70Mi
terminationMessagePath: /dev/termination-log
- args:
- -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
&& nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
- -port=8080
- -quiet
image: gcr.io/google_containers/exechealthz-amd64:1.1
imagePullPolicy: IfNotPresent
name: healthz
ports:
- containerPort: 8080
protocol: TCP
resources:
limits:
cpu: 10m
memory: 50Mi
requests:
cpu: 10m
memory: 50Mi
terminationMessagePath: /dev/termination-log
dnsPolicy: Default
restartPolicy: Always
securityContext: {}
terminationGracePeriodSeconds: 30
status:
fullyLabeledReplicas: 1
observedGeneration: 1
replicas: 1`
kubedns svc description:
apiVersion: v1
kind: Service
metadata:
creationTimestamp: 2017-02-28T08:31:58Z
labels:
k8s-app: kubedns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: kubedns
name: kubedns
namespace: kube-system
resourceVersion: "10736"
selfLink: /api/v1/namespaces/kube-system/services/kubedns
uid: 5ed4dd78-fd90-11e6-850d-005056a020b4
spec:
clusterIP: 10.233.0.3
ports:
- name: dns
port: 53
protocol: UDP
targetPort: 53
- name: dns-tcp
port: 53
protocol: TCP
targetPort: 53
selector:
k8s-app: kubedns
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
I catch some errors in kubedns container:
1 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://10.233.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.233.0.1:443: i/o timeout
1 reflector.go:199] pkg/dns/dns.go:148: Failed to list *api.Service: Get https://10.233.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.233.0.1:443: i/o timeout
Can you take a look at the output from ps auxf | grep dockerd
.
Kargo is adding the setting iptables=false
to the docker daemon. As far as I can see, this is causing issues with container to host networking as connecting to 10.233.0.1:443 will follow iptable rules that forward the request to one of the master nodes' api server.
The other kubernetes services have their networking bound to the host so you will not experience the issue.
I'm not sure if this is the root issue, however removing iptables=false
from the docker daemon settings has fixed any issues we were experiencing. This is not disabled by default and is not expected to be disabled for using network overlays like flannel.
Removing the iptables option for the docker daemon can be done from /etc/systemd/system/docker.service.d/docker-options.conf which should look something like this:
[root@k8s-joy-g2eqd2 ~]# cat /etc/systemd/system/docker.service.d/docker-options.conf [Service] Environment="DOCKER_OPTS=--insecure-registry=10.233.0.0/18 --graph=/var/lib/docker --iptables=false"
Once this is updated you can run systemctl daemon-reload
to register the change and then systemctl restart docker
.
This will allow you to test if this fixes your issue. Once you can confirm this is the fix, you can override the docker_options
variable in the kargo deployment to exclude that rule:
docker_options: "--insecure-registry=10.233.0.0/18 --graph=/var/lib/docker"
According to the error you posted, kubedns
can not communicate with the API server:
dial tcp 10.233.0.1:443: i/o timeout
This can mean three things:
Your network fabric for containers is not configured properly
You have a problem with your kube-proxy
and the network traffic is not forwarded to the API server when using the kubernetes
internal Service (10.233.0.1)
kube-proxy
logs on your nodes (kubeminion{1,2}) and update your question with any error you may findIf you are also seeing authentication errors:
kube-controller-manager
does not produce valid Service Account tokens
Check that the --service-account-private-key-file
and --root-ca-file
flags of kube-controller-manager
are set to a valid key/cert and restart the service
Delete the default-token-xxxx
secret in the kube-system
namespace and recreate the kube-dns
Deployment