Why does search name resolution fail for elasticsearch-master-headless on Kubernetes 1.16?

3/15/2020

I'm trying to get elasticsearch running on Kubernetes 1.16 with Helm 3 on GKE. I'm aware that both 1.16 and 3 aren't supported yet. I want to prepare a PR to make it compatible. I'm using the helm charts from https://github.com/elastic/helm-charts.

If I use the original chart 7.6.1 the pod creation fails due to create Pod elasticsearch-master-0 in StatefulSet elasticsearch-master failed error: pods "elasticsearch-master-0" is forbidden: unable to validate against any pod security policy: [spec.volumes[1]: Invalid value: "projected": projected volumes are not allowed to be used]. Therefore I created the following patch:

diff --git a/elasticsearch/values.yaml b/elasticsearch/values.yaml
index 053c020..fd9c37b 100755
--- a/elasticsearch/values.yaml
+++ b/elasticsearch/values.yaml
@@ -107,6 +107,7 @@ podSecurityPolicy:
       - secret
       - configMap
       - persistentVolumeClaim
+      - projected

 persistence:
   enabled: true

With this patch on master/d9ccb5a and tag 7.6.1 (tried both) the pods quickly get into unhealthy state due to failed to resolve host [elasticsearch-master-headless] caused by a java.net.UnknownHostException: elasticsearch-master-headless.

I don't understand why the name resolution doesn't work as there's no change introduced in 1.16 which changes name resolution with Kubernetes names afaik. If I try to ping elasticsearch-master-headless from a shell in the pod started with kubectl exec, I can't reach it neither.

I tried to contact the nameserver in /etc/resolv.conf with telnet because it allows specifying a specific port:

[elasticsearch@elasticsearch-master-1 ~]$ cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local us-central1-a.c.myproject.internal c.myproject.internal google.internal
nameserver 10.23.240.10
options ndots:5
[elasticsearch@elasticsearch-master-1 ~]$ telnet 10.23.240.10
Trying 10.23.240.10...
^C
[elasticsearch@elasticsearch-master-1 ~]$ telnet 10.23.240.10 53
Trying 10.23.240.10...
telnet: connect to address 10.23.240.10: Connection refused

I obfuscated the project ID with myproject.

The patch is already proposed to be merged upstream together with other changes at https://github.com/elastic/helm-charts/pull/496.

-- Karl Richter
elasticsearch
google-kubernetes-engine
kubernetes
kubernetes-1.16
kubernetes-helm

2 Answers

3/15/2020

Chances are that there is a firewall(firewalld) blocking 53/udp,tcp or an issue with CoreDNS pod in the cluster where you are performing the test.

-- Arghya Sadhu
Source: StackOverflow

3/15/2020

This is caused by the pod kube-dns crashing due to

F0315 20:01:02.464839 1 server.go:61] Failed to create a kubernetes client: open /var/run/secrets/kubernetes.io/serviceaccount/token: permission denied

Since Kubernetes 1.16 is only available in the rapid channel of GKE and it's a system pod, I consider this a bug.

I'll update this answer if I find the energy to file a bug.

-- Karl Richter
Source: StackOverflow