Kubernetes pods doesn't see each other

6/4/2019

I deployed an ElasticSearch cluster on a Kubernetes cluster using Helm chart from here and after running the following Helm command:

helm install stable/elasticsearch --name crv-elasticsearch  --set data.persistence.storageClass=nfs-client,data.storage=10Gi --set master.persistence.storageClass=nfs-client --set cluster.name=k8s-elk

I have 3 pods that are created. But if I take a look into logs, I find errors:

[o.e.d.z.ZenDiscovery     ] [crv-elasticsearch-master-0] not enough master nodes discovered during pinging (found [[Candidate{node={crv-elasticsearch-master-0}{4pQmoRkoTK28uWahaOo6Xw}{Bl_5yXubSQCld9eQ0zykgw}{10.233.67.55}{10.233.67.55:9300}, clusterStateVersion=-1}]], but needed [2]), pinging again
[2019-06-04T16:12:16,206][WARN ][o.e.d.z.UnicastZenPing   ] [crv-elasticsearch-master-0] failed to resolve host [crv-elasticsearch-discovery]
java.net.UnknownHostException: crv-elasticsearch-discovery

It seems that elasticsearch pods doesn't seen each other.

I have a K8s cluster deployed on top of VMWare vSphere.

-- Dina Bogdan
kubernetes
kubernetes-helm
vmware

1 Answer

6/5/2019

Ok. I've found the answer to my problem and it's not related to ElasticSearch or Helm, but to Kubernetes and Flannel.

I have deployed a Kubernetes cluster composed by 6 VMs: 3 masters and 3 nodes. VMs have been created using VMWare technology. After that, Kubernetes cluster was provisioned using KubeSpray, where Flannel was the implementation for Kubernetes network.

Flannel needs port 8472 (the default value) for doing some stuff related to Vxlan and you will find the property flannel_backend_port in the KubeSpray Ansible playbook. VMWare also do usage of port 8472 for Vxlan so you must change the value of flannel_backend_port from 8472 to another one (give an explicit port, other than 8472) and re-run the KubeSpray Ansible playbook or just apply the change using kubectl -f apply.

This was the real problem in my case. Be careful because in my context, the real problem was VMWare, so it's pretty possible than if you are not using VMWare to not face this problem.

-- Dina Bogdan
Source: StackOverflow