Deploy Elasticsearch for Apache Spark on Kubernetes

10/27/2016

I'm wondering if anyone has experience configuring a Kubernetes cluster using the Elasticsearch for Hadoop library. I'm running into issues with the node discovery timing out when trying to write from spark to elasticsearch. I have Elasticsearch up and running thanks to the elasticsearch-cloud-kubernetes plugin for ES, which handles discovery, but I'm not sure how best to configure elasticsearch-hadoop to be aware of the nodes (pods) within the kubernetes cluster. I've tried setting spark.es.nodes to a es-client service, but that doesn't seem to work. I'm also aware that I could enable es.nodes.wan.only, but as noted in the documentation, this would severely impact performance, which defeats the purpose of having them running on the same cluster. Any help would be appreciated.

-- Aaron Duke
apache-spark
elasticsearch
elasticsearch-hadoop
hadoop
kubernetes

1 Answer

12/1/2016

I'm not that schooled on elasticsearch-hadoop but have you tried pointing your elasticsearch-hadoop to your elasticsearch service instead of specific nodes? Your master nodes will normally take care of everything in your ES cluster.

-- jonas kint
Source: StackOverflow