I'm currently using Solr Cloud 6.1, the following behavior can also be observed until 7.0.
I'm trying to create a Solr collection with 5 shards and a replication factor of 2. I have 5 physical servers. Normally, this would distribute all 10 replicas evenly among the available servers.
But, when starting Solr Cloud with a -h
(hostname) param to give every Solr instance an individual, but constant hostname, this doesn't work any more. The distribution then looks like this:
solr-0:
wikipedia_shard1_replica1 wikipedia_shard2_replica1 wikipedia_shard3_replica2 wikipedia_shard4_replica1 wikipedia_shard4_replica2
solr-1:
solr-2:
wikipedia_shard3_replica1 wikipedia_shard5_replica1 wikipedia_shard5_replica2
solr-3:
wikipedia_shard1_replica2
solr-4:
wikipedia_shard2_replica2
I tried using Rule-based Replica Placement, but the rules seem to be ignored.
I need to use hostnames, because Solr runs in a Kubernetes cluster, where IP adresses change frequently and Solr won't find it's cores after a container restart. I first suspected a newer Solr version to be the cause of this, but I narrowed it down to the hostname problem.
Is there any solution for this?
The solution was actually quite simple (but not really documented):
When creating a Service
in OpenShift/Kubernetes, all matching Pods get backed by a load balancer. When all Solr instances get assigned an unique hostname, this hostnames would all resolve to one single IP address (that of the load balancer).
Solr somehow can't deal with that and fails to distribute its shards evenly.
The solution is to use headless services from Kubernetes. Headless services aren't backed by a load balancer and therefore every hostname resolves to an unique IP address.