Hazelcast split-brain

9/18/2018

I'm using hazelcast (3.7.4) with OpenShift. Each application is starting a HazelcastInstance.

The network discovery is done via hazelcast-kubernetes (1.1.0).

Sometimes when I deploy the whole application, the cluster is stuck in a split-brain syndrom forever. It never fix and reconnect the whole cluster.

I have to restart pods to enable the reconstruction of a single cluster.

Can someone help me to prevent the split-brain or at least making it recover after ?

Thanks

-- JohnD
hazelcast
kubernetes
openshift

1 Answer

9/19/2018

Use StatefulSet instead of Deployment (or ReplicationController). Then, PODs start one by one which prevents the Split Brain issue. You can have a look at the official OpenShift Code Sample for Hazelcast or specifically at the OpenShift template for Hazelcast.

What's more, try to use the latest Hazelcast version, I think it should re-form the cluster even if you use Deployment and the cluster starts with a Split Brain.

-- RafaƂ Leszko
Source: StackOverflow