Split-brain discovery in Hazelcast cluster in Kubernetes

11/20/2018

I have the following setup.

My Vert.x verticles are clustered with Hazelcast and deployed on Kubernetes cluster with following network info:

------------------------------------------------
           TCP/IP NETWORK INFORMATION
------------------------------------------------
IP Entered = ..................: 10.60.0.0
CIDR = ........................: /14
Netmask = .....................: 255.252.0.0
Netmask (hex) = ...............: 0xfffc0000
Wildcard Bits = ...............: 0.3.255.255
------------------------------------------------
Network Address = .............: 10.60.0.0
Broadcast Address = ...........: 10.63.255.255
Usable IP Addresses = .........: 262,142
First Usable IP Address = .....: 10.60.0.1
Last Usable IP Address = ......: 10.63.255.254

The Hazelcast's cluster.xml has the following section:

<join>
  <multicast enabled="true">
    <multicast-group>224.2.2.3</multicast-group>
    <multicast-port>54327</multicast-port>
  </multicast>
</join>

All seems fine. When I start verticles in pods, I get the output (abbreviated):

>kubectl get pods --namespace develop -o wide

READY   STATUS    RESTARTS   AGE   IP        
1/1     Running   0          52m   10.60.4.18
1/1     Running   0          4m    10.60.6.19
1/1     Running   0          4m    10.60.1.16
1/1     Running   0          4m    10.60.1.18
1/1     Running   0          4m    10.60.6.18  
1/1     Running   0          4m    10.60.1.17
1/1     Running   0          4m    10.60.4.23
1/1     Running   0          8m    10.60.6.17
1/1     Running   0          4m    10.60.4.22
1/1     Running   0          4m    10.60.4.21
1/1     Running   0          4m    10.60.6.20
1/1     Running   0          5d    10.60.4.9 

The problem is, that the clusters are groupped not by the group name specified, but rather by the 3rd number of the ip address. So, I'm getting a cluster of:

                      masterAddress=[10.60.1.17]:5701
                      Members[
                              [10.60.1.17]:5701
                              [10.60.1.16]:5701
                              [10.60.1.18]:5701]]

then 5 members for "cluster" 10.60.4.*, 4 members for 10.60.6.* and so on and they are not merging...

What am I missing?

TIA

-- injecteer
hazelcast
kubernetes
multicast
vert.x

1 Answer

11/23/2018

Hazelcast has a dedicated plugin for the discovery in Kubernetes. Please check: hazelcast-kubernetes.

Mutlicast may or may not work, since it depends on the underlying network. In my experience on GKE, it sometimes works, sometimes it doesn't. That is why multicast-based discovery is never recommended for Kubernetes.

Resources:

-- RafaƂ Leszko
Source: StackOverflow