Infinispan cluster nodes only see themself and not the other instances running in Kubernetes nodes

2/11/2019

I try to set up an infinispan cache in my application that is running on several nodes on google-cloud-platform with Kubernetes and Docker.

Each of these caches shall share their data with the other node chaches so they all have the same data available.

My problem is that the JGroups configuration doesn't seem to work the way I want and the nodes don't see any of their siblings.

I tried several configurations but the nodes always see themselves and do not build up a cluster with the other ones.

I've tried some configurations from GitHub examples like https://github.com/jgroups-extras/jgroups-kubernetes or https://github.com/infinispan/infinispan-simple-tutorials

Here my jgroups.xml

<config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-4.0.xsd">

    <TCP bind_addr="${jgroups.tcp.address:127.0.0.1}"
         bind_port="${jgroups.tcp.port:7800}"
         enable_diagnostics="false"
         thread_naming_pattern="pl"
         send_buf_size="640k"
         sock_conn_timeout="300"
         bundler_type="no-bundler"
         logical_addr_cache_expiration="360000"

         thread_pool.min_threads="${jgroups.thread_pool.min_threads:0}"
         thread_pool.max_threads="${jgroups.thread_pool.max_threads:200}"
         thread_pool.keep_alive_time="60000"
    />
    <org.jgroups.protocols.kubernetes.KUBE_PING
        port_range="1"
        namespace="${KUBERNETES_NAMESPACE:myGoogleCloudPlatformNamespace}"
    />
    <MERGE3 min_interval="10000"
            max_interval="30000"
    />
    <FD_SOCK />
    <!-- Suspect node `timeout` to `timeout + timeout_check_interval` millis after the last heartbeat -->
    <FD_ALL timeout="10000"
            interval="2000"
            timeout_check_interval="1000"
    />
    <VERIFY_SUSPECT timeout="1000"/>
    <pbcast.NAKACK2 xmit_interval="100"
                    xmit_table_num_rows="50"
                    xmit_table_msgs_per_row="1024"
                    xmit_table_max_compaction_time="30000"
                    resend_last_seqno="true"
    />
    <UNICAST3 xmit_interval="100"
              xmit_table_num_rows="50"
              xmit_table_msgs_per_row="1024"
              xmit_table_max_compaction_time="30000"
    />
    <pbcast.STABLE stability_delay="500"
                   desired_avg_gossip="5000"
                   max_bytes="1M"
    />
    <pbcast.GMS print_local_addr="false"
                join_timeout="${jgroups.join_timeout:5000}"
    />
    <MFC max_credits="2m"
         min_threshold="0.40"
    />
    <FRAG3 frag_size="8000"/>
</config>

And how I initalize the Infinispan Cache (Kotlin)

import org.infinispan.configuration.cache.CacheMode
import org.infinispan.configuration.cache.ConfigurationBuilder
import org.infinispan.configuration.global.GlobalConfigurationBuilder
import org.infinispan.manager.DefaultCacheManager
import java.util.concurrent.TimeUnit

class MyCache<V : Any>(private val cacheName: String) {

    companion object {
        private var cacheManager = DefaultCacheManager(
            GlobalConfigurationBuilder()
                .transport().defaultTransport()
                .addProperty("configurationFile", "jgroups.xml")
                .build()
        )
    }

    private val backingCache = buildCache()

    private fun buildCache(): org.infinispan.Cache<CacheKey, V> {
        val cacheConfiguration = ConfigurationBuilder()
            .expiration().lifespan(8, TimeUnit.HOURS)
            .clustering().cacheMode(CacheMode.REPL_ASYNC)
            .build()
        cacheManager.defineConfiguration(this.cacheName, cacheConfiguration)
        log.info("Started cache with name $cacheName. Found cluster members are ${cacheManager.clusterMembers}")
        return cacheManager.getCache(this.cacheName)
    }
}

Here what the logs says

INFO  o.i.r.t.jgroups.JGroupsTransport - ISPN000078: Starting JGroups channel ISPN
INFO  o.j.protocols.kubernetes.KUBE_PING - namespace myNamespace set; clustering enabled
INFO  org.infinispan.CLUSTER - ISPN000094: Received new cluster view for channel ISPN: [myNamespace-7d878d4c7b-cks6n-57621|0] (1) [myNamespace-7d878d4c7b-cks6n-57621]
INFO  o.i.r.t.jgroups.JGroupsTransport - ISPN000079: Channel ISPN local address is myNamespace-7d878d4c7b-cks6n-57621, physical addresses are [127.0.0.1:7800]

I expect that on startup a new node finds the already existing ones and gets the date from them.

Currently, on startup every node only sees themselves and nothing is shared

-- HuMa
google-cloud-platform
infinispan
jgroups
kubernetes

2 Answers

2/12/2019

Usually the first thing to do when you need help with JGroups/Infinispan is setting trace-level logging.

The problem with KUBE_PING might be that the pod does not run under proper serviceaccount, and therefore it does not have the authorization token to access Kubernetes Master API. That's why currently preferred way is using DNS_PING, and registering a headless service. See this example.

-- Radim Vansa
Source: StackOverflow

2/13/2019

Also, bind_addr is set to 127.0.0.1. This means, members on different hosts won't be able to find each other. I suggest set bind_addr, e.g. <TCP bind_addr="site_local".../>.

See [1] for details.

[1] http://www.jgroups.org/manual4/index.html#Transport

-- Bela Ban
Source: StackOverflow