ActiveMQ Artemis cluster does not redistribute messages after one instance crash

5/21/2021

I have a cluster of Artemis in Kubernetes with 3 group of master/slave:

activemq-artemis-master-0                               1/1     Running
activemq-artemis-master-1                               1/1     Running
activemq-artemis-master-2                               1/1     Running
activemq-artemis-slave-0                                0/1     Running
activemq-artemis-slave-1                                0/1     Running
activemq-artemis-slave-2                                0/1     Running

I am using Spring boot JmsListener to consume messages sent to a wildcard queue as follow.

    @Component
    @Log4j2
    public class QueueListener {
      @Autowired
      private ListenerControl listenerControl;
    
      @JmsListener(id = "queueListener0", destination = "QUEUE.service2.*.*.*.notification")
      public void add(String message, @Header("sentBy") String sentBy, @Header("sentFrom") String sentFrom, @Header("sentAt") Long sentAt) throws InterruptedException {
    
    
        log.info("---QUEUE[notification]:  message={}, sentBy={}, sentFrom={}, sentAt={}",
                    message, sentBy, sentFrom, sentAt);
    
        TimeUnit.MILLISECONDS.sleep(listenerControl.getDuration());
      }
    }

There was 20 messages sent to the queue and master-1 was the delivering node. When 5 messages has been consumed, I killed the master-1 node to simulate a crash, I saw slave-1 started running then yielded back to master-1 after Kubernetes respawn it. The listener threw a JMSException that the connection was lost and it tried to reconnect. Then I saw it successfully connected to master-0 (I saw the queue created and the consumer count > 0). However the queue on master-0 was empty, while the same queue in master-1 still had 15 messages and no consumer attached to it. I waited for a while but the 15 messages was never delivered. I am not sure why redistribution did not kick in.

The attributes of the wildcard queue on master-1 is like this when it came back online after the crash (I manually replace the value of the field accessToken since it has sensitive info):

Attribute	Value
Acknowledge attempts	0
Address	QUEUE.service2.*.*.*.notification
Configuration managed	false
Consumer count	0
Consumers before dispatch	0
Dead letter address	DLQ
Delay before dispatch	-1
Delivering count	0
Delivering size	0
Durable	true
Durable delivering count	0
Durable delivering size	0
Durable message count	15
Durable persistent size	47705
Durable scheduled count	0
Durable scheduled size	0
Enabled	true
Exclusive	false
Expiry address	ExpiryQueue
Filter	
First message age	523996
First message as json	[{"JMSType":"service2","address":"QUEUE.service2.tech-drive2.188100000059.thai.notification","messageID":68026,"sentAt":1621957145988,"accessToken":"REMOVED","type":3,"priority":4,"userID":"ID:56c7b509-bd6f-11eb-a348-de0dacf99072","_AMQ_GROUP_ID":"tech-drive2-188100000059-thai","sentBy":"user@email.com","durable":true,"JMSReplyTo":"queue://QUEUE.service2.tech-drive2.188100000059.thai.notification","__AMQ_CID":"e4469ea3-bd62-11eb-a348-de0dacf99072","sentFrom":"service2","originalDestination":"QUEUE.service2.tech-drive2.188100000059.thai.notification","_AMQ_ROUTING_TYPE":1,"JMSCorrelationID":"c329c733-1170-440a-9080-992a009d87a9","expiration":0,"timestamp":1621957145988}]
First message timestamp	1621957145988
Group buckets	-1
Group count	0
Group first key	
Group rebalance	false
Group rebalance pause dispatch	false
Id	119
Last value	false
Last value key	
Max consumers	-1
Message count	15
Messages acknowledged	0
Messages added	15
Messages expired	0
Messages killed	0
Name	QUEUE.service2.*.*.*.notification
Object Name	org.apache.activemq.artemis:broker="activemq-artemis-master-1",component=addresses,address="QUEUE.service2.\*.\*.\*.notification",subcomponent=queues,routing-type="anycast",queue="QUEUE.service2.\*.\*.\*.notification"
Paused	false
Persistent size	47705
Prepared transaction message count	0
Purge on no consumers	false
Retroactive resource	false
Ring size	-1
Routing type	ANYCAST
Scheduled count	0
Scheduled size	0
Temporary	false
User	f7bcdaed-8c0c-4bb5-ad03-ec06382cb557

The attributes of the wildcard queue on master-0 is like this:

Attribute	Value
Acknowledge attempts	0
Address	QUEUE.service2.*.*.*.notification
Configuration managed	false
Consumer count	3
Consumers before dispatch	0
Dead letter address	DLQ
Delay before dispatch	-1
Delivering count	0
Delivering size	0
Durable	true
Durable delivering count	0
Durable delivering size	0
Durable message count	0
Durable persistent size	0
Durable scheduled count	0
Durable scheduled size	0
Enabled	true
Exclusive	false
Expiry address	ExpiryQueue
Filter	
First message age	
First message as json	[{}]
First message timestamp	
Group buckets	-1
Group count	0
Group first key	
Group rebalance	false
Group rebalance pause dispatch	false
Id	119
Last value	false
Last value key	
Max consumers	-1
Message count	0
Messages acknowledged	0
Messages added	0
Messages expired	0
Messages killed	0
Name	QUEUE.service2.*.*.*.notification
Object Name	org.apache.activemq.artemis:broker="activemq-artemis-master-0",component=addresses,address="QUEUE.service2.\*.\*.\*.notification",subcomponent=queues,routing-type="anycast",queue="QUEUE.service2.\*.\*.\*.notification"
Paused	false
Persistent size	0
Prepared transaction message count	0
Purge on no consumers	false
Retroactive resource	false
Ring size	-1
Routing type	ANYCAST
Scheduled count	0
Scheduled size	0
Temporary	false
User	f7bcdaed-8c0c-4bb5-ad03-ec06382cb557

The Artemis version in use is 2.17.0. Here is my cluster config in master-0 broker.xml. The configs are the same for other brokers except the connector-ref is changed to match the broker:

<?xml version="1.0"?>
<configuration xmlns="urn:activemq" xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
  <core xmlns="urn:activemq:core" xsi:schemaLocation="urn:activemq:core ">
    <name>activemq-artemis-master-0</name>
    <persistence-enabled>true</persistence-enabled>
    <journal-type>ASYNCIO</journal-type>
    <paging-directory>data/paging</paging-directory>
    <bindings-directory>data/bindings</bindings-directory>
    <journal-directory>data/journal</journal-directory>
    <large-messages-directory>data/large-messages</large-messages-directory>
    <journal-datasync>true</journal-datasync>
    <journal-min-files>2</journal-min-files>
    <journal-pool-files>10</journal-pool-files>
    <journal-device-block-size>4096</journal-device-block-size>
    <journal-file-size>10M</journal-file-size>
    <journal-buffer-timeout>100000</journal-buffer-timeout>
    <journal-max-io>4096</journal-max-io>
    <disk-scan-period>5000</disk-scan-period>
    <max-disk-usage>90</max-disk-usage>
    <critical-analyzer>true</critical-analyzer>
    <critical-analyzer-timeout>120000</critical-analyzer-timeout>
    <critical-analyzer-check-period>60000</critical-analyzer-check-period>
    <critical-analyzer-policy>HALT</critical-analyzer-policy>
    <page-sync-timeout>2244000</page-sync-timeout>
    <acceptors>
      <acceptor name="artemis">tcp://0.0.0.0:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;amqpMinLargeMessageSize=102400;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true</acceptor>
      <acceptor name="amqp">tcp://0.0.0.0:5672?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=AMQP;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpMinLargeMessageSize=102400;amqpDuplicateDetection=true</acceptor>
      <acceptor name="stomp">tcp://0.0.0.0:61613?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=STOMP;useEpoll=true</acceptor>
      <acceptor name="hornetq">tcp://0.0.0.0:5445?anycastPrefix=jms.queue.;multicastPrefix=jms.topic.;protocols=HORNETQ,STOMP;useEpoll=true</acceptor>
      <acceptor name="mqtt">tcp://0.0.0.0:1883?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=MQTT;useEpoll=true</acceptor>
    </acceptors>
    <security-settings>
      <security-setting match="#">
        <permission type="createNonDurableQueue" roles="amq"/>
        <permission type="deleteNonDurableQueue" roles="amq"/>
        <permission type="createDurableQueue" roles="amq"/>
        <permission type="deleteDurableQueue" roles="amq"/>
        <permission type="createAddress" roles="amq"/>
        <permission type="deleteAddress" roles="amq"/>
        <permission type="consume" roles="amq"/>
        <permission type="browse" roles="amq"/>
        <permission type="send" roles="amq"/>
        <permission type="manage" roles="amq"/>
      </security-setting>
    </security-settings>
    <address-settings>
      <address-setting match="activemq.management#">
        <dead-letter-address>DLQ</dead-letter-address>
        <expiry-address>ExpiryQueue</expiry-address>
        <redelivery-delay>0</redelivery-delay>
        <!-- with -1 only the global-max-size is in use for limiting -->
        <max-size-bytes>-1</max-size-bytes>
        <message-counter-history-day-limit>10</message-counter-history-day-limit>
        <address-full-policy>PAGE</address-full-policy>
        <auto-create-queues>true</auto-create-queues>
        <auto-create-addresses>true</auto-create-addresses>
        <auto-create-jms-queues>true</auto-create-jms-queues>
        <auto-create-jms-topics>true</auto-create-jms-topics>
      </address-setting>
      <address-setting match="#">
        <dead-letter-address>DLQ</dead-letter-address>
        <expiry-address>ExpiryQueue</expiry-address>
        <redistribution-delay>60000</redistribution-delay>
        <redelivery-delay>0</redelivery-delay>
        <max-size-bytes>-1</max-size-bytes>
        <message-counter-history-day-limit>10</message-counter-history-day-limit>
        <address-full-policy>PAGE</address-full-policy>
        <auto-create-queues>true</auto-create-queues>
        <auto-create-addresses>true</auto-create-addresses>
        <auto-create-jms-queues>true</auto-create-jms-queues>
        <auto-create-jms-topics>true</auto-create-jms-topics>
      </address-setting>
    </address-settings>
    <addresses>
      <address name="DLQ">
        <anycast>
          <queue name="DLQ"/>
        </anycast>
      </address>
      <address name="ExpiryQueue">
        <anycast>
          <queue name="ExpiryQueue"/>
        </anycast>
      </address>
    </addresses>
    <cluster-user>clusterUser</cluster-user>
    <cluster-password>aShortclusterPassword</cluster-password>
    <connectors>
      <connector name="activemq-artemis-master-0">tcp://activemq-artemis-master-0.activemq-artemis-master.svc.cluster.local:61616</connector>
      <connector name="activemq-artemis-slave-0">tcp://activemq-artemis-slave-0.activemq-artemis-slave.svc.cluster.local:61616</connector>
      <connector name="activemq-artemis-master-1">tcp://activemq-artemis-master-1.activemq-artemis-master.svc.cluster.local:61616</connector>
      <connector name="activemq-artemis-slave-1">tcp://activemq-artemis-slave-1.activemq-artemis-slave.svc.cluster.local:61616</connector>
      <connector name="activemq-artemis-master-2">tcp://activemq-artemis-master-2.activemq-artemis-master.svc.cluster.local:61616</connector>
      <connector name="activemq-artemis-slave-2">tcp://activemq-artemis-slave-2.activemq-artemis-slave.svc.cluster.local:61616</connector>
    </connectors>
    <cluster-connections>
      <cluster-connection name="activemq-artemis">
        <connector-ref>activemq-artemis-master-0</connector-ref>
        <retry-interval>500</retry-interval>
        <retry-interval-multiplier>1.1</retry-interval-multiplier>
        <max-retry-interval>5000</max-retry-interval>
        <initial-connect-attempts>-1</initial-connect-attempts>
        <reconnect-attempts>-1</reconnect-attempts>
        <message-load-balancing>ON_DEMAND</message-load-balancing>
        <max-hops>1</max-hops>
        <!-- scale-down>true</scale-down -->
        <static-connectors>
          <connector-ref>activemq-artemis-master-0</connector-ref>
          <connector-ref>activemq-artemis-slave-0</connector-ref>
          <connector-ref>activemq-artemis-master-1</connector-ref>
          <connector-ref>activemq-artemis-slave-1</connector-ref>
          <connector-ref>activemq-artemis-master-2</connector-ref>
          <connector-ref>activemq-artemis-slave-2</connector-ref>
        </static-connectors>
      </cluster-connection>
    </cluster-connections>
    <ha-policy>
      <replication>
        <master>
          <group-name>activemq-artemis-0</group-name>
          <quorum-vote-wait>12</quorum-vote-wait>
          <vote-on-replication-failure>true</vote-on-replication-failure>
          <!--we need this for auto failback-->
          <check-for-live-server>true</check-for-live-server>
        </master>
      </replication>
    </ha-policy>
  </core>
  <core xmlns="urn:activemq:core">
    <jmx-management-enabled>true</jmx-management-enabled>
  </core>
</configuration>

From another answer from Stack Overflow, I understand that my topology for high-availability is redundant and I am planning to remove the slave. However, I don't think the slave is the cause for redistribution of messages not working. Is there a config that I am missing to handle Artemis node crash?

Updated 1: As Justin suggested, I tried to use a cluster of 2 nodes of Artemis without HA.

activemq-artemis-master-0                              1/1     Running            0          27m
activemq-artemis-master-1                              1/1     Running            0          74s

The following is broker.xml of the 2 artemis node. The only different between them is the node name and journal-buffer-timeout:

<?xml version="1.0"?>

<configuration xmlns="urn:activemq" xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
  <core xmlns="urn:activemq:core" xsi:schemaLocation="urn:activemq:core ">
    <name>activemq-artemis-master-0</name>
    <persistence-enabled>true</persistence-enabled>
    <journal-type>ASYNCIO</journal-type>
    <paging-directory>data/paging</paging-directory>
    <bindings-directory>data/bindings</bindings-directory>
    <journal-directory>data/journal</journal-directory>
    <large-messages-directory>data/large-messages</large-messages-directory>
    <journal-datasync>true</journal-datasync>
    <journal-min-files>2</journal-min-files>
    <journal-pool-files>10</journal-pool-files>
    <journal-device-block-size>4096</journal-device-block-size>
    <journal-file-size>10M</journal-file-size>
    <journal-buffer-timeout>100000</journal-buffer-timeout>
    <journal-max-io>4096</journal-max-io>
    <disk-scan-period>5000</disk-scan-period>
    <max-disk-usage>90</max-disk-usage>
    <critical-analyzer>true</critical-analyzer>
    <critical-analyzer-timeout>120000</critical-analyzer-timeout>
    <critical-analyzer-check-period>60000</critical-analyzer-check-period>
    <critical-analyzer-policy>HALT</critical-analyzer-policy>
    <page-sync-timeout>2244000</page-sync-timeout>
    <acceptors>
      <acceptor name="artemis">tcp://0.0.0.0:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;amqpMinLargeMessageSize=102400;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true</acceptor>
      <acceptor name="amqp">tcp://0.0.0.0:5672?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=AMQP;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpMinLargeMessageSize=102400;amqpDuplicateDetection=true</acceptor>
      <acceptor name="stomp">tcp://0.0.0.0:61613?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=STOMP;useEpoll=true</acceptor>
      <acceptor name="hornetq">tcp://0.0.0.0:5445?anycastPrefix=jms.queue.;multicastPrefix=jms.topic.;protocols=HORNETQ,STOMP;useEpoll=true</acceptor>
      <acceptor name="mqtt">tcp://0.0.0.0:1883?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=MQTT;useEpoll=true</acceptor>
    </acceptors>
    <security-settings>
      <security-setting match="#">
        <permission type="createNonDurableQueue" roles="amq"/>
        <permission type="deleteNonDurableQueue" roles="amq"/>
        <permission type="createDurableQueue" roles="amq"/>
        <permission type="deleteDurableQueue" roles="amq"/>
        <permission type="createAddress" roles="amq"/>
        <permission type="deleteAddress" roles="amq"/>
        <permission type="consume" roles="amq"/>
        <permission type="browse" roles="amq"/>
        <permission type="send" roles="amq"/>
        <permission type="manage" roles="amq"/>
      </security-setting>
    </security-settings>
    <cluster-user>ClusterUser</cluster-user>
    <cluster-password>longClusterPassword</cluster-password>
    <connectors>
      <connector name="activemq-artemis-master-0">tcp://activemq-artemis-master-0.activemq-artemis-master.ncp-stack-testing.svc.cluster.local:61616</connector>
      <connector name="activemq-artemis-master-1">tcp://activemq-artemis-master-1.activemq-artemis-master.ncp-stack-testing.svc.cluster.local:61616</connector>
    </connectors>
    <cluster-connections>
      <cluster-connection name="activemq-artemis">
        <connector-ref>activemq-artemis-master-0</connector-ref>
        <retry-interval>500</retry-interval>
        <retry-interval-multiplier>1.1</retry-interval-multiplier>
        <max-retry-interval>5000</max-retry-interval>
        <initial-connect-attempts>-1</initial-connect-attempts>
        <reconnect-attempts>-1</reconnect-attempts>
        <use-duplicate-detection>true</use-duplicate-detection>
        <message-load-balancing>ON_DEMAND</message-load-balancing>
        <max-hops>1</max-hops>
        <static-connectors>
          <connector-ref>activemq-artemis-master-0</connector-ref>
          <connector-ref>activemq-artemis-master-1</connector-ref>
        </static-connectors>
      </cluster-connection>
    </cluster-connections>
    <address-settings>
      <address-setting match="activemq.management#">
        <dead-letter-address>DLQ</dead-letter-address>
        <expiry-address>ExpiryQueue</expiry-address>
        <redelivery-delay>0</redelivery-delay>
        <max-size-bytes>-1</max-size-bytes>
        <message-counter-history-day-limit>10</message-counter-history-day-limit>
        <address-full-policy>PAGE</address-full-policy>
        <auto-create-queues>true</auto-create-queues>
        <auto-create-addresses>true</auto-create-addresses>
        <auto-create-jms-queues>true</auto-create-jms-queues>
        <auto-create-jms-topics>true</auto-create-jms-topics>
      </address-setting>
      <address-setting match="#">
        <dead-letter-address>DLQ</dead-letter-address>
        <expiry-address>ExpiryQueue</expiry-address>
        <redistribution-delay>60000</redistribution-delay>
        <redelivery-delay>0</redelivery-delay>
        <max-size-bytes>-1</max-size-bytes>
        <message-counter-history-day-limit>10</message-counter-history-day-limit>
        <address-full-policy>PAGE</address-full-policy>
        <auto-create-queues>true</auto-create-queues>
        <auto-create-addresses>true</auto-create-addresses>
        <auto-create-jms-queues>true</auto-create-jms-queues>
        <auto-create-jms-topics>true</auto-create-jms-topics>
      </address-setting>
    </address-settings>
    <addresses>
      <address name="DLQ">
        <anycast>
          <queue name="DLQ"/>
        </anycast>
      </address>
      <address name="ExpiryQueue">
        <anycast>
          <queue name="ExpiryQueue"/>
        </anycast>
      </address>
    </addresses>
  </core>
  <core xmlns="urn:activemq:core">
    <jmx-management-enabled>true</jmx-management-enabled>
  </core>
</configuration>

With this setup, I still got the same result, after the the artemis node crash and comeback, the left over message was not moved to the other node.

Update 2 I tried to use non-wildcard queue as Justin suggested but still got the same behavior. One different I noticed is that if I use the non-wildcard queue, the consumer count is only 1 compare to 3 in the case of wildcard queue.Here is the attributes of the old queue after the crash

Acknowledge attempts	0
Address	QUEUE.service2.tech-drive2.188100000059.thai.notification
Configuration managed	false
Consumer count	0
Consumers before dispatch	0
Dead letter address	DLQ
Delay before dispatch	-1
Delivering count	0
Delivering size	0
Durable	true
Durable delivering count	0
Durable delivering size	0
Durable message count	15
Durable persistent size	102245
Durable scheduled count	0
Durable scheduled size	0
Enabled	true
Exclusive	false
Expiry address	ExpiryQueue
Filter	
First message age	840031
First message as json	[{"JMSType":"service2","address":"QUEUE.service2.tech-drive2.188100000059.thai.notification","messageID":8739,"sentAt":1621969900922,"accessToken":"DONOTDISPLAY","type":3,"priority":4,"userID":"ID:09502dc0-bd8d-11eb-b75c-c6609f1332c9","_AMQ_GROUP_ID":"tech-drive2-188100000059-thai","sentBy":"user@email.com","durable":true,"JMSReplyTo":"queue://QUEUE.service2.tech-drive2.188100000059.thai.notification","__AMQ_CID":"c292b418-bd8b-11eb-b75c-c6609f1332c9","sentFrom":"service2","originalDestination":"QUEUE.service2.tech-drive2.188100000059.thai.notification","_AMQ_ROUTING_TYPE":1,"JMSCorrelationID":"90b783d0-d9cc-4188-9c9e-3453786b2105","expiration":0,"timestamp":1621969900922}]
First message timestamp	1621969900922
Group buckets	-1
Group count	0
Group first key	
Group rebalance	false
Group rebalance pause dispatch	false
Id	606
Last value	false
Last value key	
Max consumers	-1
Message count	15
Messages acknowledged	0
Messages added	15
Messages expired	0
Messages killed	0
Name	QUEUE.service2.tech-drive2.188100000059.thai.notification
Object Name	org.apache.activemq.artemis:broker="activemq-artemis-master-0",component=addresses,address="QUEUE.service2.tech-drive2.188100000059.thai.notification",subcomponent=queues,routing-type="anycast",queue="QUEUE.service2.tech-drive2.188100000059.thai.notification"
Paused	false
Persistent size	102245
Prepared transaction message count	0
Purge on no consumers	false
Retroactive resource	false
Ring size	-1
Routing type	ANYCAST
Scheduled count	0
Scheduled size	0
Temporary	false
User	6e25e08b-9587-40a3-b7e9-146360539258

and here is the attributes of the new queue

Attribute	Value
Acknowledge attempts	0
Address	QUEUE.service2.tech-drive2.188100000059.thai.notification
Configuration managed	false
Consumer count	1
Consumers before dispatch	0
Dead letter address	DLQ
Delay before dispatch	-1
Delivering count	0
Delivering size	0
Durable	true
Durable delivering count	0
Durable delivering size	0
Durable message count	0
Durable persistent size	0
Durable scheduled count	0
Durable scheduled size	0
Enabled	true
Exclusive	false
Expiry address	ExpiryQueue
Filter	
First message age	
First message as json	[{}]
First message timestamp	
Group buckets	-1
Group count	0
Group first key	
Group rebalance	false
Group rebalance pause dispatch	false
Id	866
Last value	false
Last value key	
Max consumers	-1
Message count	0
Messages acknowledged	0
Messages added	0
Messages expired	0
Messages killed	0
Name	QUEUE.service2.tech-drive2.188100000059.thai.notification
Object Name	org.apache.activemq.artemis:broker="activemq-artemis-master-1",component=addresses,address="QUEUE.service2.tech-drive2.188100000059.thai.notification",subcomponent=queues,routing-type="anycast",queue="QUEUE.service2.tech-drive2.188100000059.thai.notification"
Paused	false
Persistent size	0
Prepared transaction message count	0
Purge on no consumers	false
Retroactive resource	false
Ring size	-1
Routing type	ANYCAST
Scheduled count	0
Scheduled size	0
Temporary	false
User	6e25e08b-9587-40a3-b7e9-146360539258
-- user3304122
activemq-artemis
artemiscloud
jms-queue
kubernetes

1 Answer

5/27/2021

I've taken your simplified configuration with just 2 nodes using a non-wildcard queue with redistribution-delay of 0, and I reproduced the behavior you're seeing on my local machine (i.e. without Kubernetes). I believe I see why the behavior is such, but in order to understand the current behavior you first must understand how redistribution works in the first place.

In a cluster every time a consumer is created the node on which the consumer is created notifies every other node in the cluster about the consumer. If other nodes in the cluster have messages in their corresponding queue but don't have any consumers then those other nodes redistribute their messages to the node with the consumer (assuming the message-load-balancing is ON_DEMAND and the redistribution-delay is >= 0).

In your case however, the node with the messages is actually down when the consumer is created on the other node so it never actually receives the notification about the consumer. Therefore, once that node restarts it doesn't know about the other consumer and does not redistribute its messages.

I see you've opened ARTEMIS-3321 to enhance the broker to deal with this situation. However, that will take time to develop and release (assuming the change is approved). My recommendation to you in the mean-time would be to configure your client reconnection which is discussed in the documentation, e.g.:

tcp://127.0.0.1:61616?reconnectAttempts=30

Given the default retryInterval of 2000 milliseconds that will give the broker to which the client was originally connected 1 minute to come back up before the client gives up trying to reconnect and throws an exception at which point the application can completely re-initialize its connection as it is currently doing now.

Since you're using Spring Boot be sure to use version 2.5.0 as it contains this change which will allow you to specify the broker URL rather than just host and port.

Lastly, keep in mind that shutting the node down gracefully will short-circuit the client's reconnect and trigger your application to re-initialize the connection, which is not what we want here. Be sure to kill the node ungracefully (e.g. using kill -9 <pid>).

-- Justin Bertram
Source: StackOverflow