I had the same problem with @Amir Soleimani but the error result was a bit different, I tried all the solutions in that post but all of them didn't work.... I'm using Azure Kubernetes Service (AKS) and after upgrading from 1.13.xx to 1.18.xx can't start RabbitMQ anymore.
UPDATED - Solution that worked for me (please consider this approach as it may affect your existing queues)
Remove current rabbitmq StatefulSet including persistent disks
========
Here is my StatefulSet file:
apiVersion: v1
kind: Service
metadata:
name: rabbitmq-management
labels:
app: rabbitmq
spec:
ports:
- port: 80
targetPort: 15672
name: http
selector:
app: rabbitmq
type: LoadBalancer
---
apiVersion: v1
kind: Service
metadata:
name: rabbitmq
labels:
app: rabbitmq
spec:
ports:
- port: 5672
name: amqp
- port: 4369
name: epmd
- port: 25672
name: rabbitmq-dist
clusterIP: None
selector:
app: rabbitmq
---
apiVersion: v1
kind: Secret
metadata:
name: rabbitmq-config
namespace: default
type: Opaque
data:
erlang.cookie: samplecookie==
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: rabbitmq
labels:
app: rabbitmq
spec:
serviceName: rabbitmq
selector:
matchLabels:
app: rabbitmq
replicas: 3
template:
metadata:
labels:
app: rabbitmq
spec:
containers:
- name: rabbitmq
image: 'rabbitmq:3.6.6-management-alpine'
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- >
if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then
sed "s/^search \([^ ]\+\)/search rabbitmq.\1 \1/" /etc/resolv.conf > /etc/resolv.conf.new;
cat /etc/resolv.conf.new > /etc/resolv.conf;
rm /etc/resolv.conf.new;
fi;
until rabbitmqctl node_health_check; do sleep 1; done;
if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then
rabbitmqctl stop_app;
rabbitmqctl join_cluster rabbit@rabbitmq-0;
rabbitmqctl start_app;
fi;
rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}'
env:
- name: RABBITMQ_ERLANG_COOKIE
valueFrom:
secretKeyRef:
name: rabbitmq-config
key: erlang.cookie
- name: RABBITMQ_DEFAULT_USER
value: username
- name: RABBITMQ_DEFAULT_PASS
value: password
ports:
- containerPort: 5672
name: amqp
- containerPort: 15672
name: amqp-management
volumeMounts:
- mountPath: /var/lib/rabbitmq
name: volume
volumeClaimTemplates:
- metadata:
name: volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
Result of kubectl describe pod rabbitmq-0
DIAGNOSTICS
===========
attempted to contact: ['rabbit@rabbitmq-0']
rabbit@rabbitmq-0:
* connected to epmd (port 4369) on rabbitmq-0
* epmd reports: node 'rabbit' not running at all
no other nodes on rabbitmq-0
* suggestion: start the node
current node details:
- node name: 'rabbitmq-cli-91@rabbitmq-0'
- home dir: /var/lib/rabbitmq
- cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==
Error: unable to connect to node 'rabbit@rabbitmq-0': nodedown
DIAGNOSTICS
===========
attempted to contact: ['rabbit@rabbitmq-0']
rabbit@rabbitmq-0:
* connected to epmd (port 4369) on rabbitmq-0
* epmd reports: node 'rabbit' not running at all
no other nodes on rabbitmq-0
* suggestion: start the node
current node details:
- node name: 'rabbitmq-cli-26@rabbitmq-0'
- home dir: /var/lib/rabbitmq
- cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}
Error: rabbit application is not running on node rabbit@rabbitmq-0.
* Suggestion: start it with "rabbitmqctl start_app" and try again
, message: "Timeout: 70.0 seconds ...\nChecking health of node 'rabbit@rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit@rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit@rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit@rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit@rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit@rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit@rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit@rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit@rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit@rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit@rabbitmq-0' ...\nTimeout: 70.0 seconds ...\nChecking health of node 'rabbit@rabbitmq-0' ...\nError: unable to connect to node 'rabbit@rabbitmq-0': nodedown\n\nDIAGNOSTICS\n===========\n\nattempted to contact: ['rabbit@rabbitmq-0']\n\nrabbit@rabbitmq-0:\n * connected to epmd (port 4369) on rabbitmq-0\n * epmd reports: node 'rabbit' not running at all\n no other nodes on rabbitmq-0\n * suggestion: start the node\n\ncurrent node details:\n- node name: 'rabbitmq-cli-91@rabbitmq-0'\n- home dir: /var/lib/rabbitmq\n- cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==\n\nError: unable to connect to node 'rabbit@rabbitmq-0': nodedown\n\nDIAGNOSTICS\n===========\n\nattempted to contact: ['rabbit@rabbitmq-0']\n\nrabbit@rabbitmq-0:\n * connected to epmd (port 4369) on rabbitmq-0\n * epmd reports: node 'rabbit' not running at all\n no other nodes on rabbitmq-0\n * suggestion: start the node\n\ncurrent node details:\n- node name: 'rabbitmq-cli-26@rabbitmq-0'\n- home dir: /var/lib/rabbitmq\n- cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==\n\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: {aborted,{no_exists,[rabbit_vhost,[{{vhost,'$1','_'},[],['$1']}]]}}\nError: rabbit application is not running on node rabbit@rabbitmq-0.\n * Suggestion: start it with \"rabbitmqctl start_app\" and try again\n"
Warning FailedPostStartHook 23m kubelet Exec lifecycle hook ([/bin/sh -c if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then
sed "s/^search \([^ ]\+\)/search rabbitmq.\1 \1/" /etc/resolv.conf > /etc/resolv.conf.new;
cat /etc/resolv.conf.new > /etc/resolv.conf;
rm /etc/resolv.conf.new;
fi; until rabbitmqctl node_health_check; do sleep 1; done; if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then
rabbitmqctl stop_app;
rabbitmqctl join_cluster rabbit@rabbitmq-0;
rabbitmqctl start_app;
fi; rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}'
]) for Container "rabbitmq" in Pod "rabbitmq-0_default(3ac91d73-de7b-4cde-81f6-c31bacd10252)" failed - error: command '/bin/sh -c if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then
sed "s/^search \([^ ]\+\)/search rabbitmq.\1 \1/" /etc/resolv.conf > /etc/resolv.conf.new;
cat /etc/resolv.conf.new > /etc/resolv.conf;
rm /etc/resolv.conf.new;
fi; until rabbitmqctl node_health_check; do sleep 1; done; if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then
rabbitmqctl stop_app;
rabbitmqctl join_cluster rabbit@rabbitmq-0;
rabbitmqctl start_app;
fi; rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}'
' exited with 137: Error: unable to connect to node 'rabbit@rabbitmq-0': nodedown
Result of kubectl logs rabbitmq-0
=CRASH REPORT==== 18-Jul-2021::11:06:01 ===
crasher:
initial call: application_master:init/4
pid: <0.156.0>
registered_name: []
exception exit: {{timeout_waiting_for_tables,
[rabbit_user,rabbit_user_permission,rabbit_vhost,
rabbit_durable_route,rabbit_durable_exchange,
rabbit_runtime_parameters,rabbit_durable_queue]},
{rabbit,start,[normal,[]]}}
in function application_master:init/4 (application_master.erl, line 134)
ancestors: [<0.155.0>]
messages: [{'EXIT',<0.157.0>,normal}]
links: [<0.155.0>,<0.31.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 987
stack_size: 27
reductions: 98
neighbours:
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: rabbit
exited: {{timeout_waiting_for_tables,
[rabbit_user,rabbit_user_permission,rabbit_vhost,
rabbit_durable_route,rabbit_durable_exchange,
rabbit_runtime_parameters,rabbit_durable_queue]},
{rabbit,start,[normal,[]]}}
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: amqp_client
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: rabbit_common
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: xmerl
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: os_mon
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: inets
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: asn1
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: syntax_tools
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: mnesia
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: crypto
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: ranch
exited: stopped
type: temporary
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
application: compiler
exited: stopped
type: temporary
BOOT FAILED
===========
Timeout contacting cluster nodes: ['rabbit@rabbitmq-1','rabbit@rabbitmq-2'].
BACKGROUND
==========
This cluster node was shut down while other nodes were still running.
To avoid losing data, you should start the other nodes first, then
start this one. To force this node to start, first invoke
"rabbitmqctl force_boot". If you do so, any changes made on other
cluster nodes after this one was shut down may be lost.
DIAGNOSTICS
===========
attempted to contact: ['rabbit@rabbitmq-1','rabbit@rabbitmq-2']
rabbit@rabbitmq-1:
* unable to connect to epmd (port 4369) on rabbitmq-1: nxdomain (non-existing domain)
rabbit@rabbitmq-2:
* unable to connect to epmd (port 4369) on rabbitmq-2: nxdomain (non-existing domain)
current node details:
- node name: 'rabbit@rabbitmq-0'
- home dir: /var/lib/rabbitmq
- cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==
=INFO REPORT==== 18-Jul-2021::11:06:01 ===
Timeout contacting cluster nodes: ['rabbit@rabbitmq-1','rabbit@rabbitmq-2'].
BACKGROUND
==========
This cluster node was shut down while other nodes were still running.
To avoid losing data, you should start the other nodes first, then
start this one. To force this node to start, first invoke
"rabbitmqctl force_boot". If you do so, any changes made on other
cluster nodes after this one was shut down may be lost.
DIAGNOSTICS
===========
attempted to contact: ['rabbit@rabbitmq-1','rabbit@rabbitmq-2']
rabbit@rabbitmq-1:
* unable to connect to epmd (port 4369) on rabbitmq-1: nxdomain (non-existing domain)
rabbit@rabbitmq-2:
* unable to connect to epmd (port 4369) on rabbitmq-2: nxdomain (non-existing domain)
current node details:
- node name: 'rabbit@rabbitmq-0'
- home dir: /var/lib/rabbitmq
- cookie hash: P1XNOe5pN3Ug2FCRFzH7Xg==
{"init terminating in do_boot",timeout_waiting_for_tables}
init terminating in do_boot (timeout_waiting_for_tables)
Crash dump is being written to: erl_crash.dump...
What I tried but didn't work:
rabbitmqctl stop_app
rabbitmqctl force_boot
Remove StatefulSet and re-install
Re-configure the yaml file
Please try force boot in post Start scipt:
...
fi;
if [[ "$HOSTNAME" == "rabbitmq-0" ]]; then
rabbitmqctl stop_app;
rabbitmqctl force_boot;
fi;
until rabbitmqctl node_health_check; do sleep 1; done; ...