RabbitMQ nodes not able to discover each other and join cluster

5/1/2018

I'm new to RabbitMQ and trying to setup a Highly Available Queue using statefulsets. The tutorial I followed is here

After deploying the statefulset and service to kubernetes, The nodes are not able to discover each other in the cluster and the pod goes to Status: CrashLoopBackOff. It seems the Peer Discovery is not working as expected and the node is not able to join the cluster.

My cluster nodes are rabbit@rabbitmq-0, rabbit@rabbitmq-1 and rabbit@rabbitmq-2

$ kubectl exec -it rabbitmq-0 /bin/sh

/ # rabbitmqctl status
Status of node 'rabbit@rabbitmq-0'
Error: unable to connect to node 'rabbit@rabbitmq-0': nodedown

DIAGNOSTICS
===========

attempted to contact: ['rabbit@rabbitmq-0']

rabbit@rabbitmq-0:
  * connected to epmd (port 4369) on rabbitmq-0
  * epmd reports: node 'rabbit' not running at all
                  no other nodes on rabbitmq-0
  * suggestion: start the node

current node details:
- node name: 'rabbitmq-cli-22@rabbitmq-0'
- home dir: /var/lib/rabbitmq
- cookie hash: 5X3n5Gy+r4FL+M53FHwv3w==

rabbitmq.conf

 { rabbit, [
  { loopback_users, [ ] },
  { tcp_listeners, [ 5672 ] },
  { ssl_listeners, [ ] },
  { hipe_compile, false },
  { cluster_nodes, { [ rabbit@rabbitmq-0, rabbit@rabbitmq-1, rabbit@rabbitmq-2], disc } },
  {ssl_listeners, [5671]},
  {ssl_options, [{cacertfile,"/etc/rabbitmq/ca_certificate.pem"},
    {certfile,"/etc/rabbitmq/server_certificate.pem"},
    {keyfile,"/etc/rabbitmq/server_key.pem"},
    {verify,verify_peer},
    {versions, ['tlsv1.2', 'tlsv1.1']}
    {fail_if_no_peer_cert,false}]}
] },
  { rabbitmq_management, [ { listener, [
  { port, 15672 },
  { ssl, false }
] } ] }
].

$ kubectl get statefulset rabbitmq

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: rabbitmq
  name: rabbitmq
  namespace: development
  resourceVersion: "119265565"
  selfLink: /apis/apps/v1/namespaces/development/statefulsets/rabbitmq
  uid: 10c2fabc-cbb3-11e7-8821-00505695519e
spec:
  podManagementPolicy: OrderedReady
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: rabbitmq
  serviceName: rabbitmq
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: rabbitmq
    spec:
      containers:
      - env:
        - name: RABBITMQ_ERLANG_COOKIE
          valueFrom:
            secretKeyRef:
              key: rabbitmq-erlang-cookie
              name: rabbitmq-erlang-cookie
        image: rabbitmq:1.0
        imagePullPolicy: IfNotPresent
        lifecycle:
          postStart:
            exec:
              command:
              - /bin/sh
              - -c
              - |
                if [ -z "$(grep rabbitmq /etc/resolv.conf)" ]; then
                  sed "s/^search \([^ ]\+\)/search rabbitmq.\1 \1/" /etc/resolv.conf > /etc/resolv.conf.new;
                  cat /etc/resolv.conf.new > /etc/resolv.conf;
                  rm /etc/resolv.conf.new;
                fi; until rabbitmqctl node_health_check; do sleep 1; done; if [[ "$HOSTNAME" != "rabbitmq-0" && -z "$(rabbitmqctl cluster_status | grep rabbitmq-0)" ]]; then
                  rabbitmqctl stop_app;
                  rabbitmqctl join_cluster rabbit@rabbitmq-0;
                  rabbitmqctl start_app;
                fi; rabbitmqctl set_policy ha-all "." '{"ha-mode":"exactly","ha-params":3,"ha-sync-mode":"automatic"}'
        name: rabbitmq
        ports:
        - containerPort: 5672
          protocol: TCP
        - containerPort: 5671
          protocol: TCP
        - containerPort: 15672
          protocol: TCP
        - containerPort: 25672
          protocol: TCP
        - containerPort: 4369
          protocol: TCP
        resources:
          limits:
            cpu: 400m
            memory: 2Gi
          requests:
            cpu: 200m
            memory: 1Gi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/lib/rabbitmq
          name: rabbitmq-persistent-data-storage
        - mountPath: /etc/rabbitmq
          name: rabbitmq-config
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 10
      volumes:
      - name: rabbitmq-config
        secret:
          defaultMode: 420
          secretName: rabbitmq-config
  updateStrategy:
    type: OnDelete
  volumeClaimTemplates:
  - metadata:
      creationTimestamp: null
      name: rabbitmq-persistent-data-storage
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
    status:
      phase: Pending
status:
  currentReplicas: 1
  currentRevision: rabbitmq-4234207235
  observedGeneration: 1
  replicas: 1
  updateRevision: rabbitmq-4234207235

$ kubectl get service rabbitmq

apiVersion: v1
kind: Service
metadata:
  labels:
    app: rabbitmq
  name: rabbitmq
  namespace: develop
  resourceVersion: "59968950"
  selfLink: /api/v1/namespaces/develop/services/rabbitmq
  uid: ced85a60-cbae-11e7-8821-00505695519e
spec:
  clusterIP: None
  ports:
  - name: tls-amqp
    port: 5671
    protocol: TCP
    targetPort: 5671
  - name: management
    port: 15672
    protocol: TCP
    targetPort: 15672
  selector:
    app: rabbitmq
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}    

$ kubectl describe pod rabbitmq-0

Name:           rabbitmq-0
Namespace:      development
Node:           node9/170.XX.X.Xx
Labels:         app=rabbitmq
                controller-revision-hash=rabbitmq-4234207235
Status:         Running
IP:             10.25.128.XX
Controlled By:  StatefulSet/rabbitmq
Containers:
  rabbitmq:
    Container ID:   docker://f60b06283d3974382a068ded54782b24de4b6da3203c05772a77c65d76aa2e2f
    Image:          rabbitmq:1.0
    Image ID:       rabbitmq@sha256:6245a81a1fc0fb
    Ports:          5672/TCP, 5671/TCP, 15672/TCP, 25672/TCP, 4369/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
    Ready:          False
    Restart Count:  104
    Limits:
      cpu:     400m
      memory:  2Gi
    Requests:
      cpu:     200m
      memory:  1Gi
    Environment:
      RABBITMQ_ERLANG_COOKIE:  <set to the key 'rabbitmq-erlang-cookie' in secret 'rabbitmq-erlang-cookie'>  Optional: false
    Mounts:
      /etc/rabbitmq from rabbitmq-config (rw)
      /var/lib/rabbitmq from rabbitmq-persistent-data-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-lqbp6 (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  rabbitmq-persistent-data-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  rabbitmq-persistent-data-storage-rabbitmq-0
    ReadOnly:   false
  rabbitmq-config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rabbitmq-config
    Optional:    false
  default-token-lqbp6:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-lqbp6
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     <none>
Events:          <none>
-- CloudJedi
kubernetes
rabbitmq
rabbitmqctl

1 Answer

9/23/2019

This problem is due to failed DNS resolution happening inside the Pod. The pods are not able to contact each other due to no valid DNS records.

In order to solve this, please try creating additional service, or edit an existing one to handle DNS resolution for this.

Creating an additional service for DNS probe, can be done as follows :

kind: Service apiVersion: v1 metadata: namespace: default name: rabbitmq labels: app: rabbitmq type: Service spec: ports: - name: http protocol: TCP port: 15672 targetPort: 15672 - name: amqp protocol: TCP port: 5672 targetPort: 5672 selector: app: rabbitmq type: ClusterIP clusterIP: None

Here you mention in the Service spec that it is of type ClusterIP with clusterIP as none. This should help pods resolve the DNS.

Cheers!!

Rishabh

-- Rishabh Jain
Source: StackOverflow