RabbitMQ only shows one node

7/10/2019

I have been trying to set up RabbitMQ on a k8s cluster, I finally got everything set up, but only one node shows up on the managementUI. Here are my steps:

1. Dockerfile Setup

I do this to enable autocluster:

FROM rabbitmq:3.8-rc-management-alpine

MAINTAINER kevlai

RUN rabbitmq-plugins --offline enable rabbitmq_peer_discovery_k8s

2. Set up RBAC

apiVersion: v1
kind: ServiceAccount
metadata:
  name: borecast-rabbitmq
  namespace: borecast-production
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: borecast-rabbitmq
  namespace: borecast-production
rules:
  - apiGroups:
    - ""
    resources:
      - endpoints
    verbs:
      - get
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: borecast-rabbitmq
  namespace: borecast-production
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: dev
subjects:
- kind: ServiceAccount
  name: borecast-rabbitmq
  namespace: borecast-production

3. Set up Secrets

apiVersion: v1
kind: Secret
metadata:
  name: rabbitmq-secret
  namespace: borecast-production
type: Opaque
data:
  username: a2V2
  password: Ym9yZWNhc3RydWx6
  secretCookie: c2VjcmV0Y29va2llaGVyZQ==

4. Set up StorageClass

I'm setting up StorageClass so k8s will automatically do provision for me on AWS.

kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
  name: rabbitmq-sc
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
  zone: us-east-2a
reclaimPolicy: Retain

5. Set up StatefulSets and Services

You can see there are two services. The headless service is for the pods themselves. As for the management service, I'll expose the service for an Ingress controller in order for it to be accessible from outside.

---
apiVersion: v1
kind: Service
metadata:
  name: borecast-rabbitmq-management-service
  namespace: borecast-production
  labels:
    app: borecast-rabbitmq
spec:
  ports:
  - port: 15672
    targetPort: 15672
    name: http
  - port: 5672
    targetPort: 5672
    name: amqp
  selector:
    app: borecast-rabbitmq
---
apiVersion: v1
kind: Service
metadata:
  name: borecast-rabbitmq-service
  namespace: borecast-production
  labels:
    app: borecast-rabbitmq
spec:
  clusterIP: None
  ports:
  - port: 5672
    name: amqp
  selector:
    app: borecast-rabbitmq
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: borecast-rabbitmq
  namespace: borecast-production
spec:
  serviceName: borecast-rabbitmq-service
  replicas: 3
  template:
    metadata:
      labels:
        app: borecast-rabbitmq
    spec:
      serviceAccountName: borecast-rabbitmq
      containers:
      - image: docker.borecast.com/borecast-rabbitmq:v1.0.3
        name: borecast-rabbitmq
        imagePullPolicy: Always
        resources:
          requests:
            memory: "256Mi"
            cpu: "150m"
          limits:
            memory: "512Mi"
            cpu: "250m"
        ports:
        - containerPort: 5672
          name: amqp
        env:
          - name: RABBITMQ_DEFAULT_USER
            valueFrom:
              secretKeyRef:
                name: rabbitmq-secret
                key: username
          - name: RABBITMQ_DEFAULT_PASS
            valueFrom:
              secretKeyRef:
                name: rabbitmq-secret
                key: password
          - name: RABBITMQ_ERLANG_COOKIE
            valueFrom:
              secretKeyRef:
                name: rabbitmq-secret
                key: secretCookie
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: K8S_SERVICE_NAME
            # value: borecast-rabbitmq-service.borecast-production.svc.cluster.local
            value: borecast-rabbitmq-service
          - name: RABBITMQ_USE_LONGNAME
            value: "true"
          - name: RABBITMQ_NODENAME
            value: "rabbit@$(MY_POD_NAME).$(K8S_SERVICE_NAME)"
            # value: rabbit@$(MY_POD_NAME).borecast-rabbitmq-service.borecast-production.svc.cluster.local
          - name: RABBITMQ_NODE_TYPE
            value: disc
          - name: AUTOCLUSTER_TYPE
            value: "k8s"
          - name: AUTOCLUSTER_DELAY
            value: "10"
          - name: AUTOCLUSTER_CLEANUP
            value: "true"
          - name: CLEANUP_WARN_ONLY
            value: "false"
          - name: K8S_ADDRESS_TYPE
            value: "hostname"
          - name: K8S_HOSTNAME_SUFFIX
            value: ".$(K8S_SERVICE_NAME)"
            # value: .borecast-rabbitmq-service.borecast-production.svc.cluster.local
        volumeMounts:
        - name: rabbitmq-volume
          mountPath: /var/lib/rabbitmq
      imagePullSecrets: 
        - name: regcred
  volumeClaimTemplates:
  - metadata:
      name: rabbitmq-volume
      namespace: borecast-production
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: rabbitmq-sc
      resources:
        requests:
          storage: 5Gi

Problem

Everything is working. However, when I access the management UI (i.e. I'm access the borecast-rabbitmq-management-service, port 15672), I only see one node showing up, when it should be three:

The management UI

Also notice that the cluster name is

rabbit@borecast-rabbitmq-0.borecast-rabbitmq-service.borecast-production.svc.cluster.local

but when I log out and log in again, sometimes the number 0 will be changed to 1 or 2 for borecast-rabbitmq-0.

And also notice the node name is

rabbit@borecast-rabbitmq-1.borecast-rabbitmq-service

And you guessed it, sometimes the number is 2 or 0 for borecast-rabbitmq-1.

I have been trying to debug but to no avail. The logs for each pod doesn't raise any suspicions and every service and statefulset are working normally. I repeated the five steps multiple times, and if your cluster is on AWS, you can totally replicate my setup by following the steps (after creating the namespace borecast-production of course). If anybody can shed some light on the matter, I'll be eternally grateful.

-- kevguy
amazon-web-services
docker
kubernetes
rabbitmq

1 Answer

7/11/2019

The problem is with the headless service name definition:

  - name: K8S_SERVICE_NAME
    # value: borecast-rabbitmq-service.borecast-production.svc.cluster.local
    value: borecast-rabbitmq-service

which is a building block of node name:

       - name: RABBITMQ_NODENAME
        value: "rabbit@$(MY_POD_NAME).$(K8S_SERVICE_NAME)"

whereas the proper node name, should be of FQDN of the POD (<statefulset name>-<ordinal index>.<headless_svc_name>.<namespace>.svc.cluster.local):

  - name: RABBITMQ_NODENAME
    value: "rabbit@$(MY_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local"

Therefore you ended up with NodeName

borecast-rabbitmq-1.borecast-rabbitmq-service

instead of:

borecast-rabbitmq-1.borecast-rabbitmq-service.borecast-production.svc.cluster.local

Look up the fqdn of the pod created by borecast-rabbitmq StatefulSet (in other word: SRV records of the Pods) with nslookup util from inside of your cluster as explained here, to see what form the RABBITMQ_NODENAME is expected to have.

-- Nepomucen
Source: StackOverflow