Gluster cluster in Kubernetes: Glusterd inactive (dead) after node reboot. How to debug?

2/10/2019

I don't know what to do to debug it. I have 1 Kubernetes master node and three slave nodes. I have deployed on the three nodes a Gluster cluster just fine with this guide https://github.com/gluster/gluster-kubernetes/blob/master/docs/setup-guide.md.

I created volumes and everything is working. But when I reboot a slave node, and the node reconnects to the master node, the glusterd.service inside the slave node shows up dead and nothing works after this.

[root@kubernetes-node-1 /]# systemctl status glusterd.service
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: inactive (dead)

I don't know what to do from here, for example /var/log/glusterfs/glusterd.log has been updated last time 3 days ago (it's not being updated with errors after a reboot or a pod deletion+recreation).

I just want to know where glusterd crashes so I can find out why.

How can I debug this crash?

All the nodes (master + slaves) run on Ubuntu Desktop 18 64 bit LTS Virtualbox VMs.

requested logs (kubectl get all --all-namespaces):

NAMESPACE     NAME                                                 READY   STATUS              RESTARTS   AGE
glusterfs     pod/glusterfs-7nl8l                                  0/1     Running             62         22h
glusterfs     pod/glusterfs-wjnzx                                  1/1     Running             62         2d21h
glusterfs     pod/glusterfs-wl4lx                                  1/1     Running             112        41h
glusterfs     pod/heketi-7495cdc5fd-hc42h                          1/1     Running             0          22h
kube-system   pod/coredns-86c58d9df4-n2hpk                         1/1     Running             0          6d12h
kube-system   pod/coredns-86c58d9df4-rbwjq                         1/1     Running             0          6d12h
kube-system   pod/etcd-kubernetes-master-work                      1/1     Running             0          6d12h
kube-system   pod/kube-apiserver-kubernetes-master-work            1/1     Running             0          6d12h
kube-system   pod/kube-controller-manager-kubernetes-master-work   1/1     Running             0          6d12h
kube-system   pod/kube-flannel-ds-amd64-785q8                      1/1     Running             5          3d19h
kube-system   pod/kube-flannel-ds-amd64-8sj2z                      1/1     Running             8          3d19h
kube-system   pod/kube-flannel-ds-amd64-v62xb                      1/1     Running             0          3d21h
kube-system   pod/kube-flannel-ds-amd64-wx4jl                      1/1     Running             7          3d21h
kube-system   pod/kube-proxy-7f6d9                                 1/1     Running             5          3d19h
kube-system   pod/kube-proxy-7sf9d                                 1/1     Running             0          6d12h
kube-system   pod/kube-proxy-n9qxq                                 1/1     Running             8          3d19h
kube-system   pod/kube-proxy-rwghw                                 1/1     Running             7          3d21h
kube-system   pod/kube-scheduler-kubernetes-master-work            1/1     Running             0          6d12h

NAMESPACE     NAME                                                             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
default       service/kubernetes                                               ClusterIP   10.96.0.1        <none>        443/TCP         6d12h
elastic       service/glusterfs-dynamic-9ad03769-2bb5-11e9-8710-0800276a5a8e   ClusterIP   10.98.38.157     <none>        1/TCP           2d19h
elastic       service/glusterfs-dynamic-a77e02ca-2bb4-11e9-8710-0800276a5a8e   ClusterIP   10.97.203.225    <none>        1/TCP           2d19h
elastic       service/glusterfs-dynamic-ad16ed0b-2bb6-11e9-8710-0800276a5a8e   ClusterIP   10.105.149.142   <none>        1/TCP           2d19h
glusterfs     service/heketi                                                   ClusterIP   10.101.79.224    <none>        8080/TCP        2d20h
glusterfs     service/heketi-storage-endpoints                                 ClusterIP   10.99.199.190    <none>        1/TCP           2d20h
kube-system   service/kube-dns                                                 ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP   6d12h

NAMESPACE     NAME                                     DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
glusterfs     daemonset.apps/glusterfs                 3         3         0       3            0           storagenode=glusterfs             2d21h
kube-system   daemonset.apps/kube-flannel-ds-amd64     4         4         4       4            4           beta.kubernetes.io/arch=amd64     3d21h
kube-system   daemonset.apps/kube-flannel-ds-arm       0         0         0       0            0           beta.kubernetes.io/arch=arm       3d21h
kube-system   daemonset.apps/kube-flannel-ds-arm64     0         0         0       0            0           beta.kubernetes.io/arch=arm64     3d21h
kube-system   daemonset.apps/kube-flannel-ds-ppc64le   0         0         0       0            0           beta.kubernetes.io/arch=ppc64le   3d21h
kube-system   daemonset.apps/kube-flannel-ds-s390x     0         0         0       0            0           beta.kubernetes.io/arch=s390x     3d21h
kube-system   daemonset.apps/kube-proxy                4         4         4       4            4           <none>                            6d12h

NAMESPACE     NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
glusterfs     deployment.apps/heketi    1/1     1            0           2d20h
kube-system   deployment.apps/coredns   2/2     2            2           6d12h

NAMESPACE     NAME                                 DESIRED   CURRENT   READY   AGE
glusterfs     replicaset.apps/heketi-7495cdc5fd    1         1         0       2d20h
kube-system   replicaset.apps/coredns-86c58d9df4   2         2         2       6d12h

requested:

tasos@kubernetes-master-work:~$ kubectl logs -n glusterfs glusterfs-7nl8l
env variable is set. Update in gluster-blockd.service
-- Tasos
glusterfs
kubernetes
ubuntu-18.04

1 Answer

2/11/2019

Please check these similar topics:

GlusterFS deployment on k8s cluster-- Readiness probe failed: /usr/local/bin/status-probe.sh

and

https://github.com/gluster/gluster-kubernetes/issues/539

Check tcmu-runner.log log to debug it.

UPDATE:

I think it will be your issue: https://github.com/gluster/gluster-kubernetes/pull/557

PR is prepared, but not merged.

UPDATE 2:

https://github.com/gluster/glusterfs/issues/417

Be sure that rpcbind is installed.

--
Source: StackOverflow