Running StorageOS on Kubernetes

9/12/2018

I am trying to use StorageOS for dynamic volume provisioning on a self-hosted K8s cluster v1.11.1.

The StorageOS docs offer two options, one using the new CSI, the other using the StorageOS volume driver built into K8s. I first tried the CSI-based approach, but that failed. From what I could gather, getting CSI to work in K8s requires several preparatory steps (according to this reference), which seemed too advanced, so I tried to go the non-CSI route.

So I followed the docs and got the pods, services etc. created, but the pods are restarting all the time. A describe turns up the error:

Liveness probe failed: HTTP probe failed with statuscode: 500

Looking at the logs, I find tons of these:

time="2018-09-12T12:14:20Z" level=info msg="not first cluster node, joining first node" action=create address=192.168.34.201 category=etcd host=worker21 module=cp target=192.168.33.101
time="2018-09-12T12:14:20Z" level=error msg="failed to join existing cluster" action=create category=etcd endpoint="192.168.33.101,192.168.33.201,192.168.34.201,192.168.34.202" error="Get http://192.168.33.101:5705/v1/members: dial tcp 192.168.33.101:5705: connect: connection refused" module=cp

Since there is mention of etcd, it seems StorageOS cannot find it. I was assuming that it would use the etcd of my cluster. Unfortunately, I couldn't find any instructions to set up etcd specifically for StorageOS. On the other hand, the port 5705 is the StorageOS REST API, so maybe that isn't even the problem. Any pointers welcome!

-- PalatinateJ
kubernetes

1 Answer

9/13/2018

Well, turns out that reading the readme helped. ;-) The scripts by default try to deploy the StorageOS daemonset on all nodes - including the master(s). If those are configured to not accept workloads, the whole deployment will fail. Solution is to manually maintain the JOIN variable in the deploy-storageos script.

-- PalatinateJ
Source: StackOverflow