helm cockroachdb GKE - volume full and un-resizable

9/9/2018

I have deployed cockroachdb with a stable helm chart. Unfortunately, I didn't realize the default conf gives me a very small 1Gi, unresizable persistent volume. I also didn't realize that the cockroachdb was using quite a lot of space to monitor itself with time series.

Now, my persistent volumes are full, my cockroachdb pods are crashing:

log: exiting because of error: log: cannot create log: open /cockroach/cockroach-data/logs/cockroach.ckdb-cockroachdb-0.root.2018-09-09T14_53_47Z.000001.log: no space left on device

And I can't resize the volume:

kubectl patch pvc datadir-ckdb-cockroachdb-0 -p '{"spec":{"resources":{"requests":{"storage":"10Gi"}}}}'
The PersistentVolumeClaim "datadir-ckdb-cockroachdb-0" is invalid: spec: Forbidden: field is immutable after creation

Now I'm stuck as I can't run a node to get my data back. Is there anyway out of this? I would at least like to retrieve my data. My service is anyway crashed.

Second question: if I want to avoid this in the future, what values should be used to have dynamically resizable volumes on GKE?

Third question: should the default in the helm chart really stay like that?

-- VsM
cockroachdb
gcloud
google-kubernetes-engine
kubernetes

2 Answers

9/10/2018

As https://stackoverflow.com/users/9231144/patrick-w mentioned, automatic resize of the volumes isn't possible until Kubernetes/GKE version 1.11.

In the meantime, it is possible to manually resize them by editing the disks in the GCE management console. Go there, click on the disks you want to resize, click the Edit button near the top of the page, type in the new desired size of the disk in GB, and click "Save". You'll then have to SSH into the relevant pods (e.g. kubectl exec -it ckdb-cockroachdb-0 bash) and resize the filesystem to use the new disk capacity with a command like resize2fs.

As for your question about changing the default disk size in the Helm Chart, it's a fair question. But what would a good default size be? Too low, and it's easy for this to happen. Too high, and it won't work in environments that don't have large enough disks for the deployment to succeed. In particular, minikube uses tmpfs-backed volumes, so their size is quite limited by the memory of your machine. At the very least, warning in the output after instantiating the Chart seems warranted.

-- Alex Robinson
Source: StackOverflow

9/10/2018

Disk resize it not yet available for gce-pd on 1.10.

You need to have the "allowVolumeExpansion" value in StorageClass set to "true" Unfortunately, GKE 1.10 does not recognize this field and leaves it as . With the release of 1.11, you should be able to resize PVCs dynamically.

In the meantime, to not lose your data, you can make sure the "reclaimPolicy" is set to "retain", unbind the PVC, create a snapshot of the GCE-PD that the PV was using and create a new, larger disk. Or you can mount the GCE-PD on another VM instance to recover the data.

-- Patrick W
Source: StackOverflow