Recovering from Kubernetes node failure running Cassandra

2/8/2018

I'm looking for a good solution to replace dead Kubernetes worker node that was running Cassandra in Kubernetes.

Scenario:

  • Cassandra cluster built from 3 pods
  • Failure occurs on one of the Kubernetes worker nodes
  • Replacement node is joining the cluster
  • New pod from StatefulSet is scheduled on new node
  • As pod IP address has changed, new pod is visible as new Cassandra node (4 nodes in cluster in total) and is unable to bootstrap until the dead one is removed.

It's very difficult to follow the official procedure, as Cassandra is running as StatefulSet.

One completely hacky workaround I've found is to use ConfigMap to supply JAVA_OPTS. As changing ConfigMap doesn't recreate pods (yet), you can manipulate running pods in such way that you will be able to follow the procedure.

However, that's, as I mentioned, super hacky. I'm wondering if anyone is running Cassandra on top of Kubernetes and has a better idea how to deal with such failure?

-- Kamil Szczygieł
cassandra
kubernetes

2 Answers

2/9/2018

Jetstack navigator supports this, but it's currently in alpha:

https://github.com/jetstack/navigator

-- jaxxstorm
Source: StackOverflow

2/9/2018

unable to bootstrap until the dead one is removed. Why is that? I use the statefulset and I'm able to kill a pod and have a new one join in

-- VinceMD
Source: StackOverflow