Where/How to configure Cassandra.yaml when deployed by Google Kubernetes Engine

1/6/2021

I can't find the answer to a pretty easy question: Where can I configure Cassandra (normally using Cassandra.yaml) when its deployed on a cluster with kubernetes using the Google Kubernetes Engine?

So I'm completely new to distributed databases, Kubernetes etc. and I'm setting up a cassandra cluster (4VMs, 1 pod each) using the GKE for a university course right now.

I used the official example on how to deploy Cassandra on Kubernetes that can be found on the Kubernetes homepage (https://kubernetes.io/docs/tutorials/stateful-application/cassandra/) with a StatefulSet, persistent volume claims, central load balancer etc. Everything seems to work fine and I can connect to the DB via my java application (using the datastax java/cassandra driver) and via Google CloudShell + CQLSH on one of the pods directly. I created a keyspace, some tables and started filling them with data (~100million of entries planned), but as soon as the DB reaches some size, expensive queries result in a timeout exception (via datastax and via cql), just as expected. Speed isn't necessary for these queries right now, its just for testing.

Normally I would start with trying to increase the timeouts in the cassandra.yaml, but I'm unable to locate it on the VMs and have no clue where to configure Cassandra at all. Can someone tell me if these configuration files even exist on the VMs when deploying with GKE and where to find them? Or do I have to configure those Cassandra details via Kubectl/CQL/StatefulSet or somewhere else?

-- Medpyx
cassandra
configuration
google-kubernetes-engine
kubernetes
timeout

1 Answer

1/6/2021

I think the faster way to configure cassandra in Kubernetes Engine, you could use the next deployment of Cassandra from marketplace, there you could configure your cluster and you could follow this guide that is also marked there to configure it correctly.

======

The timeout config seems to be a configuration that require to be modified inside the container (Cassandra configuration itself).

you could use the command: kubectl exec -it POD_NAME -- bash in order to open a Cassandra container shell, that will allow you to get into the container configurations and you could look up for the configuration and change it for what you require.

after you have the configuration that you require you will need to automate it in order to avoid manual intervention every time that one of your pods get recreated (as configuration will not remain after a container recreation). Next options are only suggestions:

  1. Create you own Cassandra image from am own Docker file, changing the value of the configuration you require from there, because the image that you are using right now is a public image and the container will always be started with the config that the pulling image has.

  2. Editing the yaml of your Satefulset where Cassandra is running you could add an initContainer, which will allow to change configurations of your running container (Cassandra) this will make change the config automatically with a script ever time that your pods run.

choose the option that better fits for you.

-- Jorge P.
Source: StackOverflow