I need to pass all nodes to Cassandra client?

4/4/2019

I saw that Cassandra client needs an array of hosts.

For example, Python uses this:

from cassandra.cluster import Cluster

cluster = Cluster(['192.168.0.1', '192.168.0.2'])

Question 1: Why do I need to pass these nodes?

Question 2: Do I need to pass all nodes? Or is one sufficient? (All nodes have the information about all other nodes, right?)

Question 3: Does the client choose the best node to connect knowing all nodes? Does the client know what data is stored in each node?

Question 4: I'm starting to use Cassandra for the first time, and I'm using Kubernetes for the first time. I deployed a Cassandra cluster with 3 Cassandra nodes. I deployed another one machine and in this machine, I want to connect to Cassandra by a Python Cassandra client. Do I need to pass all the Cassandra IPs to Python Cassandra client? Or is it sufficient to put the Cassandra DNS given by Kubernetes?

For example, when I run a dig command, I know all the Cassandra IPs. I don't know if it's sufficient to pass this DNS to the client

# dig cassandra.default.svc.cluster.local

The IPs are 10.32.1.19, 10.32.1.24, 10.32.2.24

; <<>> DiG 9.10.3-P4-Debian <<>> cassandra.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18340
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;cassandra.default.svc.cluster.local. IN    A

;; ANSWER SECTION:
cassandra.default.svc.cluster.local. 30 IN A 10.32.1.19
cassandra.default.svc.cluster.local. 30 IN A 10.32.1.24
cassandra.default.svc.cluster.local. 30 IN A 10.32.2.24

;; Query time: 2 msec
;; SERVER: 10.35.240.10#53(10.35.240.10)
;; WHEN: Thu Apr 04 16:08:06 UTC 2019
;; MSG SIZE  rcvd: 125

What are the disadvantages of using for example:

from cassandra.cluster import Cluster

cluster = Cluster(['cassandra.default.svc.cluster.local'])
-- Rui Martins
cassandra
kubernetes

1 Answer

4/4/2019

Question 1: Why do I need to pass these nodes?

To make initial contact with the cluster. If the connection is made then there is no use with these contact points.

Question 2: Do I need to pass all nodes? Or is one sufficient? (All nodes have the information about all other nodes, right?)

You can pass only one node as contact point but the problem is if that node is down when the driver tries to contact then, it won't be able to connect to cluster. So if you provide another contact point it will try to connect with it even if the first one failed. It would be better if you use your Cassandra seed list as contact points.

Question 3: Does the client choose the best node to connect knowing all nodes? Does the client know what data is stored in each node?

Once the initial connection is made the client driver will have the metadata about the cluster. The client will know what data is stored in each node and also which node can be queried with less latency. you can configure all these using load balancing policies

Refer: https://docs.datastax.com/en/developer/python-driver/3.10/api/cassandra/policies/

Question 4: I'm starting to use cassandra for first time, and I'm using kubernetes for the first time. I deployed a cassandra cluster with 3 cassandra nodes. I deployed another one machine and in this machine I want to connect to cassandra by a Python Cassandra client. Do I need to pass all cassandra IPs to Python Cassandra client? Or is it sufficient to put the cassandra DNS given by Kubernetes?

If the hostname can be resolved then it is always better to use DNS instead of IP. I don't see any disadvantage.

-- Nama
Source: StackOverflow