Regarding how the kubernetes etcd runs the Raft consensus

7/29/2020

When I was studying the Kubernetes architecture I noticed the truth that etcd runs Raft consensus protocol to maintain a reliable key-value store for all the information inside the Kubernetes cluster. I have studied Raft in depth when I was learning distributed system but here due to the lack of knowledge of Kubernetes I'm having some confusions about how Raft really comes into the game here:

Basically I have searched all the online resources that tried to explain the role of Raft here and right now my visualization about etcd here is that there are multiple nodes inside a single etcd, and when we talk about Raft here we are talking about Raft that runs inside all of these "etcd" nodes instead of all the master and worker nodes that we often talk about in Kubernetes, am I right?

If what I have said above is correct, then the whole point of this etcd should be fault tolerance in my opinion, but when we draw the architecture of Kubernetes etcd usually looks like a stand alone piece of database, I'm curious about in practice how and where these etcd nodes get deployed? They have to be deployed separately otherwise it is still a single point failure isn't it? And how many etcd nodes typically are there in one kubernetes cluster deployment?

Thanks for any pointers!

-- Boooooo
kubernetes

1 Answer

8/7/2020

This is a community wiki answer based on the conversation from the chat. Posting it for the clarity and to show answers for the OP's questions raised there. I do not take any reputation gains for it. Feel free to edit and expand it.

  • Kubernetes does not really care how it's deployed: it's totally up to you or your deployment tools. As long as kubernetes API server can connect to it - you're good.

  • Kubernetes often utilize the actual worker nodes as a construct of ETCD cluster? - I would not say that. ETCD runs on wherever the sysadmin decided them to run. In my clusters ETCD daemons run on every kubernetes master (and I deploy kubernetes with kubeadm). What it means is that at most ETCD are static pods, which are not scheduled and always run on the same machines.

  • So if in your case it runs on all the master node, wouldn’t that mean we have to have a certain amount of master nodes so that we can get a fault tolerant ETCD? But I assume we are also allowed to have only one master node right? - Number of master nodes and number of ETCD are tangential. For ETCD cluster to be healthy you need to have majority online. For kubernetes to be healthy you need at one master alive. Don't confuse yourself even further: treat ETCD as a database which kubernetes uses as a storage, and that database runs somewhere.

-- Wytrzymały Wiktor
Source: StackOverflow