How does Elasticseach's model translate into these High-Availability patterns?

11/3/2019

I've been studying Elasticsearch model of availability where you create a cluster with master nodes and data nodes [1], where master nodes control the cluster and data nodes hold data. You can also set for each index, a number of shards and replicas that are distributed through these data nodes.

I also seen [2] that High-Avalability patterns are usually some model of Fail-Over (Active-passive or Active-Active) and?/or Replication (Master-slave or Master-master). But I couldn't fit these information together. How can I classify this model in this patterns?

There is also [3] other NoSQL databases like MongoDB having similar HA model and being deployed as a cluster using StatefulSets in Kubernetes. I want to understand more of how it works. Any hints on that?

References:

[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html

[2] https://www.slideshare.net/jboner/scalability-availability-stability-patterns/33-What_do_we_mean_withAvailability

[3] https://kubernetes.io/blog/2017/01/running-mongodb-on-kubernetes-with-statefulsets/

-- staticdev
cluster-computing
elasticsearch
high-availability
kubernetes-statefulset
nosql

1 Answer

11/3/2019

StatefulSet in Kubernetes

In a distribued system, it is much easier to handle stateless workload, since it contains no state and it is trivial to replicate the service to any number of replicas. In Kubernetes statless workloads is managed by ReplicaSet (deployed from Deployment).

Most services require some kind of state. StatefulSet manages stateful workload on Kubernetes and it is different from ReplicaSet in that pods managed by a StatefulSet have a unique identity that is comprised of an ordinal, a stable network identity, and stable storage.

Failover and Replication

I also seen that High-Avalability patterns are usually some model of Fail-Over (Active-passive or Active-Active) and?/or Replication (Master-slave or Master-master). But I couldn't fit these information together. How can I classify this model in this patterns?

These are pretty outdated patterns. Now, Consensus algorthims is the norm for High-Availability and Fail-over since both these problems is about replication and leader-election. Raft (2013) is one of the most popular consensus algorithms and I can recommend the book Designing Data-Intensive Applications if you want to learn more about the problems with High-Availability, Fail-over, Replicaton and Consensus.

Elasticsearch seem to use consensus algorithm for its clustering. Any of the master-eligible nodes may be elected as master and it is recommended to have at least three of them (for high-availability)

Role of nodes

a cluster with master nodes and data nodes, where master nodes control the cluster and data nodes hold data. You can also set for each index, a number of shards and replicas that are distributed through these data nodes.

Nodes can have several roles in an Elasticsearch cluster. When you have a small cluster, your nodes can have several roles, e.g. both master-eligible and data but as your cluster grows to more nodes it is recommended to have dedicated nodes for the roles.

-- Jonas
Source: StackOverflow