Containers for database and scalability

9/27/2019

Consider TiDB and the TiDB Operator as examples for this question.

TiDB

TiDB ("Ti" stands for Titanium) is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

TiDB Operator

The TiDB Operator automatically deploys, operates, and manages a TiDB cluster in any Kubernetes-enabled cloud environment.

Once the database is live, there are broadly two scenarios ever.

  1. Very high rate of read only queries.
  2. Very high rate of write queries.

In either of the scenarios, which component of the containerized database scales? Read replicas? Database 'engine' itself? Persistent volumes? All of the above?

-- cogitoergosum
database
kubernetes-pod
openshift
scalability

1 Answer

9/30/2019

Containerized infrastructure abstracts storage and computing resources (consider PV and Pod in k8s), and these resources scale as the database scales. So the form of scaling depends on the database itself.

For TiDB, while it offers MySQL compatible SQL interface, its architecture is is very different from MySQL and other traditional relational databases:

  • The SQL layer(TiDB) serves SQL queries and interacts with the storage layer based on the calculated query plan. It is stateless and scales on demand for both read and write queries. Typically, you scale out/up the SQL layer to get more compute resources for query plan calculation, join, aggregation and serving more connections.
  • The Storage layer(TiKV) is responsible for storing data and serving KV APIs for the SQL layer. The most interesting part of TiKV is the Multi-raft replication, The storage layer automatically splits data into pieces and distributes them to containers evenly. Each pieces is a raft group whose leader serves read and write queries. Upon scale in/out, the storage layer will automatically migrates data pieces to balance the load. So, scale out the storage layer will give you better read/write throughput and large data capacity.

Back to the question, all of the components mentioned in the question scales. The read/write replicas serving SQL queries can scale, the database "engine"(storage layer) serving KV queries can scale, and the PV is also scaled out along with the scaling process of the storage layer.

-- Aylei
Source: StackOverflow