How to connect a web server to a Kubernetes statefulset and headless service

11/4/2021

I have been learning Kubernetes for a few weeks and now I am trying to figure out the right way to connect a web server to a statefulset correctly.

Let's say I deployed a master-slave Postgres statefulset and now I will connect my web server to it. By using a cluster IP service, the requests will be load balanced across the master and the slaves for both reading (SELECT) and writing (UPDATE, INSERT, DELETE) records, right? But I can't do that because writing requests should be handled by the master. However, when I point my web server to the master using the headless service that will give us a DNS entry for each pod, I won't get any load balancing to the other slave replications and all of the requests will be handled by one instance and that is the master. So how am I supposed to connect them the right way? By obtaining both load balancing to all replications along with the slave in reading records and forwarding writing records requests to the master?

Should I use two endpoints in the web server and configure them in writing and reading records?

Or maybe I am using headless services and statefulsets the wrong way since I am new to Kubernetes?

-- joe1531
kubernetes

1 Answer

11/5/2021

Well, your thinking is correct - the master should be read-write and replicas should be read only. How to configure it properly? There are different possible approaches.


First approach is what you thinking about, to setup two headless services - one for accessing primary instances, the second one to access to the replica instances - good example is Kubegres:

In this example, Kubegres created 2 Kubernetes Headless services (of default type ClusterIP) using the name defined in YAML (e.g. "mypostgres"):

  • a Kubernetes service "mypostgres" allowing to access to the Primary PostgreSql instances
  • a Kubernetes service "mypostgres-replica" allowing to access to the Replica PostgreSql instances

Then you will have two endpoints:

Consequently, a client app running inside a Kubernetes cluster, would use the hostname "mypostgres" to connect to the Primary PostgreSql for read and write requests, and optionally it can also use the hostname "mypostgres-replica" to connect to any of the available Replica PostgreSql for read requests.

Check this starting guide for more details.

It's worth noting that there are many database solutions which are using this approach - another example is MySQL. Here is a good article in Kubernetes documentation about setting MySQL using Stateful set.

Another approach is to use some middleware component which will act as a gatekeeper to the cluster, for example Pg-Pool:

Pg pool is a middleware component that sits in front of the Postgres servers and acts as a gatekeeper to the cluster.
It mainly serves two purposes: Load balancing & Limiting the requests.

  1. Load Balancing: Pg pool takes connection requests and queries. It analyzes the query to decide where the query should be sent.
  2. Read-only queries can be handled by read-replicas. Write operations can only be handled by the primary server. In this way, it loads balances the cluster.
  3. Limits the requests: Like any other system, Postgres has a limit on no. of concurrent connections it can handle gracefully.
  4. Pg-pool limits the no. of connections it takes up and queues up the remaining. Thus, gracefully handling the overload.

Then you will have one endpoint for all operations - the Pg-Pool service. Check this article for more details, including the whole setup process.

-- Mikolaj S.
Source: StackOverflow