Postgres subchart not recommended for production enviroment for airflow in Kubernetes

10/8/2020

I am new working with Airflow and Kubernetes. I am trying to use apache Airflow in Kubernetes.

To deploy it I used this chart: https://github.com/apache/airflow/tree/master/chart.

When I deploy it like in the link above a PostgreSQL database is created. When I explore the value.yml file of the chart I found this:

# Configuration for postgresql subchart
# Not recommended for production
postgresql:
  enabled: true
  postgresqlPassword: postgres
  postgresqlUsername: postgres

I cannot find why is not recommended for production.

and also this:

data:
  # If secret names are provided, use those secrets
  metadataSecretName: ~
  resultBackendSecretName: ~
# Otherwise pass connection values in
  metadataConnection:
    user: postgres
    pass: postgres
    host: ~
    port: 5432
    db: postgres
    sslmode: disable
  resultBackendConnection:
    user: postgres
    pass: postgres
    host: ~
    port: 5432
    db: postgres
    sslmode: disable

What is recommended for production? use my own PostgreSQL database outside Kubernetes? If it is correct, how can I use it instead this one? How I have to modify it to use my own postgresql?

-- J.C Guzman
airflow
kubernetes
postgresql

2 Answers

10/9/2020

Managing databases in Kubernetes its a pain and not recommended due to scaling, replicating, backups, among other common tasks are not as easy to do, what you should do is set up your own Postgres in VM or a managed cloud service as RDS or GCP, more information:

https://cloud.google.com/blog/products/databases/to-run-or-not-to-run-a-database-on-kubernetes-what-to-consider

-- paltaa
Source: StackOverflow

10/9/2020

The reason why it is not recommended for production is because the chart provides a very basic Postgres setup.

In container world containers are transient unlike processes in the VM world. So likelihood of database getting restarted or killed is high. So if we are running stateful components in K8s, someone needs to make sure that the Pod is always running with its configured storage backend.

The following tools help to run Postgres with High Availablity on K8s/containers and provides various other benefits:

  1. Patroni
  2. Stolon

We have used Stolon to run 80+ Postgres instances on Kubernetes in a microservices environment. These are for public facing products so services are heavily loaded as well.

Its very easy to setup a Stolon cluster once you understand its architecture. Apart from HA it also provides replication, standby clusters and CLI for cluster administration.

Also please consider this blog as well for making your decision. It brings in the perspective of how much Ops will be involved in different solutions.

-- arunvelsriram
Source: StackOverflow