I am new working with Airflow and Kubernetes. I am trying to use apache Airflow in Kubernetes.
To deploy it I used this chart: https://github.com/apache/airflow/tree/master/chart.
When I deploy it like in the link above a PostgreSQL database is created. When I explore the value.yml file of the chart I found this:
# Configuration for postgresql subchart
# Not recommended for production
postgresql:
enabled: true
postgresqlPassword: postgres
postgresqlUsername: postgres
I cannot find why is not recommended for production.
and also this:
data:
# If secret names are provided, use those secrets
metadataSecretName: ~
resultBackendSecretName: ~
# Otherwise pass connection values in
metadataConnection:
user: postgres
pass: postgres
host: ~
port: 5432
db: postgres
sslmode: disable
resultBackendConnection:
user: postgres
pass: postgres
host: ~
port: 5432
db: postgres
sslmode: disable
What is recommended for production? use my own PostgreSQL database outside Kubernetes? If it is correct, how can I use it instead this one? How I have to modify it to use my own postgresql?
Managing databases in Kubernetes its a pain and not recommended due to scaling, replicating, backups, among other common tasks are not as easy to do, what you should do is set up your own Postgres in VM or a managed cloud service as RDS or GCP, more information:
The reason why it is not recommended for production is because the chart provides a very basic Postgres setup.
In container world containers are transient unlike processes in the VM world. So likelihood of database getting restarted or killed is high. So if we are running stateful components in K8s, someone needs to make sure that the Pod is always running with its configured storage backend.
The following tools help to run Postgres with High Availablity on K8s/containers and provides various other benefits:
We have used Stolon to run 80+ Postgres instances on Kubernetes in a microservices environment. These are for public facing products so services are heavily loaded as well.
Its very easy to setup a Stolon cluster once you understand its architecture. Apart from HA it also provides replication, standby clusters and CLI for cluster administration.
Also please consider this blog as well for making your decision. It brings in the perspective of how much Ops will be involved in different solutions.