How to backup a Postgres database in Kubernetes on Google Cloud?

2/21/2017

What is the best practice for backing up a Postgres database running on Google Cloud Container Engine?

My thought is working towards storing the backups in Google Cloud Storage, but I am unsure of how to connect the Disk/Pod to a Storage Bucket.

I am running Postgres in a Kubernetes cluster using the following configuration:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: postgres-deployment
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - image: postgres:9.6.2-alpine
          imagePullPolicy: IfNotPresent
          env:
            - name: PGDATA
              value: /var/lib/postgresql/data
            - name: POSTGRES_DB
              value: my-database-name
            - name: POSTGRES_PASSWORD
              value: my-password
            - name: POSTGRES_USER
              value: my-database-user
          name: postgres-container
          ports:
            - containerPort: 5432
          volumeMounts:
            - mountPath: /var/lib/postgresql
              name: my-postgres-volume
      volumes:
        - gcePersistentDisk:
            fsType: ext4
            pdName: my-postgres-disk
          name: my-postgres-volume

I have attempted to create a Job to run a backup:

apiVersion: batch/v1
kind: Job
metadata:
  name: postgres-dump-job
spec:
  template:
    metadata:
      labels:
        app: postgres-dump
    spec:
      containers:
        - command:
            - pg_dump
            - my-database-name
          # `env` value matches `env` from previous configuration.
          image: postgres:9.6.2-alpine
          imagePullPolicy: IfNotPresent
          name: my-postgres-dump-container
          volumeMounts:
            - mountPath: /var/lib/postgresql
              name: my-postgres-volume
              readOnly: true
      restartPolicy: Never
      volumes:
        - gcePersistentDisk:
            fsType: ext4
            pdName: my-postgres-disk
          name: my-postgres-volume

(As far as I understand) this should run the pg_dump command and output the backup data to stdout (which should appear in the kubectl logs).

As an aside, when I inspect the Pods (with kubectl get pods), it shows the Pod never gets out of the "Pending" state, which I gather is due to there not being enough resources to start the Job.

Is it correct to run this process as a Job? How do I connect the Job to Google Cloud Storage? Or should I be doing something completely different?

I'm guessing it would be unwise to run pg_dump in the database Container (with kubectl exec) due to a performance hit, but maybe this is ok in a dev/staging server?

-- MattMS
google-cloud-storage
google-kubernetes-engine
kubernetes
postgresql

3 Answers

2/23/2017

I think running pg_dump as a job is a good idea, but connecting directly to your DB's persistent disk is not. Try having pg_dump connect to your DB over the network! You could then have a second disk onto which your pg_dump command dumps the backups. To be on the safe side, you can create regular snapshots of this second disk.

-- Marco Lamina
Source: StackOverflow

9/15/2018

As @Marco Lamina said you can run pg_dump on postgres pod like

DUMP
// pod-name         name of the postgres pod
// postgres-user    database user that is able to access the database
// database-name    name of the database
kubectl exec [pod-name] -- bash -c "pg_dump -U [postgres-user] [database-name]" > database.sql


RESTORE
// pod-name         name of the postgres pod
// postgres-user    database user that is able to access the database
// database-name    name of the database
cat database.sql | kubectl exec -i [pod-name] -- psql -U [postgres-user] -d [database-name]

You can have a job pod that does run this command and exports this to a file storage system such as AWS s3.

-- ganesan dharmalingam
Source: StackOverflow

2/21/2017

The reason for the Jobs POD to stay in Pending state is that it forever tries to attach/mount the GCE persistent disk and fails to do so because it is already attached/mounted to another POD.

Attaching a persistent disk to multiple PODs is only supported if all of them attach/mount the volume in ReadOnly mode. This is of course no viable solution for you.

I never worked with GCE, but it should be possible to easily create a snapshot from the PD from within GCE. This would not give a very clean backup, more like something in the state of "crashed in the middle, but recoverable", but this is probably acceptable for you.

Running pg_dump inside the database POD is a viable solution, with a few drawbacks as you already noticed, especially performance. You'd also have to move out the resulting backup from the POD afterwards, e.g. by using kubectl cp and another exec to cleanup the backup in the POD.

-- Alexander Block
Source: StackOverflow