Prevent Kubernetes from restarting container when I shut down PostgreSQL

11/6/2021

I'm maintaining a Kubernetes cluster which includes two PostgreSQL servers in two different pods, a primary and a replica. The replica is sync'ed from the primary via log shipping.

A glitch caused the log shipping to start failing so the replica is no longer in sync with the primary.

The process for bringing a replica back into sync with the primary requires, amongst other things, stopping the postgres service of the replica. And this is where I'm having trouble.

It appears that Kubernetes is restarting the container as soon as I shut down the postgres service, which immediately restarts postgres again. I need the container running with the postgres service inside it stopped, to allow me to perform the next steps in fixing the broken replication.

How can I get Kubernetes to allow me to shut down the postgres service without restarting the container?

Further Details:

To stop the replica I run a shell on the replica pod via kubectl exec -it <pod name> -- /bin/sh, then run pg_ctl stop from the shell. I get the following response:

server shutting down
command terminated with exit code 137

and I'm kicked out of the shell.

When I run kubectl describe pod I see the following:

Name:         pgset-primary-1
Namespace:    qa
Priority:     0
Node:         aks-nodepool1-95718424-0/10.240.0.4
Start Time:   Fri, 09 Jul 2021 13:48:06 +1200
Labels:       app=pgset-primary
			  controller-revision-hash=pgset-primary-6d7d65c8c7
			  name=pgset-replica
			  statefulset.kubernetes.io/pod-name=pgset-primary-1
Annotations:  <none>
Status:       Running
IP:           10.244.1.42
IPs:
  IP:           10.244.1.42
Controlled By:  StatefulSet/pgset-primary
Containers:
  pgset-primary:
	Container ID:   containerd://bc00b4904ab683d9495ad020328b5033ecb00d19c9e5b12d22de18f828918455
	Image:          *****/crunchy-postgres:centos7-9.6.8-1.6.0
	Image ID:       docker.io/*****/crunchy-postgres@sha256:2850e00f9a619ff4bb6ff889df9bcb2529524ca8110607e4a7d9e36d00879057
	Port:           5432/TCP
	Host Port:      0/TCP
	State:          Running
	  Started:      Sat, 06 Nov 2021 18:29:34 +1300
	Last State:     Terminated
	  Reason:       Completed
	  Exit Code:    0
	  Started:      Sat, 06 Nov 2021 18:28:09 +1300
	  Finished:     Sat, 06 Nov 2021 18:29:18 +1300
	Ready:          True
	Restart Count:  6
	Limits:
	  cpu:     250m
	  memory:  512Mi
	Requests:
	  cpu:     10m
	  memory:  256Mi
	Environment:
	  PGHOST:                 /tmp
	  PG_PRIMARY_USER:        primaryuser
	  PG_MODE:                set
	  PG_PRIMARY_HOST:        pgset-primary
	  PG_REPLICA_HOST:        pgset-replica
	  PG_PRIMARY_PORT:        5432
	  [...]
	  ARCHIVE_TIMEOUT:        60
	  MAX_WAL_KEEP_SEGMENTS:  400
	Mounts:
	  /backrestrepo from backrestrepo (rw)
	  /pgconf from pgbackrestconf (rw)
	  /pgdata from pgdata (rw)
	  /var/run/secrets/kubernetes.io/serviceaccount from pgset-sa-token-nh6ng (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  pgdata:
	Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
	ClaimName:  pgdata-pgset-primary-1
	ReadOnly:   false
  backrestrepo:
	Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
	ClaimName:  backrestrepo-pgset-primary-1
	ReadOnly:   false
  pgbackrestconf:
	Type:      ConfigMap (a volume populated by a ConfigMap)
	Name:      pgbackrest-configmap
	Optional:  false
  pgset-sa-token-nh6ng:
	Type:        Secret (a volume populated by a Secret)
	SecretName:  pgset-sa-token-nh6ng
	Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule op=Exists
				 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
				 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                 From     Message
  ----     ------   ----                ----     -------
  Warning  BackOff  88m (x3 over 3h1m)  kubelet  Back-off restarting failed container
  Normal   Pulled   88m (x7 over 120d)  kubelet  Container image "*****/crunchy-postgres:centos7-9.6.8-1.6.0" already present on machine
  Normal   Created  88m (x7 over 120d)  kubelet  Created container pgset-primary
  Normal   Started  88m (x7 over 120d)  kubelet  Started container pgset-primary

The events suggest the container was started by Kubernetes.

The pod has no liveness or readiness probes so I don't know what would prompt Kubernetes to restart the container when I shut down the postgres service running within it.

-- Simon Tewsi
database-replication
kubernetes
postgresql
postgresql-9.6

1 Answer

11/6/2021

This happens due to restartPolicy. Container lifecycle is terminated due to its process being completed. If you do not want a new container to be created you need to change the restart policy for these pods.

If this pod is a part of deployment, look into kubectl explain deployment.spec.template.spec.restartPolicy

-- jabbson
Source: StackOverflow