unable to resize Postgres 10 /dev/shm due to kubernetes limiting shared memory

8/5/2020

Encountered the following error while reading from a PG 10 table with 10 parallel threads:-

ERROR: could not resize shared memory segment "/PostgreSQL.1110214013" to 3158016 bytes: No space left on device

Seems to be the result of K8s limiting the maximum size of /dev/shm/ to 64MB. Setting this value any higher results in 64MB.

Parallel reads are being carried out by Spark tasks and are partitioned based on the hashed value of an identifying column. Wondering if unbalanced partitions could be causing a particular task to exceed the value of postgres work_mem for the process causing a write to disk.

I am seeing a corresponding error log for each of my threads so this shared memory segment resize is occuring multiple times (presumably the resizes requested are pushing above the locked 64MB)

Have tried upping work_mem from 4MB, to 32MB, 64MB and finally 256MB but have seen the error at each stage. Below are the full set of PG settings that I believe can be tweaked to avoid the problematic disk usage :-

  • effective_cache_size: "750MB"
  • shared_buffers: "2GB"
  • min_wal_size: "80MB"
  • max_wal_size: "5GB"
  • work_mem: "4MB,32MB,64MB,128MB,256MB" (all tried)
  • random_page_cost: 4 (wondering if this setting could be of use?)
  • max_connections: 100

Have a potential workaround that involves mounting a directory to /dev/shm/ but would prefer to avoid this solution as I would be left unable to limit the size the directory could grow to, would ideally find a solution that works with the 64MB.

Thanks.

-- Mackey LK
apache-spark
kubernetes
postgresql

1 Answer

4/18/2021

It seems that (according to this explanation) if you want to avoid the issue while leaving /dev/shm limited to 64MB, you'll need to set shared_buffers to less than 64MB. However, mounting an emptyDir volume to /dev/shm is probably the best option if there is more memory physically available to your Kubernetes node.

It's true that as of Kubernetes 1.21 you can't constrain the size of the emptyDir volume (unless you have access to configure feature gates: the new SizeMemoryBackedVolumes feature gate is still in alpha), but this probably doesn't matter for the Postgres use case.

If Postgres is the only application running in the pod, and you've configured shared_buffers to around 25% of available memory as recommended by the Postgres documentation, the current behavior of offering up to 50% of node memory to the emptyDir volume before eviction should be fine. You'd need to trigger some bug in Postgres for it to consume much more of that memory than the shared_buffers setting.

So the best solution is likely to set shared_buffers to ~25% of available node memory, then mount an emptyDir volume to /dev/shm.

-- Jason Dreyzehner
Source: StackOverflow