Copy-on-write style memory reuse for kubernetes pods? To make pod spawn faster and more memory efficient

10/17/2021
  • Can Kubernetes pods share significant amount of memory?

  • Does copy-on-write style forking exist for pods?

The purpose is to make pods spawn faster and use less memory.

Our scenario is that we have a dedicated game server to host in kubernetes. The problem is that one instance of the dedicated game server would take up a few GB of memory upfront (e.g. 3 GBs).

Also, we have a few such docker images of game servers, each for game A, game B... Let's call a pod that's running game A's image for game A pod A.

Let's say we now have 3 x pod A, 5 x pod B. Now players rushing into game B, so I need let's say another 4 * pod B urgently.

I can surely spawn 4 more pod B. Kubernetes supports this perfectly. However there are 2 problems:

  • The booting of my game server is very slow (30s - 1min). Players don't want to wait.
  • More importantly for us, the cost of having this many pods that take up so much memory is very high. Because pods do not share memory as far as I know. Where as if it were plain old EC2 machine or bare metal, processes can share memory because they can fork and then copy-on-write.

Copy-on-write style forking and memory sharing seems to solve both problems.

-- Boyang
copy-on-write
kubernetes
memory
shared-memory

2 Answers

10/20/2021

One of Kubernetes' assumptions is that pods are scheduled on different Nodes, which contradicts the idea of sharing common resources (does not apply for storage where there are many options and documentation available). The situation is different when it comes to sharing resources between containers in one pod, but for your issue this doesn't apply.

However, it seems that there is some possibility to share memory - not well documented and I guess very uncommon in Kubernetes. Check my answers with more details below:

Can Kubernetes pods share significant amount of memory?

What I found is that pods can share a common IPC with the host (node). You can check Pod Security Policies, especially field hostIPC:

HostIPC - Controls whether the pod containers can share the host IPC namespace.

Some usage examples and possible security issues can be found here:

Keep in mind that this solution is not common in Kubernetes. Pods with elevated privileges are granted broader permissions than needed:

The way PSPs are applied to Pods has proven confusing to nearly everyone that has attempted to use them. It is easy to accidentally grant broader permissions than intended, and difficult to inspect which PSP(s) apply in a given situation.

That's why the Kubernetes team marked Pod Security Policies as deprecated from Kubernetes v1.21 - check more information in this article.

Also, if you are using multiple nodes in your cluster you should use nodeSelector to make sure that pods will be assigned to same node that means they will be able to share one (host's) IPC.

Does copy-on-write style forking exist for pods?

I did a re-search and I didn't find any information about this possibility, so I think it is not possible.


I think the main issue is that your game architecture is not "very suitable" for Kubernetes. Check these articles and websites about dedicated game servers in Kubernetes- maybe you will them useful:

-- Mikolaj S.
Source: StackOverflow

10/23/2021

A different way to resolve the issue would be if some of the initialisation can be baked into the image.

As part of the docker image build, start up the game server and do as much of the 30s - 1min initialisation as possible, then dump that part of the memory into a file in the image. On game server boot-up, use mmap (with MAP_PRIVATE and possibly even MAP_FIXED) to map the pre-calculated file into memory.

That would solve the problem with the game server boot-up time, and probably also with the memory use; everything in the stack should be doing copy-on-write all the way from the image through to the pod (although you'd have to confirm whether it actually does).

It would also have the benefit that it's plain k8s with no special tricks; no requirements for special permissions or node selection or anything, nothing to break or require reimplementation on upgrades or otherwise get in the way. You will be able to run it on any k8s cluster, whether your own or any of the cloud offerings, as well as in your CI/CD pipeline and dev machine.

-- Jiří Baum
Source: StackOverflow