Kubernetes & Docker Efficiency

2/24/2020

I've been looking for information on how efficiently Kubernetes & Docker are in terms of using machine resources, but I haven't found much so far. Here are my three questions, all about Kubernetes+Docker:

  • If multiple containers on the same node are running the same binary, are the code pages shared between all these instances? That is, is there a single set of physical pages allocated on the node for all these processes? For example, if I'm running a service mesh like Istio, which runs Envoy in every pod, is the system smart enough to only load the Envoy code in memory once, or does all the indirection taking place prevent the Linux kernel from recognizing that sharing is possible?

  • In a large Kubernetes deployment, there will end up being a considerable number of redundantly downloaded docker images on each node. Instead, it would seem more effective to have a single in-cluster repository for these images that all nodes can fetch from. I saw this about having docker use NFS for a common image store. Is this the only answer?

  • I heard there's a practical limit to the number of pods Kubernetes will schedule on a single node (30). Such a small limit forces you to use smaller VMs in order to be able to fully saturate them. Anybody know why this limit exists and whether it will eventually be raised? I ask this in the context of trying to run Kubernetes on bare metal where VMs aren't used at all. In such a world, I'd want to be able to pack way more than 30 pods on a (large) physical machine.

Thank you for any insights or pointers.

-- Geeknoid
docker
kubernetes

1 Answer

2/25/2020

You state your question in the way that you plan to use docker as container runtime for kubernetes. That is fine - but there are more choices. Depending on the runtime the answers will change. In general kubernetes provides an abstraction over the actual scheduling and running of pods/containers. Perhaps you invest too much human time into details that can be solved with more metal, which is cheap.

Multiple containers on a single node are usually (docker/containerd/crio) just system processes. Like you launch your Apache httpd multiple times yourself. If the kernel uses memory deduplication, it can indeed share pages. If you use a container runtime that launches micro-VMs (firecracker,kata, ...) I doubt memory deduplication will be possible.

I would not recommend to share storage for the container images, f.e. with NFS. With some customer setups I had to diagnose issues caused by this. like deadlocks. Basically you would reduce the robustness of your cluster in order to save disk space. Just use more metal.

The usual limit is 110 Pods per node which is usually plenty. You can change this limit using --max-pods parameter to the kubelet process or configuration file for kubelet. The reason for the limit is that the management of a pod incurs effort on the kubelet and etcd/apiserver side.

-- Thomas
Source: StackOverflow