Rightsizing Kubernetes Nodes | How much cost we save when we switch from VMs to containers

7/5/2021

We are running 4 different micro-services on 4 different ec2 autoscaling groups:

service-1 - vcpu:4, RAM:32 GB, VM count:8

service-2 - vcpu:4, RAM:32 GB, VM count:8

service-3 - vcpu:4, RAM:32 GB, VM count:8

service-4 - vcpu:4, RAM:32 GB, VM count:16

We are planning to migrate this workload on EKS (in containers)

We need help in deciding the right node configuration (in EKS) to start with. We can start with a small machine vcpu:4, RAM:32 GB, but will not get any cost saving as each container will need a separate vm. We can use a large machine vcpu:16, RAM: 128 GB, but when these machines scale out, scaled out machine will be large and thus can be underutiliized. Or we can go with a Medium machine like vcpu: 8, RAM:64 GB.

Other than this recommendation, we were also evaluating the cost saving of moving to containers. As per our understanding, every VM machine comes with following overhead

  • Overhead of running hypervisor/virtualisation
  • Overhead of running separate Operating system

Note: One large VM vs many small VMs cost the same on public cloud as cost is based on number of vCPUs + RAM.

Hypervisor/virtualization cost is only valid if we are running on-prem, so no need to consider this. On the 2nd point, how much resources a typical linux machine can take to run a OS? If we provision a small machine (vcpu:2, RAM:4GB), an approximate cpu usage is 0.2% and memory consumption (other than user space is 500Mb). So, running large instances (count:5 instances in comparison to small instances count:40) can save 35 times of this cpu and RAM, which does not seem significant.

-- Pragmatic
containers
kubernetes
microservices

1 Answer

7/5/2021

You are unlikely to see any cost savings in resources when you move to containers in EKS from applications running directly on VM's.

A Linux Container is just an isolated Linux process with specified resource limits, it is no different from a normal process when it comes to resource consumption. EKS still uses virtual machines to provide compute to the cluster, so you will still be running processes on a VM, regardless of containerization or not and from a resource point of view it will be equal. (See this answer for a more detailed comparison of VM's and containers)

When you add Kubernetes to the mix you are actually adding more overhead compared to running directly on VM's. The Kubernetes control plane runs on a set of dedicated VM's. In EKS those are fully managed in a PaaS, but Amazon charges a small hourly fee for each cluster.

In addition to the dedicated control plane nodes, each worker node in the cluster need a set of programs (system pods) to function properly (kube-proxy, kubelet etc.) and you may also define containers that must run on each node (daemon sets), like log collectors and security agents.

When it comes to sizing the nodes you need to find a balance between scaling and cost optimization.

  • The larger the worker node is the smaller the relative overhead of system pods and daemon sets become. In theory a worker node large enough to accommodate all your containers would maximize resources consumed by your applications compared to supporting applications on the node.
  • The smaller the worker nodes are the smaller the horizontal scaling steps can be, which is likely to reduce waste when scaling. It also provides better resilience as a node failure will impact fewer containers.

I tend to prefer nodes that are small so that scaling can be handled efficiently. They should be slightly larger than what is required from the largest containers, so that system pods and daemon sets also can fit.

-- danielorn
Source: StackOverflow