Horizontal scaling approach in Kubernetes

1/13/2020

First of all, I'm pretty new to Kubernetes, and the amount of different architectures and solutions out there make it very difficult to find a good source to fit my specific needs.

So I have a cluster that runs many clones of the same application, which is a stateless heavy-load python application. I enable vertical auto-scaling to add more nodes in peak times, which should help dealing with larger traffic effectively. The thing I'm not sure about is the pod allocation strategy.

What I thought to do is to keep maximum number of idle pods running in my node, waiting for requests to start operate. Is this approach even conceptually right/solid? Is it "in the spirit" of Kubernetes, or am I misusing it somehow?

The reason I think to avoid pod auto-scaling is because it's hard to determine a rule by which to perform the scaling, as well as I don't see the benefits, since each pod has basically two states - idle or full-power on.

-- Itay Davidson
azure-kubernetes
kubernetes

1 Answer

1/14/2020

You can use cluster autoscaler to have some resources in idle if you want to avoid application errors in a peak time, for example.

The cluster autoscaler will increase your cluster size based on your resources usage, but this scaling isn't very quickly and sometimes it could take some minutes, you must have this in you mind when configure cluster autoscaler.

If you already know you peaks times, so you could schedule to increase the nodes number on the cluster to waiting the peak time.

Autoscaling is always complex to set-up in the beginning because your never know what will happens with your customers. There is no a magical formula to do this, my advice for you is to test all options that you have and try to find what approach best fits for your workload.

Here you can see how to configure cluster autoscaler in the most commons providers:

Auto scaler GCE

Autoscaler GKE

Autoscaler AWS

Autoscaler Azure

Here there's a that article that could help you.

About pods resource allocation, the documentation mention:

If you do not specify a CPU limit for a Container, then one of these situations applies: - The Container has no upper bound on the CPU resources it can use. The Container could use all of the CPU resources available on the Node where it is running. - The Container is running in a namespace that has a default CPU limit, and the Container is automatically assigned the default limit. Cluster administrators can use a LimitRange to specify a default value for the CPU limit.

The containers will not allocate resources if it doesn't needed, but at moment they request for resources all resources on your node will be allocated for the pod.

You could create replicas of you container if you want balance the workload in your application, it make sense only if you limited resource of your container or if you know that each container/application support a limited number of requests.

-- KoopaKiller
Source: StackOverflow