I'm planning to deploy a WebRTC custom videoconference software (based on NodeJS, using websockets) with Kubernetes, but I have some doubts about scaling down this environment.
Actually, I'm planning to use cloud hosted Kubernetes (GKE, EKS, AKS or any) to be able to auto-scale nodes in the cluster to attend the demand increase and decrease. But, scaling up is not the problem, but it's about scaling down.
The cluster will scale down based on some CPU average usage metrics across the cluster, as I understand, and if it tries to remove some node, it will start to drain connections and stop receiving new connections, right? But now, imagine that there's a videoconference still running in this "pending deletion" node. There are two problems:
1 - Stopping the node before the videoconference finishes (it will drop the meeting)
2 - With the draining behaviour when it starts to scale down, it will stop receiving new connections, so if someone tries to join in this running video conference, it will receive a timeout, right?
So, which is the best strategy to scale down nodes for a video conference solution? Any ideas?
Thanks
I would say this is not a matter of resolving it on kubernetes level by some specific scaling strategy but rather application ability to handle such situations. It isn't even specific to kubernetes. Imagine that you deploy it directly on compute instances which are also subject to autoscale and you'll end up in exactly the same situation when the load decreases and one of the instances is removed from the set.
You should rather ask yourself if such application is suitable to be deployed as kubernetes workload. I can imagine that such videoconference session doesn't have to rely on the backend deployed on a single node only. You can even define some affinity or anti-affinity rules to prevent your Pods from being scheduled on the same node. So if the whole application cluster is still up and running (it's Pods are running on different nodes), eviction of a limited subset of Pods should not have a big impact.
You can actually face the same issue with any other application as vast majority of them base on some session which needs to be established between the client software and the server part. I would say it's application responsibility to be able to handle such scenarios. If some of the users unexpectedly loses the connection it should be possible to immediately redirect them to the running instance e.g. different Pod which is still able to accept new requests.
So basically if the application is designed to be highly available, scaling in (when we talk about horizontal scaling we actually talk about scaling in and scaling out) the underyling VMs, or more specifically kubernetes nodes, shouldn't affect it's high availability capabilities. From the other hand if it is not designed to be highly available, solution such as kubernetes probably won't help much.
There is no best strategy at your use case. When a cloud provider scales down, it is going to get one node randomly and kill it. It's not going to check whether this node has less resource consumption, so let's kill this one. It might end up killing the node with most pods running on it.
I would focus on how you want to schedule your pods. I would try to schedule them, if possible, on a node with running pods already (Pod inter-affinity), and would set up a Pod Disruption Budget to all Deployments/StatefulSets/etc (depending on how you want to run the pods). As a result it would only scale down when there are no pods running on a specific node, and it would kill that node, because on the other nodes there are pods; protected by a PDB.