I've got an AKS cluster configured with two fairly small VM worker nodes, and then a virtual node to use ACI. What I really want to happen is for pods to get scheduled on the two VM nodes until they are full, then use the virtual node, but I cannot get this to work.
I've tried using node affinity, as suggested here, but this just doesn't work, pods get scheduled on the virtual node first. If I use a required node affinity, then they do get scheduled only on the VM nodes, but that is not what I want. I am guessing the issue here is that the resource availability on my VM nodes is significantly lower than the virtual node (as you would expect) so the virtual node is getting much more weight applied to it, which counteracts the affinity rule, but I don't really know as I can't see any way to see this weight.
So, does anyone have a way to make this scenario work?
nodeAffinity
is the right way to go, but you have to play right with requiredDuringSchedulingIgnoredDuringExecution
and preferredDuringSchedulingIgnoredDuringExecution
parameters.
For example:
apiVersion: v1
kind: Pod
metadata:
name: node-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: KEY-FOR-ALL-THREE-NODES
operator: In
values:
- VALUE-FOR-ALL-THREE-NODES
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: KEY-FOR-THE-TWO-SMALL-NODES
operator: In
values:
- VALE-FOR-THE-TWO-SMALL-NODES
containers:
- name: nginx
image: nginx
This pod can run only on the nodes with the key: value
stated as requirement (so all three nodes), but you are giving it a preference to run on the small nodes (if there is room), with weight of 100. The weight is a subjective thing, so it should work the same with +1 then with +100.
Also, since you have three nodes, you can skip requirement part, and only set a preference.
https://kubernetes.io/docs/concepts/scheduling/kube-scheduler/ goes over the different scoring options used by the scheduler and https://github.com/kubernetes/examples/blob/master/staging/scheduler-policy/scheduler-policy-config.json shows how to customize them.
I suspect what you want is a preferred
affinity combined with increasing the scoring factor for NodeAffinityPriority
.