Prefer certain nodes until full

11/30/2019

I've got an AKS cluster configured with two fairly small VM worker nodes, and then a virtual node to use ACI. What I really want to happen is for pods to get scheduled on the two VM nodes until they are full, then use the virtual node, but I cannot get this to work.

I've tried using node affinity, as suggested here, but this just doesn't work, pods get scheduled on the virtual node first. If I use a required node affinity, then they do get scheduled only on the VM nodes, but that is not what I want. I am guessing the issue here is that the resource availability on my VM nodes is significantly lower than the virtual node (as you would expect) so the virtual node is getting much more weight applied to it, which counteracts the affinity rule, but I don't really know as I can't see any way to see this weight.

So, does anyone have a way to make this scenario work?

-- Sam Cogan
azure-aks
azure-kubernetes
kubernetes

2 Answers

12/1/2019

nodeAffinity is the right way to go, but you have to play right with requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution parameters.

For example:

apiVersion: v1
kind: Pod
metadata:
  name: node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: KEY-FOR-ALL-THREE-NODES
            operator: In
            values:
            - VALUE-FOR-ALL-THREE-NODES
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
          - key: KEY-FOR-THE-TWO-SMALL-NODES
            operator: In
            values:
            - VALE-FOR-THE-TWO-SMALL-NODES
  containers:
  - name: nginx
    image: nginx

This pod can run only on the nodes with the key: value stated as requirement (so all three nodes), but you are giving it a preference to run on the small nodes (if there is room), with weight of 100. The weight is a subjective thing, so it should work the same with +1 then with +100.

Also, since you have three nodes, you can skip requirement part, and only set a preference.

-- suren
Source: StackOverflow

12/1/2019

https://kubernetes.io/docs/concepts/scheduling/kube-scheduler/ goes over the different scoring options used by the scheduler and https://github.com/kubernetes/examples/blob/master/staging/scheduler-policy/scheduler-policy-config.json shows how to customize them.

I suspect what you want is a preferred affinity combined with increasing the scoring factor for NodeAffinityPriority.

-- coderanger
Source: StackOverflow