Kubernetes: preferredDuringScheduling

8/29/2018

Background

I'm running a Kubernetes cluster on Google Cloud Platform. I have 2 Node-Pools in my cluster: A and B. B is cheaper (depends on hardware). I prefer that my deployment will run on B. Unless no free resources in B. In that case, new pods will deploy to A.

So I added this section to deployment YAML:

  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: B
            operator: Exists
        weight: 100

So I giving more weight to node-pool B.

At start, it's working good. I came back after 24 hours and found that some pods are deployed to node-pool A while I have free resources (un-allocated machines) in node B. This is wast of money.

So, how its happen?

I sure that the property nodeAffinity is working currectly. I suspect that at same point, node pool B was running without any FREE resources. At this point, the cluster want to grow... The new pod was deployed to node pool A. Until here, everything is fine...

What I want to achieve?

Lets say that after an hour, from lack of node-pool B resources time, There are plany of resources free to alocation. I want that Kubernetes will move the existing pods from A to their new house in node pool B.

I looking for something like preferredDuringSchedulingPreferedDuringExecution.

Question

Is this possible?

Update

Based on @Hitobat answer, I tried to use this code:

 spec:
   tolerations:
    - key: A
      operator: "Exists"
      effect: "NoExecute"
      tolerationSeconds: 60

Unfortunately, After waiting enough time, I still see pods on my A nodepool. I did something wrong?

-- No1Lives4Ever
google-cloud-platform
google-kubernetes-engine
kubernetes

2 Answers

9/3/2018

Pod Affinity and nodeAffinity resource request are what scheduling uses to determine where to schedule a new pod. So, as long as the Pod is running, it will not be moved. So, the answer to this question is No. I will recommend filing a feature request on Github. For more information on this topic, see the nodeAffinity section of this the following documentation

-- arp-sunny.
Source: StackOverflow

9/3/2018

You can taint pool A. Then configure all your pods to tolerate the taint, but with a tolerationSeconds for the duration you want. This is in addition to the config you already did for pool B.

The effect will be that the pod is scheduled to A if it won't fit on B, but then after a while will be evicted (and hopefully rescheduled onto B again).

See: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/#taint-based-evictions

-- Hitobat
Source: StackOverflow