Kubernetes: achieving uneven/weighted pod distribution in EKS

12/29/2020

We plan to use AWS EKS to run a stateless application.

There is a goal to achieve optimal budget by using spot instances and prefer them to on-demand ones.

Per AWS recommendations, we plan to have two Managed Node Groups: one with on-demand instances, and one with spot instances, plus Cluster Autoscaler to adjust groups size.

Now, the problem to solve is achieving two somewhat conflicting requirements:

  • Prefer spot nodes to on-demand, e.g. run 90% of pods on spot instances and 10% on on-demand ones
  • But still, ensure that some pods always do run within on-demand group, so even in case of massive spot instance drain, there still will be some pods that can process requests

After some research I found following possible approaches to solving it:

Approach A: Using preferredDuringSchedulingIgnoredDuringExecution with weights based on Node Group capacity type label. E.g. one preferredDuringSchedulingIgnoredDuringExecution rule with weight 90 would prefer nodes with capacity type spot, and other rule with weight 1 would prefer on-demand ones, e.g.:

preferredDuringSchedulingIgnoredDuringExecution:
  - weight: 90
    preference:
      matchExpressions:
        - key: eks.amazonaws.com/capacityType
          operator: In
          values:
            - spot
  - weight: 1
    preference:
      matchExpressions:
        - key: eks.amazonaws.com/capacityType
          operator: NotIn
          values:
            - spot

The downside is that — as I understand — you are not guaranteed to have pods running on least preferred group, as those are just (added) weights, not some sort of exact distribution.

Other approach, which in theory could be combined with one above (?) is also using topologySpreadConstraints, e.g.:

spec:
  topologySpreadConstraints:
  - maxSkew: 20
    topologyKey: eks.amazonaws.com/capacityType
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        foo: bar

Which would distribute pods across nodes with different capacity types, while allowing a skew of, say, 20 pods between them, and probably should (?) be combined with preferredDuringSchedulingIgnoredDuringExecution to achieve the desired effect.

How feasible is the approach above? Are those the right tools to achieve the goals? I would very much appreciate any advice on the case!

-- dusty
amazon-eks
amazon-web-services
autoscaling
aws-auto-scaling
kubernetes

1 Answer

12/30/2020

This is not something the Kubernetes scheduler supports. Weights in affinities are more like score multiplies, and maxSkew is a very general cap on how out of balance things can get, but not the direction of that imbalance.

You would have to write something custom AFAIK, or at least I've not seen anything for this when I went looking last. Check out the scheduler extender webhook system for a somewhat easy way to implement it.

-- coderanger
Source: StackOverflow