NodeSelector does not work for multiple node pools?

10/19/2021

TL;DR: NodeSelector ignores nodes from another NodePool. How to distribute pods across more NodePools using a label nodeSelector or other technique?

I have two nodePools like this:

...
# Spot node pool
resource "azurerm_kubernetes_cluster_node_pool" "aks_staging_np_compute_spot" {
  name                  = "computespot"
  (...)
  vm_size               = "Standard_F8s_v2"
  max_count             = 2
  min_count             = 2
  (...)
  priority = "Spot"
  eviction_policy = "Delete"
  (...)
  node_labels = {
    "pool_type" = "compute"
  }

# Regular node pool
resource "azurerm_kubernetes_cluster_node_pool" "aks_staging_np_compute_base" {
  name                  = "computebase"
  (...)
  vm_size               = "Standard_F8s_v2"
  max_count             = 2
  min_count             = 2
  node_labels = {
    "pool_type" = "compute"
  }

Both pools are deployed in AKS and all the nodes are present in OK state. Please note two things:

  • Both have label pool_type: compute
  • Both have same size as Standard_F8s_v2

(There are also 20 other nodes with different labels in my cluster which are not important.)

Then I've got a deployment like this (omitted irrelevant lines for brevity):

apiVersion: apps/v1
kind: Deployment
metadata:
  (...)
spec:
  replicas: 4
  selector:
    matchLabels:
      app: myapp
  template:
    (...)
    spec:
      nodeSelector:
        pool_type: compute
      (...)
      containers:
        (...)

There is also entry in tolerations for accepting Azure spot instances. It apparently works.

tolerations:
        - key: "kubernetes.azure.com/scalesetpriority"
          operator: "Equal"
          value: "spot"
          effect: "NoSchedule"

The problem is that the app gets deployed only on one nodepool ("computespot" in this case) and never touches the another (computebase). Even when the label and the size of the individual nodes are same.

  • 2 pods are running on computespot nodes, one node per each.
  • The second 2 pods aren't scheduled with classic error 0/24 nodes are available: 14 Insufficient cpu, 17 Insufficient memory, 4 node(s) didn't match node selector. That's a absolute lie because I can see the computebase nodes just sitting there entirely empty.

How this can be solved?

-- rudolfdobias
azure-aks
kubernetes
kubernetes-pod
spot-instances

1 Answer

10/19/2021

Found a solution using pod affinity.

spec:
      # This didn't work:
      #
      # nodeSelector:
      #   pool_type: compute 
      # 
      # But this does:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: pool_type
                  operator: In
                  values:
                  - compute 

I don't know the reason because we're still dealing with one single label. If someone knows, please share.

-- rudolfdobias
Source: StackOverflow