Kubernetes AntiAffinity over Labels - Spread Replicas via Node-Labels

11/14/2019

We've got 3 ESXi Hosts, with 2 Kubernetes workers on each. All nodes are labeled with "esxhost: esxN" and i want to spread replicas over those hosts. Its easy to spread the replicas over the workers, to not have the same service on one host, but i want to spread over the ESXi Hosts, to have HA, even if two workers die, because the ESXi host dies.

How can i manage this? Tried some selections, but without success.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo
  namespace: someNS
spec:
  replicas: 2
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
    spec:    
      containers:
        - name: demo-mos-node
          image: registry.docker.dev...../demo:2.1.2
          ports:
            - containerPort: 80
          env:
            - name: CONFIG
              value: "https://config.git.dev....."
-- Michael
high-availability
kubernetes

2 Answers

11/14/2019

you can define antiAffinity rules. these are used to keep pods away from each other. there are 2 variants:

  • soft (preferredDuringSchedulingIgnoredDuringExecution)

  • hard (requiredDuringSchedulingIgnoredDuringExecution)

If you specify the hard variant, the pod will not be scheduled to the node if there is already one pod on that node

If you specify the soft variant, the pod prefers not to be scheduled onto a node if that node is already running a pod with label having key “app” and value “demo”

spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - demo
          topologyKey: "kubernetes.io/hostname"

Furthermore, if you want to schedule pods on the master node, you have to remove the default taint for the master:

 kubectl get nodes 

 kubectl describe node master_node_name

 kubectl taint nodes master_node_name key:NoSchedule-

https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity

-- iliefa
Source: StackOverflow

11/14/2019

As @iliefa said you should use pod anti affinity, but you must then define 2 affinity terms. The first one will prevent (soft or hard) the distribution of the pods on the same nodes, the second will prevent (soft or hard) the distribution of the pods in the same availability zones (as you call them ESXi Hosts). Maybe you can use the built in labes, or more specifically - failure-domain.beta.kubernetes.io/zone. Another option will be to label your nodes according to the availability zone. Here is an example of what I mean:

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: <define-importance-here>
        podAffinityTerm:
          topologyKey: kubernetes.io/hostname
          labelSelector:
            matchLabels:
              <insert-your-pod-labels-here>
      - weight: <define-importance-here>
        podAffinityTerm:
          topologyKey: failure-domain.beta.kubernetes.io/zone
          labelSelector:
            matchLabels:
              <insert-your-pod-labels-here>

The weight that you put will define the importance of each anti affinity rule compared to the other.

-- Lachezar Balev
Source: StackOverflow