Multizone Kubernetes cluster and affinity. How to distribute application per zone?

9/22/2018

I have a multizone (3 zones) GKE cluster (1.10.7-gke.1) of 6 nodes and want each zone to have at least one replica of my application.

So I've tried preferred podAntiAffinity:

  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: component
              operator: In
              values:
              - app
          topologyKey: failure-domain.beta.kubernetes.io/zone

Everything looks good the first time I install (scale from 1 to 3 replicas) my application. After the next rolling update, everything gets mixed up and I can have 3 copies of my application in one zone. Since additional replicas are created and the old ones are terminated.

When I am trying the same term with requiredDuringSchedulingIgnoredDuringExecution everything looks good but rolling updates don't work because new replicas can't be scheduled (pods with "component" = "app" already exist in each zone).

How to configure my deployment to be sure I have replica in each availability zone?

UPDATED:

My workaround now is to have hard anti-affinity and deny additional pods (more than 3) during the rolling update:

  replicaCount: 3 

  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: component
            operator: In
            values:
            - app
        topologyKey: failure-domain.beta.kubernetes.io/zone

  deploymentStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
-- Sergey Baranov
kubernetes

3 Answers

11/19/2018

If you have two nodes in each zone, you can use below affinity rules to make sure rolling updates works as well and you have a pod in each zone.

  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: component
            operator: In
            values:
            - app
        topologyKey: "kubernetes.io/hostname"
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: component
              operator: In
              values:
              - app
          topologyKey: failure-domain.beta.kubernetes.io/zone
-- Hitesh Agrawal
Source: StackOverflow

10/8/2018

The key issue here is rolling update - upon doing rolling update, old replica is kept until new one is launched. But new one can't be scheduled/launched due to conflict with its old replica.

So if rolling update isn't a concern, a workaround here to change strategy type to Recreate:

apiVersion: apps/v1
kind: Deployment
...
spec:
...
  strategy:
    type: Recreate
...

Then applying podAntiAffinity/requiredDuringSchedulingIgnoredDuringExecution rules would work.

-- Wei Huang
Source: StackOverflow

9/22/2018

I don't think the Kubernetes scheduler provides a way to guarantee pods in all availability zones. I believe it's a best-effort approach when it comes to that and there may be some limitations.

I've opened an issue to check whether this can be supported either through NodeAffinity or PodAffiity/PodAntiAffinity.

-- Rico
Source: StackOverflow