We've got 3 ESXi Hosts, with 2 Kubernetes workers on each. All nodes are labeled with "esxhost: esxN" and i want to spread replicas over those hosts. Its easy to spread the replicas over the workers, to not have the same service on one host, but i want to spread over the ESXi Hosts, to have HA, even if two workers die, because the ESXi host dies.
How can i manage this? Tried some selections, but without success.
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo
namespace: someNS
spec:
replicas: 2
selector:
matchLabels:
app: demo
template:
metadata:
labels:
app: demo
spec:
containers:
- name: demo-mos-node
image: registry.docker.dev...../demo:2.1.2
ports:
- containerPort: 80
env:
- name: CONFIG
value: "https://config.git.dev....."
you can define antiAffinity rules. these are used to keep pods away from each other. there are 2 variants:
soft (preferredDuringSchedulingIgnoredDuringExecution)
hard (requiredDuringSchedulingIgnoredDuringExecution)
If you specify the hard variant, the pod will not be scheduled to the node if there is already one pod on that node
If you specify the soft variant, the pod prefers not to be scheduled onto a node if that node is already running a pod with label having key “app” and value “demo”
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- demo
topologyKey: "kubernetes.io/hostname"
Furthermore, if you want to schedule pods on the master node, you have to remove the default taint for the master:
kubectl get nodes
kubectl describe node master_node_name
kubectl taint nodes master_node_name key:NoSchedule-
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
As @iliefa said you should use pod anti affinity, but you must then define 2 affinity terms. The first one will prevent (soft or hard) the distribution of the pods on the same nodes, the second will prevent (soft or hard) the distribution of the pods in the same availability zones (as you call them ESXi Hosts). Maybe you can use the built in labes, or more specifically - failure-domain.beta.kubernetes.io/zone
. Another option will be to label your nodes according to the availability zone. Here is an example of what I mean:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: <define-importance-here>
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
<insert-your-pod-labels-here>
- weight: <define-importance-here>
podAffinityTerm:
topologyKey: failure-domain.beta.kubernetes.io/zone
labelSelector:
matchLabels:
<insert-your-pod-labels-here>
The weight that you put will define the importance of each anti affinity rule compared to the other.