We have a multi-node setup of our product where we need to deploy multiple Elasticsearch pods. As all these are data nodes and have volume mounts for persistent storage, we don't want to bring two pods up on the same node. I'm trying to use the anti-affinity feature of Kubernetes, but to no avail.
The cluster deployment is done through Rancher. We have 5 nodes in the cluster, and three nodes (let's say node-1
, node-2
and node-3
) have the label test.service.es-master: "true"
. So, when I deploy the helm chart and scale it up-to 3, Elasticsearch pods are up and running on all these three nodes. but if I scale it to 4, the 4th data node comes in one of the above mentioned nodes. Is that a correct behavior? My understanding was, imposing a strict anti-affinity should prevent the pods from coming up on the same node. I've referred to multiple blogs and forums (e.g. this and this), and they suggest similar changes as mine. I'm attaching the relevant section of the helm chart.
The requirement is, we need to bring up ES on only those nodes which are labelled with specific key-value pair as mentioned above, and each of those nodes should only contain one pod. Any feedback is appreciated.
apiVersion: v1
kind: Service
metadata:
creationTimestamp: null
labels:
test.service.es-master: "true"
name: {{ .Values.service.name }}
namespace: default
spec:
clusterIP: None
ports:
...
selector:
test.service.es-master: "true"
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
creationTimestamp: null
labels:
test.service.es-master: "true"
name: {{ .Values.service.name }}
namespace: default
spec:
selector:
matchLabels:
test.service.es-master: "true"
serviceName: {{ .Values.service.name }}
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: test.service.es-master
operator: In
values:
- "true"
topologyKey: kubernetes.io/hostname
replicas: {{ .Values.replicaCount }}
template:
metadata:
creationTimestamp: null
labels:
test.service.es-master: "true"
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: test.service.es-master
operator: In
values:
- "true"
topologyKey: kubernetes.io/hostname
securityContext:
...
volumes:
...
...
status: {}
Update-1
As per the suggestions in the comments and answers, I've added the anti-affinity section in template.spec. But unfortunately the issue still remains. The updated yaml looks like as follows:
apiVersion: v1
kind: Service
metadata:
creationTimestamp: null
labels:
test.service.es-master: "true"
name: {{ .Values.service.name }}
namespace: default
spec:
clusterIP: None
ports:
- name: {{ .Values.service.httpport | quote }}
port: {{ .Values.service.httpport }}
targetPort: {{ .Values.service.httpport }}
- name: {{ .Values.service.tcpport | quote }}
port: {{ .Values.service.tcpport }}
targetPort: {{ .Values.service.tcpport }}
selector:
test.service.es-master: "true"
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
creationTimestamp: null
labels:
test.service.es-master: "true"
name: {{ .Values.service.name }}
namespace: default
spec:
selector:
matchLabels:
test.service.es-master: "true"
serviceName: {{ .Values.service.name }}
replicas: {{ .Values.replicaCount }}
template:
metadata:
creationTimestamp: null
labels:
test.service.es-master: "true"
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: test.service.es-master
operator: In
values:
- "true"
topologyKey: kubernetes.io/hostname
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: test.service.es-master
operator: In
values:
- "true"
topologyKey: kubernetes.io/hostname
securityContext:
readOnlyRootFilesystem: false
volumes:
- name: elasticsearch-data-volume
hostPath:
path: /opt/ca/elasticsearch/data
initContainers:
- name: elasticsearch-data-volume
image: busybox
securityContext:
privileged: true
command: ["sh", "-c", "chown -R 1010:1010 /var/data/elasticsearch/nodes"]
volumeMounts:
- name: elasticsearch-data-volume
mountPath: /var/data/elasticsearch/nodes
containers:
- env:
{{- range $key, $val := .Values.data }}
- name: {{ $key }}
value: {{ $val | quote }}
{{- end}}
image: {{ .Values.image.registry }}/analytics/{{ .Values.image.repository }}:{{ .Values.image.tag }}
name: {{ .Values.service.name }}
ports:
- containerPort: {{ .Values.service.httpport }}
- containerPort: {{ .Values.service.tcpport }}
volumeMounts:
- name: elasticsearch-data-volume
mountPath: /var/data/elasticsearch/nodes
resources:
limits:
memory: {{ .Values.resources.limits.memory }}
requests:
memory: {{ .Values.resources.requests.memory }}
restartPolicy: Always
status: {}
As Egor suggested, you need podAntiAffinity:
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-cache
spec:
selector:
matchLabels:
app: store
replicas: 3
template:
metadata:
labels:
app: store
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- store
topologyKey: "kubernetes.io/hostname"
So, with your current label, it might look like this:
spec:
affinity:
nodeAffinity:
# node affinity stuff here
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "test.service.es-master"
operator: In
values:
- "true"
topologyKey: "kubernetes.io/hostname"
Ensure that you put this in the correct place in your yaml, or else it won't work.
Firstly, both in your initial manifest and even in the updated manifest you are using topologyKey
for nodeAffinity
which will give you an error while trying to deploy those manifest using kubectl create
or kubectl apply
because there is no api key called topologyKey
for nodeAffinity
Ref doc
Secondly, you are using a key called test.service.es-master
for your nodeAffinity are you sure your "node" has those labels? please confirm by this command kubectl get nodes --show-labels
Lastly, Augmenting to @Laszlo answer and your @bitswazsky comment on it to simplify it, you can use the below code:
Here I have used a node label (as key) called role
to identify the node, you can add that to your existing clusters' node by executing this command kubectl label nodes <node-name> role=platform
selector:
matchLabels:
component: nginx
template:
metadata:
labels:
component: nginx
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: role
operator: In
values:
- platform
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: component
operator: In
values:
- nginx
topologyKey: kubernetes.io/hostname
This works for me with Kubernetes 1.11.5:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 3
selector:
matchLabels:
test.service.es-master: "true"
template:
metadata:
labels:
test.service.es-master: "true"
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: test.service.es-master
operator: In
values:
- "true"
topologyKey: kubernetes.io/hostname
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: test.service.es-master
operator: In
values:
- "true"
containers:
- image: nginx:1.7.10
name: nginx
I don't know why you chose the same key/value for the pod deployment selector label, as for the node selector. They are confusing as a minimum...