nodeSelector does not reliably place pods on the correct EKS worker nodes

3/10/2020

I'm running a Kubernetes cluster in EKS, but for some reason the nodeSelector attribute on a deployment isn't always being followed.

Three deployments: 1 - Cassandra:

kind: StatefulSet
metadata:
  name: cassandra
  labels:
    app: cassandra
spec:
  serviceName: cassandra
  replicas: 3
...
    spec:
      terminationGracePeriodSeconds: 1800
      containers:
      - name: cassandra
        image: gcr.io/google-samples/cassandra:v13
...
      nodeSelector:
        layer: "backend"

2 - Kafka

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    service: kafka
...
    spec:
      containers:
        image: strimzi/kafka:0.11.3-kafka-2.1.0
...
      nodeSelector:
        layer: "backend"
...

3 - Zookeeper

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    service: zookeeper
...
    spec:
      containers:
        image: strimzi/kafka:0.11.3-kafka-2.1.0
...
      nodeSelector:
        layer: "backend"
...

Note - all three have the nodeSelector "layer=backend" on container spec. I only have 2 "backend" pods, however, when I look at the pods I see:

% kubectl get all -o wide
NAME                             READY   STATUS    RESTARTS   AGE     IP             NODE                                         NOMINATED NODE   READINESS GATES
pod/cassandra-0                  1/1     Running   0          9m32s   10.1.150.39    ip-...-27.us-west-2.compute.internal    <none>           <none>
pod/cassandra-1                  1/1     Running   0          7m56s   10.1.100.7     ip-...-252.us-west-2.compute.internal   <none>           <none>
pod/cassandra-2                  1/1     Running   0          6m46s   10.1.150.254   ip-...-27.us-west-2.compute.internal    <none>           <none>
pod/kafka-56dcd8665d-hfvz4       1/1     Running   0          9m32s   10.1.100.247   ip-...-252.us-west-2.compute.internal   <none>           <none>
pod/zookeeper-7f74f96f56-xwjjt   1/1     Running   0          9m32s   10.1.100.128   ip-...-154.us-west-2.compute.internal   <none>           <none>

They are placed on three different nodes - 27, 252 and 154. Looking at the "layer" label on each of those:

> kubectl describe node ip-...-27.us-west-2.compute.internal | grep layer
                    layer=backend
> kubectl describe node ip-...-252.us-west-2.compute.internal | grep layer
                    layer=backend
> kubectl describe node ip-...-154.us-west-2.compute.internal | grep layer
                    layer=perf

The 154 node has a label of "perf", not "backend". So per my understanding of nodeSelector, the zookeeper pod shouldn't have been put there. I've deleted everything (including the nodes themselves) and tried a few times - sometimes it's kafka that gets put there, sometimes zookeeper, but reliably something gets put where it shouldn't.

As near as I can tell, the nodes I do want have plenty of capacity, and even if they didn't I would expect an error that the pod couldn't be scheduled rather than ignoring the nodeSelector.

What am I missing? Is nodeSelector not 100% reliable? Is there another way I can force pods to only be placed on nodes with specific labels?

-- DrTeeth
amazon-eks
eks
kubernetes

1 Answer

3/11/2020

Close as user error.

A separate process had reverted my git changes, and the deployment I was looking at in my IDE was stale.

-- DrTeeth
Source: StackOverflow