I am running Kubernetes with Rancher, and I am seeing weird behavior with the kube-scheduler. After adding a third node, I expect to see pods start to get scheduled & assigned to it. However, the kube-scheduler scores this new third node node3
with the lowest score, even though it has almost no pods running in it, and I expect it to receive the highest score.
Here are the logs from the Kube-scheduler:
scheduling_queue.go:815] About to try and schedule pod namespace1/pod1
scheduler.go:456] Attempting to schedule pod: namespace1/pod1
predicates.go:824] Schedule Pod namespace1/pod1 on Node node1 is allowed, Node is running only 94 out of 110 Pods.
predicates.go:1370] Schedule Pod namespace1/pod1 on Node node1 is allowed, existing pods anti-affinity terms satisfied.
predicates.go:824] Schedule Pod namespace1/pod1 on Node node3 is allowed, Node is running only 4 out of 110 Pods.
predicates.go:1370] Schedule Pod namespace1/pod1 on Node node3 is allowed, existing pods anti-affinity terms satisfied.
predicates.go:824] Schedule Pod namespace1/pod1 on Node node2 is allowed, Node is running only 95 out of 110 Pods.
predicates.go:1370] Schedule Pod namespace1/pod1 on Node node2 is allowed, existing pods anti-affinity terms satisfied.
resource_allocation.go:78] pod1 -> node1: BalancedResourceAllocation, capacity 56000 millicores 270255251456 memory bytes, total request 40230 millicores 122473676800 memory bytes, score 7
resource_allocation.go:78] pod1 -> node1: LeastResourceAllocation, capacity 56000 millicores 270255251456 memory bytes, total request 40230 millicores 122473676800 memory bytes, score 3
resource_allocation.go:78] pod1 -> node3: BalancedResourceAllocation, capacity 56000 millicores 270255251456 memory bytes, total request 800 millicores 807403520 memory bytes, score 9
resource_allocation.go:78] pod1 -> node3: LeastResourceAllocation, capacity 56000 millicores 270255251456 memory bytes, total request 800 millicores 807403520 memory bytes, score 9
resource_allocation.go:78] pod1 -> node2: BalancedResourceAllocation, capacity 56000 millicores 270255247360 memory bytes, total request 43450 millicores 133693440000 memory bytes, score 7
resource_allocation.go:78] pod1 -> node2: LeastResourceAllocation, capacity 56000 millicores 270255247360 memory bytes, total request 43450 millicores 133693440000 memory bytes, score 3
generic_scheduler.go:748] pod1_namespace1 -> node1: TaintTolerationPriority, Score: (10)
generic_scheduler.go:748] pod1_namespace1 -> node3: TaintTolerationPriority, Score: (10)
generic_scheduler.go:748] pod1_namespace1 -> node2: TaintTolerationPriority, Score: (10)
selector_spreading.go:146] pod1 -> node1: SelectorSpreadPriority, Score: (10)
selector_spreading.go:146] pod1 -> node3: SelectorSpreadPriority, Score: (10)
selector_spreading.go:146] pod1 -> node2: SelectorSpreadPriority, Score: (10)
generic_scheduler.go:748] pod1_namespace1 -> node1: SelectorSpreadPriority, Score: (10)
generic_scheduler.go:748] pod1_namespace1 -> node3: SelectorSpreadPriority, Score: (10)
generic_scheduler.go:748] pod1_namespace1 -> node2: SelectorSpreadPriority, Score: (10)
generic_scheduler.go:748] pod1_namespace1 -> node1: NodeAffinityPriority, Score: (0)
generic_scheduler.go:748] pod1_namespace1 -> node3: NodeAffinityPriority, Score: (0)
generic_scheduler.go:748] pod1_namespace1 -> node2: NodeAffinityPriority, Score: (0)
interpod_affinity.go:232] pod1 -> node1: InterPodAffinityPriority, Score: (0)
interpod_affinity.go:232] pod1 -> node3: InterPodAffinityPriority, Score: (0)
interpod_affinity.go:232] pod1 -> node2: InterPodAffinityPriority, Score: (10)
generic_scheduler.go:803] Host node1 => Score 100040
generic_scheduler.go:803] Host node3 => Score 100038
generic_scheduler.go:803] Host node2 => Score 100050
scheduler_binder.go:256] AssumePodVolumes for pod "namespace1/pod1", node "node2"
scheduler_binder.go:266] AssumePodVolumes for pod "namespace1/pod1", node "node2": all PVCs bound and nothing to do
factory.go:727] Attempting to bind pod1 to node2
I can tell from the logs that your pod will always be scheduled on node2
because it seems like you have some sort of PodAffinity that scores an additional 10
. Making it go to 50
.
What's kind of odd is that I'm scoring 48
for node3 but it seems like -10
is being stuck there somewhere (totaling 38
). Perhaps because of the affinity, or some entry not being shown in the logs, or plain simply a bug in the way the scheduler is doing the calculation. You'll probably have to dig deep into the kube-scheduler code if you'd like to find out more.
This is what I have:
node1 7 + 3 + 10 + 10 + 10 = 40
node2 7 + 3 + 10 + 10 + 10 + 10 = 50
node3 9 + 9 + 10 + 10 + 10 = 48