Prevent kube-system pods from running on a specific node

7/16/2019

I have a cluster running on GKE. I created 2 separated node pools. My first node pool (let's call it main-pool) scales from 1 to 10 nodes. The second one (let's call it db-pool) scales from 0 to 10 nodes. The db-pool nodes have specific needs as I have to dynamically create some pretty big databases, requesting a lot of memory, while the main-pool is for "light" workers. I used node selectors for my workers to be created on the right nodes and everything works fine.

The problem I have is that the db-pool nodes, because they request a lot of memory, are way more expensive and I want them to scale down to 0 when no database is running. It was working fine until I added the node selectors (I am not 100% sure but it seems to be when it happened), but now it will not scale down to less than 1 node. I believe it is because some kube-system pods are running on this node:

NAME                                                       READY     STATUS    RESTARTS   AGE       IP              NODE                                            NOMINATED NODE
heapster-v1.6.0-beta.1-6c9dfdb9f5-2htn7                    3/3       Running   0          39m       10.56.18.22     gke-padawan-cluster-ipf-db-pool-bb2827a7-99pm   <none>
metrics-server-v0.3.1-5b4d6d8d98-7h659                     2/2       Running   0          39m       10.56.18.21     gke-padawan-cluster-ipf-db-pool-bb2827a7-99pm   <none>
fluentd-gcp-v3.2.0-jmlcv                                   2/2       Running   0          1h        10.132.15.241   gke-padawan-cluster-ipf-db-pool-bb2827a7-99pm   <none>
kube-proxy-gke-padawan-cluster-ipf-db-pool-bb2827a7-99pm   1/1       Running   0          1h        10.132.15.241   gke-padawan-cluster-ipf-db-pool-bb2827a7-99pm   <none>
prometheus-to-sd-stfz4                                     1/1       Running   0          1h        10.132.15.241   gke-padawan-cluster-ipf-db-pool-bb2827a7-99pm   <none>

Is there any way to prevent it from happening?

-- Silveris
google-cloud-platform
google-kubernetes-engine
kubernetes

1 Answer

7/16/2019

System pods like Fluentd and (eventually kube-proxy) are daemonsets and are required on each node; these shouldn't stop scaling down though. Pods like Heapster and metrics-server are not required and those can block the node pool from scaling down to 0.

The best way to stop these non-node critical system pods from scheduling on your expensive node pool is to use taints and tolerations. The taints will prevent pods from being scheduled to the nodes, you just need to make sure that the db pods do get scheduled on the larger node pool by setting tolerations along with the node selector.

You should configure the node taints when you create the node pool so that new nodes are created with the taint already in place. With proper taints and tolerations, your node pool should be able to scale down to 0 without issue.

-- Patrick W
Source: StackOverflow