I have a cluster running on GKE. I created 2 separated node pools. My first node pool (let's call it main-pool
) scales from 1 to 10 nodes. The second one (let's call it db-pool
) scales from 0 to 10 nodes. The db-pool
nodes have specific needs as I have to dynamically create some pretty big databases, requesting a lot of memory, while the main-pool
is for "light" workers. I used node selectors for my workers to be created on the right nodes and everything works fine.
The problem I have is that the db-pool
nodes, because they request a lot of memory, are way more expensive and I want them to scale down to 0 when no database is running. It was working fine until I added the node selectors (I am not 100% sure but it seems to be when it happened), but now it will not scale down to less than 1 node. I believe it is because some kube-system pods are running on this node:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
heapster-v1.6.0-beta.1-6c9dfdb9f5-2htn7 3/3 Running 0 39m 10.56.18.22 gke-padawan-cluster-ipf-db-pool-bb2827a7-99pm <none>
metrics-server-v0.3.1-5b4d6d8d98-7h659 2/2 Running 0 39m 10.56.18.21 gke-padawan-cluster-ipf-db-pool-bb2827a7-99pm <none>
fluentd-gcp-v3.2.0-jmlcv 2/2 Running 0 1h 10.132.15.241 gke-padawan-cluster-ipf-db-pool-bb2827a7-99pm <none>
kube-proxy-gke-padawan-cluster-ipf-db-pool-bb2827a7-99pm 1/1 Running 0 1h 10.132.15.241 gke-padawan-cluster-ipf-db-pool-bb2827a7-99pm <none>
prometheus-to-sd-stfz4 1/1 Running 0 1h 10.132.15.241 gke-padawan-cluster-ipf-db-pool-bb2827a7-99pm <none>
Is there any way to prevent it from happening?
System pods like Fluentd and (eventually kube-proxy) are daemonsets and are required on each node; these shouldn't stop scaling down though. Pods like Heapster and metrics-server are not required and those can block the node pool from scaling down to 0.
The best way to stop these non-node critical system pods from scheduling on your expensive node pool is to use taints and tolerations. The taints will prevent pods from being scheduled to the nodes, you just need to make sure that the db pods do get scheduled on the larger node pool by setting tolerations along with the node selector.
You should configure the node taints when you create the node pool so that new nodes are created with the taint already in place. With proper taints and tolerations, your node pool should be able to scale down to 0 without issue.