I am using Cloud Composer (Apache Airflow) on Google Cloud. Some of our processes require more resources than what's available on Composer's default node pool, so I've created an additional node pool within our cluster. The resource-intensive DAG's use the KubernetesPodOperator and specifically target the special node pool through the affinity={ nodeAffinity...}
attribute.
My issue is that since creating the new node pool, I've noticed that ALL of my workloads are being scheduled on this new pool. How can I keep my normal workloads running on the default pool, while reserving the new node pool for special use cases?
Here is an example of KubernetesPodOperator definition that targets the special pool. The regular KubernetesPodOperator don't have the affinity attribute filled out:
KubernetesPodOperator(namespace='default',
image="image_name",
image_pull_policy='Always',
name="example_name",
task_id="example_name",
get_logs=True,
affinity={
'nodeAffinity': {
'requiredDuringSchedulingIgnoredDuringExecution': {
'nodeSelectorTerms': [{
'matchExpressions': [{
'key': 'cloud.google.com/gke-nodepool',
'operator': 'In',
'values': ['datascience-pool']
}]
}]
}
}
},
is_delete_operator_pod=True,
dag=dag)
The KubernetesPodOperator does not have any default affinity preferences, so the scheduling decision for your normal workloads to have ended up in the new node pool were made by the Kubernetes scheduler. To avoid this, you will now have to set affinity on all instances of KubernetesPodOperator (which you can make somewhat less painful by using default_args
and the apply_defaults
Airflow decorator).
At least as of versions of Cloud Composer up to v1.8.3, the Composer system pods will always run in the node pool default-pool
. Therefore, you can use this to ensure the pods run in the Composer node pool instead of a custom one.