Prometheus alert rules for cluster autoscaler metrics

9/13/2020

I want to create a Prometheus alert rules for below scenarios,

Max capacity reached for the cluster

Unusual Scaling activity

I think "Max capacity reached for the cluster" can be obtained with combination of following metrics,

1.cluster_autoscaler_unscheulable_pods_count >0

2. sum(cluster_autoscaler_unneeded_nodes_count)==0

And,"Unusual Scaling activity" can be obtained from sum(cluster_autoscaler_scaled_up_nodes_total)

I have enabled metrics for Cluster autoscaler.However I am not sure how to create prometheus rule expressions with these metrics.Should I create any Service monitors? how to combine these metrics for the scenarios mentioned above? Do you already have examples of Prometheus rules for the Cluster autoscaler metrics?

-- Rad4
kubernetes
prometheus-operator

0 Answers