I want to create a Prometheus alert rules for below scenarios,
Max capacity reached for the cluster
Unusual Scaling activity
I think "Max capacity reached for the cluster" can be obtained with combination of following metrics,
1.cluster_autoscaler_unscheulable_pods_count >0
2. sum(cluster_autoscaler_unneeded_nodes_count)==0
And,"Unusual Scaling activity" can be obtained from sum(cluster_autoscaler_scaled_up_nodes_total)
I have enabled metrics for Cluster autoscaler.However I am not sure how to create prometheus rule expressions with these metrics.Should I create any Service monitors? how to combine these metrics for the scenarios mentioned above? Do you already have examples of Prometheus rules for the Cluster autoscaler metrics?