How to divide after grouping two different metrics in Prometheus?


I'm currently trying to alert on Kubernetes pods stacking within an availability zone. I've managed to use two different metrics to the point where I can see how many pods for an application are running on a specific availability zone. However, due to scaling, I want the alert to be percentage we can alert when a specific percentage of pods are running on one AZ (i.e. over 70%).

My current query:

sum(count(kube_pod_info{namespace="somenamespace", created_by_kind="StatefulSet"}) by (created_by_name, node) * on (node) group_left(az_info) kube_node_labels) by (created_by_name, az_info)

And some selected output:

{created_by_name="some-db-1",az_info="az1"} 1
{created_by_name="some-db-1",az_info="az2"} 4
{created_by_name="some-db-2",az_info="az1"} 2
{created_by_name="some-db-2",az_info="az2"} 3

For example, in the above output we can see that 4 db-1 pods are stacking on az2 as opposed to 1 pod on az1. In this scenario we would want to alert as 80% of db-1 pods are stacked on a single AZ.

As the output contains multiple pods on multiple AZs, it feels like it may be difficult to get the percentage using a single Prometheus query, but wondered if anyone with more experience could offer a solution?


-- Alistair Webster

1 Answer

/ ignoring(created_by_name) group_left
  sum without(created_by_name)(your_expression)

will give you the ratio of the whole for each, and then you can do > .8 on that.

-- brian-brazil
Source: StackOverflow