My goal is to create a dashboard, showing repeated graphs - basically, just a multi-select variable in grafana - of the total memory/cpu requests for a given node, alongside the actual memory/cpu usage for that node.
To sum the total requests I'm using the query:
sum(kube_pod_container_resource_requests_memory_bytes{node="${node:pipe}"})
where ${node:pipe}
is coming from the variable in grafana
and to get the actual usage I use:
node_memory_MemTotal_bytes{instance="${node:pipe}"} - node_memory_MemFree_bytes{instance="${node:pipe}"}
Both of them get the info I need, the issue comes with the label used to select on.
Since they are coming from different sources the first one has a label node
that uses this format: ip-10-10-12-12.ec2.internal
, while the second has a label instance
which appears like so: 10.10.12.12:9100
.
There is a clear relationship between the two, but when I use a multi-select label in grafana, I have to choose the source from which to select, in the case I choose the first one, the second will not work, unless I find some way to translate.
The first approach I tried was to create an additional label using label_replace
function in promql. That created a new label in the second metric (using regex, based on the "instance" label), that matched the format of the first metric, but I had no way to use it because prometheus does not allow to filter over the results of a function
The second approach was to try to use relabel_configs directives in the prometheus config. My attempt was like so:
...
- source_labels: [instance]
regex: ([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)
replacement: ip-${1}-${2}-${3}-${4}.ec2.internal
target_label: nodename
...
Although I clearly did it wrong in some way, because that didn't work (possibly I didn't add it the correct job because I'm not sure from where these metrics are coming from)
Is there any way to fix any of my attempts so that they will work? Or possibly a simpler way to do way which I missed?