1) In kubernetes many components (i.e. nodes) have metadata that you want to view by group. Examples:
And so on : For any metric that is being measured on a node, you might want to view/query it by arbitrary labels or taints that exist on the said node.
In any of these cases, since metrics aren't emitted with labels for all of these data.
One solution: many prometheus masters
So far I've thought of one solution: A separate Prometheus master for different logical groups of nodes. This would allow an administrator to create masters that rollup metrics by an arbitrary label, i.e.
2) Are there any more elegant solutions to this problem?
The above solution is frought with terror - you are doing alot of work just to "hack" the grafana "data source" concept as a way to shard your metrics up.
3) A few more, crazy ideas... just to help seed a broader conversation on how to shard metrics in kubernetes by hosts...
Generally you'd have one Prometheus per datacenter, to keep things within the same failure domain. You may split that out in future if there's load issues, but for just node exporter stats that's unlikely.
https://www.robustperception.io/scaling-and-federating-prometheus/ describes the general scaling approach.
https://www.robustperception.io/how-to-have-labels-for-machine-roles/ addresses how to aggregate based on things like GPU presence.
I would expect zone to end up as a target label, so no special consideration is required there.