I'm completely new to Prometheus and trying to implement an HPA for a use case.
Our use-case is application/pod will be processing jobs from a queue asynchronously. Each pod will pull as many jobs as it can and once they have reached a limit it must start to autoscale.
To achieve this there are two approaches,
1) Pod will expose a Gauge metric say "state" which will be 0 (free to process jobs) by default and will be set to 1 once it has pulled as many as jobs it can process. So the metric would be only 0 or 1 at any point in time and an average can be taken for the past 10 mins to determine the load on a pod. If the average is above say 0.7 then we can assume that the pod was occupied for more than 70% in the last 10 minutes and must be scaled.
2) Pod will expose a histogram metric "state" with two buckets 0 & 1. Each time the pod gets completely occupied then the state will be observed with a constant value of 1. To determine when to scale we can consider the 90th percentile value for the past 10 mins i.e if the 90th percentile value from the past 10 mins is not zero then the pod was completely occupied for 90% of the time and has to scale up.
The first approach is more straightforward and makes more sense to me but averages can be misleading. With respect to histograms, I'm not so sure whether they can be employed for such a use case.
If I'd need to choose one of your approaches I would probably choose the first approach.
But I'd probably change the path here.
Instead of using the applications' metrics to decide how many jobs/pods you need I would probably use the queue's metrics.
For that, I used KEDA and I recommend it. Using KEDA can help you scaling your solution and keep using Prometheus only to keep track of what's happening.
KEDA supports Jobs or Deployments. Jobs (ScaledJob) have advantages over deployments (ScaledObject) in some cases. For example, if you can use jobs, you can also leverage from scaling ephemeral nodes or scaling from zero nodes to the needed node count.