Kubernetes HPA gets wrong current value for a custom metric

4/13/2018

HPA status displays 132500m / 500 in a situation when the actual metric value is lower then 100/500 (according to the Prometheus).

$ kubectl get hpa -n frontend --context testing
NAME       REFERENCE              TARGETS                               MINPODS   MAXPODS   REPLICAS   AGE
frontend   Deployment/streaming   50237440 / 629145600, 132500m / 500   2         5         2          4d

HPA manifest is:

---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: frontend
  namespace: streaming
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: streaming
  minReplicas: 2
  maxReplicas: 5
  metrics:
  - type: Pods
    pods:
      metricName: redis_memory_used_rss_bytes
      targetAverageValue: 629145600
  - type: Pods
    pods:
      metricName: redis_db_keys
      targetAverageValue: 500

It should print normal results, like:

$ kubectl get hpa -n streaming --context streaming-eu
NAME       REFERENCE              TARGETS                               MINPODS   MAXPODS   REPLICAS   AGE
frontend   Deployment/streaming   50237440 / 629145600, 87 / 500   2         5         2          4d

The problem is in that 132500m value, which is wrong (Prometheus query reports a normal value). And as HPA didn't scale up on that metric, so it saw it's value somewhat different, I suppose.

Use oliver006/redis_exporter and it's metrics as a custom Pod metrics with HPA to reproduce this issue.

Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:34:11Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}`
Server Version: version.Info{Major:"1", Minor:"9+", GitVersion:"v1.9.4-gke.1", GitCommit:"10e47a740d0036a4964280bd663c8500da58e3aa", GitTreeState:"clean", BuildDate:"2018-03-13T18:00:36Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider:

GKE 1.9.4
-- cardinal-gray
autoscaling
docker
google-kubernetes-engine
kubernetes

1 Answer

4/13/2018

I think that is a metric conversion problem.

Here is a good comment from the contributor on the related issue, but it's about the http_requests metric:

if you look at the documentation for the Prometheus adapter, you'll see that all cumulative (counter) metrics are converted to rate metrics, since the HPA's algorithm in fundamentally incompatible with scaling on cumulative metrics directly (scaling on cumulative metrics directly doesn't make much sense in general).

In your case, your http_requests_total is being converted into http_requests, so it will always show up as milli-requests from the metrics API when using the Prometheus adapter.

So, in your case, it is returning something like 132500 millirecords. Just divide value to 1000, and you will get the correct average value.

-- Anton Kostenko
Source: StackOverflow