I am setting up HPA on custom metrics - basically on no. of threads of a deployment.
I have created a PrometheusRule to get average of threads (5 min. based). On the container, I am doing cont. load to increase the threads and average value is also going high.
I started with 2 replicas and when current value is crossing the target value, am not seeing my deployment scaling out.
As you can see, have set target as 44 and current value is 51.55 for more than 10 min but still no scale up.
Version Info
Autoscaling api version : autoscaling/v2beta2
Prometheus Rule
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: rdp-rest
namespace: default
labels:
app.kubernetes.io/name: node-exporter
app.kubernetes.io/version: 1.0.1
prometheus: k8s
role: alert-rules
run: rdp-rest
app: rdp-rest
spec:
groups:
- name: hpa-rdp-rest
interval: 10s
rules:
- expr: 'avg_over_time(container_threads{container="rdp-rest"}[5m])'
record: hpa_custom_metrics_container_threads_rdp_rest
labels:
service: rdp-rest
Manifests - https://github.com/prometheus-operator/kube-prometheus/tree/release-0.7/manifests
Update (6th July) - HPA with custom metrics is working fine for any other technology like nodejs/ngnix, etc. but not working for netty api
Any thoughts?
Finally after a week, found the root cause.
So the issue was with the label. I had 2 deployments with same label. So what internal hpa is doing is it's getting stats for all the pods with that label and then doing scale up/down. As soon as I corrected the labels, hpa worked as expected.
But the same on prometheus UI shows stats for ONLY one type of pods. Looks like some internal bug or something. Not getting when we provide name why it's going and fetching stats based on label.
Point to remember : Always double check your labels.