I have a running GKE cluster with an HPA using a target CPU utilisation metric. This is OK but CPU utilisation is not the best scaling metric for us. Analysis suggests that active connection count is a good indicator of general platform load and thus, we'd like to look into this as our primary scaling metric.
To this end I have enabled custom metrics for the NGINX ingress that we use. From here we can see active connection counts, request rates, etc.
Here is the HPA specification using the NGINX custom metric:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-uat-active-connections
namespace: default
spec:
minReplicas: 3
maxReplicas: 6
metrics:
- type: Pods
pods:
metricName: custom.googleapis.com|nginx-ingress-controller|nginx_ingress_controller_nginx_process_connections
selector:
matchLabels:
metric.labels.state: active
resource.labels.cluster_name: "[redacted]"
targetAverageValue: 5
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: "[redacted]"
However, while this specification does deploy OK, I always get this output from the HPA:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpa-uat-active-connections Deployment/[redacted] <unknown>/5 3 6 3 31s
In short, the target value is "unknown" and I have so far failed to understand / resolve why. The custom metric is indeed present:
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com|nginx-ingress-controller|nginx_ingress_controller_nginx_process_connections?labelSelector=metric.labels.state%3Dactive,resource.labels.cluster_name%3D[redacted]" | jq
Which gives:
{
"kind": "ExternalMetricValueList",
"apiVersion": "external.metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/custom.googleapis.com%7Cnginx-ingress-controller%7Cnginx_ingress_controller_nginx_process_connections"
},
"items": [
{
"metricName": "custom.googleapis.com|nginx-ingress-controller|nginx_ingress_controller_nginx_process_connections",
"metricLabels": {
"metric.labels.controller_class": "nginx",
"metric.labels.controller_namespace": "ingress-nginx",
"metric.labels.controller_pod": "nginx-ingress-controller-54f84b8dff-sml6l",
"metric.labels.state": "active",
"resource.labels.cluster_name": "[redacted]",
"resource.labels.container_name": "",
"resource.labels.instance_id": "[redacted]-eac4b327-stqn",
"resource.labels.namespace_id": "ingress-nginx",
"resource.labels.pod_id": "nginx-ingress-controller-54f84b8dff-sml6l",
"resource.labels.project_id": "[redacted],
"resource.labels.zone": "[redacted]",
"resource.type": "gke_container"
},
"timestamp": "2019-12-30T14:11:01Z",
"value": "1"
}
]
}
So I have two questions, really:
Many thanks in advance, Ben
Edit 1
kubectl get all
NAME READY STATUS RESTARTS AGE
pod/[redacted]-deployment-7f5fbc9ddf-l9tqk 1/1 Running 0 34h
pod/[redacted]-uat-deployment-7f5fbc9ddf-pbcns 1/1 Running 0 34h
pod/[redacted]-uat-deployment-7f5fbc9ddf-tjfrm 1/1 Running 0 34h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/[redacted]-webapp-service NodePort [redacted] <none> [redacted] 57d
service/kubernetes ClusterIP [redacted] <none> [redacted] 57d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/[redacted]-uat-deployment 3/3 3 3 57d
NAME DESIRED CURRENT READY AGE
replicaset.apps/[redacted]-uat-deployment-54b6bd5f9c 0 0 0 12d
replicaset.apps/[redacted]-uat-deployment-574c778cc9 0 0 0 35h
replicaset.apps/[redacted]-uat-deployment-66546bf76b 0 0 0 11d
replicaset.apps/[redacted]-uat-deployment-698dfbb6c4 0 0 0 4d
replicaset.apps/[redacted]-uat-deployment-69b5c79d54 0 0 0 6d17h
replicaset.apps/[redacted]-uat-deployment-6f67ff6599 0 0 0 10d
replicaset.apps/[redacted]-uat-deployment-777bfdbb9d 0 0 0 3d23h
replicaset.apps/[redacted]-uat-deployment-7f5fbc9ddf 3 3 3 34h
replicaset.apps/[redacted]-uat-deployment-9585454ff 0 0 0 6d21h
replicaset.apps/[redacted]-uat-deployment-97cbcfc6 0 0 0 17d
replicaset.apps/[redacted]-uat-deployment-c776f648d 0 0 0 10d
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
horizontalpodautoscaler.autoscaling/[redacted]-uat-deployment Deployment/[redacted]-uat-deployment 4%/80% 3 6 3 9h
Ok I managed to figure this out by looking up the schema for the HPA (https://docs.okd.io/latest/rest_api/apis-autoscaling/v2beta1.HorizontalPodAutoscaler.html).
In short, I was using the wrong metric type (as above you can see I am using "Pods", but I should be using "External").
The correct HPA specification is:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-uat-active-connections
namespace: default
spec:
minReplicas: 3
maxReplicas: 6
metrics:
- type: External
external:
metricName: custom.googleapis.com|nginx-ingress-controller|nginx_ingress_controller_nginx_process_connections
metricSelector:
matchLabels:
metric.labels.state: active
resource.labels.cluster_name: [redacted]
targetAverageValue: 5
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: [redacted]
As soon as I did this, things worked right away:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpa-uat-active-connections Deployment/bustle-webapp-uat-deployment 334m/5 (avg) 3 6 3 30s