GKE K8 HPA is unable to get stackdriver metric

1/9/2020

we have a k8 gke cluster, we want our to pod to be scaled via custom metric exposed by our application logic to stackdriver

i am able to push the metric and able to see in metric explorer image

we are able to see the metric in k8 custom metric list kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | python -m json.tool | grep -a10 num_drivers_per_pod

{
            "kind": "MetricValueList",
            "name": "*/custom.googleapis.com|num_drivers_per_pod",
            "namespaced": true,
            "singularName": "",
            "verbs": [
                "get"
            ]
        }

we have successfully installed stackdriver adapter and is running along with heapster

but when we deploy the given HPA manifest

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: custom-metric-sd-num-drivers
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: test-ws-api-server
  minReplicas: 1
  maxReplicas: 5
  metrics:
    - type: Pods
      pods:
        metricName: "num_drivers_per_pod"
        targetAverageValue: 2

k8 cluster is unable to fetch the metric with the following message

Name:               custom-metric-sd-num-drivers
Namespace:          default
Labels:             <none>
Annotations:        autoscaling.alpha.kubernetes.io/conditions:
                      [{"type":"AbleToScale","status":"True","lastTransitionTime":"2020-01-07T14:26:25Z","reason":"SucceededGetScale","message":"the HPA control...
                    autoscaling.alpha.kubernetes.io/current-metrics:
                      [{"type":"External","external":{"metricName":"custom.googleapis.com|num_drivers_per_pod","currentValue":"0","currentAverageValue":"1"}}]
                    autoscaling.alpha.kubernetes.io/metrics: [{"type":"Pods","pods":{"metricName":"num_drivers_per_pod","targetAverageValue":"2"}}]
                    kubectl.kubernetes.io/last-applied-configuration:
                      {"apiVersion":"autoscaling/v2beta1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"custom-metric-sd-num-drivers","n...
CreationTimestamp:  Tue, 07 Jan 2020 19:56:10 +0530
Reference:          Deployment/test-ws-api-server
Min replicas:       1
Max replicas:       5
Deployment pods:    1 current / 1 desired
Events:
  Type     Reason               Age                   From                       Message
  ----     ------               ----                  ----                       -------
  Warning  FailedGetPodsMetric  47s (x6237 over 27h)  horizontal-pod-autoscaler  unable to get metric num_drivers_per_pod: no metrics returned from custom metrics API

following is the code for pushing our metrics

def put_k8_pod_metric(metric_name,value,metric_type="k8s_pod"):
    try:
        client = monitoring_v3.MetricServiceClient()
        series = monitoring_v3.types.TimeSeries()
        series.metric.type = f'custom.googleapis.com/{metric_name}'
        series.resource.type = metric_type
        series.resource.labels['project_id'] = os.getenv("PROJECT_NAME")
        series.resource.labels['location'] = os.getenv("POD_LOCATION","asia-south1")
        series.resource.labels['cluster_name'] = os.getenv("CLUSTER_NAME","data-k8cluster")
        series.resource.labels['namespace_name'] = "default"
        series.resource.labels['pod_name'] = os.getenv("MY_POD_NAME","wrong_pod")
        point = series.points.add()
        point.value.double_value = value
        now = time.time()
        point.interval.end_time.seconds = int(now)
        point.interval.end_time.nanos = int(
            (now - point.interval.end_time.seconds) * 10**9)
        project_name = client.project_path(os.getenv('PROJECT_NAME'))
        client.create_time_series(project_name, [series],timeout=2)
        logger.info(f"successfully send the metric {metric_name} with value {value}")
    except Exception as e:
        traceback.print_exc()
        logger.info(f"failed to send the metric {metric_name} with value {value}")

can you guys give pointers of where to look and what can be causing the problem

Hey Just solved the problem with bumping the deployment apiversion as well moving back to gke_container resource type. I have published a simple repo in python to achieve the same gke-hpa-custom-metric-python

-- navdeep
google-cloud-platform
google-kubernetes-engine
kubernetes

1 Answer

1/10/2020

I found this post that describes a similar issue to yours.

There may be a mixup relating to "external.metrics" and "custom.metrics".

Here the type is set to “External” but the name indicates "custom":

[{"type":"External","external":{"metricName":"custom.googleapis.com

The "type:" value in your HorizontalPodAutoscaler should be looked at.

For custom metrics it should indicate "type: Object" and for external metrics it should indicate "type: External" as mentioned here

Edit.

From what I can understand, there is 4 thing here:
- MetricValueList
- Stackdriver Metrics Explorer
- The HorizontalPodAutoscaler
- The Python script

Since you can view the metric in the Metric Explorer, this rules out an issue with the MetricValueList and your Python script.

Knowing this, the issue is most likely in the HorizontalPodAutoscaler or around it.

The fact that this command did not return any item is an issue

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/num_drivers_per_pod"
-- Frederic G
Source: StackOverflow