Why is Kubernetes HPA scaling not down (Memory)?

6/23/2020

Summary

in our Kubernetes-cluster we introduced a HPA whit memory and cpu limits. Right now we do not understand why we have 2 replicas of one service.

The service in question uses 57% / 85% Memory and has 2 replicas instead of one. We think that it is because when you sum up the memory of both pods it is more than 85% but it would not be if there would be only one pod. So is this preventing it from scale down? What can we do here?

We also observe a peak in memory usage when we deploy a service. We are using spring-boot services in aks (azure) and think it may scales up there and never down. Do we miss something or has anyone a suggestion?

Helm

hpa:

{{- $fullName := include "app.fullname" . -}}
{{- $ := include "app.fullname" . -}}

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: {{ $fullName }}-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ include "app.name" . }}
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        targetAverageUtilization: 50
    - type: Resource
      resource:
        name: memory
        targetAverageUtilization: 85

and in the deployment:

# Horizontal-Pod-Auto-Scaler
          resources:
            requests:
              memory: {{ $requestedMemory }}
              cpu: {{ $requesteCpu }}
            limits:
              memory: {{ $limitMemory }}
              cpu: {{ $limitCpu }}

with service defaults:

hpa:
  resources:
    request:
      memory: 500Mi
      cpu: 300m
    limits:
      memory: 1000Mi
      cpu: 999m

kubectl get hpa -n dev

NAME                            REFERENCE                              TARGETS           MINPODS   MAXPODS   REPLICAS   AGE
xxxxxxxx-load-for-cluster-hpa   Deployment/xxxxxxxx-load-for-cluster   34%/85%, 0%/50%   1         10        1          4d7h
xxx5-ccg-hpa                    Deployment/xxx5-ccg                    58%/85%, 0%/50%   1         10        1          4d12h
iotbootstrapping-service-hpa    Deployment/iotbootstrapping-service    54%/85%, 0%/50%   1         10        1          4d12h
mocks-hpa                       Deployment/mocks                       41%/85%, 0%/50%   1         10        1          4d12h
user-pairing-service-hpa        Deployment/user-pairing-service        41%/85%, 0%/50%   1         10        1          4d12h
aaa-registration-service-hpa    Deployment/aaa-registration-service    57%/85%, 0%/50%   1         10        2          4d12h
webshop-purchase-service-hpa    Deployment/webshop-purchase-service    41%/85%, 0%/50%   1         10        1          4d12h

kubectl describe hpa -n dev

Name:                                                     xxx-registration-service-hpa
Namespace:                                                dev
Labels:                                                   app.kubernetes.io/managed-by=Helm
Annotations:                                              meta.helm.sh/release-name: vwg-registration-service
                                                          meta.helm.sh/release-namespace: dev
CreationTimestamp:                                        Thu, 18 Jun 2020 22:50:27 +0200
Reference:                                                Deployment/xxx-registration-service
Metrics:                                                  ( current / target )
  resource memory on pods  (as a percentage of request):  57% (303589376) / 85%
  resource cpu on pods  (as a percentage of request):     0% (1m) / 50%
Min replicas:                                             1
Max replicas:                                             10
Deployment pods:                                          2 current / 2 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from memory resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:           <none>

if any further informations are needed pls feel free to ask!

Thank you so much for taking the time!

Cheers Robin

-- Robin
azure-aks
hpa
kubernetes

1 Answer

6/23/2020

The formula for determining the desired replica count is:

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

The important part of this for your question is the ceil[...] function wrapper: it always rounds up to the next nearest replica. If currentReplicas is 2 and desiredMetricValue is 85%, then currentMetricValue must be 42.5% or lower to trigger scale-down.

In your example, currentMetricValue is 57%, so you get

desiredReplicas = ceil[2 * (57 / 85)]
                = ceil[2 * 0.671]
                = ceil[1.341]
                = 2

You are right that, if currentReplicas were 1, HPA also wouldn't feel a need to scale up; actual utilization would need to climb above 85% to trigger it.

-- David Maze
Source: StackOverflow