Kubernetes HPA doesn't scale down after decreasing the loads


the Kubernetes HPA works correctly when load of the pod increased but after the load decreased, the scale of deployment doesn't change. This is my HPA file:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
  name: baseinformationmanagement
  namespace: default
    apiVersion: apps/v1
    kind: Deployment
    name: baseinformationmanagement
  minReplicas: 1
  maxReplicas: 3
  - type: Resource
      name: cpu
        type: Utilization
        averageUtilization: 80
  - type: Resource
      name: memory
        type: Utilization
        averageUtilization: 80

My kubernetes version:

> kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.1", GitCommit:"d647ddbd755faf07169599a625faf302ffc34458", GitTreeState:"clean", BuildDate:"2019-10-02T17:01:15Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:22:30Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}

And this is my HPA describe:

> kubectl describe hpa baseinformationmanagement
Name:                                                     baseinformationmanagement
Namespace:                                                default
Labels:                                                   <none>
Annotations:                                              kubectl.kubernetes.io/last-applied-configuration:
CreationTimestamp:                                        Sun, 27 Sep 2020 06:09:07 +0000
Reference:                                                Deployment/baseinformationmanagement
Metrics:                                                  ( current / target )
  resource memory on pods  (as a percentage of request):  49% (1337899008) / 70%
  resource cpu on pods  (as a percentage of request):     2% (13m) / 50%
Min replicas:                                             1
Max replicas:                                             3
Deployment pods:                                          2 current / 2 desired
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from memory resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:           <none>
Your HPA specifies both memory and CPU targets. The Horizontal Pod Autoscaler documentation notes:

If multiple metrics are specified in a HorizontalPodAutoscaler, this calculation is done for each metric, and then the largest of the desired replica counts is chosen.

The actual replica target is a function of the current replica count and the current and target utilization (same link):

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

For memory in particular: currentReplicas is 2; currentMetricValue is 49; desiredMetricValue is 80. So the target replica count is

desiredReplicas = ceil[       2        * (         49        /         80         )]
desiredReplicas = ceil[       2        *                   0.61                    ]
desiredReplicas = ceil[                          1.26                              ]
desiredReplicas = 2

Even if your service is totally idle, this will cause there to be (at least) 2 replicas, unless the service chooses to release memory back to the OS; that's usually up to the language runtime and a little out of your control.

Just removing the memory target and autoscaling based only on CPU might match better what you expect.

