Kubernetes HPA based on JVM Heap memory

8/26/2021

I have an openjdk:8 image running on the Kubernetes cluster. I added memory HPA (Horizontal Pod Autoscaling) which scales up fine but since JVM doesn't release the memory back from the heap to the OS, pods do not scale down. Following is the hpa.yaml

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: image-server
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: image-server
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 60

One way to solve this is to use the right GC and make it release the memory, but since JVM has been designed to not release from the heap frequently for performance reasons, doing this isn't a good idea. Is there a way to handle this from Kubernetes? Like instead of checking OS's memory usage, can we not just check the memory usage from heap and scale on that?

-- Manoj Suthar
garbage-collection
jvm
kubernetes

1 Answer

8/26/2021

Scaling Java applications in Kubernetes is a bit tricky. The HPA looks at system memory only and as pointed out, the JVM generally do not release commited heap space (at least not immediately).

There are two main approaches one could take to solve this

1. Tune JVM Parameters so that the commited heap follows the used heap more closely

Depending on which JVM and GC is in use the tuning options may be slightly different, but the most important ones would be

  • MaxHeapFreeRatio - How much of the commited heap that is allowed to be unused
  • GCTimeRatio - How often GC is allowed to run (impacts performance)
  • AdaptiveSizePolicyWeight - How to weigh older vs newer GC runs when calculating new heap

Giving exact values for these are not easy, it is a compromise between releasing memory fast and application performance. The best settings will be dependant on the load characteristics of the application.

Patrick Dillon has written an article published by RedHat called Scaling Java containers that deep dives into this subject.

2. Custom scaling logic

Instead of using the HPA you could create your own scaling logic and deploy it into Kubernetes as a job running periodically to do: 1. Check the heap usage in all pods (for example by running jstat inside the pod) 2. Scale out new pods if the max threshold is reached 3. Scale in pods if the min threshold is reached

This approach has the benefit of looking at the actual heap usage, but requires a custom component.

An example of this can be found in the article Autoscaling based on CPU/Memory in Kubernetes — Part II by powercloudup

-- danielorn
Source: StackOverflow