I have an openjdk:8 image running on the Kubernetes cluster. I added memory HPA (Horizontal Pod Autoscaling) which scales up fine but since JVM doesn't release the memory back from the heap to the OS, pods do not scale down. Following is the hpa.yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: image-server
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: image-server
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 60
One way to solve this is to use the right GC and make it release the memory, but since JVM has been designed to not release from the heap frequently for performance reasons, doing this isn't a good idea. Is there a way to handle this from Kubernetes? Like instead of checking OS's memory usage, can we not just check the memory usage from heap and scale on that?
Scaling Java applications in Kubernetes is a bit tricky. The HPA looks at system memory only and as pointed out, the JVM generally do not release commited heap space (at least not immediately).
There are two main approaches one could take to solve this
Depending on which JVM and GC is in use the tuning options may be slightly different, but the most important ones would be
MaxHeapFreeRatio
- How much of the commited heap that is allowed to be unusedGCTimeRatio
- How often GC is allowed to run (impacts performance)AdaptiveSizePolicyWeight
- How to weigh older vs newer GC runs when calculating new heapGiving exact values for these are not easy, it is a compromise between releasing memory fast and application performance. The best settings will be dependant on the load characteristics of the application.
Patrick Dillon has written an article published by RedHat called Scaling Java containers that deep dives into this subject.
Instead of using the HPA you could create your own scaling logic and deploy it into Kubernetes as a job running periodically to do: 1. Check the heap usage in all pods (for example by running jstat inside the pod) 2. Scale out new pods if the max threshold is reached 3. Scale in pods if the min threshold is reached
This approach has the benefit of looking at the actual heap usage, but requires a custom component.
An example of this can be found in the article Autoscaling based on CPU/Memory in Kubernetes — Part II by powercloudup