How to manage Java Spring applications autoscaling in Kubernetes PROPERLY?

5/29/2019

I'm trying to set up autoscaling in Kubernetes (hosted in Google Kubernetes Engine) for my Java Spring application. I have faced two problems:

  1. Spring application uses a lot of cpu at the start (something like 250mCPU*, but sometimes it is even 500mCPU) which really breaks autoscaling, because some instances of that application, after more or less than 1 minute (Spring context start etc.), use only 50mCPU. Because at some environments that aplication uses small amount of mCPU (and almost at every environment at night), I would like to set requested cpu=200mCPU max (=80% limit cpu) (or even less!). So then autoscaling would have much more sense. But I can't really do that, because of that heavy start of Spring, which won't be finished if i give him too less cpu.

  2. When application starts receiving traffic (when new pod is created because of autoscaling event) at the beginning its cpu usage can jump to something like 200% of standard usage, and then go back to that 100% - it doesn't look like it's because of too many request are being pushed to that new pod, it looks more like JVM is just slower at the start and he receives too much traffic at the begging. It looks like JVM would need something like warm up (so don't push 1/n of traffic to new pod suddenly, but switch traffic to that new pod slower). Thanks to that behaviour autoscaling sometimes get crazy - when it really needs just one pod more, it can scale up a lot of them, and then scale down...

* in GKE 1000mCPU = 1 core

On uploaded images we can see cpu charts. In the first, we can see that cpu usage after start is much smaller than at the beginning. In the second, we can spot both problems: high cpu usage at the start, then grace period (readiness probe initial* delay hasn't finished), and then high pick at the beginning of receiving traffic.

* I have set readiness probe initial delay to be longer than context loading.

Chart 1 Chart 2

The only thing that I've found in the internet is to add container to that pod, which will do nothing but "sleep x", and then die. And add set to that container requested mCPU to amount which will be used at spring app startup (then I would have to increase cpu limit for that spring app container, but it shouldn't harm anyway, because autoscaling should prevent spring app from starving other apps in the node).

I would really appreciate any advice.

-- DawPawel
autoscaling
java
jvm
kubernetes
spring

1 Answer

1/31/2020

It is true that Spring (or actually Java) applications are not the most container friendly thing out there but there are few things you can try:

  1. On startup, Spring autowires the beans and performs dependency injection, creates objects in memory, etc. All of those things are CPU intensive. If you assign less CPU to your pod, it will logically increase the startup time. Things you can do here are:

    • Use a startupProbe and give time to your application to start. It is explained pretty good here on how to calculate the delays and thresholds

    • Adjust the maxSurge and maxUnavailable in your deployment strategy as it fits best to your case (for example, maybe you have 10 replicas and max surge /max unavailable of 10% so your pods will rollout slowly, one by one). This will help to reduce spikes in traffic on the overall application replicas (docs are here).

    • If your use case allows, you can look into lazy loading your Spring application, meaning that it will not create all objects upon startup, rather it will wait until they are used. This can be somewhat dangerous due to potentially not being able to discover issues on startup in some cases.

  2. If you have HPA enabled + defined replicas value in the deployment, you might experience issues upon deploying, I can't find the relevant GH issue ATM but you might want to run some tests there on how it behaves (scaling more than it should, etc). Things you can do here are:

    • Tweak the autoscaling thresholds and times (default is 3min, afaik) to allow your deployments to rollout smoothly without triggering the autoscale.

    • Write a custom autoscaling metric instead of scaling by CPU. This one requires some work but might solve your scaling issues for good (relevant docs).

Lastly, what you are suggesting with a sidecar looks like a hack :) Haven't tried it though so can't really tell the pros and cons.

Unfortunately, there is no silver bullet for Spring Boot (or Java) + K8s but things are getting better than they were a few years back. If I find some helpful resources. I will come back and link them here.

Hope the above helps.

Cheers

-- Urosh T.
Source: StackOverflow