In general, is it better to configure a Tomcat REST service with a rational limit of max threads, or set it to an effectively infinite value?

1/16/2020

There isn't a single answer for this, but I don't know where else to ask this question.

I work on a large enterprise system that uses Tomcat to run REST services running in containers, managed by kubernetes.

Tomcat, or really any request processor, has a "max threads" property, such that if enough requests come in that cause creation of many threads, if the number of created threads reaches that defined limit, it will put additional requests into a queue (limited by the value of another property), and then possibly requests will be rejected after that queue is full.

It's reasonable to consider whether this property should be set to a value that could possibly be reached, or whether it should be set to effective infinity.

There are many scenarios to consider, although the only interesting ones are when traffic is extremely higher than normal, either from real customer traffic, or malicious ddos traffic.

In managed container environments, and other similar cases, this also begs the question of how many instances, pods, or containers should be running copies of the service. I would assume you would want to have as few of these as possible, to reduce duplication of resources for each pod, which would increase the average number of threads in each container, but I would assume that's better than spreading them thinly across a set of containers.

Some members of my team think it's better to set the "max threads" property to effective infinity.

What are some reasonable thoughts about this?

-- David M. Karr
java
kubernetes
multithreading
rest
tomcat

1 Answer

1/17/2020

As a general rule, I'd suggest trying to scale by running more pods (which can easily be scheduled on multiple hosts) rather than by running more threads. It's also easier for the cluster to schedule 16 1-core pods than to schedule 1 16-core pod.

In terms of thread count, it depends a little bit on how much work your process is doing. A typical Web application spends most of its time talking to the database, and does a little bit of local computation, so you could often set it to run 50 or 100 threads but still with a limit of 1.0 CPU, and be effectively using resources. If it's very computation-heavy (it's doing real image-processing or machine-learning work, say) you might be limited to 1 thread per CPU. The bad case is where your process is allocating 16 threads, but the system only actually has 4 cores available, in which case your process will get throttled but you really want it to scale up.

The other important bad state to be aware of is the thread pool filling up. If it does, requests will get queued up, as you note, but if some of those requests are Kubernetes health-check probes, that can result in the cluster recording your service as unhealthy. This can actually lead to a bad spiral where an overloaded replica gets killed off (because it's not answering health checks promptly), so its load gets sent to other replicas, which also become overloaded and stop answering health checks. You can escape this by running more pods, or more threads. (...or by rewriting your application in a runtime which doesn't have a fixed upper capacity like this.)

It's also worth reading about the horizontal pod autoscaler. If you can connect some metric (CPU utilization, thread pool count) to say "I need more pods", then Kubernetes can automatically create more for you, and scale them down when they're not needed.

-- David Maze
Source: StackOverflow