I'm running an ASP.NET Core API on Linux, on Kubernetes in the Google Cloud.
This is an API with high load, and on every request it's executing a library doing a long (1-5 seconds), CPU-intensive operation.
What I see is that after deployment the API works properly for a while, but after 10-20 minutes it becomes unresponsive, and even the health check endpoint (which just returns a hardcoded 200 OK
) stops working and times out. (This makes Kubernetes kill the pods.)
Sometimes I'm also seeing the infamous Heartbeat took longer than "00:00:01"
error message in the logs.
Googling these phenomenons points me to "Thread starvation", so that there are too many thread pool threads started, or too many threads are blocking waiting on something, so that there are no threads left in the pool which could pick up ASP.NET Core requests (hence the timeout of even the health check endpoint).
What is the best way to troubleshoot this issue? I started monitoring the numbers returned by ThreadPool.GetMaxThreads
and ThreadPool.GetAvailableThreads
, but they stayed constant (the completion port is always 1000
both for max and available, and the worker is always 32767
).
Is there any other property I should monitor?
Are you sure your ASP.NET Core web app is running out of threads? It may be it is simply saturating all available pod resources, causing Kubernetes to just kill down the pod itself, and so your web app.
I did experience a very similar scenario with an ASP.NET Core web API running on Linux RedHat within an OpenShift environment, which also supports the pod concept as in Kubernetes: one call required approximately 1 second to complete and, under large workload, it became first slower and then unresponsive, causing OpenShift to kill down the pod, and so my web app.
It may be your ASP.NET Core web app is not running out of threads, especially considering the high amount of worker threads available in the ThreadPool. Instead, the number of active threads combined with their CPU need is probably too large compared to the actual millicores available within the pod where they are running: indeed, after being created, those active threads are too many for the available CPU that most of them end up being queued by the scheduler and waiting for execution, while only a bunch will actually run. The scheduler then does its job, making sure CPU is shared fairly among threads, by frequently switching those that would use it. As for your case, where threads require heavy and long CPU bound operations, over time resources get saturated and the web app becomes unresponsive.
A mitigation step may be providing more capacity to your pods, especially millicores, or increase the number of pods Kubernetes may deploy based on need. However, in my particular scenario this approach did not help much. Instead, improving the API itself by reducing the execution of one request from 1s to 300ms sensibly improved the overall web application performance and actually solved the issue.
For example, if your library performs the same calculations in more than one request, you may consider introducing caching on your data structures in order to enhance speed at the slight cost of memory (which worked for me), especially if your operations are mainly CPU bound and if you have such request demands to your web app. You may also consider enabling cache response in ASP.NET Core too if that makes sense with the workload and responses of your API. Using cache, you make sure your web app does not perform the same task twice, freeing up CPU and reduce the risk of queuing threads.
Processing each request faster will make your web app less prone to the risk of filling up the available CPU and therefore reduce the risk of having too many threads queued and waiting for execution.
Generally speaking, long-running work is anathema for web applications. You want sub-second response times for a healthy web app. This is particularly true if the work you need to do is synchronous or CPU-bound. Async can at least free up threads during the process, but with CPU-bound work, the thread is hog-tied.
You should off-load whatever you're doing to a different process and then monitor the progress. For an API, the typical approach here is to schedule the work on a different process and then immediately return a 202 Accepted, with an endpoint in the response body the client can utilize to monitor the progress/get the eventual completed result. You could also implement a webhook, which the client may register to receive notification that the process has completed, without having to constantly check on it.
Your only other option is to throw more resources at the problem. For example, you could stage multiple instances behind a load balancer, divvying requests between each instance to reduce the overall load on each.
It's also entirely possible that there's some inefficiency or issue in your code that could be corrected to either reduce the amount of time the process takes and/or the resources being consumed. As a trivial example, say if you're using something like Task.Run
, you could potentially free up a ton of threads by not doing that. Task.Run
should pretty much never be used within the context of a web application. However, you have not posted any code, so it's impossible to give you exact guidance there.