I have a .net core API using Entity Framework Core. The DB context is registered in startup.cs like this:
services.AddDbContext<AppDBContext>(options =>
options.UseSqlServer(connectionString,
providerOptions => providerOptions.CommandTimeout(60)));
In connection string I set
Pooling=true;Max Pool Size=100;Connection Timeout=300
The controller calls methods in a service which in turn makes calls to aysnc methods in a repo for data retrieval and processing.
All worked well if concurrent user is under 500 during load testing. However beyond that number I start to see a lot of timeout expired errors. When I checked the database, there's no deadlock but I could see well over 100 connections in sleeping mode(the API is hosted on two kubernetes pods). I monitored these connections during the testing and it appeared that instead of current sleeping connections being reused, new ones were added to the pool. My understanding is entity framework core manages opening and closing connections but this didn't seem to be the case. Or am I missing anything?
The error looks like this:
StatusCode":500,"Message":"Error:Timeout expired
. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached. Stack Trace:
at
Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)\n
at
Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource
1 retry, DbConnectionOptions userOptions)\n at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource
1 retry, SqlConnectionOverrides overrides)\n at Microsoft.Data.SqlClient.SqlConnection.Open(SqlConnectionOverrides overrides)\n
at Microsoft.Data.SqlClient.SqlConnection.Open()\n at
Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternal(Boolean errorsExpected)\n
at
Microsoft.EntityFrameworkCore.Storage.RelationalConnection.Open(Boolean errorsExpected)\n at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.BeginTransaction(IsolationLevel isolationLevel)\n.....................
An example of how the dbcontext
was used:
the controller calls a method in a service class:
var result = await _myservice.SaveUserStatusAsync(userId, status);
then in 'myservice'
:
var user = await _userRepo.GetUserAsync(userId);
....set user status to new value and then
return await _userRepo.UpdateUserAsync(user);
then in 'userrepo'
:
_context.user.Update(user);
var updated = await _context.SaveChangesAsync();
return updated > 0;
Update:
Thanks very much to Ivan Yang who generously offered the bounty. Although I'm still investigating, I've learned a lot by reading all the comments and answers below. Here is what I've tried so far: I increased the pool size to 200 (I know it's not the right way to deal with the issue), increased the number of pods so that the API now runs on 4 pods and allocated more memory to each pod. The end result so far has been good:500 errors disappear completely with up to 2000 concurrent users. I will update this question with my findings after I try other options.
Error:Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
This is almost always a connection leak. And here the fact that your queries are short-running, and you see idle connections on the server confirms it. Somewhere you're leaving an open connection.
A DbContext will open/close the underlying connection, and return it to the pool on Dispose. But if you start a transaction on a connection and don't commit or rollback, the connection will be segregated in the pool and won't be reused. Or if you return an IEnumerable
or a DataReader
that never gets iterated and disposed, the connection can't be reused.
Look at the "sleeping" sessions to see what their last query was, and cross-reference that with your code to track down the call site that leaked the connection. First try the DMVs, eg
select s.session_id, s.open_transaction_count, ib.event_info
from sys.dm_exec_sessions s
cross apply sys.dm_exec_input_buffer(s.session_id,null) ib
Or start an Extended Events trace if necessary.
connection leaks cause such problems, connections probably is not getting closed properly unless Garbage collector is used to trash all those dangling connections using IDisposable
, finally clause can be added to ensure that connections are closed after its use.
link is helpful to understand this issue.
As far as entity framework concerned, max pool size can be achieved by keeping many object in database context, while you can materialize them using FirstOrDefault
or ToList
functions, as queries can hold connections to database server
Note that DbContext
implements IDisposable
.
Best practice (for many reasons, not just connection management) is to new up your DbContext
in a using statement:
using(MyContext context = new MyContext())
{
// do your work
}
I wrote a small library that assists you in implementing and enforcing a pattern such as this.
Try to turn off the pooling. The pooling trys to keep the connections open. If you turn it off, it uses the provider timeout of 60 seconds you set in the DI