I have a very simple query, in a very simple table:
def get_defaults(domain):
defaults = BaseDefaults.query.filter_by(domain=domain).first()
return defaults
The table has (no kidding) 3 rows, with 34 columns, 1 PK, 1 Unique, 2 FK. of the following types:
Timestamp: 2 Cols
Integer: 5 Cols
Booleans: 8 Cols
VarChar(100) & Varchar(250): 19 Cols
Usually, around 5-8 of these columns have null values. This function runs in the context of a Celery task, which was exploding in memory (1.2Gb), while another task that's running uses around 110Mb.
So I profiled the code with memory_profiler
and ends up that this specific function defaults = get_defaults(domain)
is eating around 800-900Mb of RAM, which makes absolutely no sense.
I don't see the same behavior locally - it only happens within a Kubernetes Cluster (hosted on DigitalOcean), so it's been hard to understand what might be going on.
The function queries a Postgres DB hosted on RDS, which seems to be working fine (queries from local pc, or from SQL Client work just fine less than 200ms).
I have also found that the bulk of the high memory consumption happens during the first time the query is run, on the second time the query runs (another task run), the same line increases usage by about 120Mb, which seems more plausible. So might be the case that the session management in Flask-SQLAlchemy within Celery is getting in the way also.
It's consuming a great deal of RAM within the cluster and threatening a node's health. If I limit the pod consumption to 512Mb, the pods get created but dies right afterward.
Any ideas on how to troubleshoot, optimize, fix this issue?
For anyone that's going through something similar, I fixed my problem with lazy=True
in the backref
model declaration.
This wasn't a problem until a completely different table in the database started to grow fast - we were using lazy='joined'
which would automatically join every table that had relationships declared with BsaeDefaults
.
By using lazy=True
you only load the table you've queried, so memory consumption in the pod dropped from 1.2Gb to 140Mb.