Small Flask-SQLAlchemy Query taking 900Mb of RAM within Celery Task (Kubernetes Cluster)

2/12/2021

I have a very simple query, in a very simple table:

def get_defaults(domain):
    defaults = BaseDefaults.query.filter_by(domain=domain).first()
    return defaults

The table has (no kidding) 3 rows, with 34 columns, 1 PK, 1 Unique, 2 FK. of the following types:

Timestamp: 2 Cols
Integer: 5 Cols
Booleans: 8 Cols
VarChar(100) & Varchar(250): 19 Cols

Usually, around 5-8 of these columns have null values. This function runs in the context of a Celery task, which was exploding in memory (1.2Gb), while another task that's running uses around 110Mb.

So I profiled the code with memory_profiler and ends up that this specific function defaults = get_defaults(domain) is eating around 800-900Mb of RAM, which makes absolutely no sense.

I don't see the same behavior locally - it only happens within a Kubernetes Cluster (hosted on DigitalOcean), so it's been hard to understand what might be going on.

The function queries a Postgres DB hosted on RDS, which seems to be working fine (queries from local pc, or from SQL Client work just fine less than 200ms).

I have also found that the bulk of the high memory consumption happens during the first time the query is run, on the second time the query runs (another task run), the same line increases usage by about 120Mb, which seems more plausible. So might be the case that the session management in Flask-SQLAlchemy within Celery is getting in the way also.

It's consuming a great deal of RAM within the cluster and threatening a node's health. If I limit the pod consumption to 512Mb, the pods get created but dies right afterward.

Any ideas on how to troubleshoot, optimize, fix this issue?

-- lowercase00
celery
flask
flask-sqlalchemy
kubernetes
python

1 Answer

2/13/2021

For anyone that's going through something similar, I fixed my problem with lazy=True in the backref model declaration.

This wasn't a problem until a completely different table in the database started to grow fast - we were using lazy='joined' which would automatically join every table that had relationships declared with BsaeDefaults.

By using lazy=True you only load the table you've queried, so memory consumption in the pod dropped from 1.2Gb to 140Mb.

-- lowercase00
Source: StackOverflow