SQLAlchemy and Kubernetes - OperationalError 2013 - "Lost Connection to MYSQL server during query"

2/6/2020

Problem

Intermittent Operational Error 2013 - Lost connection during query... many times a day and apart from that it behaves normally.


Infra. Overview:

We're using K8s cluster (created using kops) to deploy our applications but our primary DB server is not yet migrated to kubernetes which means it still resides in a separate EC2 instance outside kubernetes cluster.

We use Flask_SQLAlchemy to connect our flask app to MySQL.

I've created mysql service in my namespace with endpoint pointing to private IP of Mysql's EC2 instance.


As I know, similar questions related to OperationalError have been already asked and I've applied almost all of the fixes suggested.

  • sqlalchemy_pool_recycle = <way-less-than-servers-wait_timeout> // 900 (mysql's is 28800)
  • sqlalchemy's pre_ping is set to true
  • kernel's nf_conntrack_tcp_timeout_close_wait = 3600 (but pool will be recycled in 900)

  • Flask_sqlAlchemy - v2.1
  • SqlAlchemy - v1.3.12
  • Flask - v0.11
  • Kops - v.1.15.1
  • kubernetes - 1.15.7
  • Mysql - 8.x

Please help me to understand it deeper. Suggestions welcomed.

-- oli
kops
kubernetes
mysql
python
sqlalchemy

1 Answer

3/11/2020

its flask app config issue. you can update in app config below settings that should fix the issue.

SQLALCHEMY_POOL_SIZE = 50 # scale app for pool size # default 10

SQLALCHEMY_POOL_RECYCLE = 250 # pool recycle time in seconds 

Flask_sqlAlchemy has issue to close session , you need update in mysql config as well .

[mysql]
lock_wait_timeout=300

It should less than SQLALCHEMY_POOL_RECYCLE as mentioned in flask documentation here :- https://flask-sqlalchemy.palletsprojects.com/en/2.x/config/#timeouts

"it is recommended that you set SQLALCHEMY_POOL_RECYCLE to a value less than your backend’s timeout".

-- ANISH KUMAR MOURYA
Source: StackOverflow