Preventing a python process from being killed due to out of memory

3/19/2020

We're running into an issue where python is killed by the OS / k8s due to running out of memory. I see there is the MemoryError exception, but this is never thrown.

I see this is a fairly common problem with oomkiller, and getting a proper exception instead of a kill -9 to the face seems nearly impossible.

What are good design patterns to avoid this? We are using a pub/sub based job queue.

  • Running each job in a separate process - possible, but stops us from caching expensive model load operations.
  • Keeping track of which jobs started but didn't finish, and on the pod encountering the message too many times, ack it as failing - just really quite inefficient on restarts and so on.
-- Sander
kubernetes
linux
python

0 Answers