We have Azure Kubernetes clusters running (one per stage on DEV, TST, PRD) where we need several Python scripts to run periodically, which is what we are using APScheduler (3.6.0) for. The default in-memory jobstore is used.
Some days ago, however, I found out that APScheduler is behaving different than expected. It happens on all three stages:
Steps undertaken but not with desired result:
Increase the number of process_pool_workers and thread_pool_max_workers and set misfire_grace_time:
Executor thread_pool_max_workers: 50 process_pool_max_workers: 20 Job defaults job_defaults_coalesce: True job_defaults_max_instances: 3 misfire_grace_time: 120
I also have checked the resources of the cluster, but CPU and memory of the scheduler are not even close to their limits. We also have quite a low avg. active pod count of 25, and even it this would become an issue to our K8s cluster, autoscaling is enabled.
Does anyone here have a clue what might be going on?
Don't use in-memory job store instead go for persistence stores like Redis, mongo etc., If you need your jobs to persist over scheduler restarts or application crashes, you've to choose the persistence job stores.
APscheduler supports the following persistence job stores.