I have a server I wrote that allows us to "Kick off" processing and or manage CRON jobs to do the processing on different schedules.
If I'm running my container in a horizontally scalable way how to I recover from crashed containers or inform new containers that a CRON job is being handled by another container?
Should I store my cron job definitions and their state in a database? If so how do I record state if a container crashes and does not have the opportunity to write state to the DB?
Should I break out my container definition into a separate container and only ever run one?
If you want your application code to be stateless, then yes, you will need to store it in a database to be resilient to containers crashing.
However, I think you're asking how other containers will pick up and retry the failed cron jobs if a container crashes. You're now looking at designing a distributed job scheduler. Rolling your own is a significant amount of work, and there are already many off-the-shelf solutions out there.
Thankfully, you're already running a distributed job scheduler, Kubernetes! You could take advantage of Kubernetes CronJob feature. If you configure your application to be able to talk to the Kubernetes API, your application can create CronJob objects, and leave the rest to the scheduler.
Why do you think you need to do anything different just because you have virtualized your servers?
Scheduling is probably in /etc/cron.d
and running-state through pid-files in /run/
. Just share this directory with the containers that need it. The mechanisms should be transaction based if several actions rely on each other, just like a normal cronjob.
More impotant is, how you orchestrate the containers.
I'd have a cronjob-etc container which launches a new container for each job at the right time. You don't need containers running when their job is not active.
If you have less than 1000 jobs, it would be sufficient to store additional information on the scheduling/launcher container in a sqlite3 file.