As explained in the Kuberenetes docs on the topic of jobs:
The
activeDeadlineSeconds
applies to the duration of the job, no matter how many Pods are created. Once a Job reachesactiveDeadlineSeconds
, all of its running Pods are terminated and the Job status will becometype: Failed
withreason: DeadlineExceeded
.
However, what I want to do is limit the time of each pod. If a pod takes too long, I want it to fail, but I want the other pods to continue, and for the job to create more pods if necessary.
I'll explain a bit about my task, just to make the problem crystal clear. The job consists of taking items from a Redis database, where the database serves as a sort of queue. Each pod processes one item (well, the number might vary). If a pod takes too long processing an item, I want it to fail. However, the other pods should continue, and the job should continue creating pods and retrieving more items from the database.
Your use case seems identical to this example from the kubernetes docs.
As you said, activeDeadlineSeconds
is not the parameter you should be using here.
I'm not sure why do you want the pod to fail if it can't process an item in a given time frame. I see a few different approaches that you can take here, but more info on the nature of you problem is required to know which one to take. One approach for solving your issue would be setting the job parallelism to the number of pods you'd like to run concurrently and set this behaviour in the code itself -
Another approach would be to fanning out the messages in the queue in a way that will spawn a worker pod for each message, same as this example depicts.
Choosing this solution will indeed cause every pod taking too long to process the item to fail, and if you set the restartPolicy
of the pods you create to never
you should have a list of failed pods on your hands that correspond to the number of failed processed items.
Having said all that, I don't failing the pods is the right approach here, and that keeping track of failed processing events should be done using instrumentation, either by container logs or metrics.