We are stuck with a big problem of kubernetes
Queuing.
We are submitting jobs through Workflow manager i.e. Airflow to cluster manager i.e. AWS Batch. Since there are limitations in AWS batch, irrespective of a number of jobs submitted to the queue, the batch was concurrently executing the number of jobs equal to vCPUs available in the cluster. To overcome this limitation of AWS batch, we are planning to migrate to Kubernetes over AWS batch.
But, we are not sure, how kubernetes
handle this problem, On exploring, we are getting examples of the queue in following links.
https://kubernetes.io/docs/tasks/job/fine-parallel-processing-work-queue/
https://kubernetes.io/docs/tasks/job/coarse-parallel-processing-work-queue/
In these examples, it is required from us to write tasks to queue and read from the queue by our code. But, the case explained here, is not what we are looking for. In our case, Apache airflow's Kubernetes_Pod_operator will submit pod/job to K8 cluster, and we are expecting K8 will put these jobs in its internal queue and will pick up and will execute them on cluster based on the available cluster capacity.
We want to know, does k8 internally support queue and put the jobs/ pods in the queue, and picks up and executes them on cluster based on the available cluster capacity?
Any solution to this problem? OR is it the limitation in k8, and we should develop our own solution to this?
You can configure container resources in your job yamls. Read the below link on how to achieve this: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#how-pods-with-resource-requests-are-scheduled
This configuration will make sure that the jobs remain in the 'Pending' state until the requirements are met. Kubernetes scheduler uses an internal queue where all 'Pending' and 'scheduler failed' jobs are stored.