I sent 2000 short-lived jobs to my kube cluster very quickly, and I observed a couple of minutes delay between a job was created and a pod for the job started pending. Does anybody have any clue about what may be the bottleneck?
Could etcd be the bottleneck?
From a 10.000 foot view, the process is:
Every time you schedule a pod/job, it gets added to a queue.
The scheduler reads that queue and assign the POD to a node.
When a node receives an assignment of a pod, it handles the creation by calling the runtime and requesting the creation.
Given the above, the delay might be either:
ETCD bottleneck might also be an issue, but is less likely, if was ETCD you probably would notice that while creating the jobs.
Also, worth mentioning that the nodes have a limit on how many pods each node can run at same time time, on V1.14 no more than 100 pods per node can run at same time, no matter how large is the node , in this case, you would need at least 21 nodes to run all at same time, 20 for the requested pods and 1 extra node to account for system pods. If you are running k8s in a cloud provider, the limit might be different for each provider.
Without investigation is hard to say where is the problem.
In summary:
There is a work queue to guarantee the reliability of the cluster (API/scheduler/ETCD) and prevent burst calls to affect the availability of the services, after the pods are scheduled, the node runtime will download the images and make sure it runs the PODs as desired on their own time.
If the issue is the limit of pods running at same time in a node, it is likely slowing down because the scheduler is waiting for a node to finish a pod before running another, adding more nodes will improve de results
This link details some examples of k8s scheduler performance issues.
This link describes in a bit of details the entire flow.