Kubernetes batch performance with activation of thousands of pods using jobs

5/10/2019

I am writing a pipeline with kubernetes in google cloud.

I need to activate sometimes a few pods in a second, where each pod is a task that runs inside a pod.

I plan to call kubectl run with Kubernetes job and wait for it to complete (poll every second all the pods running) and activate the next step in the pipeline.

I will also monitor the cluster size to make sure I am not exceeding the max CPU/RAM usage.

I can run tens of thousands of jobs at the same time.

I am not using standard pipelines because I need to create a dynamic number of tasks in the pipeline.

I am running the batch operation so I can handle the delay.

Is it the best approach? How long does it take to create a pod in Kubernetes?

-- asaf
kubectl
kubernetes

1 Answer

5/11/2019

If you wanna run ten thousands of jobs at the same time - you will definitely need to plan resource allocation. You need to estimate the number of nodes that you need. After that you may create all nodes at once, or use GKE cluster autoscaler for automatically adding new nodes in response to resource demand. If you preallocate all nodes at once - you will probably have high bill at the end of month. But pods can be created very quickly. If you create only small number of nodes initially and use cluster autoscaler - you will face large delays, because nodes take several minutes to start. You must decide what your approach will be.

If you use cluster autoscaler - do not forget to specify maximum nodes number in cluster.

Another important thing - you should put your jobs into Guaranteed quality of service in Kubernetes. Otherwise if you use Best Effort or Burstable pods - you will end up with Eviction nightmare which is really terrible and uncontrolled.

-- Vasily Angapov
Source: StackOverflow