I want to scale my deployment depending on the amount of requests. Each pod can only handle a request at a time. Scaling up is no problem, but when I want to scale down I want to make sure I am not killing a pod that is working right now ( e.g. encoding a large file).
I have the folling pods:
If I reduce the replica value, kubernetes will kill pod 3. It does not care if the pod is busy or not. I could manually kill pod 2, so kubernetes would start a new one:
After I know pod 2 got killed I could reduce the number of the counter, so pod 4 will be killed before getting a task. But this solution sounds very ugly, because someone else has to tell pod 2 to shut down.
So kubernetes will kill the last created ones, but is it possible to tell him, that a pod is busy and he has to wait before it will be killed?