Why does scaling down a deployment seem to always remove the newest pods?

7/22/2018

(Before I start, I'm using minikube v27 on Windows 10.)

I have created a deployment with the nginx 'hello world' container with a desired count of 2:

pods before scaling up

I actually went into the '2 hours' old pod and edited the index.html file from the welcome message to "broken" - I want to play with k8s to seem what it would look like if one pod was 'faulty'.

If I scale this deployment up to more instances and then scale down again, I almost expected k8s to remove the oldest pods, but it consistently removes the newest:

pods after scaling down

How do I make it remove the oldest pods first?

(Ideally, I'd like to be able to just say "redeploy everything as the exact same version/image/desired count in a rolling deployment" if that is possible)

-- Neil Trodden
kubernetes

1 Answer

7/23/2018

Pod deletion preference is based on a ordered series of checks, defined in code here:

https://github.com/kubernetes/kubernetes/blob/release-1.11/pkg/controller/controller_utils.go#L737

Summarizing- precedence is given to delete pods:

  • that are unassigned to a node, vs assigned to a node
  • that are in pending or not running state, vs running
  • that are in not-ready, vs ready
  • that have been in ready state for fewer seconds
  • that have higher restart counts
  • that have newer vs older creation times

These checks are not directly configurable.

Given the rules, if you can make an old pod to be not ready, or cause an old pod to restart, it will be removed at scale down time before a newer pod that is ready and has not restarted.

There is discussion around use cases for the ability to control deletion priority, which mostly involve workloads that are a mix of job and service, here:

https://github.com/kubernetes/kubernetes/issues/45509

-- Jonah Benton
Source: StackOverflow