I am currently running Kubernetes 1.9.7 and successfully using the Cluster Autoscaler and multiple Horizontal Pod Autoscalers.
However, I recently started noticing the HPA would favour newer pods when scaling down replicas.
For example, I have 1 replica of service A running on a node alongside several other services. This node has plenty of available resource. During load, the target CPU utilisation for service A rose above the configured threshold, therefore the HPA decided to scale it to 2 replicas. As there were no other nodes available, the CAS span up a new node on which the new replica was successfully scheduled - so far so good!
The problem is, when the target CPU utilisation drops back below the configured threshold, the HPA decides to scale down to 1 replica. I would expect to see the new replica on the new node removed, therefore enabling the CAS to turn off that new node. However, the HPA removed the existing service A replica that was running on the node with plenty of available resources. This means I now have service A running on a new node, by itself, that can't be removed by the CAS even though there is plenty of room for service A to be scheduled on the existing node.
Is this a problem with the HPA or the Kubernetes scheduler? Service A has now been running on the new node for 48 hours and still hasn't been rescheduled despite there being more than enough resources on the existing node.
After scouring through my cluster configuration, I managed to come to a conclusion as to why this was happening.
Service A was configured to run on a public subnet and the new node created by the CA was public. The existing node running the original replica of Service A was private, therefore leading the HPA to remove this replica.
I'm not sure how Service A was scheduled onto this node in the first place, but that is a different issue.