I have a Deployment with x amount of gameservers (pods) running. I'm using Agones to make sure gameservers with players connected to them won't get stopped by downscaling. In addition, I use a Service ("connected" to all of the gameserves) which acts as a LoadBalancer for the pods and as I understand it, it will randomly choose a gameserver when a player connects to the service. This all works great when upscaling, but not so much when downscaling. Since Agones prevents gameservers with players on them from scaling down, the amount of pods will essentially never decrease because the service doesn't consider the amount of desired replicas (the actual amount is higher because gameservers with players on them won't be downscaled).
Is there a way to prevent the LoadBalancer service from picking a gameserver (replica) that's no longer desired? For example: current network load only requires 3 replicas, but currently there's 5 because there's 5 servers with players on them preventing them from shutting down. I would like to only spread new load accross the 3 desired replicas (gameservers) to give the other 2 the chance to reach 0 players so it's eventually able to shut itself down.
Instead of using a LoadBalancer to spread players across your game instances, I'd recommend using the Agones GameServerAllocation API to let Agones find an available game server for you.
If you allow multiple players to connect to the same game server, check out the integration pattern for allocating based on player capacity. Agones will pack players onto game servers with available capacity (instead of spreading them out) which will prevent you from having a very small number of players spread across all game servers, which is what happens when you use a load balancer to assign players to game servers.