What is most efficient way to keep "idempotentcy" when applying manifest files to deployments that have horizontal autoscalers applied to them?

2/11/2019

Let's say you have a CI/CD pipeline, and as part of that pipeline you have a deployment manifest file where you change some values and redeploy.

On the other hand your deployment is also horizontally autoscaled which works by modifying your deployment manifest file to scale up to the replicas you set it to.

Then one day your horizontal autoscaler has spun 40 pods to meet demand and you go and run your pipeline which has a deployment manifest set to one replica. The moment you apply this file, it will kill all your pod replicas, meaning your scaler would need to bring them back up, this could potentially affect data and service, which is not good.

How can you apply a manifest file so it doesn't affect your scaled replicas? All I can think of is automating a check of the number of replicas and change the deployment manifest to reflect that, but that seems like an extremely ugly solution.

I am guessing there must a better solution to this, I just couldn't find it, or maybe I am looking in the wrong places...

-- Ulukai
continuous-deployment
continuous-integration
google-kubernetes-engine
kubernetes
kubernetes-hpa

1 Answer

2/16/2019

Regardless how ugly it may appear, I don't think you have many alternatives because your autoscaler works by modifying your deployment manifest file.

So you have to somehow merge the automatic changes done in the deployed manifest file by the autoscaler into the version of the manifest that you'd be deploying in your CI/CD pipeline to be able to have a sufficient cluster "capacity" to handle the traffic load at the deployment time.

You might have an alternative if somehow you're able to keep both deployment versions alive simultaneously and gradually migrate traffic from the old one to the new one, which would give the new deployment's autoscaler time to spin up the needed number of replicas. Donno if GKE offers something like this, that's the recommended deployment strategy for GAE for addressing such cases.

-- Dan Cornilescu
Source: StackOverflow