I need to understand what limits can cause a kubectl rollout to timeout with "watch closed before Until timeout".
I'm rolling out a service that needs to load a database at startup. We're running this service on a node that unfortunately has a slow relative connection to the db server. After 51 minutes, it still had quite a bit of data left to load, but that's when the rollout timed out, even though my "initial liveness delay" on the service was set to 90 minutes. What else might have caused it to timeout before the initial liveness delay?
Update:
To answer both a comment and an answer:
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.5+coreos.0", GitCommit:"b8e596026feda7b97f4337b115d1a9a250afa8ac", GitTreeState:"clean", BuildDate:"2017-12-12T11:01:08Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
I don't control the platform. I believe I'm limited to that client version because of what server version we have.
Update:
I changed it to set that property you specified to 7200, but it didn't appear to make any difference.
I then made a decision to change how the liveness probe works. The service has a custom SpringBoot health check, which before today would only return UP if the database was fully loaded. I've now changed it so that the liveness probe calls the health check with a parameter indicating it's a live check, so it can tell live checks from ready checks. It returns live unconditionally, but only ready when it's ready. Unfortunately, this didn't help the rollout. Even though I could see that the live checks were returning UP, it apparently needs to wait for the service to be ready. It timed out after 53 minutes, well before it was ready.
I'm now going to look at a somewhat ugly compromise, which is to have the environment know it's in a "slow" environment, and have the readiness check return ready even if it's not ready. I suppose we'll add a large initial delay on that, at least.
I believe what you want is .spec.progressDeadlineSeconds in your Deployment. If you see the output of kubectl describe deployment <deployment-name>
, it would have these values: Type=Progressing, Status=False. and Reason=ProgressDeadlineExceeded
.
You can set it to a very large number, larger than what a pod/container takes to come up. i.e 7200
seconds which is 2 hours.