I have a flask app that includes some NLP packages and takes a while to initially build some vectors before it starts the server. I've noticed this in the past with Google App Engine and I was able to set a max timeout in the app.yaml file to fix this.
The problem is that when I start my cluster on Kubernetes with this app, I notice that the workers keep timing out in the logs. Which makes sense because I'm sure the default amount of time is not enough. However, I can't figure out how to configure GKE to allow the workers enough time to do everything it needs to do before it starts serving.
How do I increase the time the workers can take before they timeout?
I deleted the old instances so I can't get the logs right now, but I can start it up if someone wants to see the logs.
It's something like this:
I 2020-06-26T01:16:04.603060653Z Computing vectors for all products
E 2020-06-26T01:16:05.660331982Z
95it [00:05, 17.84it/s][2020-06-26 01:16:05 +0000] [220] [INFO] Booting worker with pid: 220
E 2020-06-26T01:16:31.198002748Z [nltk_data] Downloading package stopwords to /root/nltk_data...
E 2020-06-26T01:16:31.198056691Z [nltk_data] Package stopwords is already up-to-date!
100it 2020-06-26T01:16:35.696015992Z [CRITICAL] WORKER TIMEOUT (pid:220)
E 2020-06-26T01:16:35.696015992Z [2020-06-26 01:16:35 +0000] [220] [INFO] Worker exiting (pid: 220)
I also see this:
The node was low on resource: memory. Container thoughtful-sha256-1 was using 1035416Ki, which exceeds its request of 0.
Obviously I don't exactly know what I'm doing. Why does it say I'm requesting 0 memory and can I set a timeout amount for the Kubernetes nodes?
Thanks for the help!
One thing you can do is add some sort of delay in a startup script for your GCP instances. You could try a simple:
#!/bin/bash
sleep <time-in-seconds>
Another thing you can try is adding some sort of delay to when your containers start in your Kubernetes nodes. For example, a delay in an initContainer
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: myapa:latest
initContainers:
- name: init-myservice
image: busybox:1.28
command: ['sh', '-c', "echo Waiting a bit && sleep 3600"]
Furthermore, you can try a StartupProbe combined with the Probe parameter initialDelaySeconds
on your actual application container that way it actually waits for some time before saying: I'm going to see if the application has started.:
startupProbe:
exec:
command:
- touch
- /tmp/started
initialDelaySeconds: 3600