NLP Flask app startup nodes timing out on Google Kubernetes GKE

6/26/2020

I have a flask app that includes some NLP packages and takes a while to initially build some vectors before it starts the server. I've noticed this in the past with Google App Engine and I was able to set a max timeout in the app.yaml file to fix this.

The problem is that when I start my cluster on Kubernetes with this app, I notice that the workers keep timing out in the logs. Which makes sense because I'm sure the default amount of time is not enough. However, I can't figure out how to configure GKE to allow the workers enough time to do everything it needs to do before it starts serving.

How do I increase the time the workers can take before they timeout?

I deleted the old instances so I can't get the logs right now, but I can start it up if someone wants to see the logs.

It's something like this:

I 2020-06-26T01:16:04.603060653Z Computing vectors for all products
E 2020-06-26T01:16:05.660331982Z 
95it [00:05, 17.84it/s][2020-06-26 01:16:05 +0000] [220] [INFO] Booting worker with pid: 220
E 2020-06-26T01:16:31.198002748Z [nltk_data] Downloading package stopwords to /root/nltk_data...
E 2020-06-26T01:16:31.198056691Z [nltk_data]   Package stopwords is already up-to-date!
100it 2020-06-26T01:16:35.696015992Z [CRITICAL] WORKER TIMEOUT (pid:220)
E 2020-06-26T01:16:35.696015992Z [2020-06-26 01:16:35 +0000] [220] [INFO] Worker exiting (pid: 220)

I also see this:

The node was low on resource: memory. Container thoughtful-sha256-1 was using 1035416Ki, which exceeds its request of 0.

Obviously I don't exactly know what I'm doing. Why does it say I'm requesting 0 memory and can I set a timeout amount for the Kubernetes nodes?

Thanks for the help!

-- LukasDeco
flask
gke-networking
google-kubernetes-engine
kubernetes

1 Answer

6/26/2020

One thing you can do is add some sort of delay in a startup script for your GCP instances. You could try a simple:

#!/bin/bash

sleep <time-in-seconds>

Another thing you can try is adding some sort of delay to when your containers start in your Kubernetes nodes. For example, a delay in an initContainer

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app: myapp
spec:
  containers:
  - name: myapp-container
    image: myapa:latest
  initContainers:
  - name: init-myservice
    image: busybox:1.28
    command: ['sh', '-c', "echo Waiting a bit && sleep 3600"]

Furthermore, you can try a StartupProbe combined with the Probe parameter initialDelaySeconds on your actual application container that way it actually waits for some time before saying: I'm going to see if the application has started.:

startupProbe:
  exec:
    command:
    - touch
    - /tmp/started
  initialDelaySeconds: 3600
-- Rico
Source: StackOverflow