I'm running a flask application that serves an ML model that loads in a wordembeddings file (2GB) on k8s. This file is being loaded with gcsfuse and we have the application running for about 2 years.
Since a recent restart of the pod, this setup is not working anymore while nothing has changed in our code/deployment settings. While debugging, I notice that we even have the problem with the following dockerfile that does not even use the python script:
Dockerfile:
FROM levkuznetsov/gcsfuse-docker
RUN apt-get update && apt-get install -y
build-essential
COPY . /app
WORKDIR /app
RUN /bin/bash -c "mkdir -p /app/wordembeddingtest"
COPY ./serviceacc.json /
ADD /serviceacc.json /etc/gcloud/serviceacc.json
ADD /serviceacc.json /etc/gcloud/service-account.json
EXPOSE 8080
ENTRYPOINT ["/bin/bash", "-c", "gcsfuse bucket_name wordembeddingtest ; ls wordembeddingtest"]
What is even more strange, is that we have other deployments that use the same set-up, and they can be restarted and still work..
The logs show the following error:
And with --foreground --debug_invariants --debug_http --debug_gcs --debug_fuse we get the following:
What I have checked so far:
Service account permissions are ok
What I have tried so far:
Different storage bucket gcsfuse command with Implicit-dirs, -o allow_other Different kubernetes cluster Other mount folder locations