I have a docker container that first downloads file then it uses them, I wanted to make my application scalable so i used kubernetes depoyments and I created a Persistent volume so that all the pods of the deployment have a shared storage. I don't want every pod to download the files I just want only the first one that runs to download it and the other pods will only read id my docker container start.sh file contain those two commands:
python download.py
gunicorn my_application.wsgi:application --bind 0.0.0.0:8000
the configuration file of my deployment is as follows :
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-deployment
spec:
replicas: 3
selector:
matchLabels:
component: api
template:
metadata:
labels:
component: api
spec:
volumes:
- name: api-storage
persistentVolumeClaim:
claimName: storage-persistent-volume-claim
containers:
- name: dispatch-model
image: dockeris/my-app-image:latest
ports:
- containerPort: 8000
volumeMounts:
- name: api-storage
mountPath: /dispatch_server/Assets/Data
subPath: data_storage
imagePullSecrets:
- name: regcred
any suggestions on how to approch this problem
The absolute easiest approach is just to build this file into your Docker image.
# At image-build time
RUN ./download.py
# Don't need to separately download at container startup time
CMD gunicorn my_application.wsgi:application --bind 0.0.0.0:8000
This doesn't need a PersistentVolumeClaim, or shared storage, or anything else special. You need to trigger your CI system whenever there's an updated copy of the file but that should be fairly routine. As with anything else managed by a Deployment, it's possible to be running Pods that use two different versions of the file, and also possible to roll back to a setup that used an older version of the file.
But what if the file is multiple gigabytes, and it really doesn't make sense to build it into an image? There are a bunch of questions you need to think about:
In a comment, @Eugene suggests a Job and that could be a reasonable approach. There will be only one copy of the Job running, which helps address the concurrency considerations, and you can delete and recreate it whenever you need to. The Job can run the same image as the main service, if that helps your particular setup, and you can override the command it runs.
apiVersion: batch/v1
kind: Job
metadata:
name: downloader
spec:
template:
spec:
volumes: [ ... ]
containers:
- name: downloader
image: registry.example.com/my-application:20220103
args: # overrides Dockerfile CMD, one word to a list item
- /app/downloader.py
volumeMounts: [ ... ]