Make only one of the pods of kubernetes deployment make a download, and only reads

1/3/2022

I have a docker container that first downloads file then it uses them, I wanted to make my application scalable so i used kubernetes depoyments and I created a Persistent volume so that all the pods of the deployment have a shared storage. I don't want every pod to download the files I just want only the first one that runs to download it and the other pods will only read id my docker container start.sh file contain those two commands:

python download.py
gunicorn my_application.wsgi:application --bind 0.0.0.0:8000

the configuration file of my deployment is as follows :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      component: api
  template:
    metadata:
      labels:
        component: api
    spec:
      volumes:
        - name: api-storage
          persistentVolumeClaim:
            claimName: storage-persistent-volume-claim
      containers:
        - name: dispatch-model
          image: dockeris/my-app-image:latest
          ports:
            - containerPort: 8000
          volumeMounts:
            - name: api-storage
              mountPath: /dispatch_server/Assets/Data
              subPath: data_storage
      imagePullSecrets:
        - name: regcred 

any suggestions on how to approch this problem

-- MUSTAPHAAMINE DEBBIH
cicd
docker
kubernetes

1 Answer

1/3/2022

The absolute easiest approach is just to build this file into your Docker image.

# At image-build time
RUN ./download.py
# Don't need to separately download at container startup time
CMD gunicorn my_application.wsgi:application --bind 0.0.0.0:8000

This doesn't need a PersistentVolumeClaim, or shared storage, or anything else special. You need to trigger your CI system whenever there's an updated copy of the file but that should be fairly routine. As with anything else managed by a Deployment, it's possible to be running Pods that use two different versions of the file, and also possible to roll back to a setup that used an older version of the file.

But what if the file is multiple gigabytes, and it really doesn't make sense to build it into an image? There are a bunch of questions you need to think about:

  • In the steady state (the system has been running successfully for months), when and how does the file get updated?
  • What happens if multiple processes try to do the update at the same time?
  • Mechanically, just looking through the list of Kubernetes volume types, do you have access to something that supports ReadWriteMany access?

In a comment, @Eugene suggests a Job and that could be a reasonable approach. There will be only one copy of the Job running, which helps address the concurrency considerations, and you can delete and recreate it whenever you need to. The Job can run the same image as the main service, if that helps your particular setup, and you can override the command it runs.

apiVersion: batch/v1
kind: Job
metadata:
  name: downloader
spec:
  template:
    spec:
      volumes: [ ... ]
      containers:
        - name: downloader
          image: registry.example.com/my-application:20220103
          args: # overrides Dockerfile CMD, one word to a list item
            - /app/downloader.py
          volumeMounts: [ ... ]
-- David Maze
Source: StackOverflow