Remove files periodically from Volume Mount in Kubernetes Pod

7/12/2021

I have a kubernetes pod running which has two containers, say A and B. They share common volume of type emptyDir. I have a usecase in which application running in container A takes files uploaded by client and places them at mount point. Then it signals to container B to process the file. Container B then sends back response to A (The response is instantaneous and not batched). Now as the file is processed it can be deleted (It must be, due to obvious storage constraints). Deleting files one by one as they are processed could be lot of deletions (or not?) so thought of considering batch deletions periodically.

  1. What is the best way to delete this file?
  2. Is it good to use this volume type for this usecase? If not what?
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: FileProcessor
  labels:
    app: FileProcessor
spec:
  selector:
    matchLabels:
      app: FileProcessor
  template:
    metadata:
      labels:
        app: FileProcessor
    spec:
      containers:
      - image: processor:0.2
        name: Container_B
        ports:
        - containerPort: 80
          name: Container_B
        volumeMounts:
        - name: upload-files
          mountPath: /app/upload-files
      - image: sample:0.1
        name: Container_A
        ports:
        - containerPort: 8000
          name: Container_A
        volumeMounts:
        - name: upload-files
          mountPath: /app/uploads
      volumes:
      - name: upload-files
        emptyDir: {}

PS: This is to be deployed on GKE. Update 1: Still looking for a better solution

-- Vineeth
kubernetes

1 Answer

7/12/2021

As you are using emptyDir, files must be deleted from one of the sidecars. Now, lets check what are your options here:

  1. Container A or B can delete the files after processing them.
  2. Container A or B delete the files once they reach a certain amount (say 1Gi).
  3. Add another container C which periodically cleanup the files.

Now, lets check advantage and disadvantage of these solutions:

  • If you go with solution 1, your container A or B will have to do little extra work after each processing. If the files size are not large enough this extra time shouldn't be significant

  • If you go with solution 2, you might save extra work after each processing. However, after a certain period container A or B will require a relatively long time to cleanup those files. Furthermore, you have to add logic when to cleanup the files. If you can do it intelligently, let say when your containers are idle, then this solution should fit best.

  • Now, if you go with solution 3, you have to ensure that your container C does not delete files that are being processed by container B.

In case, you want to use a different type of volume, which can be mounted from an external pod, then you can have a CronJob to periodically cleanup those data. In this case, same constraint of solution 3 is applicable.

-- Emruz Hossain
Source: StackOverflow