What is a preferable way to run a job need large tmp disk space?

8/7/2020

I got a service that needs to scan large files and process them, upload them back to the file server. My problem is that default available space in a pod is 10G which is not enough. I have 3 options:

  1. use hostFile/emptyDir volume, but this way I can't specify how much space I need, my pods could be scheduled to a node which didn't have enough disk space.
  2. use hostFile persistent volume, but the documents say it is "Single node testing only“
  3. use local persistent volume, but according to the document Dynamic provisioning is not supported yet, I have to manually create pv in each node which seems not acceptable by me, but if there is no other options this will be the only way to go.

Is there any other simpler options than local persistent volume?

-- Nick Allen
kubernetes

1 Answer

8/11/2020

Depending on your cloud provider you can mount their block storage options e.g e.g. Google Cloud Storage, Azure storage by Azure, Elasticblockstore for AWS. This way you won`t be depended on your node availability for storage. All of them are supported in Kubernetes via plugins as an expanded persistent volume claims. For example:

gcePersistentDisk

A gcePersistentDisk volume mounts a Google Compute Engine (GCE) Persistent Disk into your Pod. Unlike emptyDir, which is erased when a Pod is removed, the contents of a PD are preserved and the volume is merely unmounted. This means that a PD can be pre-populated with data, and that data can be "handed off" between Pods.T

This is similar for awsElasticBlockStore or azureDisk


If you want to use AWS S3 there is an S3 Operator which you may find interesting.

AWS S3 Operator will deploy the AWS S3 Provisioner which will dynamically or statically provision AWS S3 Bucket storage and access.

-- acid_fuji
Source: StackOverflow