I'm looking to to find the gpu memory fraction limiting parameters for:
spacy, torch, and other common models.
I know for tensorflow models we can set per_process_gpu_memory_fraction
but is there in an equivalent in pytorch or spacy?
Bonus points if there's a way to allocate memory on shared vgpu to multiple different kinds of models simultaneously?
i.e Spacy models consume .20, Tensorflow models consume .40, and pytorch models consume .30 so we can avoid out of memory errors while serving all of our models on the same cluster with shared vgpu resources.
For a little more context, we're using this virtualgpu plugin https://github.com/awslabs/aws-virtual-gpu-device-plugin with kubeflow.