I'm new to Kubernetes and want to know the best approach to this problem.
I have a varying set of large models (~5GB) that need to be loaded into memory for my application to run. The app handles a request that specifies which model it needs, but the actual task is the same. I don't want to load all of the models with a single pod for cost reasons, and so models can be added/removed more easily. Can I have a single service with pods that each have a different subset of the resources loaded (1 or 2 each), with a request being directed to the pod that has the model it needs? Or do I need to make a service for each model, and then a gateway service in front of all of them?
I'm thinking it might be possible with process namespaces, but I'm not sure how customizable a service master can be in terms of parsing a request parameter and sending it the right namespace.