Im wondering about the best way to setup my GKE architecture, as my application will serve different clients and each client will have it's own storage, postgre and elastic search service. My application basically stores multimedia files and processes machine learning API's on them, and the results of those processing goes to storage as json, cloud sql and elasticsearch. We have a job manager to deal with the various API's, and its deployed as micro-services. We also have jobs to process those multimedia files, like making some transformation in videos with ffmpeg.
Today we have like 6 clients running within the same cluster, in the same node pool. This node-pool has several nodes and each node, several pods. We don't have clients running on specific node-pools or specific pods. We set up a lot of pods and we differentiate clients only by setting up a namespace for each one. I want to prepare to have like 100, 500, 1000 clients.
Im not sure if that's the best way when it comes to optimize performance and costs. In the same node for example, we can have pods for the elasticsearch service for client A and B, only being separated by the namespace.
I understand that you have a multi-tenant GKE cluster. My first advice is to read about GKE Multitenancy [1]. If you want to optimize performance and costs you have to place some quotas on how many resources does your users can use. Remember:
“Unlimited wants essentially mean that people never get enough, that there is always something else that they would like to have.”
For example, if your application has 6 use cases, measure how many resources does each use case needs to properly run. Then, establish the periodicity of the use case, if the use case runs in a very frequent basis (like a login process or a cron-based process) then, establish that amount of resources as your baseline of resources needed. Then, you need to define your expected amount resources that a user will consume in a normal day. By last, the highest amount of resources that you will allow your user to run. Once you have all the information needed (baseline, mean and highest resources needed) you can know the amount of infrastructure you need.
My advice is to design your infrastructure always by the “mean” amount of resources because not all your users are going to use the highest amount of resources at the same time.
Remember: the key to success is to always measure and to set quotas.
[1] https://cloud.google.com/kubernetes-engine/docs/concepts/multitenancy-overview