How to improve response time for a model deployed in Kubernetes?
I get a response time of 0.18 sec from localhost Tensorlfow model, whereas the same model hosted on Kubenetes cluster gives me response time of 4 sec.
Kubernetes Cluster- I created a simple deployment and service to host model. There is only one active node for this deployment.
Any help is much appreciated. Thanks
It is normal behaviour that delay in response on Kubernets cluster is higher. Main goal of Kubernetes is to manage resources.
Prediction speed has a direct relationship to the cost of serving, since it is directly related to the amount of compute resources necessary to make a prediction. The time it takes to make a prediction will always be a critical variable in any formula that measures prediction throughput. Faster predictions means more prediction throughput on the same hardware, translating into reduced cost.
More information you can find here: tensorflow-performance.
I hope it helps.