Delay in response from Tensorflow model deployed into Kubernetes

6/26/2019

How to improve response time for a model deployed in Kubernetes?

I get a response time of 0.18 sec from localhost Tensorlfow model, whereas the same model hosted on Kubenetes cluster gives me response time of 4 sec.

Kubernetes Cluster- I created a simple deployment and service to host model. There is only one active node for this deployment.

Any help is much appreciated. Thanks

-- ashwini prakash
kubernetes
tensorflow-serving

1 Answer

7/1/2019

It is normal behaviour that delay in response on Kubernets cluster is higher. Main goal of Kubernetes is to manage resources.

Prediction speed has a direct relationship to the cost of serving, since it is directly related to the amount of compute resources necessary to make a prediction. The time it takes to make a prediction will always be a critical variable in any formula that measures prediction throughput. Faster predictions means more prediction throughput on the same hardware, translating into reduced cost.

More information you can find here: tensorflow-performance.

I hope it helps.

-- MaggieO
Source: StackOverflow