object detection model - tensorflow served - k8s deployed : takes long time 3-4 seconds

10/3/2019

I have a TensorFlow object detection model, served with TensorFlow serving and deployed into the Azure Kubernetes cluster. I have used Nvidia K80 GPU device, with TensorFlow/serving:1.12.3:gpu version.

The model is deployed and response properly, but the response time is huge, 3-4 seconds for 500*375 - 135 KB images.

Can anyone help me to understand what can be improved?

-- ashwini prakash
azure
azure-kubernetes
kubernetes
tensorflow
tensorflow-serving

1 Answer

10/14/2019

If this image is the first prediction request, it is a normal situation. You may need a warm-up request.

-- zzachimonde
Source: StackOverflow