Tensorflow Serving error when loading new version of model from S3

7/18/2019

I'm serving TF models through TF Serving with a dedicated deployment in a K8S cluster. I'm using a MinIO instance deployed in the same namespace as TF Serving.

I configured it with the following environment variables, in order to let TF Serving access S3 and sync the proper files

        - name: MODEL_NAME
          value: model-name
        - name: S3_ENDPOINT
          value: minio:9000
        - name: S3_USE_HTTPS
          value: '0'
        - name: S3_VERIFY_SSL
          value: '0'
        - name: AWS_REGION
          value: 'us-west-1'
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              key: AWS_ACCESS_KEY_ID
              name: minio-secret
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: AWS_SECRET_ACCESS_KEY
              name: minio-secret
        - name: MODEL_BASE_PATH
          value: s3://ROOT-BUCKET
        - name: TF_CPP_MIN_LOG_LEVEL
          value: '2'

Everything went fine and I've been able to query the server and predict properly, but when I load a new version I always get the following error:

'{ "error": "Failed to process element: 0 key: decoder_state_input_h:0 of \\\'instances\\\' list. Error: Invalid argument: JSON object: does not have named input: decoder_state_input_h:0" }'

Killing the POD (aka restarting TF serving), results in proper restore of the new version. This problem seems due to a sync issue where TF serving starts loading the serveable even before S3 sync has been complete.

Any clue of what's happening? Thanks!

-- luke035
kubernetes
tensorflow
tensorflow-serving

0 Answers