I'm serving TF models through TF Serving with a dedicated deployment in a K8S cluster. I'm using a MinIO instance deployed in the same namespace as TF Serving.
I configured it with the following environment variables, in order to let TF Serving access S3 and sync the proper files
- name: MODEL_NAME
value: model-name
- name: S3_ENDPOINT
value: minio:9000
- name: S3_USE_HTTPS
value: '0'
- name: S3_VERIFY_SSL
value: '0'
- name: AWS_REGION
value: 'us-west-1'
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: AWS_ACCESS_KEY_ID
name: minio-secret
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: AWS_SECRET_ACCESS_KEY
name: minio-secret
- name: MODEL_BASE_PATH
value: s3://ROOT-BUCKET
- name: TF_CPP_MIN_LOG_LEVEL
value: '2'
Everything went fine and I've been able to query the server and predict properly, but when I load a new version I always get the following error:
'{ "error": "Failed to process element: 0 key: decoder_state_input_h:0 of \\\'instances\\\' list. Error: Invalid argument: JSON object: does not have named input: decoder_state_input_h:0" }'
Killing the POD (aka restarting TF serving), results in proper restore of the new version. This problem seems due to a sync issue where TF serving starts loading the serveable even before S3 sync has been complete.
Any clue of what's happening? Thanks!