Training and evaluating big neural network models (tensorflow) on google cloud.
Got the following error when evaluating my model:
W Resource exhausted: OOM when allocating tensor with shape[38633472,17]
W Ran out of memory trying to allocate 2.45GiB. See logs for memory state.
undefined
I think it has to do with the container memory limit.
Any help on that?
Which scale tier and machine type were you using? In case of OOM, you can try to use larger machine in CUSTOM tier.
If you still have the issue, please send us your project number and job id to cloudml-feedback@google.com, so that we can take a close look.