Simplest way to distribute Tensorflow training on premise?

12/27/2017

What is the simplest way to train tensorflow models (using Estimator API) distributed across a home network? Doesn't look like ml-engine local train allows you to specify IPs.

-- rodrigo-silveira
distributed-computing
google-cloud-ml
kubernetes
machine-learning
tensorflow

2 Answers

12/28/2017

Your best bet is to use something like Kubernetes. This is a work in progress, but I believe it does have support for distributed training as well -- https://github.com/tensorflow/k8s.

Alternatively for more low-tech automation options, these come to mind...

  1. You could have a script which still uses SSH and executes a script remotely.
  2. You could have the individual workers poll a shared location for a file to use as a signal to download and execute a script.
-- Nikhil Kothari
Source: StackOverflow

12/27/2017

You can set the environment variable TF_CONFIG, which will be parsed by estimators.

-- Guoqing Xu
Source: StackOverflow