Nightly TF / Cloned TFX - how to manage Image for Kubeflow?

7/21/2019

When I access my Kubeflow endpoint to upload and run a pipeline using a cloned TFX, the process starts hanging at the first step producing this message:

"This step is in Pending state with this message: ImagePullBackOff: Back-off pulling image "tensorflow/tfx:0.14.0dev", which is the same image used in the created pipeline yaml file.

My overall goal is to build an ExampleGen for tfrecords files, just as described in the guide here. The most recent tfx version in pip is 0.13 and does not yet include the necessary functions. For this reason, I install tf-nightly and clone/build tfx (dev-version 0.14). Doing so and installing some additional modules, e.g. tensorflow_data_validation, I can now create my pipeline using the tfx components and including an ExampleGen for tfrecords files. I finally build the pipeline with the KubeflowRunner. Yet this yields the error stated above.

I now wonder about an appropriate way to address this. I guess one way would be to build an image myself with the specified versions, but maybe there is a more practical way?

-- ronbor
kubeflow
kubernetes
tensorflow
tfx

1 Answer

7/24/2019

TFX doesn't have a nightly image build as yet. Currently, it defaults to using the image tagged with the version of the library you use to build the pipeline, hence the reason the tag is 0.14dev0. This is the current version at HEAD, see here: https://github.com/tensorflow/tfx/blob/a1f43af5e66f9548ae73eb64813509445843eb53/tfx/version.py#L17

You can build your own image and push it somewhere, for example gcr.io/your-gcp-project/your-image-name:tag, and specify that the pipeline use this image instead, by customizing the tfx_image argument to the pipeline: https://github.com/tensorflow/tfx/blob/74f9b6ab26c51ebbfb5d17826c5d5288a67dcf85/tfx/orchestration/kubeflow/base_component.py#L54

See for example: https://github.com/tensorflow/tfx/blob/b3796fc37bd4331a4e964c822502ba5096ad4bb6/tfx/examples/chicago_taxi_pipeline/taxi_pipeline_kubeflow.py#L243

-- ajay
Source: StackOverflow