When I access my Kubeflow endpoint to upload and run a pipeline using a cloned TFX, the process starts hanging at the first step producing this message:
"This step is in Pending state with this message: ImagePullBackOff: Back-off pulling image "tensorflow/tfx:0.14.0dev", which is the same image used in the created pipeline yaml file.
My overall goal is to build an ExampleGen for tfrecords files, just as described in the guide here. The most recent tfx version in pip is 0.13 and does not yet include the necessary functions. For this reason, I install tf-nightly and clone/build tfx (dev-version 0.14). Doing so and installing some additional modules, e.g. tensorflow_data_validation, I can now create my pipeline using the tfx components and including an ExampleGen for tfrecords files. I finally build the pipeline with the KubeflowRunner. Yet this yields the error stated above.
I now wonder about an appropriate way to address this. I guess one way would be to build an image myself with the specified versions, but maybe there is a more practical way?
TFX doesn't have a nightly image build as yet. Currently, it defaults to using the image tagged with the version of the library you use to build the pipeline, hence the reason the tag is 0.14dev0
. This is the current version at HEAD, see here: https://github.com/tensorflow/tfx/blob/a1f43af5e66f9548ae73eb64813509445843eb53/tfx/version.py#L17
You can build your own image and push it somewhere, for example gcr.io/your-gcp-project/your-image-name:tag
, and specify that the pipeline use this image instead, by customizing the tfx_image
argument to the pipeline: https://github.com/tensorflow/tfx/blob/74f9b6ab26c51ebbfb5d17826c5d5288a67dcf85/tfx/orchestration/kubeflow/base_component.py#L54
See for example: https://github.com/tensorflow/tfx/blob/b3796fc37bd4331a4e964c822502ba5096ad4bb6/tfx/examples/chicago_taxi_pipeline/taxi_pipeline_kubeflow.py#L243