Kubeflow example: understanding the context

11/13/2020

In this KF example https://github.com/kubeflow/examples/blob/master/financial_time_series/tensorflow_model/ml_pipeline.py an ML pipeline gets constructed that triggers Python functions via command line.

This means that all .py files that are being called (e.g. "python3 train.py --param value") should be in the directory where the process runs. What I don't understand is where exactly should I put the .py files in the context of GCP.

Should I just copy them using Cloud shell? Or should I add git clone <repository with .py files> into my Dockerfile?

-- OlgaPp
google-cloud-platform
kubeflow
kubernetes
python

1 Answer

1/4/2021

To kickstart KFP development using python, try the following tutorial: Data passing in python components

Should I just copy them using Cloud shell? Or should I add git clone <repository with .py files> into my Dockerfile?

Ideally, the files should be inside the container image (the Dockerfile method). This ensures maximum reproducibility.

For not very complex python scripts, the Lightweight python component feature allows you to create component from a python function. In this case the script code is store in the component command-line, so you do not need to upload the code anywhere.

Putting scripts somewhere remote (e.g. cloud storage or website) is possible, but can reduce reliability and reproducibility.

-- Ark-kun
Source: StackOverflow