How to create a Kubeflow component from a PyTorch job?

4/16/2020

I've been ramping up on Kubeflow recently. My goal is to get PyTorch running in Kubeflow. I've gone through the documentation on creating a distributed PyTorch job here. I've also read through all the documentation on how to create pipelines / components in Kubeflow.

My question is how can I now take a PyTorch job, which is a Kubernetes resource, and run it as a component. The ultimate goal is to have my PyTorch code, which is a distributed training of some model, run within the component / pipeline framework of Kubeflow. How do multi-worker jobs fit into the component / pipeline framework of Kubeflow?

The documentation gives plenty of information on how to run components from python code, Docker containers, etc... but nothing on how to do it from a PyTorchJob or a Kubernetes job. This seems like an obvious use case to me, and I feel like I'm missing something obvious, but I've gone through all the documentation for Kubeflow that I could find, and did additional searches for anything on how to do this.

Would appreciate any help, thank you!

-- brenmcnamara
kubeflow
kubernetes
pytorch

0 Answers