I've been ramping up on Kubeflow recently. My goal is to get PyTorch running in Kubeflow. I've gone through the documentation on creating a distributed PyTorch job here. I've also read through all the documentation on how to create pipelines / components in Kubeflow.
My question is how can I now take a PyTorch job, which is a Kubernetes resource, and run it as a component. The ultimate goal is to have my PyTorch code, which is a distributed training of some model, run within the component / pipeline framework of Kubeflow. How do multi-worker jobs fit into the component / pipeline framework of Kubeflow?
The documentation gives plenty of information on how to run components from python code, Docker containers, etc... but nothing on how to do it from a PyTorchJob or a Kubernetes job. This seems like an obvious use case to me, and I feel like I'm missing something obvious, but I've gone through all the documentation for Kubeflow that I could find, and did additional searches for anything on how to do this.
Would appreciate any help, thank you!