I'm running Argo workflow on Kubernetes. And after I followed this blog to setup Jupyterhub, I started getting this error (never had the issue before using Jupyterhub) on the pods on Argo: failed to save outputs: timed out waiting for the condition
.
The job always fails if I add
resources:
limits:
nvidia.com/gpu: 1
But if gpu is not used, it sometimes succeeds (with retryStrategy after occasional failures).
Could someone help me out?