I have an Argo workflow that has two steps, the first runs on Linux and the second runs on Windows
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: my-workflow-v1.13
spec:
entrypoint: process
volumeClaimTemplates:
- metadata:
name: workdir
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
arguments:
parameters:
- name: jobId
value: 0
templates:
- name: process
steps:
- - name: prepare
template: prepare
- - name: win-step
template: win-step
- name: win-step
nodeSelector:
kubernetes.io/os: windows
container:
image: mcr.microsoft.com/windows/nanoserver:1809
command: ["cmd", "/c"]
args: ["dir", "C:\\workdir\\source"]
volumeMounts:
- name: workdir
mountPath: /workdir
- name: prepare
nodeSelector:
kubernetes.io/os: linux
inputs:
artifacts:
- name: src
path: /opt/workdir/source.zip
s3:
endpoint: minio:9000
insecure: true
bucket: "{{workflow.parameters.jobId}}"
key: "source.zip"
accessKeySecret:
name: my-minio-cred
key: accesskey
secretKeySecret:
name: my-minio-cred
key: secretkey
script:
image: garthk/unzip:latest
imagePullPolicy: IfNotPresent
command: [sh]
source: |
unzip /opt/workdir/source.zip -d /opt/workdir/source
volumeMounts:
- name: workdir
mountPath: /opt/workdir
both steps share a volume.
To achieve that in Azure Kubernetes Service, I had to create two node pools, one for Linux nodes and another for Windows nodes
The problem is, when I queue the workflow, sometimes it completes, and sometimes, the win-step
(the step that runs in the windows container), hangs/fails and shows this message
1 node(s) had volume node affinity conflict
I've read that this could happen because the volume gets scheduled on a specific zone and the windows container (since it's in a different pool) gets scheduled in a different zone that doesn't have access to that volume, but I couldn't find a solution for that.
Please help.
the first runs on Linux and the second runs on Windows
I doubt that you can mount the same volume on both Linux, typically ext4 file system and on a Windows node, Azure Windows containers uses NTFS file system.
So the volume that you try to mount in the second step, is located on the node pool that does not match your nodeSelector
.