While trying to setup my own kubeflow pipeline I ran into a problem when one step is finished and the outputs should be saved. After finishing the step kubeflow always throws an error with the message This step is in Error state with this message: failed to save outputs: Error response from daemon: No such container: <container-id>
First I thought I would have made a mistake with my pipeline, but it's the same with the preexisting examples pipeline, e.g. for "[Sample] Basic - Conditional execution" I get this message after the first step (flip-coin) is finished.
The main container shows output:
heads
So it seems to have run successfully.
The wait container shows following output:
time="2019-06-07T11:41:35Z" level=info msg="Creating a docker executor"
time="2019-06-07T11:41:35Z" level=info msg="Executor (version: v2.2.0, build_date: 2018-08-30T08:52:54Z) initialized with template:\narchiveLocation:\n s3:\n accessKeySecret:\n key: accesskey\n name: mlpipeline-minio-artifact\n bucket: mlpipeline\n endpoint: minio-service.kubeflow:9000\n insecure: true\n key: artifacts/conditional-execution-pipeline-vmdhx/conditional-execution-pipeline-vmdhx-2104306666\n secretKeySecret:\n key: secretkey\n name: mlpipeline-minio-artifact\ncontainer:\n args:\n - python -c \"import random; result = 'heads' if random.randint(0,1) == 0 else 'tails';\n print(result)\" | tee /tmp/output\n command:\n - sh\n - -c\n image: python:alpine3.6\n name: \"\"\n resources: {}\ninputs: {}\nmetadata: {}\nname: flip-coin\noutputs:\n artifacts:\n - name: mlpipeline-ui-metadata\n path: /mlpipeline-ui-metadata.json\n - name: mlpipeline-metrics\n path: /mlpipeline-metrics.json\n parameters:\n - name: flip-coin-output\n valueFrom:\n path: /tmp/output\n"
time="2019-06-07T11:41:35Z" level=info msg="Waiting on main container"
time="2019-06-07T11:41:36Z" level=info msg="main container started with container ID: 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c"
time="2019-06-07T11:41:36Z" level=info msg="Starting annotations monitor"
time="2019-06-07T11:41:36Z" level=info msg="docker wait 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c"
time="2019-06-07T11:41:36Z" level=info msg="Starting deadline monitor"
time="2019-06-07T11:41:37Z" level=error msg="`docker wait 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c` failed: Error response from daemon: No such container: 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c\n"
time="2019-06-07T11:41:37Z" level=info msg="Main container completed"
time="2019-06-07T11:41:37Z" level=info msg="No sidecars"
time="2019-06-07T11:41:37Z" level=info msg="Saving output artifacts"
time="2019-06-07T11:41:37Z" level=info msg="Annotations monitor stopped"
time="2019-06-07T11:41:37Z" level=info msg="Saving artifact: mlpipeline-ui-metadata"
time="2019-06-07T11:41:37Z" level=info msg="Archiving 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/mlpipeline-ui-metadata.json to /argo/outputs/artifacts/mlpipeline-ui-metadata.tgz"
time="2019-06-07T11:41:37Z" level=info msg="sh -c docker cp -a 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/mlpipeline-ui-metadata.json - | gzip > /argo/outputs/artifacts/mlpipeline-ui-metadata.tgz"
time="2019-06-07T11:41:37Z" level=info msg="Archiving completed"
time="2019-06-07T11:41:37Z" level=info msg="Creating minio client minio-service.kubeflow:9000 using static credentials"
time="2019-06-07T11:41:37Z" level=info msg="Saving from /argo/outputs/artifacts/mlpipeline-ui-metadata.tgz to s3 (endpoint: minio-service.kubeflow:9000, bucket: mlpipeline, key: artifacts/conditional-execution-pipeline-vmdhx/conditional-execution-pipeline-vmdhx-2104306666/mlpipeline-ui-metadata.tgz)"
time="2019-06-07T11:41:37Z" level=info msg="Successfully saved file: /argo/outputs/artifacts/mlpipeline-ui-metadata.tgz"
time="2019-06-07T11:41:37Z" level=info msg="Saving artifact: mlpipeline-metrics"
time="2019-06-07T11:41:37Z" level=info msg="Archiving 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/mlpipeline-metrics.json to /argo/outputs/artifacts/mlpipeline-metrics.tgz"
time="2019-06-07T11:41:37Z" level=info msg="sh -c docker cp -a 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/mlpipeline-metrics.json - | gzip > /argo/outputs/artifacts/mlpipeline-metrics.tgz"
time="2019-06-07T11:41:37Z" level=info msg="Archiving completed"
time="2019-06-07T11:41:37Z" level=info msg="Creating minio client minio-service.kubeflow:9000 using static credentials"
time="2019-06-07T11:41:37Z" level=info msg="Saving from /argo/outputs/artifacts/mlpipeline-metrics.tgz to s3 (endpoint: minio-service.kubeflow:9000, bucket: mlpipeline, key: artifacts/conditional-execution-pipeline-vmdhx/conditional-execution-pipeline-vmdhx-2104306666/mlpipeline-metrics.tgz)"
time="2019-06-07T11:41:37Z" level=info msg="Successfully saved file: /argo/outputs/artifacts/mlpipeline-metrics.tgz"
time="2019-06-07T11:41:37Z" level=info msg="Saving output parameters"
time="2019-06-07T11:41:37Z" level=info msg="Saving path output parameter: flip-coin-output"
time="2019-06-07T11:41:37Z" level=info msg="[sh -c docker cp -a 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/tmp/output - | tar -ax -O]"
time="2019-06-07T11:41:37Z" level=error msg="`[sh -c docker cp -a 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/tmp/output - | tar -ax -O]` stderr:\nError: No such container:path: 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c:/tmp/output\ntar: This does not look like a tar archive\ntar: Exiting with failure status due to previous errors\n"
time="2019-06-07T11:41:37Z" level=info msg="Alloc=4338 TotalAlloc=11911 Sys=10598 NumGC=4 Goroutines=11"
time="2019-06-07T11:41:37Z" level=fatal msg="exit status 2\ngithub.com/argoproj/argo/errors.Wrap\n\t/root/go/src/github.com/argoproj/argo/errors/errors.go:87\ngithub.com/argoproj/argo/errors.InternalWrapError\n\t/root/go/src/github.com/argoproj/argo/errors/errors.go:70\ngithub.com/argoproj/argo/workflow/executor/docker.(*DockerExecutor).GetFileContents\n\t/root/go/src/github.com/argoproj/argo/workflow/executor/docker/docker.go:40\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveParameters\n\t/root/go/src/github.com/argoproj/argo/workflow/executor/executor.go:343\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/root/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:49\ngithub.com/argoproj/argo/cmd/argoexec/commands.glob..func4\n\t/root/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:19\ngithub.com/argoproj/argo/vendor/github.com/spf13/cobra.(*Command).execute\n\t/root/go/src/github.com/argoproj/argo/vendor/github.com/spf13/cobra/command.go:766\ngithub.com/argoproj/argo/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/src/github.com/argoproj/argo/vendor/github.com/spf13/cobra/command.go:852\ngithub.com/argoproj/argo/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/root/go/src/github.com/argoproj/argo/vendor/github.com/spf13/cobra/command.go:800\nmain.main\n\t/root/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:15\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:198\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2361"
So it seems that there is a problem with either kubeflow or my docker daemon. The output of kubectl describe pods
for the created pod is following:
Name: conditional-execution-pipeline-vmdhx-2104306666
Namespace: kubeflow
Priority: 0
PriorityClassName: <none>
Node: root-nuc8i5beh/9.233.5.90
Start Time: Fri, 07 Jun 2019 13:41:29 +0200
Labels: workflows.argoproj.io/completed=true
workflows.argoproj.io/workflow=conditional-execution-pipeline-vmdhx
Annotations: workflows.argoproj.io/node-message:
Error response from daemon: No such container: 7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c
workflows.argoproj.io/node-name: conditional-execution-pipeline-vmdhx.flip-coin
workflows.argoproj.io/template:
{"name":"flip-coin","inputs":{},"outputs":{"parameters":[{"name":"flip-coin-output","valueFrom":{"path":"/tmp/output"}}],"artifacts":[{"na...
Status: Failed
IP: 10.1.1.30
Controlled By: Workflow/conditional-execution-pipeline-vmdhx
Containers:
main:
Container ID: containerd://7e3064415736db584cac5598a2b2a28728e11c03014ac67a05d008ad8119b13c
Image: python:alpine3.6
Image ID: docker.io/library/python@sha256:766a961bf699491995cc29e20958ef11fd63741ff41dcc70ec34355b39d52971
Port: <none>
Host Port: <none>
Command:
sh
-c
Args:
python -c "import random; result = 'heads' if random.randint(0,1) == 0 else 'tails'; print(result)" | tee /tmp/output
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 07 Jun 2019 13:41:35 +0200
Finished: Fri, 07 Jun 2019 13:41:35 +0200
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from pipeline-runner-token-xh2p7 (ro)
wait:
Container ID: containerd://f0449dc70c0a651c09aeb883edda9ce0ec5e415fa15a5468fe5b360fb06637c2
Image: argoproj/argoexec:v2.2.0
Image ID: docker.io/argoproj/argoexec@sha256:eea81e0b0d8899a0b7f9815c9c7bd89afa73ab32e5238430de82342b3bb7674a
Port: <none>
Host Port: <none>
Command:
argoexec
Args:
wait
State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 07 Jun 2019 13:41:35 +0200
Finished: Fri, 07 Jun 2019 13:41:37 +0200
Ready: False
Restart Count: 0
Environment:
ARGO_POD_NAME: conditional-execution-pipeline-vmdhx-2104306666 (v1:metadata.name)
Mounts:
/argo/podmetadata from podmetadata (rw)
/var/lib/docker from docker-lib (ro)
/var/run/docker.sock from docker-sock (ro)
/var/run/secrets/kubernetes.io/serviceaccount from pipeline-runner-token-xh2p7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
podmetadata:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.annotations -> annotations
docker-lib:
Type: HostPath (bare host directory volume)
Path: /var/lib/docker
HostPathType: Directory
docker-sock:
Type: HostPath (bare host directory volume)
Path: /var/run/docker.sock
HostPathType: Socket
pipeline-runner-token-xh2p7:
Type: Secret (a volume populated by a Secret)
SecretName: pipeline-runner-token-xh2p7
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m1s default-scheduler Successfully assigned kubeflow/conditional-execution-pipeline-vmdhx-2104306666 to root-nuc8i5beh
Normal Pulling 8m1s kubelet, root-nuc8i5beh Pulling image "python:alpine3.6"
Normal Pulled 7m56s kubelet, root-nuc8i5beh Successfully pulled image "python:alpine3.6"
Normal Created 7m56s kubelet, root-nuc8i5beh Created container main
Normal Started 7m55s kubelet, root-nuc8i5beh Started container main
Normal Pulled 7m55s kubelet, root-nuc8i5beh Container image "argoproj/argoexec:v2.2.0" already present on machine
Normal Created 7m55s kubelet, root-nuc8i5beh Created container wait
Normal Started 7m55s kubelet, root-nuc8i5beh Started container wait
So probably there is a problem with the argoexec container image? I see it tries to mount /var/run/docker.sock. When I try to read this file with cat
I get a "No such device or address" even though I can see the file with ls /var/run
. When I try to open it with vi
it mentions that the Permissions were denied, so I cannot see inside of the file. Is this the usual behavior with this file or does it seem like there are any problems with it?
I would really appreciate any help I can get! Thank you guys!
The problem is an upstream issue with kubeflow pipelines on microk8s, which is not working well together: https://github.com/kubeflow/kubeflow/issues/2347
I switched to Minikube, on which Kubeflow pipelines is running fine.