I have set a Jenkins master (on a VM) and this is provisioning jnlp
slaves as kubernetes
pods.
In very rare occasions, the pipeline fails, with this message:
java.io.IOException: Pipe closed
at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:260)
at java.io.PipedInputStream.receive(PipedInputStream.java:226)
at java.io.PipedOutputStream.write(PipedOutputStream.java:149)
at java.io.OutputStream.write(OutputStream.java:75)
at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.setupEnvironmentVariable(ContainerExecDecorator.java:510)
at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.doLaunch(ContainerExecDecorator.java:474)
at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:333)
at hudson.Launcher$ProcStarter.start(Launcher.java:455)
Viewing kubernetes
logs Stackdriver
in Stackdriver, one can see that the pod does manage to connect to the master, e.g.
Handshaking
Agent discovery successful
Trying protocol: JNLP4-Connect
Remote Identity confirmed: <some_hash_here>
Connecting to <jenkins-master-url>:49187
started container
loading plugin ...
but after a while it fails and here are the relevant logs:
org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave$SlaveDisconnector call
INFO: Disabled slave engine reconnects.
hudson.remoting.jnlp.Main$CuiListener status
Terminated
hudson.remoting.Request$2 run
Failed to send back a reply to the request hudson.remoting.Request$2@336ec321: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@29d0e8b2:JNLP4-connect connection to <jenkins-master-url>/<public-ip-of-jenkins-master>:49187": channel is already closed
"Processing signal 'terminated'"
.
.
.
How can I further troubleshoot this random error?
Can you take a look at the Kubernetes Pod-Events with Stackdriver? We had a similar behavior with a different CI-System (GitlabCI). Our builds where also randomly failing. It turned out that the JVM inside the Container exceeded its memory limitation and was killed by Kubernetes (OOMKilled) and the CI-System recognised this as a build error.