Jenkins slave pod on kubernetes randomly failing

2/10/2020

I have set a Jenkins master (on a VM) and this is provisioning jnlp slaves as kubernetes pods.

In very rare occasions, the pipeline fails, with this message:

java.io.IOException: Pipe closed
    at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:260)
    at java.io.PipedInputStream.receive(PipedInputStream.java:226)
    at java.io.PipedOutputStream.write(PipedOutputStream.java:149)
    at java.io.OutputStream.write(OutputStream.java:75)
    at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.setupEnvironmentVariable(ContainerExecDecorator.java:510)
    at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.doLaunch(ContainerExecDecorator.java:474)
    at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:333)
    at hudson.Launcher$ProcStarter.start(Launcher.java:455)

Viewing kubernetes logs Stackdriver in Stackdriver, one can see that the pod does manage to connect to the master, e.g.

Handshaking
Agent discovery successful
Trying protocol: JNLP4-Connect
Remote Identity confirmed: <some_hash_here>
Connecting to <jenkins-master-url>:49187
started container
loading plugin ...

but after a while it fails and here are the relevant logs:

org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave$SlaveDisconnector call
INFO: Disabled slave engine reconnects.
hudson.remoting.jnlp.Main$CuiListener status
Terminated
hudson.remoting.Request$2 run
 Failed to send back a reply to the request hudson.remoting.Request$2@336ec321: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@29d0e8b2:JNLP4-connect connection to <jenkins-master-url>/<public-ip-of-jenkins-master>:49187": channel is already closed
"Processing signal 'terminated'"
.
.
.

How can I further troubleshoot this random error?

-- pkaramol
jenkins
kubernetes

1 Answer

2/10/2020

Can you take a look at the Kubernetes Pod-Events with Stackdriver? We had a similar behavior with a different CI-System (GitlabCI). Our builds where also randomly failing. It turned out that the JVM inside the Container exceeded its memory limitation and was killed by Kubernetes (OOMKilled) and the CI-System recognised this as a build error.

-- Alex
Source: StackOverflow