Jenkins slave pod is offline

9/24/2020

I am having issue with slave pods not being able to connect to Jenkins master.

This is the Jenkins build output

[Pipeline] Start of Pipeline
[Pipeline] podTemplate
[Pipeline] {
[Pipeline] node
Still waiting to schedule task
‘ci-xprj2-2z8qp’ is offline

I can see this in the Jenkins pod log

2020-09-24 20:16:57.778+0000 [id=6228]	INFO	o.c.j.p.k.KubernetesLauncher#launch: Created Pod: infrastructure/ci-xprj2-2tqzn
2020-09-24 20:16:57.778+0000 [id=24]	INFO	hudson.slaves.NodeProvisioner#lambda$update$6: Kubernetes Pod Template provisioning successfully completed. We have now 2 computer(s)
2020-09-24 20:16:57.779+0000 [id=24]	INFO	o.c.j.p.k.KubernetesCloud#provision: Excess workload after pending Kubernetes agents: 0
2020-09-24 20:16:57.779+0000 [id=24]	INFO	o.c.j.p.k.KubernetesCloud#provision: Template for label ci: Kubernetes Pod Template
2020-09-24 20:16:57.839+0000 [id=5801]	INFO	o.internal.platform.Platform#log: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
2020-09-24 20:16:59.902+0000 [id=6228]	INFO	o.c.j.p.k.KubernetesLauncher#launch: Pod is running: infrastructure/ci-xprj2-2tqzn
2020-09-24 20:16:59.906+0000 [id=6228]	INFO	o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (0/100): ci-xprj2-2tqzn
2020-09-24 20:17:00.911+0000 [id=6228]	INFO	o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (1/100): ci-xprj2-2tqzn
2020-09-24 20:17:01.917+0000 [id=6228]	INFO	o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (2/100): ci-xprj2-2tqzn

The log from ci-xprj2-2tqzn shows this:

Sep 24, 2020 8:18:59 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: ci-xprj2-29g0p
Sep 24, 2020 8:18:59 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Sep 24, 2020 8:18:59 PM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 4.3
Sep 24, 2020 8:18:59 PM hudson.remoting.Engine startEngine
WARNING: No Working Directory. Using the legacy JAR Cache location: /home/jenkins/.jenkins/cache/jars
Sep 24, 2020 8:18:59 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins1:8080/]
Sep 24, 2020 8:19:19 PM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Failed to connect to http://jenkins1:8080/tcpSlaveAgentListener/: jenkins1
java.io.IOException: Failed to connect to http://jenkins1:8080/tcpSlaveAgentListener/: jenkins1
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:217)
	at hudson.remoting.Engine.innerRun(Engine.java:693)
	at hudson.remoting.Engine.run(Engine.java:518)
Caused by: java.net.UnknownHostException: jenkins1
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:607)
	at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
	at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
	at sun.net.www.http.HttpClient.New(HttpClient.java:339)
	at sun.net.www.http.HttpClient.New(HttpClient.java:357)
	at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
	at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)
	at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:214)
	... 2 more

My Jenkins config looks like this: enter image description here

enter image description here

Any help?

-- Bob
jenkins
jenkins-pipeline
kubernetes

1 Answer

9/25/2020

It looks like the error to focus on would be:

SEVERE: Failed to connect to http://jenkins1:8080/tcpSlaveAgentListener/: jenkins1
java.io.IOException: Failed to connect to http://jenkins1:8080/tcpSlaveAgentListener/: jenkins1
...
Caused by: java.net.UnknownHostException

which means jenkins1 can't be resolved.

  • If jenkins1 corresponds to a Kubernetes service name, I would double check its name and details and then spin up another pod in your namespace that sleeps for a while so that you can exec in and see if you can resolve jenkins1.
kubectl exec -it <sleep-test-pod-name> /bin/bash

ping jenkins1
nslookup jenkins1  #install nslookup if not already installed
  • If jenkins1 corresponds to one of those single word domains you sometimes see at corporations, then I would double check your search prefixes in /etc/resolv.conf in your pods:
cat /etc/resolv.conf
-- Howard_Roark
Source: StackOverflow