Jenkins agent pod fails to run on kubernetes

1/28/2020

I have installed jenkins on a GKE cluster using the stable helm chart.

I am able to access and login to the UI.

However, when trying to run a simple job, the agent pod fails to be created.

The logs are not very informative on this

jenkins-kos-58586644f9-vh278 jenkins 2020-01-28 18:30:46.523+0000 [id=184]  WARNING o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: default-ld008, template=PodTemplate{inheritFrom='', name='default', slaveConnectTimeout=0, label='jenkins-kos-jenkins-slave ', serviceAccount='default', nodeSelector='', nodeUsageMode=NORMAL, workspaceVolume=EmptyDirWorkspaceVolume [memory=false], containers=[ContainerTemplate{name='jnlp', image='jenkins/jnlp-slave:3.27-1', workingDir='/home/jenkins/agent', command='', args='${computer.jnlpmac} ${computer.name}', resourceRequestCpu='500m', resourceRequestMemory='1Gi', resourceLimitCpu='4000m', resourceLimitMemory='8Gi', envVars=[ContainerEnvVar [getValue()=http://jenkins-kos.jenkins.svc.cluster.local:8080/jenkins, getKey()=JENKINS_URL]]}]}
jenkins-kos-58586644f9-vh278 jenkins java.lang.IllegalStateException: Pod has terminated containers: jenkins/default-ld008 (jnlp)
jenkins-kos-58586644f9-vh278 jenkins    at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.periodicAwait(AllContainersRunningPodWatcher.java:166)
jenkins-kos-58586644f9-vh278 jenkins    at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.periodicAwait(AllContainersRunningPodWatcher.java:187)
jenkins-kos-58586644f9-vh278 jenkins    at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.await(AllContainersRunningPodWatcher.java:127)
jenkins-kos-58586644f9-vh278 jenkins    at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:132)
jenkins-kos-58586644f9-vh278 jenkins    at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:290)
jenkins-kos-58586644f9-vh278 jenkins    at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
jenkins-kos-58586644f9-vh278 jenkins    at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
jenkins-kos-58586644f9-vh278 jenkins    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
jenkins-kos-58586644f9-vh278 jenkins    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jenkins-kos-58586644f9-vh278 jenkins    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jenkins-kos-58586644f9-vh278 jenkins    at java.lang.Thread.run(Thread.java:748)
jenkins-kos-58586644f9-vh278 jenkins 2020-01-28 18:30:46.524+0000 [id=184]  INFO    o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent default-ld008
jenkins-kos-58586644f9-vh278 jenkins Terminated Kubernetes instance for agent jenkins/default-ld008
jenkins-kos-58586644f9-vh278 jenkins Disconnected computer default-ld008
jenkins-kos-58586644f9-vh278 jenkins 2020-01-28 18:30:46.559+0000 [id=184]  INFO    o.c.j.p.k.KubernetesSlave#deleteSlavePod: Terminated Kubernetes instance for agent jenkins/default-ld008
jenkins-kos-58586644f9-vh278 jenkins 2020-01-28 18:30:46.560+0000 [id=184]  INFO    o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer default-ld008
jenkins-kos-58586644f9-vh278 jenkins 2020-01-28 18:30:56.009+0000 [id=53

Here are the kubernetes events

0s          Normal    Scheduled                pod/default-zkwp4                   Successfully assigned jenkins/default-zkwp4 to gke-kos-nodepool1-kq69
0s          Normal    Pulled                   pod/default-zkwp4                   Container image "docker.io/istio/proxyv2:1.4.0" already present on machine
0s          Normal    Created                  pod/default-zkwp4                   Created container
0s          Normal    Started                  pod/default-zkwp4                   Started container
0s          Normal    Pulled                   pod/default-zkwp4                   Container image "jenkins/jnlp-slave:3.27-1" already present on machine
0s          Normal    Created                  pod/default-zkwp4                   Created container
0s          Normal    Started                  pod/default-zkwp4                   Started container
0s          Normal    Pulled                   pod/default-zkwp4                   Container image "docker.io/istio/proxyv2:1.4.0" already present on machine
1s          Normal    Created                  pod/default-zkwp4                   Created container
0s          Normal    Started                  pod/default-zkwp4                   Started container
0s          Warning   Unhealthy                pod/default-zkwp4                   Readiness probe failed: Get http://10.15.2.113:15020/healthz/ready: dial tcp 10.15.2.113:15020: connect: connection refused
0s          Warning   Unhealthy                pod/default-zkwp4                   Readiness probe failed: Get http://10.15.2.113:15020/healthz/ready: dial tcp 10.15.2.113:15020: connect: connection refused
0s          Normal    Killing                  pod/default-zkwp4                   Killing container with id docker://istio-proxy:Need to kill Pod

The TCP port for agent communication is fixed to 50000

enter image description here

Using jenkins/jnlp-slave:3.27-1 for the agent image.

Any ideas what might be causing this?

UPDATE 1: Here is a gist with the description of the failed agent.

UPDATE 2: Managed to pinpoint the actual error in the jnlp logs using stackdriver (although not aware of the root cause yet)

"SEVERE: Failed to connect to http://jenkins-kos.jenkins.svc.cluster.local:8080/jenkins/tcpSlaveAgentListener/: Connection refused (Connection refused)

UPDATE 3: Here comes the weird(est) part: from a pod I spin up within the jenkins namespace:

/ # dig +short jenkins-kos.jenkins.svc.cluster.local
10.14.203.189
/ # nc -zv -w 3 jenkins-kos.jenkins.svc.cluster.local 8080
jenkins-kos.jenkins.svc.cluster.local (10.14.203.189:8080) open
/ # curl http://jenkins-kos.jenkins.svc.cluster.local:8080/jenkins/tcpSlaveAgentListener/


  Jenkins

UPDATE 4: I can confirm that this occurs on a GKE cluster using istio 1.4.0 but NOT on another one using 1.1.15

-- pkaramol
jenkins
kubernetes

1 Answer

2/12/2020

You can disable the sidecar proxy for agents.

Go to Manage Jenkins -> Configuration -> Kubernetes Cloud.

Select Annotations options and enter the below annotation value.

sidecar.istio.io/inject: "false"
-- Bharathi
Source: StackOverflow