Jenkins suddenly started failing to provision agents in Amazon EKS

8/30/2019

We are using the Kubernetes plugin to provision agents in EKS and around 8:45 pm EST yesterday, with no apparent changes on our end (I'm the only admin, and I certainly wasn't doing anything then) we started getting issues with provisioning agents. I have rebooted the EKS node and the Jenkins master. I can confirm that kubectl works fine and lists 1 node running.

I'm suspecting something must have changed on the AWS side of things.

What's odd is that those ALPN errors don't show up anywhere else in our logs until just before this started happening. Google around, I see people saying to ignore these "info" messages because the Java version doesn't support ALPN, but the fact that it's complaining about "HTTP/2" makes me wonder if Amazon changed something on their end to be HTTP/2 only?

I know this might seem too specific for a SO question, but if something did change with AWS that broke compatibility, I think this would be the right place.

From the Jenkins log at around 8:45:

INFO: Docker Container Watchdog check has been completed
Aug 29, 2019 8:42:05 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Finished DockerContainerWatchdog Asynchronous Periodic Work. 0 ms
Aug 29, 2019 8:45:04 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
INFO: Excess workload after pending Kubernetes agents: 1
Aug 29, 2019 8:45:04 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud provision
INFO: Template for label eks: Kubernetes Pod Template
Aug 29, 2019 8:45:04 PM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Aug 29, 2019 8:45:04 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
INFO: Started provisioning Kubernetes Pod Template from eks with 1 executors. Remaining excess workload: 0
Aug 29, 2019 8:45:14 PM hudson.slaves.NodeProvisioner$2 run
INFO: Kubernetes Pod Template provisioning successfully completed. We have now 3 computer(s)
Aug 29, 2019 8:45:14 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
INFO: Created Pod: jenkins-eks-39hfp in namespace jenkins
Aug 29, 2019 8:45:14 PM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Aug 29, 2019 8:45:14 PM io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1 onFailure
WARNING: Exec Failure: HTTP 403, Status: 403 -
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
        at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
        at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
        at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
        at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Aug 29, 2019 8:45:14 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher launch
WARNING: Error in provisioning; agent=KubernetesSlave name: jenkins-eks-39hfp, template=PodTemplate{inheritFrom='', name='jenkins-eks', namespace='jenkins', slaveConnectTimeout=300, label='eks', nodeSelector='', nodeUsageMode=NORMAL, workspaceVolume=EmptyDirWorkspaceVolume [memory=false], volumes=[HostPathVolume [mountPath=/var/run/docker.sock, hostPath=/var/run/docker.sock], EmptyDirVolume [mountPath=/tmp/build, memory=false]], containers=[ContainerTemplate{name='jnlp', image='infra-docker.artifactory.mycompany.io/jnlp-docker:latest', alwaysPullImage=true, workingDir='/home/jenkins/work', command='', args='-url http://jenkins.mycompany.io:8080 ${computer.jnlpmac} ${computer.name}', ttyEnabled=true, resourceRequestCpu='', resourceRequestMemory='', resourceLimitCpu='', resourceLimitMemory='', envVars=[KeyValueEnvVar [getValue()=/home/jenkins, getKey()=HOME]], livenessProbe=org.csanchez.jenkins.plugins.kubernetes.ContainerLivenessProbe@2043f440}], envVars=[KeyValueEnvVar [getValue()=/tmp/build, getKey()=BUILDDIR]], imagePullSecrets=[org.csanchez.jenkins.plugins.kubernetes.PodImagePullSecret@40ba07e2]}
io.fabric8.kubernetes.client.KubernetesClientException:
        at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onFailure(WatchConnectionManager.java:198)
        at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571)
        at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:198)
        at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
        at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Aug 29, 2019 8:45:14 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
INFO: Terminating Kubernetes instance for agent jenkins-eks-39hfp
Aug 29, 2019 8:45:14 PM hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler uncaughtException
SEVERE: A thread (OkHttp Dispatcher/255634) died unexpectedly due to an uncaught exception, this may leave your Jenkins in a bad way and is usually indicative of a bug in the code.
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@2c315338 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@2bddc643[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
        at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326)
        at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533)
        at java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:632)
        at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:678)
        at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.scheduleReconnect(WatchConnectionManager.java:300)
        at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$800(WatchConnectionManager.java:48)
        at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onFailure(WatchConnectionManager.java:213)
        at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571)
        at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:198)
        at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
        at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
-- Isaac Freeman
amazon-web-services
jenkins
kubernetes

4 Answers

8/30/2019

It's a bit difficult to see without direct troubleshooting, but the log points to Jenkins (through the Java API library) not being able to talk to the kube-apiserver because it has been denied.

It would check if you can still talk to the cluster with a previously working KUBECONFIG using the standard kubectl.

I speculate that the reason for the changed behavior could be to an automatic EKS upgrade of your minor version. For example, EKS recently released a patch (~08/30/19) to address CVE-2019-9512 and CVE-2019-9514.

PS. I don't think the issue is related to dropping the HTTP/2 connection.

-- Rico
Source: StackOverflow

9/5/2019

Just update the latest version of Kubernetes plugin (1.18.3) and restart the Jenkins. This worked for me.

-- Raghu Ram
Source: StackOverflow

8/30/2019

Ran into this today as AWS just pushed the update for the net/http golang CVE for K8s versions 1.12.x. That patch apparently broke the version of the Kubernetes plugin we were on. Updating to the latest version of the plugin 1.18.3 resolved the issue.

https://issues.jenkins-ci.org/browse/JENKINS-59000?page=com.atlassian.jira.plugin.system.issuetabpanels%3Achangehistory-tabpanel

-- DiamonW
Source: StackOverflow

8/30/2019

Just realized my Kubernetes plugin had an update available. Applied that and it seems to be working fine now.

-- Isaac Freeman
Source: StackOverflow