Endpoints are not being updated with new IP address of a pod

8/13/2019

Platform: AWS EKS

Output of helm version:

Client: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.2", GitCommit:"a8b13cc5ab6a7dbef0a58f5061bcc7c0c61598e7", GitTreeState:"clean"}

Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:18:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.10-eks-2e569f", GitCommit:"2e569fd887357952e506846ed47fc30cc385409a", GitTreeState:"clean", BuildDate:"2019-07-25T23:13:33Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Cloud Provider/Platform (AKS, GKE, Minikube etc.): AWS EKS

The problem: After jenkins pod restart, the pod got a new IP address and ReadinesProbe supposed to update the endpoints but it doesn't.

kubectl get endpoints
jenkins                                  <none>
jenkins-agent                            <none>

Error:

Readiness probe failed: Get http://192.168.0.109:8080/login: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

I can successfully access above URL from all pods and worker nodes and I get correct Headers.

This happened after helm failed to upgrade jenkins and then I rollback the release, and it was successful (apart from now not updating endpoints) Now I need to edit endpoints manually to point Endpoints to the correct IP address of a pod.

Current ReadinesProbe from deployment is:

    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /login
        port: http
        scheme: HTTP
      initialDelaySeconds: 60
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1

Log from Jenkins pod is:

Events:
  Type     Reason                  Age    From                                                  Message
  ----     ------                  ----   ----                                                  -------
  Normal   Scheduled               8m13s  default-scheduler                                     Successfully assigned default/jenkins-pod-id to <ip>.<region>.compute.internal
  Normal   SuccessfulAttachVolume  8m6s   attachdetach-controller                               AttachVolume.Attach succeeded for volume "jenkins"
  Normal   Pulling                 8m4s   kubelet, <ip>.<region>.compute.internal  pulling image "jenkins/jenkins:2.176.2-alpine"
  Normal   Pulled                  7m57s  kubelet, <ip>.<region>.compute.internal  Successfully pulled image "jenkins/jenkins:2.176.2-alpine"
  Normal   Created                 7m56s  kubelet, <ip>.<region>.compute.internal  Created container
  Normal   Started                 7m56s  kubelet, <ip>.<region>.compute.internal  Started container
  Normal   Pulling                 7m43s  kubelet, <ip>.<region>.compute.internal  pulling image "jenkins/jenkins:2.176.2-alpine"
  Normal   Pulled                  7m42s  kubelet, <ip>.<region>.compute.internal  Successfully pulled image "jenkins/jenkins:2.176.2-alpine"
  Normal   Created                 7m42s  kubelet, <ip>.<region>.compute.internal  Created container
  Normal   Started                 7m42s  kubelet, <ip>.<region>.compute.internal  Started container
  Warning  Unhealthy               6m40s  kubelet, <ip>.<region>.compute.internal  Readiness probe failed: Get http://<IP>:8080/login: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

The pod got IP almost instantly but it takes a few minutes for the container to start. How can I get the ReadinesProbe to update Endpoints or even get ReadinesProbe logs? This is running in AWS so no access to the controller to get more logs.

If I update the endpoints fast enough, the ReadinesProbe won't fail but this doesn't help when the pod restarts next time.

Update: Just enabled EKS logs and got this:

deployment_controller.go:484] Error syncing deployment default/jenkins: Operation cannot be fulfilled on deployments.apps "jenkins": the object has been modified; please apply your changes to the latest version and try again
-- tr53
amazon-web-services
aws-eks
eks
kubernetes

1 Answer

8/13/2019

Below helped. The Readiness probe is still failing but this is due to Jenkins taking 90s to start. I will update this.

helm delete jenkins
release "jenkins" deleted


helm rollback jenkins 25
Rollback was a success! Happy Helming!
-- tr53
Source: StackOverflow