When starting cert-manager
I get the following message
TLS handshake error from 10.42.152.128:38676: EOF
$ kubectl -n cert-manager logs cert-manager-webhook-8575f88c85-l4tlw
I0214 19:41:28.147106 1 main.go:64] "msg"="enabling TLS as certificate file flags specified"
I0214 19:41:28.147365 1 server.go:126] "msg"="listening for insecure healthz connections" "address"=":6080"
I0214 19:41:28.147418 1 server.go:138] "msg"="listening for secure connections" "address"=":10250"
I0214 19:41:28.147437 1 server.go:155] "msg"="registered pprof handlers"
I0214 19:41:28.147570 1 tls_file_source.go:144] "msg"="detected private key or certificate data on disk has changed. reloading certificate"
2020/02/14 19:43:32 http: TLS handshake error from 10.42.152.128:38676: EOF
Interestingly there is not pod with that IP
$ kubectl get pod -o wide --all-namespaces | grep 128
cert-manager cert-manager-webhook-8575f88c85-l4tlw 1/1 Running 0 4m56s 10.42.112.128 node002 <none> <none>
Similar error on the cert-manager
pod
E0214 19:38:22.540589 1 controller.go:131] cert-manager/controller/ingress-shim "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: net/http: TLS handshake timeout" "key"="kube-system/dashboard-kubernetes-dashboard"
I have two ClusterIssuer
kubectl get ClusterIssuer --namespace cert-manager
NAME READY AGE
letsencrypt-prd True 42d
letsencrypt-stg True 42d
But no certificate yet:
kubectl get certificate --all-namespaces
No resources found
When I try to request a certificate I get the same error
kubectl apply -f mycert.yml
Error from server (InternalError): error when creating "cert-wyssmann-dev.yml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: net/http: TLS handshake timeout
I am not sure how exactly can I get to the bottom of the problem. I ran sonobouy
to see if this helps me, however test failed on 2 of my 3 nodes.
Plugin: e2e
Status: failed
Total: 1
Passed: 0
Failed: 1
Skipped: 0
Failed tests:
Container e2e is in a terminated state (exit code 1) due to reason: Error:
Plugin: systemd-logs
Status: failed
Total: 3
Passed: 1
Failed: 2
Skipped: 0
Failed tests:
timeout waiting for results
For the failing nodes I can see this in the sonobouy
logs
E0214 19:38:22.540589 1 controller.go:131] cert-manager/controller/ingress-shim "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: net/http: TLS handshake timeout" "key"="kube-system/dashboard-kubernetes-dashboard"
If you really don't need the webhook then one quick way to solve this is to disable the webhook as per documentation