kubeadm init fails when using aws elb as control-plane endpoint

4/8/2020

I'm trying to configure an HA kubernetes cluster within AWS, and I've been having no luck using an ELB for the control plane (right now an NLB with a TLS listener, but have tried with an ALB and HTTPS as well). No matter what I do, it always fails on the wait-control-plane step. If I bump up the verbosity of the output, I can see it curl my load balancer endpoint every second during this step, and after 4 minutes it fails out. There isn't any indication on what the response is from the load balancer, here's an example of the output I'm seeing:

curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.17.3 (linux/amd64) kubernetes/06ad960" 'https://<load-balancer-dns>:443/healthz?timeout=10s'
I0408 13:51:07.899477   27075 round_trippers.go:443] GET https://<load-balancer-dns>:443/healthz?timeout=10s  in 4 milliseconds
I0408 13:51:07.899497   27075 round_trippers.go:449] Response Headers:

(There is nothing after response headers).

The odd thing is that while init is running, I can pull up that /healthz endpoint in a browser, which results in just a page saying "ok". I can also curl it from another terminal window and I get an HTTP 200 and everything looks good.

Further details - after init fails, there are no crashed docker containers. Kubeadm suggests checking the kubelet service status and journal, and I'm seeing lines like this:

E0408 14:50:36.738997   11649 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.CSIDriver: Get https://<load-balancer-dns>:443/apis/storage.k8s.io/v1beta1/csidrivers?limit=500&resourceVersion=0: x509: certificate signed by unknown authority

Curling that address does not give me any certificate errors, though it does give me a 403. The certificate should be trusted as it's chain is (I believe) imported correctly. So I'm not sure why kubelet is complaining about it.

The problem seems related somehow to the --control-plane-endpoint flag I'm using. If I instead just let it default to the IP of the single instance, kubeadm init will complete successfully and the cluster is initialized and I'm able to join workers to it, etc.

FWIW, my init command looks like this:

kubeadm init --control-plane-endpoint "<load-balancer-dns>:<port>" --ignore-preflight-errors=ImagePull --apiserver-bind-port=30400 --v=10

What can I check to try to identify exactly what the problem is?

-- Dan Potter
kubeadm
kubelet
kubernetes

0 Answers