kube-apiserver unable to communicate with TLS enabled etcd

12/10/2016

I'm trying to set up a kubernetes cluster on some raspberry pis. I have successfully set up an etcd cluster with TLS enabled, and I can access this cluster via etcdctl and curl.

However, when I try to run kube-apiserver with the same ca file, I get messages saying that the etcd cluster is misconfigured or unavailable.

My question is, why are curl and etcdctl able to view cluster health and add keys with the same ca file that kube-apiserver is trying to use, but the kube-apiserver cannot?

When I run kube-apiserver and hit 127.0.0.1 over HTTP, not HTTPS, I can start the api server.

If this and the information below is not enough to understand the problem, please let me know. I'm not experienced with TLS/x509 certificates at all. I've been using Kelsey Hightowers Kubernetes The Hard Way mixed with the CoreOS docs for spinning up a kubernetes cluster, as well as looking at github issues and things like that.

Here is my etcd unit file:

[Unit]
Description=etcd
Documentation=https://github.com/coreos/etcd

[Service]
Environment=ETCD_UNSUPPORTED_ARCH=arm
ExecStart=/usr/bin/etcd \
--name etcd-master1 \
--cert-file=/etc/etcd/etcd.pem  \
--key-file=/etc/etcd/etcd-key.pem \
--peer-cert-file=/etc/etcd/etcd.pem \
--peer-key-file=/etc/etcd/etcd-key.pem \
--trusted-ca-file=/etc/etcd/ca.pem \
--peer-trusted-ca-file=/etc/etcd/ca.pem \
--initial-advertise-peer-urls=https://10.0.1.200:2380 \
--listen-peer-urls https://10.0.1.200:2380  \
--listen-client-urls https://10.0.1.200:2379,http://127.0.0.1:2379  \
--advertise-client-urls https://10.0.1.200:2379 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster etcd-master1=https://10.0.1.200:2380,etcd-master2=https://10.0.1.201:2380 \
--initial-cluster-state new \
--data-dir=var/lib/etcd \
--debug
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Here is the kube-apiserver command I'm trying to run:

#!/bin/bash
./kube-apiserver \
  --etcd-cafile=/etc/etcd/ca.pem \
  --etcd-certfile=/etc/etcd/etcd.pem \
  --etcd-keyfile=/etc/etcd/etcd-key.pem \
  --etcd-servers=https://10.0.1.200:2379,https://10.0.1.201:2379 \
  --service-cluster-ip-range=10.32.0.0/24

Here is some output of that attempt. I think it's kind of weird that it is trying to list the etcd node, but nothing is printed out:

deploy@master1:~$ sudo ./run_kube_apiserver.sh
I1210 00:11:35.096887   20480 config.go:499] Will report 10.0.1.200 as public IP address.
I1210 00:11:35.842049   20480 trace.go:61] Trace "List *api.PodTemplateList" (started 2016-12-10 00:11:35.152287704 -0500 EST):
[134.479µs] [134.479µs] About to list etcd node
[688.376235ms] [688.241756ms] Etcd node listed
[689.062689ms] [686.454µs] END
E1210 00:11:35.860221   20480 cacher.go:261] unexpected ListAndWatch error: pkg/storage/cacher.go:202: Failed to list *api.PodTemplate: client: etcd cluster is unavailable or misconfigured
I1210 00:11:36.588511   20480 trace.go:61] Trace "List *api.LimitRangeList" (started 2016-12-10 00:11:35.273714755 -0500 EST):
[184.478µs] [184.478µs] About to list etcd node
[1.314010127s] [1.313825649s] Etcd node listed
[1.314362833s] [352.706µs] END
E1210 00:11:36.596092   20480 cacher.go:261] unexpected ListAndWatch error: pkg/storage/cacher.go:202: Failed to list *api.LimitRange: client: etcd cluster is unavailable or misconfigured
I1210 00:11:37.286714   20480 trace.go:61] Trace "List *api.ResourceQuotaList" (started 2016-12-10 00:11:35.325895387 -0500 EST):
[133.958µs] [133.958µs] About to list etcd node
[1.96003213s] [1.959898172s] Etcd node listed
[1.960393274s] [361.144µs] END

A successful cluster-health query:

deploy@master1:~$ sudo etcdctl --cert-file /etc/etcd/etcd.pem --key-file /etc/etcd/etcd-key.pem  --ca-file /etc/etcd/ca.pem cluster-health
member 133c48556470c88d is healthy: got healthy result from https://10.0.1.200:2379
member 7acb9583fc3e7976 is healthy: got healthy result from https://10.0.1.201:2379

I am also seeing a lot of timeouts on the etcd servers themselves trying to send heartbeats back:

Dec 10 00:19:56 master1 etcd[19308]: failed to send out heartbeat on time (exceeded the 100ms timeout for 790.808604ms)
Dec 10 00:19:56 master1 etcd[19308]: server is likely overloaded
Dec 10 00:22:40 master1 etcd[19308]: failed to send out heartbeat on time (exceeded the 100ms timeout for 122.586925ms)
Dec 10 00:22:40 master1 etcd[19308]: server is likely overloaded
Dec 10 00:22:41 master1 etcd[19308]: failed to send out heartbeat on time (exceeded the 100ms timeout for 551.618961ms)
Dec 10 00:22:41 master1 etcd[19308]: server is likely overloaded

I can still do etcd operations like gets and puts, but I'm wondering if this could be a contributing factor? Can I tell the kube-apiserver to wait longer for etcd? I've been trying to figure this out myself but IMO the technical parts of the kuberentes components aren't really well documented, and a lot of the examples are very turnkey, without really explaining what everything is doing and why. I can find all kinds of diagrams and blog posts about high-level stuff, but things like e.g. how to run the actual binary, and what flags are and are not required, are kind of lacking.

-- Tom
etcd
kubernetes
x509

0 Answers