kubeadm join command fails to join the HA k8s master cluster

4/11/2019

I am setting up kubernetes 1.14 HA on AWS.

I am using Stacked etcd topology with 3 master and 5 worker nodes. I am able to run kubeadm init command on first master node, and run kubeadm join command on the 2nd master node. I see both are successful and able to list using kubectl get nodes command.

However, the same kubeadm join command fails on 3rd master node, the command fails.

[mark-control-plane] Marking the node ip-10-169-50-168 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node ip-10-169-50-168 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[kubelet-check] Initial timeout of 40s passed.
error execution phase control-plane-join/mark-control-plane: error applying control-plane label and taints: timed out waiting for the condition
NAME                           STATUS     ROLES    AGE     VERSION
ip-XX-XX-XX-XX                 Ready      master   74m     v1.14.0
ip-XX-XX-XX-XX                 Ready      master   70m     v1.14.0

When I check the docker logs, I see the etcd on 3rd node was able to join the cluster and later the connection is rejected. Below are the logs.

2019-04-10 17:44:59.409307 I | etcdserver/membership: added member 8ee1c831d170ef7f [https://XX.XX.XX.XX:2380] to cluster bedd10c18e149ae2
2019-04-10 17:44:59.409447 N | etcdserver/membership: set the initial cluster version to 3.3
2019-04-10 17:44:59.409506 I | etcdserver/api: enabled capabilities for version 3.3
2019-04-10 17:44:59.414195 I | rafthttp: established a TCP streaming connection with peer aa2e639fdfb57216 (stream Message reader)
2019-04-10 17:44:59.426797 I | etcdserver/membership: added member aa2e639fdfb57216 [https://XX.XX.XX.XX:2380] to cluster bedd10c18e149ae2
2019-04-10 17:44:59.428027 I | etcdserver/membership: added member 4d402309132b25d3 [https://XX.XX.XX.XX:2380] to cluster bedd10c18e149ae2
2019-04-10 17:44:59.436291 I | etcdserver: 4d402309132b25d3 initialzed peer connection; fast-forwarding 8 ticks (election ticks 10) with 2 active peer(s)
2019-04-10 17:44:59.448880 I | etcdserver: published {Name:ip-10-169-50-178 ClientURLs:[https://XX.XX.XX.XX:2379]} to cluster bedd10c18e149ae2
2019-04-10 17:44:59.448959 I | embed: ready to serve client requests
2019-04-10 17:44:59.449247 I | embed: ready to serve client requests
2019-04-10 17:44:59.450469 I | embed: serving client requests on XX.XX.XX.XX:2379
2019-04-10 17:44:59.450817 I | embed: serving client requests on 127.0.0.1:2379
2019-04-10 17:45:01.533145 I | embed: rejected connection from "127.0.0.1:46992" (error "EOF", ServerName "")
2019-04-10 17:45:03.146800 I | embed: rejected connection from "XX.XX.XX.XX:48888" (error "remote error: tls: bad certificate", ServerName "")
2019-04-10 17:45:03.788293 I | raft: 4d402309132b25d3 [logterm: 8, index: 892, vote: 0] ignored MsgVote from 8ee1c831d170ef7f [logterm: 8, index: 892] at term 8: lease is not expired (remaining ticks: 10)
2019-04-10 17:45:04.312725 W | wal: sync duration of 1.985619012s, expected less than 1s
2019-04-10 17:45:05.588410 I | raft: 4d402309132b25d3 [logterm: 8, index: 892, vote: 0] ignored MsgVote from 8ee1c831d170ef7f [logterm: 8, index: 892] at term 8: lease is not expired (remaining ticks: 3)
2019-04-10 17:45:05.589745 I | raft: 4d402309132b25d3 [term: 8] received a MsgApp message with higher term from 8ee1c831d170ef7f [term: 10]
2019-04-10 17:45:05.589762 I | raft: 4d402309132b25d3 became follower at term 10
2019-04-10 17:45:05.589781 I | raft: raft.node: 4d402309132b25d3 changed leader from aa2e639fdfb57216 to 8ee1c831d170ef7f at term 10
proto: no coders for int
proto: no encoder for ValueSize int [GetProperties]
2019-04-10 17:50:43.108887 I | mvcc: store.index: compact 978
2019-04-10 17:50:43.110211 I | mvcc: finished scheduled compaction at 978 (took 960.176µs)

Could you share some pointers, to resolve the issue?

-- Sunil Gajula
high-availability
kubeadm
kubernetes

2 Answers

4/11/2019

rejected connection from "XX.XX.XX.XX:48888" (error "remote error: tls: bad certificate"

I'd start looking into certs on the third master. Compare it with certs from the second master, the one which was successfully bootstrapped into the cluster
Eventually you should have this list of files:

enter image description here

Hopefully you didn't miss a step from this instruction

-- A_Suh
Source: StackOverflow

4/15/2019

I was able to resolve the issue. I had to pass apiServer.authorization-mode as Node,RBAC.

-- Sunil Gajula
Source: StackOverflow