Since two days I am fighting with Kubernetes setup on Ubuntu 20.04. I created so called template vm on vSphere and I cloned three vm's out of it.
I have following configurations for each master node:
127.0.0.1 localhost
127.0.1.1 kubernetes-master1
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.255.200 kubernetes-cluster.homelab01.local
192.168.255.201 kubernetes-master1.homelab01.local
192.168.255.202 kubernetes-master2.homelab01.local
192.168.255.203 kubernetes-master3.homelab01.local
192.168.255.204 kubernetes-worker1.homelab01.local
192.168.255.205 kubernetes-worker2.homelab01.local
192.168.255.206 kubernetes-worker3.homelab01.local
127.0.1.1 kubernetes-master1
on a first master and 127.0.1.1 kubernetes-master2
on second one and 127.0.1.1 kubernetes-master3
on the third one.
I am using Docker 19.03.11 which is latest supported by Kubernetes as per documentation.
Client: Docker Engine - Community
Version: 19.03.11
API version: 1.40
Go version: go1.13.10
Git commit: 42e35e61f3
Built: Mon Jun 1 09:12:34 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.11
API version: 1.40 (minimum version 1.12)
Go version: go1.13.10
Git commit: 42e35e61f3
Built: Mon Jun 1 09:11:07 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
I used following commands to install docker:
sudo apt-get update && sudo apt-get install -y \
containerd.io=1.2.13-2 \
docker-ce=5:19.03.11~3-0~ubuntu-$(lsb_release -cs) \
docker-ce-cli=5:19.03.11~3-0~ubuntu-$(lsb_release -cs)
I marked all the necessary packets on hold.
sudo apt-mark hold kubelet kubeadm kubectl docker-ce containerd.io docker-ce-cli
Some details about the VM's.
sudo cat /sys/class/dmi/id/product_uuid
f09c3242-c8f7-c97e-bc6a-b2065c286ea9
IP: 192.168.255.201
sudo cat /sys/class/dmi/id/product_uuid
b4fe3242-ba37-a533-c12f-b30b735cbe9f
IP: 192.168.255.202
sudo cat /sys/class/dmi/id/product_uuid
c3cc3242-4115-8c38-8e46-166190620249
IP: 192.168.255.203
IP addresses and name resolution works flawless on all hosts
192.168.255.200 kubernetes-cluster.homelab01.local
192.168.255.201 kubernetes-master1.homelab01.local
192.168.255.202 kubernetes-master2.homelab01.local
192.168.255.203 kubernetes-master3.homelab01.local
192.168.255.204 kubernetes-worker1.homelab01.local
192.168.255.205 kubernetes-worker2.homelab01.local
192.168.255.206 kubernetes-worker3.homelab01.local
From master1. On master2 it has state=backup and priority 100, on master3 state=backup and priority 89.
! /etc/keepalived/keepalived.conf
! Configuration File for keepalived
$STATE=MASTER
$INTERFACE=ens160
$ROUTER_ID=51
$PRIORITY=255
$AUTH_PASS=Kub3rn3t3S!
$APISERVER_VIP=192.168.255.200/24
global_defs {
router_id LVS_DEVEL
}
vrrp_script check_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 3
weight -2
fall 10
rise 2
}
vrrp_instance VI_1 {
state $STATE
interface $INTERFACE
virtual_router_id $ROUTER_ID
priority $PRIORITY
authentication {
auth_type PASS
auth_pass $AUTH_PASS
}
virtual_ipaddress {
$APISERVER_VIP
}
track_script {
check_apiserver
}
}
/etc/keepalived/check_apiserver.sh
#!/bin/sh
APISERVER_VIP=192.168.255.200
APISERVER_DEST_PORT=6443
errorExit() {
echo "*** $*" 1>&2
exit 1
}
curl --silent --max-time 2 --insecure https://localhost:${APISERVER_DEST_PORT}/ -o /dev/null || errorExit "Error GET https://localhost:${APISERVER_DEST_PORT}/"
if ip addr | grep -q ${APISERVER_VIP}; then
curl --silent --max-time 2 --insecure https://${APISERVER_VIP}:${APISERVER_DEST_PORT}/ -o /dev/null || errorExit "Error GET https://${APISERVER_VIP}:${APISERVER_DEST_PORT}/"
fi
sudo service keepalived status
● keepalived.service - Keepalive Daemon (LVS and VRRP)
Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2021-01-06 16:41:38 CET; 1min 26s ago
Main PID: 804 (keepalived)
Tasks: 2 (limit: 4620)
Memory: 4.7M
CGroup: /system.slice/keepalived.service
├─804 /usr/sbin/keepalived --dont-fork
└─840 /usr/sbin/keepalived --dont-fork
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: Registering Kernel netlink reflector
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: Registering Kernel netlink command channel
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: Opening file '/etc/keepalived/keepalived.conf>
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: WARNING - default user 'keepalived_script' fo>
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: (Line 29) Truncating auth_pass to 8 characters
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: SECURITY VIOLATION - scripts are being execut>
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: (VI_1) ignoring tracked script check_apiserve>
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: Warning - script check_apiserver is not used
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: Registering gratuitous ARP shared channel
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: (VI_1) Entering MASTER STATE
lines 1-20/20 (END)
# /etc/haproxy/haproxy.cfg
#
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
log /dev/log local0
log /dev/log local1 notice
daemon
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 1
timeout http-request 10s
timeout queue 20s
timeout connect 5s
timeout client 20s
timeout server 20s
timeout http-keep-alive 10s
timeout check 10s
#---------------------------------------------------------------------
# apiserver frontend which proxys to the masters
#---------------------------------------------------------------------
frontend apiserver
bind *:8443
mode tcp
option tcplog
default_backend apiserver
#---------------------------------------------------------------------
# round robin balancing for apiserver
#---------------------------------------------------------------------
backend apiserver
option httpchk GET /healthz
http-check expect status 200
mode tcp
option ssl-hello-chk
balance roundrobin
server kubernetes-master1 192.168.255.201:6443 check
server kubernetes-master2 192.168.255.202:6443 check
server kubernetes-master3 192.168.255.203:6443 check
sudo service haproxy status
● haproxy.service - HAProxy Load Balancer
Loaded: loaded (/lib/systemd/system/haproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2021-01-06 16:41:38 CET; 3min 12s ago
Docs: man:haproxy(1)
file:/usr/share/doc/haproxy/configuration.txt.gz
Process: 847 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS (code=exited, status=0/SUC>
Main PID: 849 (haproxy)
Tasks: 3 (limit: 4620)
Memory: 4.7M
CGroup: /system.slice/haproxy.service
├─849 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -S /run/hapro>
└─856 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -S /run/hapro>
Jan 06 16:41:38 kubernetes-master1 haproxy[856]: Server apiserver/kubernetes-master1 is DOWN, reason: >
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: [WARNING] 005/164139 (856) : Server apiserver/kuberne>
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: Server apiserver/kubernetes-master2 is DOWN, reason: >
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: Server apiserver/kubernetes-master2 is DOWN, reason: >
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: [WARNING] 005/164139 (856) : Server apiserver/kuberne>
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: [ALERT] 005/164139 (856) : backend 'apiserver' has no>
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: Server apiserver/kubernetes-master3 is DOWN, reason: >
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: Server apiserver/kubernetes-master3 is DOWN, reason: >
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: backend apiserver has no server available!
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: backend apiserver has no server available!
lines 1-23/23 (END)
I am creating the first kubernetes node with following command
sudo kubeadm init --control-plane-endpoint kubernetes-cluster.homelab01.local:8443 --upload-certs
This works well and I apply Calico CNI plugin with command
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
After that I am attempting join from master2.
Keepalived works perfectly fine as I tested it on all three with stopping service and observing failover to other nodes. When on the first master1 node I created kubernetes haproxy informed that backend was visible.
udo kubeadm init --control-plane-endpoint kubernetes-cluster.homelab01.local:8443 --upload-certs
[init] Using Kubernetes version: v1.20.1
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes-cluster.homelab01.local kubernetes-master1 kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.255.201]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kubernetes-master1 localhost] and IPs [192.168.255.201 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kubernetes-master1 localhost] and IPs [192.168.255.201 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 18.539325 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.20" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
57abea9f00357a4459c852249ac0170633c9a0f2327cde191e529a1689ea158b
[mark-control-plane] Marking the node kubernetes-master1 as control-plane by adding the labels "node-role.kubernetes.io/master=''" and "node-role.kubernetes.io/control-plane='' (deprecated)"
[mark-control-plane] Marking the node kubernetes-master1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 2cu336.rjxs8i0svtna27ke
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join kubernetes-cluster.homelab01.local:8443 --token 2cu336.rjxs8i0svtna27ke \
--discovery-token-ca-cert-hash sha256:eb0668ca16acec622e4a97d69e0d4c42e64b1a61ffea13a3787956817021ca54 \
--control-plane --certificate-key 57abea9f00357a4459c852249ac0170633c9a0f2327cde191e529a1689ea158b
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join kubernetes-cluster.homelab01.local:8443 --token 2cu336.rjxs8i0svtna27ke \
--discovery-token-ca-cert-hash sha256:eb0668ca16acec622e4a97d69e0d4c42e64b1a61ffea13a3787956817021ca54
All stuff is up and running on master1
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/calico-kube-controllers-744cfdf676-mks4d 1/1 Running 0 36s
kube-system pod/calico-node-bnvmz 1/1 Running 0 37s
kube-system pod/coredns-74ff55c5b-skdzk 1/1 Running 0 3m11s
kube-system pod/coredns-74ff55c5b-tctl9 1/1 Running 0 3m11s
kube-system pod/etcd-kubernetes-master1 1/1 Running 0 3m4s
kube-system pod/kube-apiserver-kubernetes-master1 1/1 Running 0 3m4s
kube-system pod/kube-controller-manager-kubernetes-master1 1/1 Running 0 3m4s
kube-system pod/kube-proxy-smmmx 1/1 Running 0 3m11s
kube-system pod/kube-scheduler-kubernetes-master1 1/1 Running 0 3m4s
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 3m17
s
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 3m11
s
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SE
LECTOR AGE
kube-system daemonset.apps/calico-node 1 1 1 1 1 kuberne
tes.io/os=linux 38s
kube-system daemonset.apps/kube-proxy 1 1 1 1 1 kuberne
tes.io/os=linux 3m11s
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/calico-kube-controllers 1/1 1 1 38s
kube-system deployment.apps/coredns 2/2 2 2 3m11s
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/calico-kube-controllers-744cfdf676 1 1 1 37s
kube-system replicaset.apps/coredns-74ff55c5b 2 2 2 3m11s
Immediately after attempting to join master2 to cluster master1 kubernetes dies.
wojcieh@kubernetes-master2:~$ sudo kubeadm join kubernetes-cluster.homelab01.local:8443 --token 2cu336.rjxs8i0svtna27ke \
> --discovery-token-ca-cert-hash sha256:eb0668ca16acec622e4a97d69e0d4c42e64b1a61ffea13a3787956817021ca54 \
> --control-plane --certificate-key 57abea9f00357a4459c852249ac0170633c9a0f2327cde191e529a1689ea158b
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes-cluster.homelab01.local kubernetes-master2 kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.255.202]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kubernetes-master2 localhost] and IPs [192.168.255.202 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kubernetes-master2 localhost] and IPs [192.168.255.202 127.0.0.1 ::1]
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[kubelet-check] Initial timeout of 40s passed.
Broadcast message from systemd-journald@kubernetes-master2 (Wed 2021-01-06 16:53:04 CET):
haproxy[870]: backend apiserver has no server available!
Broadcast message from systemd-journald@kubernetes-master2 (Wed 2021-01-06 16:53:04 CET):
haproxy[870]: backend apiserver has no server available!
^C
wojcieh@kubernetes-master2:~$
Here are some logs which might be relevant
Logs from master1 https://pastebin.com/Y1zcwfWt
Logs from master2 https://pastebin.com/rBELgK1Y