Following Kubernetes v1.11 documentation, I have managed to setup Kubernetes high availability using kubeadm, stacked control plane nodes, with 3 masters running on-premises on CentOS7 VMs. But with no load-balancer available, I used Keepalived to set a failover virtual IP (10.171.4.12) for apiserver as described in Kubernetes v1.10 documentation. As a result, my "kubeadm-config.yaml" used to boostrap the control planes had the following header:
apiVersion: kubeadm.k8s.io/v1alpha2
kind: MasterConfiguration
kubernetesVersion: v1.11.0
apiServerCertSANs:
- "10.171.4.12"
api:
controlPlaneEndpoint: "10.171.4.12:6443"
etcd:
...
The configuration went fine with the following Warning that appeared when boostrapping all 3 Masters:
[endpoint] WARNING: port specified in api.controlPlaneEndpoint overrides api.bindPort in the controlplane address
And this Warning when joining Workers:
[WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh] or no builtin kernel ipvs support: map[ip_vs:{} ip_vs_rr:{} ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{}] you can solve this problem with following methods:
1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support
I am running Kubernetes v1.11.1 but kubeadm-config.yaml mentions 1.11.0, is this something I should worry about?
Shall I not follow the official documentation and go for other alternatives such as described at: https://medium.com/@bambash/ha-kubernetes-cluster-via-kubeadm-b2133360b198
Note: same issue with new Kubernetes HA installation using the latest version 1.11.2 (three masters + one worker) and deployed nginx latest ingress controller release 0.18.0.
Normal Pulled 28m (x38 over 2h) kubelet, node3.local Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.17.1" already present on machine
Warning Unhealthy 7m (x137 over 2h) kubelet, node3.local Liveness probe failed: Get http://10.240.3.14:10254/healthz: dial tcp 10.240.3.14:10254: connect: connection refused
Warning BackOff 2m (x502 over 2h) kubelet, node3.local Back-off restarting failed container
nginx version: nginx/1.13.12
W0809 14:05:46.171066 5 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0809 14:05:46.171748 5 main.go:191] Creating API client for https://10.250.0.1:443
'# helm install ...
Error: no available release name found
'# helm list
Error: Get https://10.250.0.1:443/api/v1/namespaces/kube-system/configmaps?labelSelector=OWNER%!D(MISSING)TILLER: dial tcp 10.250.0.1:443: i/o timeout
# kubectl describe svc kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 10.250.0.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 10.171.4.10:6443,10.171.4.8:6443,10.171.4.9:6443
Session Affinity: None
Events: <none>
# kubectl get endpoints --all-namespaces
NAMESPACE NAME ENDPOINTS AGE
default bc-svc 10.240.3.27:8080 6d
default kubernetes 10.171.4.10:6443,10.171.4.8:6443,10.171.4.9:6443 7d
ingress-nginx default-http-backend 10.240.3.24:8080 4d
kube-system kube-controller-manager <none> 7d
kube-system kube-dns 10.240.2.4:53,10.240.2.5:53,10.240.2.4:53 + 1 more... 7d
kube-system kube-scheduler <none> 7d
kube-system tiller-deploy 10.240.3.25:44134 5d
As you can see in the Kubernetes client-go code, IP address and port are read from environment variables inside a container:
host, port := os.Getenv("KUBERNETES_SERVICE_HOST"), os.Getenv("KUBERNETES_SERVICE_PORT")
You can check these variables if you run following command mentioning any healthy pod in it:
$ kubectl exec <healthy-pod-name> -- printenv | grep SERVICE
I think the cause of the problem is that the variables KUBERNETES_SERVICE_HOST:KUBERNETES_SERVICE_PORT
is set to 10.250.0.1:443
instead of 10.171.4.12:6443
Could you confirm it by checking these variables in your cluster?
Important Additional Notes:
After running couple of labs, I got the same issue with: - new Kubernetes HA installation using the latest version 1.11.2 (three masters + one worker) and nginx latest ingress controller release 0.18.0. - standalone Kubernetes master with few workers using version 1.11.1 (one master + two workers) and nginx latest ingress controller release 0.18.0. - but with standalone Kubernetes master version 1.11.0 (one master + two workers), nginx ingress controller 0.17.1 worked with no complaints while 0.18.0 complained that Readiness probe failed but the pod went into the running state.
\=> As a result, I think the issue is related to kubernetes releases 1.11.1 and 1.11.2 in the way they interpret the health probes maybe
Problems solved when switched my POD network from Flanneld to Calico. (tested on Kubernetes 1.11.0; will repeat tests tomorrow on latest k8s version 1.11.2)