Failure on kubernetes cluster creation with kops

2/11/2021

I am trying to create a very simple cluster on aws with kops with one master and 2 worker nodes. But after creating, kops validate cluster complains that cluster is not healthy.

cluster created with:

kops create cluster --name=mycluster --zones=ap-south-1a --master-size="t2.micro" --node-size="t2.micro" --node-count="2" --cloud aws --ssh-public-key="~/.ssh/id_rsa.pub"
Output from kops validate cluster:

VALIDATION ERRORS
KIND    NAME                                                                                    MESSAGE
Pod     kube-system/kops-controller-xxxtk                                                       system-node-critical pod "kops-controller-xxxtk" is not ready (kops-controller)
Pod     kube-system/kube-controller-manager-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal        system-cluster-critical pod "kube-controller-manager-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal" is not ready (kube-controller-manager)

Validation Failed

Validation failed: cluster not yet healthy

Getting the resources in kube-system namespace shows:

NAME                                                                       READY   STATUS             RESTARTS   AGE
pod/dns-controller-8d8889c4b-rwnkd                                         1/1     Running            0          47m
pod/etcd-manager-events-ip-xxx-xxx-xxx-xxx..ap-south-1.compute.internal       1/1     Running            0          72m
pod/etcd-manager-main-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal         1/1     Running            0          72m
pod/kops-controller-xxxtk                                                  1/1     Running            11         70m
pod/kube-apiserver-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal            2/2     Running            1          72m
pod/kube-controller-manager-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal   0/1     CrashLoopBackOff   15         72m
pod/kube-dns-696cb84c7-qzqf2                                               3/3     Running            0          16h
pod/kube-dns-696cb84c7-tt7ng                                               3/3     Running            0          16h
pod/kube-dns-autoscaler-55f8f75459-7jbjb                                   1/1     Running            0          16h
pod/kube-proxy-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal                1/1     Running            0          16h
pod/kube-proxy-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal                1/1     Running            0          72m
pod/kube-proxy-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal                1/1     Running            0          16h
pod/kube-scheduler-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal            1/1     Running            15         72m

NAME               TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)         AGE
service/kube-dns   ClusterIP   100.64.0.10   <none>        53/UDP,53/TCP   16h

NAME                             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                                      AGE
daemonset.apps/kops-controller   1         1         1       1            1           kops.k8s.io/kops-controller-pki=,node-role.kubernetes.io/master=   16h

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/dns-controller        1/1     1            1           16h
deployment.apps/kube-dns              2/2     2            2           16h
deployment.apps/kube-dns-autoscaler   1/1     1            1           16h

NAME                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/dns-controller-8d8889c4b         1         1         1       16h
replicaset.apps/kube-dns-696cb84c7               2         2         2       16h
replicaset.apps/kube-dns-autoscaler-55f8f75459   1         1         1       16h

Getting the logs from the kube-scheduler shows:

I0211 04:26:45.546427       1 flags.go:59] FLAG: --vmodule=""
I0211 04:26:45.546442       1 flags.go:59] FLAG: --write-config-to=""
I0211 04:26:46.306497       1 serving.go:331] Generated self-signed cert in-memory
W0211 04:26:47.736258       1 authentication.go:368] failed to read in-cluster kubeconfig for delegated authentication: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0211 04:26:47.765649       1 authentication.go:265] No authentication-kubeconfig provided in order to lookup client-ca-file in configmap/extension-apiserver-authentication in kube-system, so client certificate authentication won't work.
W0211 04:26:47.783852       1 authentication.go:289] No authentication-kubeconfig provided in order to lookup requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work.
W0211 04:26:47.798838       1 authorization.go:187] failed to read in-cluster kubeconfig for delegated authorization: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0211 04:26:47.831825       1 authorization.go:156] No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work.
I0211 04:26:55.344064       1 factory.go:210] Creating scheduler from algorithm provider 'DefaultProvider'
I0211 04:26:55.370766       1 registry.go:173] Registering SelectorSpread plugin
I0211 04:26:55.370802       1 registry.go:173] Registering SelectorSpread plugin
I0211 04:26:55.504324       1 server.go:146] Starting Kubernetes Scheduler version v1.19.7
W0211 04:26:55.607516       1 authorization.go:47] Authorization is disabled
W0211 04:26:55.607537       1 authentication.go:40] Authentication is disabled
I0211 04:26:55.618714       1 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251
I0211 04:26:55.741863       1 tlsconfig.go:200] loaded serving cert ["Generated self signed cert"]: "localhost@1613017606" [serving] validServingFor=[127.0.0.1,localhost,localhost] issuer="localhost-ca@1613017605" (2021-02-11 03:26:45 +0000 UTC to 2022-02-11 03:26:45 +0000 UTC (now=2021-02-11 04:26:55.741788572 +0000 UTC))
I0211 04:26:55.746888       1 named_certificates.go:53] loaded SNI cert [0/"self-signed loopback"]: "apiserver-loopback-client@1613017607" [serving] validServingFor=[apiserver-loopback-client] issuer="apiserver-loopback-client-ca@1613017607" (2021-02-11 03:26:46 +0000 UTC to 2022-02-11 03:26:46 +0000 UTC (now=2021-02-11 04:26:55.7468713 +0000 UTC))
I0211 04:26:55.757881       1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0211 04:26:55.771581       1 secure_serving.go:197] Serving securely on [::]:10259
I0211 04:26:55.793134       1 reflector.go:207] Starting reflector *v1.StorageClass (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.815641       1 reflector.go:207] Starting reflector *v1.CSINode (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.841309       1 reflector.go:207] Starting reflector *v1beta1.PodDisruptionBudget (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.857460       1 reflector.go:207] Starting reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.875096       1 reflector.go:207] Starting reflector *v1.Node (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.894283       1 reflector.go:207] Starting reflector *v1.Service (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.894615       1 reflector.go:207] Starting reflector *v1.PersistentVolume (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.895000       1 reflector.go:207] Starting reflector *v1.ReplicationController (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.895250       1 reflector.go:207] Starting reflector *v1.ReplicaSet (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.902323       1 reflector.go:207] Starting reflector *v1.StatefulSet (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.902572       1 reflector.go:207] Starting reflector *v1.PersistentVolumeClaim (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.905927       1 reflector.go:207] Starting reflector *v1.Pod (0s) from k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:188
I0211 04:26:56.355570       1 node_tree.go:86] Added node "ip-172-20-43-190.ap-south-1.compute.internal" in group "ap-south-1:\x00:ap-south-1a" to NodeTree
I0211 04:26:56.357441       1 node_tree.go:86] Added node "ip-172-20-63-116.ap-south-1.compute.internal" in group "ap-south-1:\x00:ap-south-1a" to NodeTree
I0211 04:26:56.357578       1 node_tree.go:86] Added node "ip-172-20-60-103.ap-south-1.compute.internal" in group "ap-south-1:\x00:ap-south-1a" to NodeTree
I0211 04:26:56.377402       1 leaderelection.go:243] attempting to acquire leader lease  kube-system/kube-scheduler...
I0211 04:27:12.368681       1 leaderelection.go:253] successfully acquired lease kube-system/kube-scheduler
I0211 04:27:12.436915       1 scheduler.go:597] "Successfully bound pod to node" pod="default/nginx-deployment-66b6c48dd5-w4hb5" node="ip-172-20-63-116.ap-south-1.compute.internal" evaluatedNodes=3 feasibleNodes=2
I0211 04:27:12.451792       1 scheduler.go:597] "Successfully bound pod to node" pod="default/nginx-deployment-66b6c48dd5-4xz8l" node="ip-172-20-43-190.ap-south-1.compute.internal" evaluatedNodes=3 feasibleNodes=2
E0211 04:32:20.487059       1 leaderelection.go:325] error retrieving resource lock kube-system/kube-scheduler: Get "https://127.0.0.1/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=10s": context deadline exceeded
I0211 04:32:20.633059       1 leaderelection.go:278] failed to renew lease kube-system/kube-scheduler: timed out waiting for the condition
F0211 04:32:20.673521       1 server.go:199] leaderelection lost
goroutine 1 [running]:
k8s.io/kubernetes/vendor/k8s.io/klog/v2.stacks(0xc0005c2d01, 0xc000900800, 0x41, 0x1fd)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:996 +0xb9
....
... stack trace from go runtime
-- Divick
kops
kubernetes

2 Answers

2/11/2021

I don't see anything particularly wrong with the command you are running. However, t2.micro are very small, and may be too small for the cluster to function.

You can have a look at the kops-operator logs why it is not starting. Try kubectl logs kops-controller-xxxx and kubectl describe pod kops-controller-xxx

-- Ole Markus With
Source: StackOverflow

2/19/2021

You know, after @Markus and your comments i started to dig deeper with an information and this is what I found.

First Running Kubernetes on AWS T2 Instances article. Example of using T2.medium with very detailed steps and timeline describing what was going on there.

Conclusion in the end:

We’ve shown that the unpredictable nature of deployments on Kubernetes clusters isn’t a good fit for the T2/3 family of instances. There is the potential to have instances throttled due to pods consuming vast amounts of resources. At best this will limit the performance of your applications and at worst could cause the cluster to fail (if using T2/3s for master nodes) due to ETCD issues. Furthermore, this condition will only be picked up if we are monitoring CloudWatch carefully or performing application performance monitoring on the pods.

To this end it is advisable to avoid using T2/3 instance type families for Kubernetes deployments, if you would like to save money whilst using more traditional instance families (such as Ms and Rs) then take a look at our blog on spot instances.

And next to official info:

1) t2.micro spec: T2.micro is 1 vCPU and 1 gb mem t2.micro specs: enter image description here

2) Minimal required memory & CPU (cores) for Kubernetes in general:

  • Master node’s minimal required memory is 2GB and the worker node needs minimum is 1GB

  • The master node needs at least 1.5 and the worker node need at least 0.7 cores.

Not enough resources. Please use minimum T2.medium for masters

-- Vit
Source: StackOverflow