I am trying to create a very simple cluster on aws with kops with one master and 2 worker nodes. But after creating, kops validate cluster complains that cluster is not healthy.
cluster created with:
kops create cluster --name=mycluster --zones=ap-south-1a --master-size="t2.micro" --node-size="t2.micro" --node-count="2" --cloud aws --ssh-public-key="~/.ssh/id_rsa.pub"
Output from kops validate cluster:
VALIDATION ERRORS
KIND NAME MESSAGE
Pod kube-system/kops-controller-xxxtk system-node-critical pod "kops-controller-xxxtk" is not ready (kops-controller)
Pod kube-system/kube-controller-manager-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal system-cluster-critical pod "kube-controller-manager-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal" is not ready (kube-controller-manager)
Validation Failed
Validation failed: cluster not yet healthy
Getting the resources in kube-system namespace shows:
NAME READY STATUS RESTARTS AGE
pod/dns-controller-8d8889c4b-rwnkd 1/1 Running 0 47m
pod/etcd-manager-events-ip-xxx-xxx-xxx-xxx..ap-south-1.compute.internal 1/1 Running 0 72m
pod/etcd-manager-main-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal 1/1 Running 0 72m
pod/kops-controller-xxxtk 1/1 Running 11 70m
pod/kube-apiserver-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal 2/2 Running 1 72m
pod/kube-controller-manager-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal 0/1 CrashLoopBackOff 15 72m
pod/kube-dns-696cb84c7-qzqf2 3/3 Running 0 16h
pod/kube-dns-696cb84c7-tt7ng 3/3 Running 0 16h
pod/kube-dns-autoscaler-55f8f75459-7jbjb 1/1 Running 0 16h
pod/kube-proxy-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal 1/1 Running 0 16h
pod/kube-proxy-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal 1/1 Running 0 72m
pod/kube-proxy-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal 1/1 Running 0 16h
pod/kube-scheduler-ip-xxx-xxx-xxx-xxx.ap-south-1.compute.internal 1/1 Running 15 72m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 100.64.0.10 <none> 53/UDP,53/TCP 16h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kops-controller 1 1 1 1 1 kops.k8s.io/kops-controller-pki=,node-role.kubernetes.io/master= 16h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/dns-controller 1/1 1 1 16h
deployment.apps/kube-dns 2/2 2 2 16h
deployment.apps/kube-dns-autoscaler 1/1 1 1 16h
NAME DESIRED CURRENT READY AGE
replicaset.apps/dns-controller-8d8889c4b 1 1 1 16h
replicaset.apps/kube-dns-696cb84c7 2 2 2 16h
replicaset.apps/kube-dns-autoscaler-55f8f75459 1 1 1 16h
Getting the logs from the kube-scheduler shows:
I0211 04:26:45.546427 1 flags.go:59] FLAG: --vmodule=""
I0211 04:26:45.546442 1 flags.go:59] FLAG: --write-config-to=""
I0211 04:26:46.306497 1 serving.go:331] Generated self-signed cert in-memory
W0211 04:26:47.736258 1 authentication.go:368] failed to read in-cluster kubeconfig for delegated authentication: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0211 04:26:47.765649 1 authentication.go:265] No authentication-kubeconfig provided in order to lookup client-ca-file in configmap/extension-apiserver-authentication in kube-system, so client certificate authentication won't work.
W0211 04:26:47.783852 1 authentication.go:289] No authentication-kubeconfig provided in order to lookup requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work.
W0211 04:26:47.798838 1 authorization.go:187] failed to read in-cluster kubeconfig for delegated authorization: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0211 04:26:47.831825 1 authorization.go:156] No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work.
I0211 04:26:55.344064 1 factory.go:210] Creating scheduler from algorithm provider 'DefaultProvider'
I0211 04:26:55.370766 1 registry.go:173] Registering SelectorSpread plugin
I0211 04:26:55.370802 1 registry.go:173] Registering SelectorSpread plugin
I0211 04:26:55.504324 1 server.go:146] Starting Kubernetes Scheduler version v1.19.7
W0211 04:26:55.607516 1 authorization.go:47] Authorization is disabled
W0211 04:26:55.607537 1 authentication.go:40] Authentication is disabled
I0211 04:26:55.618714 1 deprecated_insecure_serving.go:51] Serving healthz insecurely on [::]:10251
I0211 04:26:55.741863 1 tlsconfig.go:200] loaded serving cert ["Generated self signed cert"]: "localhost@1613017606" [serving] validServingFor=[127.0.0.1,localhost,localhost] issuer="localhost-ca@1613017605" (2021-02-11 03:26:45 +0000 UTC to 2022-02-11 03:26:45 +0000 UTC (now=2021-02-11 04:26:55.741788572 +0000 UTC))
I0211 04:26:55.746888 1 named_certificates.go:53] loaded SNI cert [0/"self-signed loopback"]: "apiserver-loopback-client@1613017607" [serving] validServingFor=[apiserver-loopback-client] issuer="apiserver-loopback-client-ca@1613017607" (2021-02-11 03:26:46 +0000 UTC to 2022-02-11 03:26:46 +0000 UTC (now=2021-02-11 04:26:55.7468713 +0000 UTC))
I0211 04:26:55.757881 1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0211 04:26:55.771581 1 secure_serving.go:197] Serving securely on [::]:10259
I0211 04:26:55.793134 1 reflector.go:207] Starting reflector *v1.StorageClass (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.815641 1 reflector.go:207] Starting reflector *v1.CSINode (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.841309 1 reflector.go:207] Starting reflector *v1beta1.PodDisruptionBudget (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.857460 1 reflector.go:207] Starting reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.875096 1 reflector.go:207] Starting reflector *v1.Node (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.894283 1 reflector.go:207] Starting reflector *v1.Service (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.894615 1 reflector.go:207] Starting reflector *v1.PersistentVolume (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.895000 1 reflector.go:207] Starting reflector *v1.ReplicationController (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.895250 1 reflector.go:207] Starting reflector *v1.ReplicaSet (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.902323 1 reflector.go:207] Starting reflector *v1.StatefulSet (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.902572 1 reflector.go:207] Starting reflector *v1.PersistentVolumeClaim (0s) from k8s.io/client-go/informers/factory.go:134
I0211 04:26:55.905927 1 reflector.go:207] Starting reflector *v1.Pod (0s) from k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:188
I0211 04:26:56.355570 1 node_tree.go:86] Added node "ip-172-20-43-190.ap-south-1.compute.internal" in group "ap-south-1:\x00:ap-south-1a" to NodeTree
I0211 04:26:56.357441 1 node_tree.go:86] Added node "ip-172-20-63-116.ap-south-1.compute.internal" in group "ap-south-1:\x00:ap-south-1a" to NodeTree
I0211 04:26:56.357578 1 node_tree.go:86] Added node "ip-172-20-60-103.ap-south-1.compute.internal" in group "ap-south-1:\x00:ap-south-1a" to NodeTree
I0211 04:26:56.377402 1 leaderelection.go:243] attempting to acquire leader lease kube-system/kube-scheduler...
I0211 04:27:12.368681 1 leaderelection.go:253] successfully acquired lease kube-system/kube-scheduler
I0211 04:27:12.436915 1 scheduler.go:597] "Successfully bound pod to node" pod="default/nginx-deployment-66b6c48dd5-w4hb5" node="ip-172-20-63-116.ap-south-1.compute.internal" evaluatedNodes=3 feasibleNodes=2
I0211 04:27:12.451792 1 scheduler.go:597] "Successfully bound pod to node" pod="default/nginx-deployment-66b6c48dd5-4xz8l" node="ip-172-20-43-190.ap-south-1.compute.internal" evaluatedNodes=3 feasibleNodes=2
E0211 04:32:20.487059 1 leaderelection.go:325] error retrieving resource lock kube-system/kube-scheduler: Get "https://127.0.0.1/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=10s": context deadline exceeded
I0211 04:32:20.633059 1 leaderelection.go:278] failed to renew lease kube-system/kube-scheduler: timed out waiting for the condition
F0211 04:32:20.673521 1 server.go:199] leaderelection lost
goroutine 1 [running]:
k8s.io/kubernetes/vendor/k8s.io/klog/v2.stacks(0xc0005c2d01, 0xc000900800, 0x41, 0x1fd)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:996 +0xb9
....
... stack trace from go runtime
I don't see anything particularly wrong with the command you are running. However, t2.micro are very small, and may be too small for the cluster to function.
You can have a look at the kops-operator logs why it is not starting. Try kubectl logs kops-controller-xxxx
and kubectl describe pod kops-controller-xxx
You know, after @Markus and your comments i started to dig deeper with an information and this is what I found.
First Running Kubernetes on AWS T2 Instances article. Example of using T2.medium with very detailed steps and timeline describing what was going on there.
Conclusion in the end:
We’ve shown that the unpredictable nature of deployments on Kubernetes clusters isn’t a good fit for the T2/3 family of instances. There is the potential to have instances throttled due to pods consuming vast amounts of resources. At best this will limit the performance of your applications and at worst could cause the cluster to fail (if using T2/3s for master nodes) due to ETCD issues. Furthermore, this condition will only be picked up if we are monitoring CloudWatch carefully or performing application performance monitoring on the pods.
To this end it is advisable to avoid using T2/3 instance type families for Kubernetes deployments, if you would like to save money whilst using more traditional instance families (such as Ms and Rs) then take a look at our blog on spot instances.
And next to official info:
1) t2.micro spec: T2.micro is 1 vCPU and 1 gb mem t2.micro specs:
2) Minimal required memory & CPU (cores) for Kubernetes in general:
Master node’s minimal required memory is 2GB and the worker node needs minimum is 1GB
The master node needs at least 1.5 and the worker node need at least 0.7 cores.
Not enough resources. Please use minimum T2.medium for masters