Azure aks no nodes found

2/5/2018

I created an azure AKS with 3 nodes(Standard DS3 v2 (4 vcpus, 14 GB memory)). I was fiddling with the cluster and created a Deployment with 1000 replicas.After this complete cluster went down.

azureuser@saa:~$ k get cs
NAME                 STATUS      MESSAGE                                                                                        ERROR
controller-manager   Unhealthy   Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: getsockopt: connection refused   
scheduler            Unhealthy   Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: getsockopt: connection refused   
etcd-0               Healthy     {"health": "true"}  

From debugging it seems both Scheduler and Controller-manager went down. How to Fix this?

What exactly happened when created a Deployment with 1000 replicas? Should it be taken care by k8s?

Few debugging commands output:

  kubectl cluster-info
    Kubernetes master is running at https://cg-games-e5252212.hcp.eastus.azmk8s.io:443
    Heapster is running at https://cg-games-e5252212.hcp.eastus.azmk8s.io:443/api/v1/namespaces/kube-system/services/heapster/proxy
    KubeDNS is running at https://cg-games-e5252212.hcp.eastus.azmk8s.io:443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
    kubernetes-dashboard is running at https://cg-games-e5252212.hcp.eastus.azmk8s.io:443/api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy

Logs for kubectl cluster-info dump @ http://termbin.com/e6wb

azureuser@sim:~$ az aks scale -n cg -g cognitive-games -c 4 --verbose
Deployment failed. Correlation ID: 4df797b2-28bf-4c18-a26a-4e341xxxxx. Operation failed with status: 200. Details: Resource state Failed

no nodes displayed

azureuser@si:~$ k get nodes
No resources found
-- StateLess
azure
azure-kubernetes
kubernetes

1 Answer

2/13/2018

Looks silly but when AKS is created in an RG, surprisingly two RGs are created one with the AKS and another one with some random hash having all the VMS. I've deleted the 2nd RG and the basic AKS stopped working.

-- StateLess
Source: StackOverflow