This container service is in a failed state

2/11/2019

All of our AKS clusters have the following error reported in Azure Portal:

This container service is in a failed state. Click here to open a new support request.

enter image description here

It seems we also cannot edit the cluster. When trying to scale out the nodes, I am getting the following error:

Failed to save container service 'test-aks'. Error: Operation is not allowed while cluster is being upgrading or failed in upgrade

When looking into the AKS properties, I see there is a provisioning state of "Failed":

enter image description here

We don't know how to troubleshoot this problem.

-- Dave New
azure
azure-aks
azure-container-instances
azure-container-service
azure-kubernetes

2 Answers

2/13/2019

For the issue that you shows:

This container service is in a failed state. Click here to open a new support request.

It also happened to me. Usually, there is some limitation to the user for the use of resources. On my side, I just can use 10 vCpu. So I got the error when I scale up for more nodes if the vCpu have none left. I think it's also a possible reason for you. You can take a check.

-- Charles Xu
Source: StackOverflow

2/13/2019

Use the az aks scale command to scale the cluster nodes using Azure CLI as described here: https://docs.microsoft.com/en-us/azure/aks/scale-cluster#scale-the-cluster-nodes

az aks show --resource-group myResourceGroup --name myAKSCluster --query agentPoolProfiles

This will show you the descriptive error message in Azure CLI. It is likely that you exceeded the limit for the core quota. More details discussed on this thread: https://github.com/Azure/AKS/issues/542

-- Karishma Tiwari - MSFT
Source: StackOverflow