Azure AKS Public IP in Non-standard Resource Group

5/13/2019

I've been trying to manage an Azure Kubernetes Service (AKS) instance via Terraform. When I create the AKS instance via the Azure CLI per this MS tutorial, then install an ingress controller with a static public IP, per this MS tutorial, everything works fine. This method implicitly creates a service principal (SP).

When I create an otherwise exact duplicate of the AKS cluster via Terraform, I am forced to supply the service principal explicitly. I gave this new SP "Contributor" access to the cluster's entire resource group yet, when I get to the step to create the ingress controller (using the same command that tutorial 2 provided, above: helm install stable/nginx-ingress --set controller.replicaCount=2 --set controller.service.loadBalancerIP="XX.XX.XX.XX"), the ingress service comes up but it never acquires its public IP. The IP status remains "<pending>" indefinitely, and I can find nothing in any log about why. Are there logs that should tell me why my IP is still pending?

Again, I am fairly certain that, other than the SP, the Terraform AKS cluster is an exact duplicate of the one created based on the MS tutorial. Running terraform plan finds no differences between the two. Does anyone have any idea what permission my AKS SP might need or what else I might be missing here? Strangely, I can't find ANY permissions assigned to the implicitly created principal via the Azure portal, but I can't think of anything else that might be causing this behavior.

Not sure if it's a red herring or not, but other users have complained about a similar problem in the context of issues opened against the second tutorial. Their fix always appears to be "tear down your cluster and retry", but that isn't an acceptable solution in this context. I need a reproducible working cluster and azurerm_kubernetes_cluster doesn't currently allow for building an AKS instance with an implicitly created SP.

-- Derek
azure
azure-kubernetes
service-principal
terraform
terraform-provider-azure

1 Answer

5/13/2019

I'm going to answer my own question, for posterity. It turns out the problem was the resource group where I created the static public IP. AKS clusters use two resource groups: the group that you explicitly created the cluster in, and a second group which is implicitly created by the cluster. That second, implicit resource group always gets a name starting with "MC_" (the rest of the name is derivative of the explicit RG, the cluster name, and the region).

Anyhow, the default AKS configuration requires that the public IP be created within that implicit resource group. Assuming that you created the AKS cluster with Terraform, its name will be exported in ${azurerm_kubernetes_cluster.NAME.node_resource_group}.

EDIT 2019-05-23

Since writing this, we found a use case that the workaround of using the MC_* resource group wasn't good enough for. I opened a support ticket with MS and they directed me to this solution. Add the following annotation to your LoadBalancer (or Ingress controller), and make sure that the AKS SP has at least Network Contributor rights in the destination resource group (myResourceGroup in the example below):

metadata:
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-resource-group: myResourceGroup

This solved it completely for us.

-- Derek
Source: StackOverflow