KEDA scaler not working on AKS with trigger authentication using pod identity

10/10/2021

KEDA scaler not scales with scaled object defined with trigger using pod identity for authentication for service bus queue. I'm following this KEDA service bus triggered scaling project.
The scaling works fine with the connection string, but when I try to scale using the pod identity for KEDA scaler the keda operator fails to get the azure identity bound to it with the following keda operator error message log:

github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).isScaledObjectActive
        /workspace/pkg/scaling/scale_handler.go:228
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
        /workspace/pkg/scaling/scale_handler.go:211
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
        /workspace/pkg/scaling/scale_handler.go:145
2021-10-10T17:35:53.916Z        ERROR   azure_servicebus_scaler error   {"error": "failed to refresh token, error: adal: Refresh request failed. Status Code = '400'. Response body: {\"error\":\"invalid_request\",\"error_description\":\"Identity not found\"}\n"}

Edited on 11/09/2021 I opened a github issue at keda, and we did some troubleshoot. But it seems like an issue with AAD Pod Identity as @Tom suggests. The AD Pod Identity MIC pod gives logs like this:

E1109 03:15:34.391759       1 mic.go:1111] failed to update user-assigned identities on node aks-agentpool-14229154-vmss (add [2], del [0], update[0]), error: failed to update identities for aks-agentpool-14229154-vmss in MC_Arun_democluster_westeurope, error: compute.VirtualMachineScaleSetsClient#Update: Failure sending request: StatusCode=0 -- Original Error: Code="LinkedAuthorizationFailed" Message="The client 'fe0d7679-8477-48e3-ae7d-43e2a6fdb957' with object id 'fe0d7679-8477-48e3-ae7d-43e2a6fdb957' has permission to perform action 'Microsoft.Compute/virtualMachineScaleSets/write' on scope '/subscriptions/f3786c6b-8dca-417d-af3f-23929e8b4129/resourceGroups/MC_Arun_democluster_westeurope/providers/Microsoft.Compute/virtualMachineScaleSets/aks-agentpool-14229154-vmss'; however, it does not have permission to perform action 'Microsoft.ManagedIdentity/userAssignedIdentities/assign/action' on the linked scope(s) '/subscriptions/f3786c6b-8dca-417d-af3f-23929e8b4129/resourcegroups/arun/providers/microsoft.managedidentity/userassignedidentities/autoscaler-id' or the linked scope(s) are invalid."

Any clues how to fix it?

My scaler objects' definition is as below:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: trigger-auth-service-bus-orders
spec:
  podIdentity:
    provider: azure
---
apiVersion: keda.sh/v1alpha1 
kind: ScaledObject
metadata:
  name: order-scaler
spec:
  scaleTargetRef:
    name: order-processor
  # minReplicaCount: 0 Change to define how many minimum replicas you want
  maxReplicaCount: 10
  triggers:
  - type: azure-servicebus
    metadata:
      namespace: demodemobus
      queueName: orders
      messageCount: '5'
    authenticationRef:
      name: trigger-auth-service-bus-orders

Im deploying the azure identity to the namespace keda where my keda deployment resides. And installs KEDA with the following command to set the pod identity binding using helm:

helm install keda kedacore/keda --set podIdentity.activeDirectory.identity=app-autoscaler --namespace keda

Expected Behavior The KEDA scaler should have worked fine with the assigned pod identity and access token to perform scaling

Actual Behavior The KEDA operator could not be able to find the azure identity assigned and scaling fails

Scaler Used Azure Service Bus

Steps to Reproduce the Problem 1. Create the azure identity and bindings for the KEDA 2. Install KEDA with the aadpodidentitybinding 3. Create the scaledobject and triggerauthentication using KEDA pod identity 4. The scaler fails to authenticate and scale

-- iarunpaul
autoscaling
azure
azure-aks
keda
kubernetes

2 Answers

1/24/2022

First and foremost, I am using AKS with kubenet plugin.

By default 'AAD Pod Identity is disabled by default on Clusters with Kubenet starting from release v1.7.'

This is because of the Kubenet is vulnerable to ARP Spoofing. Please read it here.

Even then you can have a workaround to enable the KEDA scaling in Kubenet powered AKS.(The script holds good for other CNI's also, except that you dont need to edit anything with the aad-pod-identity component nmi daemonset definition yaml, if it runs well with your cluster plugins.).

Below I'm adding an e2e script for the same. Please visit the github issue for access to all the discussions.

# Define aks name and resource group
$aksResourceGroup = "K8sScalingDemo"
$aksName = "K8sScalingDemo"

# Create resource group
az group create -n $aksResourceGroup -l centralindia

# Create the aks cluster with default kubenet plugin
az aks create -n $aksName -g $aksResourceGroup

# Resourcegroup where the aks resources will be deployed
$resourceGroup = "$(az aks show -g $aksResourceGroup -n $aksName --query nodeResourceGroup -otsv)"

# Set the kubectl context to the newly created aks cluster
az aks get-credentials -n $aksName -g $aksResourceGroup

# Install AAD Pod Identity into the aad-pod-identity namespace using helm
kubectl create namespace aad-pod-identity
helm repo add aad-pod-identity https://raw.githubusercontent.com/Azure/aad-pod-identity/master/charts
helm install aad-pod-identity aad-pod-identity/aad-pod-identity --namespace aad-pod-identity

# Check the status of installation 
kubectl --namespace=aad-pod-identity get pods -l "app.kubernetes.io/component=mic"
kubectl --namespace=aad-pod-identity get pods -l "app.kubernetes.io/component=nmi"

# the nmi components will Crashloop, ignore them for now. We will make them right later

# Get Resourcegroup Id of our $ResourceGroup
$resourceGroup_ResourceId = az group show --name $resourceGroup --query id -otsv

# Get the aks cluster kubeletidentity client id
$aad_pod_identity_clientid = az aks show -g $aksResourceGroup -n $aksName --query identityProfile.kubeletidentity.clientId -otsv

# Assign required roles for cluster over the resourcegroup
az role assignment create --role "Managed Identity Operator" --assignee $aad_pod_identity_clientid  --scope $resourceGroup_ResourceId
az role assignment create --role "Virtual Machine Contributor" --assignee $aad_pod_identity_clientid  --scope $resourceGroup_ResourceId

# Create autoscaler azure identity and get client id and resource id of the autoscaler identity
$autoScaleridentityName = "autoscaler-aad-identity"
az identity create --name $autoScaleridentityName  --resource-group $resourceGroup
$autoscaler_aad_identity_clientId = az identity show --name $autoScaleridentityName  --resource-group $resourceGroup --query clientId -otsv
$autoscaler_aad_identity_resourceId = az identity show --name $autoScaleridentityName  --resource-group $resourceGroup --query id -otsv

# Create the app azure identity and get client id and resource id of the app identity
$appIdentityName = "app-aad-identity"
az identity create --name app-aad-identity --resource-group $resourceGroup
$app_aad_identity_clientId = az identity show --name $appIdentityName --resource-group $resourceGroup --query clientId -otsv
$app_aad_identity_resourceId = az identity show --name $appIdentityName --resource-group $resourceGroup --query id -otsv

# Create service bus and queue
$servicebus = 'svcbusdemo'
az servicebus namespace create --name $servicebus --resource-group $resourceGroup --sku basic
$servicebus_namespace_resourceId = az servicebus namespace show --name $servicebus --resource-group $resourceGroup --query id -otsv

az servicebus queue create --namespace-name $servicebus --name orders --resource-group $resourceGroup
$servicebus_queue_resourceId = az servicebus queue show --namespace-name $servicebus --name orders --resource-group $resourceGroup --query id -otsv

# Assign Service Bus Data Receiver role to the app identity created
az role assignment create --role 'Azure Service Bus Data Receiver' --assignee $app_aad_identity_clientId  --scope $servicebus_queue_resourceId

# Create a namespace for order app deployment
kubectl create namespace keda-dotnet-sample

# Create a yaml deployment configuration variable
$app_with_identity_yaml= @"
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentity
metadata:
  name: $appIdentityName
  annotations:
    aadpodidentity.k8s.io/Behavior: namespaced
spec:
  type: 0 # 0 means User-assigned MSI
  resourceID: $app_aad_identity_resourceId
  clientID: $app_aad_identity_clientId
---
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentityBinding
metadata:
  name: $appIdentityName-binding
spec:
  azureIdentity: $appIdentityName
  selector: order-processor
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-processor
  labels:
    app: order-processor
spec:
  selector:
    matchLabels:
      app: order-processor
  template:
    metadata:
      labels:
        app: order-processor
        aadpodidbinding: order-processor
    spec:
      containers:
      - name: order-processor
        image: ghcr.io/kedacore/sample-dotnet-worker-servicebus-queue:latest
        env:
        - name: KEDA_SERVICEBUS_AUTH_MODE
          value: ManagedIdentity
        - name: KEDA_SERVICEBUS_HOST_NAME
          value: $servicebus.servicebus.windows.net
        - name: KEDA_SERVICEBUS_QUEUE_NAME
          value: orders
        - name: KEDA_SERVICEBUS_IDENTITY_USERASSIGNEDID
          value: $app_aad_identity_clientId
"@

# Create the app deployment with identity bindings using kubectl apply
$app_with_identity_yaml | kubectl apply --namespace keda-dotnet-sample -f -

# Now the order processor app works with the pod identity and 
# processes the queues 
# You can refer the [project ](https://github.com/kedacore/sample-dotnet-worker-servicebus-queue/blob/main/pod-identity.md) for that.

# Now start installation of KEDA in namespace keda-system

kubectl create namespace keda-system

# Create a pod identity and binding for autoscaler azure identity
$autoscaler_yaml =@"
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentity
metadata:
  name: $autoScaleridentityName
spec:
  type: 0 # 0 means User-assigned MSI
  resourceID: $autoscaler_aad_identity_resourceId
  clientID: $autoscaler_aad_identity_clientId
---
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentityBinding
metadata:
  name: $autoScaleridentityName-binding
spec:
  azureIdentity: $autoScaleridentityName
  selector: $autoScaleridentityName
"@
$autoscaler_yaml | kubectl apply --namespace keda-system -f -

# Install KEDA using helm
helm install keda kedacore/keda --set podIdentity.activeDirectory.identity=autoscaler-aad-identity --namespace keda-system

# Assign Service Bus Data Owner role to keda autoscaler identity
az role assignment create --role 'Azure Service Bus Data Owner' --assignee $autoscaler_aad_identity_clientId --scope $servicebus_namespace_resourceId

# Apply scaled object definition and trigger authentication provider as `azure`
$aap_autoscaling_yaml = @"
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: trigger-auth-service-bus-orders
spec:
  podIdentity:
    provider: azure
---
apiVersion: keda.sh/v1alpha1 
kind: ScaledObject
metadata:
  name: order-scaler
spec:
  scaleTargetRef:
    name: order-processor
  # minReplicaCount: 0 Change to define how many minimum replicas you want
  maxReplicaCount: 10
  triggers:
  - type: azure-servicebus
    metadata:
      namespace: $servicebus
      queueName: orders
      messageCount: '5'
    authenticationRef:
      name: trigger-auth-service-bus-orders
"@

$aap_autoscaling_yaml | kubectl apply --namespace keda-dotnet-sample -f -

# Now the Keda is getting 401 unauthorized error as the AAD Pod Identity comnponent `nmi` is not runnig on the system
# To fix it edit the daemonset for `nmi` component
# add the container arg `--allow-network-plugin-kubenet=true` by editing the `daemonset.apps/aad-pod-identity-nmi`
kubectl edit daemonset.apps/aad-pod-identity-nmi -n aad-pod-identity

# the containe arg section should look like this after editing:
    spec:
      containers:
      - args:
        - --node=$(NODE_NAME)
        - --http-probe-port=8085
        - --enableScaleFeatures=true
        - --metadata-header-required=true
        - --operation-mode=standard
        - --kubelet-config=/etc/default/kubelet
        - --allow-network-plugin-kubenet=true
        env:

# Now the KEDA is authenticated by aad-pod-identity metadata endpoint and the orderapp should scale up 
# with the queue counts
# If the order app still falls back to errors please delete and redeploy it.
# And that's it you just scaled your app up using KEDA on Kubenet AKS cluster.
Note: Read this instruction before you run AAD Identity On a Kubenet powered AKS.
-- iarunpaul
Source: StackOverflow

10/13/2021

Unfortunately this looks like an issue with the identity itself and with AD Pod identities, they can be a bit flaky (based on my experiences)

-- Tom Kerkhove
Source: StackOverflow