KEDA scaler not scales with scaled object defined with trigger using pod identity for authentication for service bus queue.
I'm following this KEDA service bus triggered scaling project.
The scaling works fine with the connection string, but when I try to scale using the pod identity for KEDA scaler the keda operator fails to get the azure identity bound to it with the following keda operator error message log:
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).isScaledObjectActive
/workspace/pkg/scaling/scale_handler.go:228
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
/workspace/pkg/scaling/scale_handler.go:211
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
/workspace/pkg/scaling/scale_handler.go:145
2021-10-10T17:35:53.916Z ERROR azure_servicebus_scaler error {"error": "failed to refresh token, error: adal: Refresh request failed. Status Code = '400'. Response body: {\"error\":\"invalid_request\",\"error_description\":\"Identity not found\"}\n"}
Edited on 11/09/2021 I opened a github issue at keda, and we did some troubleshoot. But it seems like an issue with AAD Pod Identity as @Tom suggests. The AD Pod Identity MIC pod gives logs like this:
E1109 03:15:34.391759 1 mic.go:1111] failed to update user-assigned identities on node aks-agentpool-14229154-vmss (add [2], del [0], update[0]), error: failed to update identities for aks-agentpool-14229154-vmss in MC_Arun_democluster_westeurope, error: compute.VirtualMachineScaleSetsClient#Update: Failure sending request: StatusCode=0 -- Original Error: Code="LinkedAuthorizationFailed" Message="The client 'fe0d7679-8477-48e3-ae7d-43e2a6fdb957' with object id 'fe0d7679-8477-48e3-ae7d-43e2a6fdb957' has permission to perform action 'Microsoft.Compute/virtualMachineScaleSets/write' on scope '/subscriptions/f3786c6b-8dca-417d-af3f-23929e8b4129/resourceGroups/MC_Arun_democluster_westeurope/providers/Microsoft.Compute/virtualMachineScaleSets/aks-agentpool-14229154-vmss'; however, it does not have permission to perform action 'Microsoft.ManagedIdentity/userAssignedIdentities/assign/action' on the linked scope(s) '/subscriptions/f3786c6b-8dca-417d-af3f-23929e8b4129/resourcegroups/arun/providers/microsoft.managedidentity/userassignedidentities/autoscaler-id' or the linked scope(s) are invalid."
Any clues how to fix it?
My scaler objects' definition is as below:
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: trigger-auth-service-bus-orders
spec:
podIdentity:
provider: azure
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: order-scaler
spec:
scaleTargetRef:
name: order-processor
# minReplicaCount: 0 Change to define how many minimum replicas you want
maxReplicaCount: 10
triggers:
- type: azure-servicebus
metadata:
namespace: demodemobus
queueName: orders
messageCount: '5'
authenticationRef:
name: trigger-auth-service-bus-orders
Im deploying the azure identity to the namespace keda
where my keda deployment resides.
And installs KEDA with the following command to set the pod identity binding
using helm:
helm install keda kedacore/keda --set podIdentity.activeDirectory.identity=app-autoscaler --namespace keda
Expected Behavior The KEDA scaler should have worked fine with the assigned pod identity and access token to perform scaling
Actual Behavior The KEDA operator could not be able to find the azure identity assigned and scaling fails
Scaler Used Azure Service Bus
Steps to Reproduce the Problem 1. Create the azure identity and bindings for the KEDA 2. Install KEDA with the aadpodidentitybinding 3. Create the scaledobject and triggerauthentication using KEDA pod identity 4. The scaler fails to authenticate and scale
First and foremost, I am using AKS with kubenet plugin.
By default 'AAD Pod Identity is disabled by default on Clusters with Kubenet starting from release v1.7.'
This is because of the Kubenet is vulnerable to ARP Spoofing. Please read it here.
Even then you can have a workaround to enable the KEDA scaling in Kubenet powered AKS.(The script holds good for other CNI's also, except that you dont need to edit anything with the aad-pod-identity
component nmi daemonset
definition yaml, if it runs well with your cluster plugins.).
Below I'm adding an e2e script for the same. Please visit the github issue for access to all the discussions.
# Define aks name and resource group
$aksResourceGroup = "K8sScalingDemo"
$aksName = "K8sScalingDemo"
# Create resource group
az group create -n $aksResourceGroup -l centralindia
# Create the aks cluster with default kubenet plugin
az aks create -n $aksName -g $aksResourceGroup
# Resourcegroup where the aks resources will be deployed
$resourceGroup = "$(az aks show -g $aksResourceGroup -n $aksName --query nodeResourceGroup -otsv)"
# Set the kubectl context to the newly created aks cluster
az aks get-credentials -n $aksName -g $aksResourceGroup
# Install AAD Pod Identity into the aad-pod-identity namespace using helm
kubectl create namespace aad-pod-identity
helm repo add aad-pod-identity https://raw.githubusercontent.com/Azure/aad-pod-identity/master/charts
helm install aad-pod-identity aad-pod-identity/aad-pod-identity --namespace aad-pod-identity
# Check the status of installation
kubectl --namespace=aad-pod-identity get pods -l "app.kubernetes.io/component=mic"
kubectl --namespace=aad-pod-identity get pods -l "app.kubernetes.io/component=nmi"
# the nmi components will Crashloop, ignore them for now. We will make them right later
# Get Resourcegroup Id of our $ResourceGroup
$resourceGroup_ResourceId = az group show --name $resourceGroup --query id -otsv
# Get the aks cluster kubeletidentity client id
$aad_pod_identity_clientid = az aks show -g $aksResourceGroup -n $aksName --query identityProfile.kubeletidentity.clientId -otsv
# Assign required roles for cluster over the resourcegroup
az role assignment create --role "Managed Identity Operator" --assignee $aad_pod_identity_clientid --scope $resourceGroup_ResourceId
az role assignment create --role "Virtual Machine Contributor" --assignee $aad_pod_identity_clientid --scope $resourceGroup_ResourceId
# Create autoscaler azure identity and get client id and resource id of the autoscaler identity
$autoScaleridentityName = "autoscaler-aad-identity"
az identity create --name $autoScaleridentityName --resource-group $resourceGroup
$autoscaler_aad_identity_clientId = az identity show --name $autoScaleridentityName --resource-group $resourceGroup --query clientId -otsv
$autoscaler_aad_identity_resourceId = az identity show --name $autoScaleridentityName --resource-group $resourceGroup --query id -otsv
# Create the app azure identity and get client id and resource id of the app identity
$appIdentityName = "app-aad-identity"
az identity create --name app-aad-identity --resource-group $resourceGroup
$app_aad_identity_clientId = az identity show --name $appIdentityName --resource-group $resourceGroup --query clientId -otsv
$app_aad_identity_resourceId = az identity show --name $appIdentityName --resource-group $resourceGroup --query id -otsv
# Create service bus and queue
$servicebus = 'svcbusdemo'
az servicebus namespace create --name $servicebus --resource-group $resourceGroup --sku basic
$servicebus_namespace_resourceId = az servicebus namespace show --name $servicebus --resource-group $resourceGroup --query id -otsv
az servicebus queue create --namespace-name $servicebus --name orders --resource-group $resourceGroup
$servicebus_queue_resourceId = az servicebus queue show --namespace-name $servicebus --name orders --resource-group $resourceGroup --query id -otsv
# Assign Service Bus Data Receiver role to the app identity created
az role assignment create --role 'Azure Service Bus Data Receiver' --assignee $app_aad_identity_clientId --scope $servicebus_queue_resourceId
# Create a namespace for order app deployment
kubectl create namespace keda-dotnet-sample
# Create a yaml deployment configuration variable
$app_with_identity_yaml= @"
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentity
metadata:
name: $appIdentityName
annotations:
aadpodidentity.k8s.io/Behavior: namespaced
spec:
type: 0 # 0 means User-assigned MSI
resourceID: $app_aad_identity_resourceId
clientID: $app_aad_identity_clientId
---
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentityBinding
metadata:
name: $appIdentityName-binding
spec:
azureIdentity: $appIdentityName
selector: order-processor
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-processor
labels:
app: order-processor
spec:
selector:
matchLabels:
app: order-processor
template:
metadata:
labels:
app: order-processor
aadpodidbinding: order-processor
spec:
containers:
- name: order-processor
image: ghcr.io/kedacore/sample-dotnet-worker-servicebus-queue:latest
env:
- name: KEDA_SERVICEBUS_AUTH_MODE
value: ManagedIdentity
- name: KEDA_SERVICEBUS_HOST_NAME
value: $servicebus.servicebus.windows.net
- name: KEDA_SERVICEBUS_QUEUE_NAME
value: orders
- name: KEDA_SERVICEBUS_IDENTITY_USERASSIGNEDID
value: $app_aad_identity_clientId
"@
# Create the app deployment with identity bindings using kubectl apply
$app_with_identity_yaml | kubectl apply --namespace keda-dotnet-sample -f -
# Now the order processor app works with the pod identity and
# processes the queues
# You can refer the [project ](https://github.com/kedacore/sample-dotnet-worker-servicebus-queue/blob/main/pod-identity.md) for that.
# Now start installation of KEDA in namespace keda-system
kubectl create namespace keda-system
# Create a pod identity and binding for autoscaler azure identity
$autoscaler_yaml =@"
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentity
metadata:
name: $autoScaleridentityName
spec:
type: 0 # 0 means User-assigned MSI
resourceID: $autoscaler_aad_identity_resourceId
clientID: $autoscaler_aad_identity_clientId
---
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentityBinding
metadata:
name: $autoScaleridentityName-binding
spec:
azureIdentity: $autoScaleridentityName
selector: $autoScaleridentityName
"@
$autoscaler_yaml | kubectl apply --namespace keda-system -f -
# Install KEDA using helm
helm install keda kedacore/keda --set podIdentity.activeDirectory.identity=autoscaler-aad-identity --namespace keda-system
# Assign Service Bus Data Owner role to keda autoscaler identity
az role assignment create --role 'Azure Service Bus Data Owner' --assignee $autoscaler_aad_identity_clientId --scope $servicebus_namespace_resourceId
# Apply scaled object definition and trigger authentication provider as `azure`
$aap_autoscaling_yaml = @"
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: trigger-auth-service-bus-orders
spec:
podIdentity:
provider: azure
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: order-scaler
spec:
scaleTargetRef:
name: order-processor
# minReplicaCount: 0 Change to define how many minimum replicas you want
maxReplicaCount: 10
triggers:
- type: azure-servicebus
metadata:
namespace: $servicebus
queueName: orders
messageCount: '5'
authenticationRef:
name: trigger-auth-service-bus-orders
"@
$aap_autoscaling_yaml | kubectl apply --namespace keda-dotnet-sample -f -
# Now the Keda is getting 401 unauthorized error as the AAD Pod Identity comnponent `nmi` is not runnig on the system
# To fix it edit the daemonset for `nmi` component
# add the container arg `--allow-network-plugin-kubenet=true` by editing the `daemonset.apps/aad-pod-identity-nmi`
kubectl edit daemonset.apps/aad-pod-identity-nmi -n aad-pod-identity
# the containe arg section should look like this after editing:
spec:
containers:
- args:
- --node=$(NODE_NAME)
- --http-probe-port=8085
- --enableScaleFeatures=true
- --metadata-header-required=true
- --operation-mode=standard
- --kubelet-config=/etc/default/kubelet
- --allow-network-plugin-kubenet=true
env:
# Now the KEDA is authenticated by aad-pod-identity metadata endpoint and the orderapp should scale up
# with the queue counts
# If the order app still falls back to errors please delete and redeploy it.
# And that's it you just scaled your app up using KEDA on Kubenet AKS cluster.
Unfortunately this looks like an issue with the identity itself and with AD Pod identities, they can be a bit flaky (based on my experiences)