I am attempting to set up a cluster in Azure using acs-engine to build the Kubernetes cluster utilizing VMSS for the agent pools. After the cluster is up I add the cluster-autoscaler to manage 2 dedicated agent pools, 1 cpu and 1 gpu. Scale-down and scale-up work as long as the scale set still has running VMs in them. Both scale sets are set to scale down to 0. With ACS I have set these 2 scale sets up with taints and custom labels. Once the scale set has scaled down to 0, I am unable to get the autoscaler to spin back up a node when a new pod is scheduled. I am not sure what I'm doing wrong or if I am missing some config, label, taint, etc. I just started using kubernetes recently.
Below is my acs-engine json, pod definition and the logs from the autoscaler and pod describe.
Output from kubectl logs -n kube-system cluster-autoscaler-5967b96496-jnvjr
I0920 16:11:14.925761 1 scale_up.go:249] Pod default/my-test-pod is unschedulable
I0920 16:11:14.999323 1 utils.go:196] Pod my-test-pod can't be scheduled on k8s-pool2-24760778-vmss, predicate failed: GeneralPredicates predicate mismatch, cannot put default/my-test-pod on template-node-for-k8s-pool2-24760778-vmss-6220731686255962863, reason: node(s) didn't match node selector
I0920 16:11:14.999408 1 utils.go:196] Pod my-test-pod can't be scheduled on k8s-pool3-24760778-vmss, predicate failed: GeneralPredicates predicate mismatch, cannot put default/my-test-pod on template-node-for-k8s-pool3-24760778-vmss-3043543739698957784, reason: node(s) didn't match node selector
I0920 16:11:14.999442 1 scale_up.go:376] No expansion options
Output from kubectl describe pod my-test-pod
Name: my-test-pod
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: <none>
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"my-test-pod","namespace":"default"},"spec":{"affinity":{"nodeAffinity":{"preferred...
Status: Pending
IP:
Containers:
my-test-pod:
Image: ubuntu:latest
Port: <none>
Host Port: <none>
Command:
/bin/bash
-ec
while :; do echo '.'; sleep 5; done
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qzm6s (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
default-token-qzm6s:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-qzm6s
Optional: false
QoS Class: BestEffort
Node-Selectors: agentpool=pool2
environment=DEV
hardware=cpu-spec
node-template=k8s-pool2-24760778-vmss
vmSize=Standard_D4s_v3
Tolerations: dedicated=pool2:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m (x273 over 17m) default-scheduler 0/3 nodes are available: 3 node(s) didn't match node selector.
Normal NotTriggerScaleUp 2m (x89 over 17m) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added)
acs-engine config file (using terraform to render and generate)
{
"apiVersion": "vlabs",
"properties": {
"orchestratorProfile": {
"orchestratorType": "Kubernetes",
"orchestratorRelease": "1.11",
"kubernetesConfig": {
"networkPlugin": "azure",
"clusterSubnet": "${cidr}",
"privateCluster": {
"enabled": true
},
"addons": [
{
"name": "nvidia-device-plugin",
"enabled": true
},
{
"name": "cluster-autoscaler",
"enabled": true,
"config": {
"minNodes": "0",
"maxNodes": "2",
"image": "gcr.io/google-containers/cluster-autoscaler:1.3.1"
}
}
]
}
},
"masterProfile": {
"count": ${master_vm_count},
"dnsPrefix": "${dns_prefix}",
"vmSize": "${master_vm_size}",
"storageProfile": "ManagedDisks",
"vnetSubnetId": "${pool_subnet_id}",
"firstConsecutiveStaticIP": "${first_master_ip}",
"vnetCidr": "${cidr}"
},
"agentPoolProfiles": [
{
"name": "pool3",
"count": ${dedicated_vm_count},
"vmSize": "${dedicated_vm_size}",
"storageProfile": "ManagedDisks",
"OSDiskSizeGB": 31,
"vnetSubnetId": "${pool_subnet_id}",
"customNodeLabels": {
"vmSize":"${dedicated_vm_size}",
"dedicatedOnly": "true",
"environment":"${environment}",
"hardware": "${dedicated_spec}"
},
"availabilityProfile": "VirtualMachineScaleSets",
"scaleSetEvictionPolicy": "Delete",
"kubernetesConfig": {
"kubeletConfig": {
"--register-with-taints": "dedicated=pool3:NoSchedule"
}
}
},
{
"name": "pool2",
"count": ${pool2_vm_count},
"vmSize": "${pool2_vm_size}",
"storageProfile": "ManagedDisks",
"OSDiskSizeGB": 31,
"vnetSubnetId": "${pool_subnet_id}",
"availabilityProfile": "VirtualMachineScaleSets",
"customNodeLabels": {
"vmSize":"${pool2_vm_size}",
"environment":"${environment}",
"hardware": "${pool_spec}"
},
"kubernetesConfig": {
"kubeletConfig": {
"--register-with-taints": "dedicated=pool2:NoSchedule"
}
}
},
{
"name": "pool1",
"count": ${pool1_vm_count},
"vmSize": "${pool1_vm_size}",
"storageProfile": "ManagedDisks",
"OSDiskSizeGB": 31,
"vnetSubnetId": "${pool_subnet_id}",
"availabilityProfile": "VirtualMachineScaleSets",
"customNodeLabels": {
"vmSize":"${pool1_vm_size}",
"environment":"${environment}",
"hardware": "${pool_spec}"
}
}
],
"linuxProfile": {
"adminUsername": "${admin_user}",
"ssh": {
"publicKeys": [
{
"keyData": "${ssh_key}"
}
]
}
},
"servicePrincipalProfile": {
"clientId": "${service_principal_client_id}",
"secret": "${service_principal_client_secret}"
}
}
}
Pod config file
apiVersion: v1
kind: Pod
metadata:
name: my-test-pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: vmSize
operator: In
values:
- Standard_D4s_v3
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: hardware
operator: In
values:
- cpu-spec
nodeSelector:
agentpool: pool2
hardware: cpu-spec
vmSize: Standard_D4s_v3
environment: DEV
node-template: k8s-pool2-24760778-vmss
tolerations:
- key: dedicated
operator: Equal
value: pool2
effect: NoSchedule
containers:
- name: my-test-pod
image: ubuntu:latest
command: ["/bin/bash", "-ec", "while :; do echo '.'; sleep 5; done"]
restartPolicy: Never
I've tried with variations in the nodeAffinity/nodeSelector/Tolerations adding and removing them, all with the same outcome.
After the cluster is up, I do add pool2 to the autoscaler. In searching the Internet for the solution, I keep running across posts about a node-template label, I think in the form or k8s.io/autoscaler/cluster-autoscaler/node-template/label/value, but that seems to be needed for AWS.
Can anyone provide me any direction with this on Azure?
Thank you.
Update.
I have figured out the answer to this. By removing the requiredDuringSchedulingIgnoreDuringExecution node affinity rule and just using the preferredDuringSchedulingIgnoreDuringExecution, the scheduler properly spins up a new VM in the scale set.
apiVersion: v1
kind: Pod
metadata:
name: my-test-pod
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: hardware
operator: In
values:
- cpu-spec
nodeSelector:
agentpool: pool2
hardware: cpu-spec
vmSize: Standard_D4s_v3
environment: DEV
node-template: k8s-pool2-24760778-vmss
tolerations:
- key: dedicated
operator: Equal
value: pool2
effect: NoSchedule
containers:
- name: my-test-pod
image: ubuntu:latest
command: ["/bin/bash", "-ec", "while :; do echo '.'; sleep 5; done"]
restartPolicy: Never