I have a kubernetes pod that should be replacing an older pod, but is stuck in pending with these event outputs on the pod:
Normal TriggeredScaleUp 14m (x56 over 1d) cluster-autoscaler pod triggered scale-up: [{nodes.sand.k8s.local 4->5 (max: 7)}]
Normal NotTriggerScaleUp 3m (x1838 over 1d) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added)
Warning FailedScheduling 1m (x8556 over 1d) default-scheduler 0/5 nodes are available: 2 Insufficient memory, 3 PodToleratesNodeTaints.
The 3 PodToleratesNodeTaints
makes sense, but the 2 Insufficient memory
doesn't, because there's enough memory available on both nodes.
Pod memory request:
Limits:
cpu: 1
memory: 1717986918400m
Requests:
cpu: 100m
memory: 600Mi
Node #1 mem stats:
Capacity:
cpu: 1
memory: 2050944Ki
pods: 110
Allocatable:
cpu: 1
memory: 1948544Ki
pods: 110
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 320m (32%) 2 (200%)
memory 1700Mi (89%) 3960261836800m (198%)
Node #2 stats:
Capacity:
cpu: 1
memory: 2050944Ki
pods: 110
Allocatable:
cpu: 1
memory: 1948544Ki
pods: 110
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 320m (32%) 2 (200%)
memory 1700Mi (89%) 3960261836800m (198%)
I've tried deleting some live pods to see if that might re-trigger rollout properly for the pending pod but it just re-initializes new versions of the same pods.
Recent output of kubectl get events
shows the CA continuing to try and rollout for the past day:
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
1m 1d 8575 foolish-dingo-sand-web-57c44b7b94-zm974.16062dd687c20c37 Pod Warning FailedScheduling default-scheduler 0/5 nodes are available: 2 Insufficient memory, 3 PodToleratesNodeTaints.
2m 1d 1850 foolish-dingo-sand-web-57c44b7b94-zm974.16062e4cb3a94085 Pod Normal NotTriggerScaleUp cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added)
19m 1d 56 foolish-dingo-sand-web-57c44b7b94-zm974.16062e4f3ce0a190 Pod Normal TriggeredScaleUp cluster-autoscaler pod triggered scale-up: [{nodes.sand.k8s.local 4->5 (max: 7)}]
Is there something I'm missing or another way to debug deeper or force rollout somehow?