Kubernetes err: nodes have insufficient memory, but actually do have sufficient memory

4/17/2020

I have a kubernetes pod that should be replacing an older pod, but is stuck in pending with these event outputs on the pod:

  Normal   TriggeredScaleUp   14m (x56 over 1d)   cluster-autoscaler  pod triggered scale-up: [{nodes.sand.k8s.local 4->5 (max: 7)}]
  Normal   NotTriggerScaleUp  3m (x1838 over 1d)  cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added)
  Warning  FailedScheduling   1m (x8556 over 1d)  default-scheduler   0/5 nodes are available: 2 Insufficient memory, 3 PodToleratesNodeTaints.

The 3 PodToleratesNodeTaints makes sense, but the 2 Insufficient memory doesn't, because there's enough memory available on both nodes.

Pod memory request:

    Limits:
      cpu:     1
      memory:  1717986918400m
    Requests:
      cpu:      100m
      memory:   600Mi

Node #1 mem stats:

Capacity:
 cpu:     1
 memory:  2050944Ki
 pods:    110
Allocatable:
 cpu:     1
 memory:  1948544Ki
 pods:    110


 Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource  Requests      Limits
  --------  --------      ------
  cpu       320m (32%)    2 (200%)
  memory    1700Mi (89%)  3960261836800m (198%)

Node #2 stats:

Capacity:
 cpu:     1
 memory:  2050944Ki
 pods:    110
Allocatable:
 cpu:     1
 memory:  1948544Ki
 pods:    110


Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource  Requests      Limits
  --------  --------      ------
  cpu       320m (32%)    2 (200%)
  memory    1700Mi (89%)  3960261836800m (198%)

I've tried deleting some live pods to see if that might re-trigger rollout properly for the pending pod but it just re-initializes new versions of the same pods.

Recent output of kubectl get events shows the CA continuing to try and rollout for the past day:

LAST SEEN   FIRST SEEN   COUNT     NAME                                                           KIND      SUBOBJECT   TYPE      REASON              SOURCE               MESSAGE
1m          1d           8575      foolish-dingo-sand-web-57c44b7b94-zm974.16062dd687c20c37   Pod                   Warning   FailedScheduling    default-scheduler    0/5 nodes are available: 2 Insufficient memory, 3 PodToleratesNodeTaints.
2m          1d           1850      foolish-dingo-sand-web-57c44b7b94-zm974.16062e4cb3a94085   Pod                   Normal    NotTriggerScaleUp   cluster-autoscaler   pod didn't trigger scale-up (it wouldn't fit if a new node is added)
19m         1d           56        foolish-dingo-sand-web-57c44b7b94-zm974.16062e4f3ce0a190   Pod                   Normal    TriggeredScaleUp    cluster-autoscaler   pod triggered scale-up: [{nodes.sand.k8s.local 4->5 (max: 7)}]

Is there something I'm missing or another way to debug deeper or force rollout somehow?

-- alyx
kubernetes

0 Answers