Why does my deployments go into a pending state when one node from the cluster runs out of request-able memory?

12/27/2019

I'm new to kubernetes and I'm playing around with K3S. I'm testing out the max power of my servers by turning up the replicas on a deployment. At 150, my weakest server hit a memory request limit ceiling where no more memory could be allocated to the deploying pods, thus putting them in a "Pending" state. I see the node is showing "Ready,SchedulingDisabled" for the status which all makes sense.

What I don't understand is why isn't the other server picking up these pending deployments and deploying them there?

# kubectl get nodes -A
NAME                 STATUS                     ROLES    AGE     VERSION
titan.zircon.local   Ready                      master   3d23h   v1.16.3-k3s.2
nexus.zircon.local   Ready,SchedulingDisabled   worker   2d21h   v1.16.3-k3s.2

# kubectl top nodes
NAME                 CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
nexus.zircon.local   170m         2%     3008Mi          25%       
titan.zircon.local   316m         0%     2313Mi          1%        

default       fedora-7c586469b4-ldq8l                   0/1     Pending             0          0s      <none>        <none>               <none>           <none>
default       fedora-7c586469b4-sbzph                   0/1     Pending             0          0s      <none>        <none>               <none>           <none>
default       fedora-7c586469b4-74mmh                   0/1     Pending             0          0s      <none>        <none>               <none>           <none>
default       fedora-7c586469b4-ldq8l                   0/1     Pending             0          0s      <none>        <none>               <none>           <none>
default       fedora-7c586469b4-k4669                   0/1     Pending             0          0s      <none>        <none>               <none>           <none>
default       fedora-7c586469b4-6gg2c                   0/1     Pending             0          0s      <none>        <none>               <none>           <none>
default       fedora-7c586469b4-sbzph                   0/1     Pending             0          0s      <none>        <none>               <none>           <none>
default       fedora-7c586469b4-74mmh                   0/1     Pending             0          0s      <none>        <none>               <none>           <none>
default       fedora-7c586469b4-k4669                   0/1     Pending             0          0s      <none>        <none>               <none>           <none>
default       fedora-7c586469b4-6gg2c                   0/1     Pending             0          0s      <none>        <none>               <none>           <none>
default       fedora-7c586469b4-94tx6                   0/1     Pending             0          7m43s   <none>        <none>               <none>           <none>
default       fedora-7c586469b4-k8bdl                   0/1     Pending             0          7m43s   <none>        <none>               <none>           <none>
default       fedora-7c586469b4-8d64k                   0/1     Pending             0          7m43s   <none>        <none>               <none>           <none>
default       fedora-7c586469b4-ldq8l                   0/1     Pending             0          2m59s   <none>        <none>               <none>           <none>
default       fedora-7c586469b4-sbzph                   0/1     Pending             0          2m59s   <none>        <none>               <none>           <none>
default       fedora-7c586469b4-74mmh                   0/1     Pending             0          2m59s   <none>        <none>               <none>           <none>
default       fedora-7c586469b4-k4669                   0/1     Pending             0          2m59s   <none>        <none>               <none>           <none>
default       fedora-7c586469b4-6gg2c                   0/1     Pending             0          2m59s   <none>        <none>               <none>           <none>

Here is the deployment config:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app":"fedora"},"name":"fedora","namespace":"default"},"spec":{"replicas":1,"selector":{"matchLabels
  creationTimestamp: "2019-12-23T21:38:15Z"
  generation: 7
  labels:
    app: fedora
  name: fedora
  namespace: default
  resourceVersion: "281186"
  selfLink: /apis/apps/v1/namespaces/default/deployments/fedora
  uid: 729b502f-6b4e-4a44-b870-e0d6e6b9929e
spec:
  progressDeadlineSeconds: 600                                                                                                                                                                                                                                                                                             
  replicas: 155                                                                                                                                                                                                                                                                                                            
  revisionHistoryLimit: 10                                                                                                                                                                                                                                                                                                 
  selector:                                                                                                                                                                                                                                                                                                                
    matchLabels:                                                                                                                                                                                                                                                                                                           
      app: fedora                                                                                                                                                                                                                                                                                                          
  strategy:                                                                                                                                                                                                                                                                                                                
    rollingUpdate:                                                                                                                                                                                                                                                                                                         
      maxSurge: 25%                                                                                                                                                                                                                                                                                                        
      maxUnavailable: 25%                                                                                                                                                                                                                                                                                                  
    type: RollingUpdate                                                                                                                                                                                                                                                                                                    
  template:                                                                                                                                                                                                                                                                                                                
    metadata:                                                                                                                                                                                                                                                                                                              
      creationTimestamp: null                                                                                                                                                                                                                                                                                              
      labels:                                                                                                                                                                                                                                                                                                              
        app: fedora                                                                                                                                                                                                                                                                                                        
    spec:                                                                                                                                                                                                                                                                                                                  
      containers:                                                                                                                                                                                                                                                                                                          
      - args:                                                                                                                                                                                                                                                                                                              
        - -c                                                                                                                                                                                                                                                                                                               
        - while true; do echo hello; sleep 10;done                                                                                                                                                                                                                                                                         
        command:                                                                                                                                                                                                                                                                                                           
        - /bin/sh                                                                                                                                                                                                                                                                                                          
        image: docker.io/fedora:latest                                                                                                                                                                                                                                                                                  
        imagePullPolicy: Always                                                                                                                                                                                                                                                                                            
        name: fedora                                                                                                                                                                                                                                                                                                       
        resources:                                                                                                                                                                                                                                                                                                         
          limits:                                                                                                                                                                                                                                                                                                          
            cpu: 80m                                                                                                                                                                                                                                                                                                       
            memory: 256Mi                                                                                                                                                                                                                                                                                                  
        terminationMessagePath: /dev/termination-log                                                                                                                                                                                                                                                                       
        terminationMessagePolicy: File                                                                                                                                                                                                                                                                                     
      dnsPolicy: ClusterFirst                                                                                                                                                                                                                                                                                              
      restartPolicy: Always                                                                                                                                                                                                                                                                                                
      schedulerName: default-scheduler                                                                                                                                                                                                                                                                                     
      securityContext: {}                                 
      terminationGracePeriodSeconds: 30                                                                                                                                                                                                                                                                                    
status:                                                                                                                                                                                                                                                                                                                    
  availableReplicas: 147                                                                                                                                                                                                                                                                                                   
  conditions:                                                                                                                                                                                                                                                                                                              
  - lastTransitionTime: "2019-12-23T21:38:15Z"                                                                                                                                                                                                                                                                             
    lastUpdateTime: "2019-12-23T21:38:46Z"                                                                                                                                                                                                                                                                                 
    message: ReplicaSet "fedora-7c586469b4" has successfully progressed.                                                                                                                                                                                                                                                   
    reason: NewReplicaSetAvailable                                                                                                                                                                                                                                                                                         
    status: "True"                                                                                                                                                                                                                                                                                                         
    type: Progressing                                                                                                                                                                                                                                                                                                      
  - lastTransitionTime: "2019-12-27T19:48:41Z"                                                                                                                                                                                                                                                                             
    lastUpdateTime: "2019-12-27T19:48:41Z"                                                                                                                                                                                                                                                                                 
    message: Deployment has minimum availability.                                                                                                                                                                                                                                                                          
    reason: MinimumReplicasAvailable                                                                                                                                                                                                                                                                                       
    status: "True"                                                                                                                                                                                                                                                                                                         
    type: Available                                                                                                                                                                                                                                                                                                        
  observedGeneration: 7                                                                                                                                                                                                                                                                                                    
  readyReplicas: 147                                                                                                                                                                                                                                                                                                       
  replicas: 155                                                                                                                                                                                                                                                                                                            
  unavailableReplicas: 8                                                                                                                                                                                                                                                                                                   
  updatedReplicas: 155                                                                                                                                                                                         
-- SkyVar
k3s
kubectl
kubernetes

1 Answer

12/27/2019

I believe I found the answer to my question. Apparently, the nodes have a ceiling limit on them for 110 pods which my master node (the one with space left) has already hit, rendering it full.

# kubectl describe node titan.zircon.local
Name:               titan.zircon.local
Roles:              master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=k3s
                    beta.kubernetes.io/os=linux
                    k3s.io/hostname=titan.zircon.local
                    k3s.io/internal-ip=192.168.1.96
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=titan.zircon.local
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/master=true
Annotations:        flannel.alpha.coreos.com/backend-data: {"VtepMAC":"d2:3d:24:54:f1:cb"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.1.96
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 23 Dec 2019 15:54:12 -0500
Taints:             <none>
Unschedulable:      false
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Fri, 27 Dec 2019 15:31:40 -0500   Fri, 27 Dec 2019 15:31:40 -0500   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Fri, 27 Dec 2019 15:55:49 -0500   Mon, 23 Dec 2019 15:54:12 -0500   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Fri, 27 Dec 2019 15:55:49 -0500   Mon, 23 Dec 2019 15:54:12 -0500   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Fri, 27 Dec 2019 15:55:49 -0500   Mon, 23 Dec 2019 15:54:12 -0500   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Fri, 27 Dec 2019 15:55:49 -0500   Fri, 27 Dec 2019 15:31:39 -0500   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.1.96
  Hostname:    titan.zircon.local
Capacity:
 cpu:                32
 ephemeral-storage:  7623778Mi
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             131837268Ki
 pods:               110 <============HERE
Allocatable:
 cpu:                32
 ephemeral-storage:  7594405102166
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             131837268Ki
 pods:               110 <============HERE
System Info:
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  containerd://1.3.0-k3s.5
 Kubelet Version:            v1.16.3-k3s.2
 Kube-Proxy Version:         v1.16.3-k3s.2
PodCIDR:                     10.42.0.0/24
PodCIDRs:                    10.42.0.0/24
ProviderID:                  k3s://titan.zircon.local
Non-terminated Pods:         (110 in total)
  Namespace                  Name                                       CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                       ------------  ----------  ---------------  -------------  ---
  kube-system                metrics-server-6d684c7b5-5kpbr             0 (0%)        0 (0%)      0 (0%)           0 (0%)         4d
  kube-system                local-path-provisioner-58fb86bdfd-7b2ss    0 (0%)        0 (0%)      0 (0%)           0 (0%)         4d
-- SkyVar
Source: StackOverflow