How to troubleshot: Kubernetes pods not creating or terminating

12/19/2020

I am new at K8s so I am having troubles getting to the button of the issue. Last week I installed a cluster with 1 master 2 nodes in centos with kubeadm:

kubectl get nodes

NAME             STATUS   ROLES                  AGE    VERSION
ardl-k8latam01   Ready    control-plane,master   7d2h   v1.20.0
ardl-k8latam02   Ready    <none>                 7d2h   v1.20.0
ardl-k8latam03   Ready    <none>                 7d2h   v1.20.0

At first was working fine, but then started failing after I start working with helm (don't know if its related). Now I cannot run any deployment and have a lot of pods in "terminating" status that never finish. Here I am trying to apply kubectl apply -f https://k8s.io/examples/controllers/nginx-deployment.yaml as an example:

[root@ardl-k8latam01 ~]# kubectl get all --all-namespaces
NAMESPACE     NAME                                          READY   STATUS        RESTARTS   AGE
default       pod/nginx-deployment-66b6c48dd5-2xt7b         1/1     Terminating   0          19h
default       pod/nginx-deployment-66b6c48dd5-5cttk         1/1     Terminating   0          19h
default       pod/nginx-deployment-66b6c48dd5-8bz2f         0/1     Pending       0          18h
default       pod/nginx-deployment-66b6c48dd5-dksqx         1/1     Terminating   0          19h
default       pod/nginx-deployment-66b6c48dd5-fj9kl         0/1     Pending       0          18h
default       pod/nginx-deployment-66b6c48dd5-j4hqv         0/1     Pending       0          18h
kube-system   pod/calico-kube-controllers-bcc6f659f-bgmkb   1/1     Running       0          18h
kube-system   pod/calico-kube-controllers-bcc6f659f-pksws   1/1     Terminating   0          7d21h
kube-system   pod/calico-node-fns6d                         0/1     Running       2          7d21h
kube-system   pod/calico-node-t854c                         1/1     Running       0          7d21h
kube-system   pod/calico-node-vbsdr                         1/1     Running       0          7d21h
kube-system   pod/coredns-74ff55c5b-gw8j2                   1/1     Running       1          18h
kube-system   pod/coredns-74ff55c5b-xhvqb                   1/1     Terminating   0          7d21h
kube-system   pod/coredns-74ff55c5b-xr9mb                   1/1     Terminating   0          7d21h
kube-system   pod/coredns-74ff55c5b-zhhkx                   1/1     Running       1          18h
kube-system   pod/etcd-ardl-k8latam01                       1/1     Running       2          7d21h
kube-system   pod/kube-apiserver-ardl-k8latam01             1/1     Running       4          7d21h
kube-system   pod/kube-controller-manager-ardl-k8latam01    1/1     Running       2          7d21h
kube-system   pod/kube-proxy-2lmpb                          1/1     Running       0          7d21h
kube-system   pod/kube-proxy-fchv8                          1/1     Running       2          7d21h
kube-system   pod/kube-proxy-xks7h                          1/1     Running       0          7d21h
kube-system   pod/kube-scheduler-ardl-k8latam01             1/1     Running       2          7d21h
kube-system   pod/metrics-server-68b849498d-6q74v           1/1     Terminating   0          7d20h
kube-system   pod/metrics-server-68b849498d-7lpz8           0/1     Pending       0          18h

NAMESPACE     NAME                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
default       service/dashboardlb      ClusterIP   10.100.82.105   <none>        8001/TCP                 7d20h
default       service/kubernetes       ClusterIP   10.96.0.1       <none>        443/TCP                  7d21h
kube-system   service/kube-dns         ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP   7d21h
kube-system   service/metrics-server   ClusterIP   10.101.85.63    <none>        443/TCP                  7d20h

NAMESPACE     NAME                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
kube-system   daemonset.apps/calico-node   3         3         0       3            0           beta.kubernetes.io/os=linux   7d21h
kube-system   daemonset.apps/kube-proxy    3         3         1       3            1           kubernetes.io/os=linux        7d21h

NAMESPACE     NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
default       deployment.apps/nginx-deployment          0/3     3            0           18h
kube-system   deployment.apps/calico-kube-controllers   1/1     1            1           7d21h
kube-system   deployment.apps/coredns                   2/2     2            2           7d21h
kube-system   deployment.apps/metrics-server            0/1     1            0           7d20h

NAMESPACE     NAME                                                DESIRED   CURRENT   READY   AGE
default       replicaset.apps/nginx-deployment-66b6c48dd5         3         3         0       18h
kube-system   replicaset.apps/calico-kube-controllers-bcc6f659f   1         1         1       7d21h
kube-system   replicaset.apps/coredns-74ff55c5b                   2         2         2       7d21h
kube-system   replicaset.apps/metrics-server-68b849498d           1         1         0       7d20h

in cluster info dump I get:

==== START logs for container second-node of pod default/second-app-deployment-7f794d896f-q6zn5 ====
Request log error: the server rejected our request for an unknown reason (get pods second-app-deployment-7f794d896f-q6zn5)
==== END logs for container second-node of pod default/second-app-deployment-7f794d896f-q6zn5 ====

with describe:

[root@ardl-k8latam01 testwordpress]# kubectl describe pod nginx-deployment-66b6c48dd5-5cttk
Name:           nginx-deployment-66b6c48dd5-5cttk
Namespace:      default
Priority:       0
Node:           ardl-k8latam02/10.48.41.12
Start Time:     Fri, 18 Dec 2020 17:06:57 -0300
Labels:         app=nginx
                pod-template-hash=66b6c48dd5
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/nginx-deployment-66b6c48dd5
Containers:
  nginx:
    Container ID:
    Image:          nginx:1.14.2
    Image ID:
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9rnk6 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-9rnk6:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-9rnk6
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                   From               Message
  ----     ------                  ----                  ----               -------
  Warning  FailedCreatePodSandBox  22m                   kubelet            Failed to create pod **sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "044a2201b141e6679570d0f0ec3b1967b2a5bf0b230fa5058ed2bc6711eba55e" network for pod "nginx-deployment-66b6c48dd5-5cttk": networkPlugin cni failed to set up pod "nginx-deployment-66b6c48dd5-5cttk_default" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: connect: no route to host, failed to clean up sandbox container "044a2201b141e6679570d0f0ec3b1967b2a5bf0b230fa5058ed2bc6711eba55e" network for pod "nginx-deployment-66b6c48dd5-5cttk": networkPlugin cni failed to teardown pod "nginx-deployment-66b6c48dd5-5cttk_default" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: connect: no route to host]
  Normal   Scheduled               21m                   default-scheduler  Successfully assigne**d default/nginx-deployment-66b6c48dd5-5cttk to ardl-k8latam02
  Normal   SandboxChanged          2m27s (x93 over 22m)  kubelet            Pod sandbox changed, it will be killed and re-created.

I also tried rebooting the nodes and master but nothing changed. When I try to "describe" a "Terminating" pod it tells me that the pode does not exist.

Is my problem related to calico? How can I go deep about Request log error: the server rejected our request for an unknown reason?
How should I continue the investigation?

-- Leandro De Mestico
devops
kubernetes
kubernetes-helm
project-calico

0 Answers