I am new at K8s so I am having troubles getting to the button of the issue. Last week I installed a cluster with 1 master 2 nodes in centos with kubeadm:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ardl-k8latam01 Ready control-plane,master 7d2h v1.20.0
ardl-k8latam02 Ready <none> 7d2h v1.20.0
ardl-k8latam03 Ready <none> 7d2h v1.20.0
At first was working fine, but then started failing after I start working with helm (don't know if its related).
Now I cannot run any deployment and have a lot of pods in "terminating" status that never finish. Here I am trying to apply kubectl apply -f https://k8s.io/examples/controllers/nginx-deployment.yaml
as an example:
[root@ardl-k8latam01 ~]# kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default pod/nginx-deployment-66b6c48dd5-2xt7b 1/1 Terminating 0 19h
default pod/nginx-deployment-66b6c48dd5-5cttk 1/1 Terminating 0 19h
default pod/nginx-deployment-66b6c48dd5-8bz2f 0/1 Pending 0 18h
default pod/nginx-deployment-66b6c48dd5-dksqx 1/1 Terminating 0 19h
default pod/nginx-deployment-66b6c48dd5-fj9kl 0/1 Pending 0 18h
default pod/nginx-deployment-66b6c48dd5-j4hqv 0/1 Pending 0 18h
kube-system pod/calico-kube-controllers-bcc6f659f-bgmkb 1/1 Running 0 18h
kube-system pod/calico-kube-controllers-bcc6f659f-pksws 1/1 Terminating 0 7d21h
kube-system pod/calico-node-fns6d 0/1 Running 2 7d21h
kube-system pod/calico-node-t854c 1/1 Running 0 7d21h
kube-system pod/calico-node-vbsdr 1/1 Running 0 7d21h
kube-system pod/coredns-74ff55c5b-gw8j2 1/1 Running 1 18h
kube-system pod/coredns-74ff55c5b-xhvqb 1/1 Terminating 0 7d21h
kube-system pod/coredns-74ff55c5b-xr9mb 1/1 Terminating 0 7d21h
kube-system pod/coredns-74ff55c5b-zhhkx 1/1 Running 1 18h
kube-system pod/etcd-ardl-k8latam01 1/1 Running 2 7d21h
kube-system pod/kube-apiserver-ardl-k8latam01 1/1 Running 4 7d21h
kube-system pod/kube-controller-manager-ardl-k8latam01 1/1 Running 2 7d21h
kube-system pod/kube-proxy-2lmpb 1/1 Running 0 7d21h
kube-system pod/kube-proxy-fchv8 1/1 Running 2 7d21h
kube-system pod/kube-proxy-xks7h 1/1 Running 0 7d21h
kube-system pod/kube-scheduler-ardl-k8latam01 1/1 Running 2 7d21h
kube-system pod/metrics-server-68b849498d-6q74v 1/1 Terminating 0 7d20h
kube-system pod/metrics-server-68b849498d-7lpz8 0/1 Pending 0 18h
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/dashboardlb ClusterIP 10.100.82.105 <none> 8001/TCP 7d20h
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 7d21h
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 7d21h
kube-system service/metrics-server ClusterIP 10.101.85.63 <none> 443/TCP 7d20h
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/calico-node 3 3 0 3 0 beta.kubernetes.io/os=linux 7d21h
kube-system daemonset.apps/kube-proxy 3 3 1 3 1 kubernetes.io/os=linux 7d21h
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
default deployment.apps/nginx-deployment 0/3 3 0 18h
kube-system deployment.apps/calico-kube-controllers 1/1 1 1 7d21h
kube-system deployment.apps/coredns 2/2 2 2 7d21h
kube-system deployment.apps/metrics-server 0/1 1 0 7d20h
NAMESPACE NAME DESIRED CURRENT READY AGE
default replicaset.apps/nginx-deployment-66b6c48dd5 3 3 0 18h
kube-system replicaset.apps/calico-kube-controllers-bcc6f659f 1 1 1 7d21h
kube-system replicaset.apps/coredns-74ff55c5b 2 2 2 7d21h
kube-system replicaset.apps/metrics-server-68b849498d 1 1 0 7d20h
in cluster info dump I get:
==== START logs for container second-node of pod default/second-app-deployment-7f794d896f-q6zn5 ====
Request log error: the server rejected our request for an unknown reason (get pods second-app-deployment-7f794d896f-q6zn5)
==== END logs for container second-node of pod default/second-app-deployment-7f794d896f-q6zn5 ====
with describe:
[root@ardl-k8latam01 testwordpress]# kubectl describe pod nginx-deployment-66b6c48dd5-5cttk
Name: nginx-deployment-66b6c48dd5-5cttk
Namespace: default
Priority: 0
Node: ardl-k8latam02/10.48.41.12
Start Time: Fri, 18 Dec 2020 17:06:57 -0300
Labels: app=nginx
pod-template-hash=66b6c48dd5
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/nginx-deployment-66b6c48dd5
Containers:
nginx:
Container ID:
Image: nginx:1.14.2
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9rnk6 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-9rnk6:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-9rnk6
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 22m kubelet Failed to create pod **sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "044a2201b141e6679570d0f0ec3b1967b2a5bf0b230fa5058ed2bc6711eba55e" network for pod "nginx-deployment-66b6c48dd5-5cttk": networkPlugin cni failed to set up pod "nginx-deployment-66b6c48dd5-5cttk_default" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: connect: no route to host, failed to clean up sandbox container "044a2201b141e6679570d0f0ec3b1967b2a5bf0b230fa5058ed2bc6711eba55e" network for pod "nginx-deployment-66b6c48dd5-5cttk": networkPlugin cni failed to teardown pod "nginx-deployment-66b6c48dd5-5cttk_default" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: connect: no route to host]
Normal Scheduled 21m default-scheduler Successfully assigne**d default/nginx-deployment-66b6c48dd5-5cttk to ardl-k8latam02
Normal SandboxChanged 2m27s (x93 over 22m) kubelet Pod sandbox changed, it will be killed and re-created.
I also tried rebooting the nodes and master but nothing changed. When I try to "describe" a "Terminating" pod it tells me that the pode does not exist.
Is my problem related to calico?
How can I go deep about Request log error: the server rejected our request for an unknown reason
?
How should I continue the investigation?