Service won't allow traffic to pods after re-creating deployments/pods on k8s centos7 baremetal cluster: No route to host

5/3/2020

Created a bare metal multiple master cluster 1.18 release:

NAME            STATUS   ROLES    AGE   VERSION
master1   Ready    master   9d    v1.18.0
master2   Ready    master   9d    v1.18.0
master3   Ready    master   9d    v1.18.0
worker1   Ready    <none>   9d    v1.18.0
worker2   Ready    <none>   9d    v1.18.0
worker3   Ready    <none>   9d    v1.18.0

kubectl create deploy la --image=linuxacademycontent/kubeserve:v1
kubectl scale deploy la --replica=6
kubectl get pod

curlpod                1/1     Running   0          55m
la-65776bc88f-4kp95    1/1     Running   0          42m
la-65776bc88f-7lb9f    1/1     Running   0          42m
la-65776bc88f-dgzsz    1/1     Running   0          42m
la-65776bc88f-l2hs2    1/1     Running   0          43m
la-65776bc88f-vkcts    1/1     Running   0          42m
la-65776bc88f-wkm2k    1/1     Running   0          42m

kubectl expose deploy la --port=80 --target-port=80
kubectl get svc

kubernetes   ClusterIP   10.96.0.1        <none>        443/TCP   75m
la           ClusterIP   10.103.199.187   <none>        80/TCP    46m

kubectl get ep

kubernetes   172.31.16.67:6443                                       78m
la           10.244.1.6:80,10.244.1.7:80,10.244.2.5:80 + 3 more...   49m

I can verify all is working by curling from a testpod - curlpod
kubectl exec -it curlpod bash

This is v1 pod la-65776bc88f-dgzsz

Problem is if i now delete deploy (or delete all the pods), and recreate them, the service won't allow traffic to new pods.
Here is how to reproduce the problem. kubectl delete deploy la

curlpod                1/1     Running       0          71m
la-65776bc88f-4kp95    1/1     Terminating   0          58m
la-65776bc88f-7lb9f    1/1     Terminating   0          58m
la-65776bc88f-dgzsz    1/1     Terminating   0          58m
la-65776bc88f-l2hs2    1/1     Terminating   0          59m
la-65776bc88f-vkcts    1/1     Terminating   0          58m
la-65776bc88f-wkm2k    1/1     Terminating   0          58m

kubectl create deploy la --image=linuxacademycontent/kubeserve:v1
kubectl scale deploy la --replicas=6
kubectl get pod

NAME                   READY   STATUS    RESTARTS   AGE
curlpod                1/1     Running   0          73m
la-65776bc88f-5q7d8    1/1     Running   0          6s
la-65776bc88f-b69z2    1/1     Running   0          10s
la-65776bc88f-j2t4t    1/1     Running   0          6s
la-65776bc88f-ndxs5    1/1     Running   0          6s
la-65776bc88f-ns2gt    1/1     Running   0          6s
la-65776bc88f-vjp8x    1/1     Running   0          6s

All looks fine. However, service won't allow any traffic to the new pods
kubectl exec -it curlpod bash

root@curlpod:/app# curl la
curl: (7) Failed to connect to la port 80: No route to host
root@curlpod:/app# curl 10.103.199.187   #name resolution seems fine
curl: (7) Failed to connect to la port 80: No route to host

I can still reach pods individually. eg la-65776bc88f-5q7d8 has IP: 10.244.5.13

root@test-pod:/app# curl 10.244.5.13
This is v1 pod la-65776bc88f-5q7d8

So I thought it was service that won't route the traffic, but I can see service has no change and all new pods are listed as endpoints for the service. kubectl get svc

NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1        <none>        443/TCP   93m
la           ClusterIP   10.103.199.187   <none>        80/TCP    64m

kubectl get ep

NAME         ENDPOINTS                                               AGE
kubernetes   172.31.16.67:6443                                       93m
la           10.244.5.13,10.244.1.9:80,10.244.2.7:80 + 3 more...   64m

Somewhere something is blocking the traffic from service to new pods.

Now, if we restart the svc, the problem will go away!!
kubectl delete svc la
service "la" deleted

kubectl expose deploy la --port=80 --target-port=80
service/la exposed

kubectl get svc

kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP   108m
la           ClusterIP   10.96.124.200   <none>        80/TCP    71s

kubectl exec -it curlpod bash

This is v1 pod la-65776bc88f-z24qn

Tried a few alternative, such as use yaml to create deployment, service, same problem. Also tried using kubectl scale to scale down and up. However I create new pod in the deployment, the traffic from service to pod would always be blocked.

Any help or suggestion welcome!

Add Flannel log

kubectl logs --namespace kube-system kube-flannel-ds-amd64-tbglf
I0423 15:15:52.311421       1 main.go:518] Determining IP address of default interface
I0423 15:15:52.312644       1 main.go:531] Using interface with name eth0 and address 10.30.193.13
I0423 15:15:52.312669       1 main.go:548] Defaulting external address to interface address (10.30.193.13)
W0423 15:15:52.312689       1 client_config.go:517] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0423 15:15:52.323517       1 kube.go:119] Waiting 10m0s for node controller to sync
I0423 15:15:52.324491       1 kube.go:306] Starting kube subnet manager
I0423 15:15:53.323919       1 kube.go:126] Node controller sync successful
I0423 15:15:53.323976       1 main.go:246] Created subnet manager: Kubernetes Subnet Manager - uk02sybk8smp3
I0423 15:15:53.323983       1 main.go:249] Installing signal handlers
I0423 15:15:53.324084       1 main.go:390] Found network config - Backend type: vxlan
I0423 15:15:53.324176       1 vxlan.go:121] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
I0423 15:15:53.365832       1 main.go:355] Current network or subnet (10.244.0.0/16, 10.244.2.0/24) is not equal to previous one (0.0.0.0/0, 0.0.0.0/0), trying to recycle old iptables rules
I0423 15:15:53.367815       1 iptables.go:167] Deleting iptables rule: -s 0.0.0.0/0 -d 0.0.0.0/0 -j RETURN
I0423 15:15:53.409089       1 iptables.go:167] Deleting iptables rule: -s 0.0.0.0/0 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully
I0423 15:15:53.410118       1 iptables.go:167] Deleting iptables rule: ! -s 0.0.0.0/0 -d 0.0.0.0/0 -j RETURN
I0423 15:15:53.411017       1 iptables.go:167] Deleting iptables rule: ! -s 0.0.0.0/0 -d 0.0.0.0/0 -j MASQUERADE --random-fully
I0423 15:15:53.412076       1 main.go:305] Setting up masking rules
I0423 15:15:53.412916       1 main.go:313] Changing default FORWARD chain policy to ACCEPT
I0423 15:15:53.412999       1 main.go:321] Wrote subnet file to /run/flannel/subnet.env
I0423 15:15:53.413009       1 main.go:325] Running backend.
I0423 15:15:53.413017       1 main.go:343] Waiting for all goroutines to exit
I0423 15:15:53.413038       1 vxlan_network.go:60] watching for new subnet leases
I0423 15:15:53.510136       1 iptables.go:145] Some iptables rules are missing; deleting and recreating rules
I0423 15:15:53.510158       1 iptables.go:167] Deleting iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
I0423 15:15:53.510293       1 iptables.go:145] Some iptables rules are missing; deleting and recreating rules
I0423 15:15:53.510301       1 iptables.go:167] Deleting iptables rule: -s 10.244.0.0/16 -j ACCEPT
I0423 15:15:53.511096       1 iptables.go:167] Deleting iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully
I0423 15:15:53.512091       1 iptables.go:167] Deleting iptables rule: ! -s 10.244.0.0/16 -d 10.244.2.0/24 -j RETURN
I0423 15:15:53.512206       1 iptables.go:167] Deleting iptables rule: -d 10.244.0.0/16 -j ACCEPT
I0423 15:15:53.512998       1 iptables.go:167] Deleting iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE --random-fully
I0423 15:15:53.513970       1 iptables.go:155] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
I0423 15:15:53.609595       1 iptables.go:155] Adding iptables rule: -s 10.244.0.0/16 -j ACCEPT
I0423 15:15:53.612312       1 iptables.go:155] Adding iptables rule: -d 10.244.0.0/16 -j ACCEPT
I0423 15:15:53.613132       1 iptables.go:155] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully
I0423 15:15:53.710527       1 iptables.go:155] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.2.0/24 -j RETURN
I0423 15:15:53.712938       1 iptables.go:155] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE --random-fully

kube-proxy is flooded with

E0504 08:25:36.251542       1 proxier.go:1533] Failed to sync endpoint for service: 10.30.218.46:30080/TCP, err: parseIP Error ip=[10 244 5 3 0 0 0 0 0 0 0 0 0 0 0 0]
E0504 08:25:36.251632       1 proxier.go:1950] Failed to list IPVS destinations, error: parseIP Error ip=[10 244 5 3 0 0 0 0 0 0 0 0 0 0 0 0]
-- Kai Zhang
kubernetes

0 Answers