Kubernetes high latency access svc ip on other nodes BUT works well in nodePort

3/31/2020

My k8s env:

NAME           STATUS   ROLES    AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
k8s-master01   Ready    master   46h   v1.18.0   172.18.90.100   <none>        CentOS Linux 7 (Core)   3.10.0-1062.12.1.el7.x86_64   docker://19.3.8
k8s-node01     Ready    <none>   46h   v1.18.0   172.18.90.111   <none>        CentOS Linux 7 (Core)   3.10.0-1062.12.1.el7.x86_64   docker://19.3.8

kube-system:

kubectl get pod -o wide -n kube-system
NAME                                   READY   STATUS    RESTARTS   AGE   IP              NODE           NOMINATED NODE   READINESS GATES
coredns-66bff467f8-9dg27               1/1     Running   0          16h   10.244.1.62     k8s-node01     <none>           <none>
coredns-66bff467f8-blgch               1/1     Running   0          16h   10.244.0.5      k8s-master01   <none>           <none>
etcd-k8s-master01                      1/1     Running   0          46h   172.19.90.189   k8s-master01   <none>           <none>
kube-apiserver-k8s-master01            1/1     Running   0          46h   172.19.90.189   k8s-master01   <none>           <none>
kube-controller-manager-k8s-master01   1/1     Running   0          46h   172.19.90.189   k8s-master01   <none>           <none>
kube-flannel-ds-amd64-scgkt            1/1     Running   0          17h   172.19.90.194   k8s-node01     <none>           <none>
kube-flannel-ds-amd64-z6fk9            1/1     Running   0          44h   172.19.90.189   k8s-master01   <none>           <none>
kube-proxy-8pbmz                       1/1     Running   0          16h   172.19.90.194   k8s-node01     <none>           <none>
kube-proxy-sgpds                       1/1     Running   0          16h   172.19.90.189   k8s-master01   <none>           <none>
kube-scheduler-k8s-master01            1/1     Running   0          46h   172.19.90.189   k8s-master01   <none>           <none>

My Deployment and Service:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hostnames
spec:
  selector:
    matchLabels:
      app: hostnames
  replicas: 3
  template:
    metadata:
      labels:
        app: hostnames
    spec:
      containers:
      - name: hostnames
        image: k8s.gcr.io/serve_hostname
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 9376
          protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  name: hostnames
spec:
  selector:
    app: hostnames
  ports:
  - name: default
    protocol: TCP
    port: 80
    targetPort: 9376

My svc info:

kubectl get svc 
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
hostnames    ClusterIP   10.106.24.115   <none>        80/TCP    42m
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP   46h

The problem:

When I curl 10.106.24.115 on the k8s-master01 responding a high deley about a minute,But I can get response right away on k8s-node01.

I edited my svc and changed ClusterIP to NodePort:

kubectl edit svc hostnames
spec:
  clusterIP: 10.106.24.115
  ports:
  - name: default
    port: 80
    protocol: TCP
    targetPort: 9376
    nodePort: 30888
  selector:
    app: hostnames
  sessionAffinity: None
  type: NodePort

kubectl get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
hostnames    NodePort    10.106.24.115   <none>        80:30888/TCP   64m
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP        46h

Now, I curl each node with: nodeIp:30888. It works well and response right away.Why It occured high delay when I access throuth ClusterIP on other node.I also have another k8s cluster, it has no problem.Then the same delay response using curl 127.0.0.1:30555 on k8s-master01. so weird!

There are no errors in my kube-controller-manager:

'SuccessfulCreate' Created pod: hostnames-68b5ff98ff-mbh4k
I0330 09:11:20.953439       1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"kube-system", Name:"coredns-66bff467f8", UID:"df14e2c6-faf1-4f6a-8b97-8d519b390c73", APIVersion:"apps/v1", ResourceVersion:"986", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: coredns-66bff467f8-7pd8r
I0330 09:11:36.488237       1 event.go:278] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"kube-dns", UID:"f42d9cbc-c757-48f0-96a4-d15f75082a88", APIVersion:"v1", ResourceVersion:"250956", FieldPath:""}): type: 'Warning' reason: 'FailedToUpdateEndpoint' Failed to update endpoint kube-system/kube-dns: Operation cannot be fulfilled on endpoints "kube-dns": the object has been modified; please apply your changes to the latest version and try again
I0330 09:11:44.753349       1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"kube-system", Name:"coredns-66bff467f8", UID:"df14e2c6-faf1-4f6a-8b97-8d519b390c73", APIVersion:"apps/v1", ResourceVersion:"250936", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: coredns-66bff467f8-z7fps
I0330 09:12:46.690043       1 event.go:278] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-system", Name:"kube-flannel-ds-amd64", UID:"12cda6e4-fd07-4328-887d-6dd9ca8a86d7", APIVersion:"apps/v1", ResourceVersion:"251183", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: kube-flannel-ds-amd64-scgkt
I0330 09:19:35.915568       1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"kube-system", Name:"coredns-66bff467f8", UID:"df14e2c6-faf1-4f6a-8b97-8d519b390c73", APIVersion:"apps/v1", ResourceVersion:"251982", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: coredns-66bff467f8-9dg27
I0330 09:19:42.808373       1 event.go:278] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"kube-dns", UID:"f42d9cbc-c757-48f0-96a4-d15f75082a88", APIVersion:"v1", ResourceVersion:"252221", FieldPath:""}): type: 'Warning' reason: 'FailedToUpdateEndpoint' Failed to update endpoint kube-system/kube-dns: Operation cannot be fulfilled on endpoints "kube-dns": the object has been modified; please apply your changes to the latest version and try again
I0330 09:19:52.606633       1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"kube-system", Name:"coredns-66bff467f8", UID:"df14e2c6-faf1-4f6a-8b97-8d519b390c73", APIVersion:"apps/v1", ResourceVersion:"252222", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: coredns-66bff467f8-blgch
I0330 09:20:36.488412       1 event.go:278] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-system", Name:"kube-proxy", UID:"33fa53f5-2240-4020-9b1f-14025bb3ab0b", APIVersion:"apps/v1", ResourceVersion:"252365", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: kube-proxy-sgpds
I0330 09:20:46.686463       1 event.go:278] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-system", Name:"kube-proxy", UID:"33fa53f5-2240-4020-9b1f-14025bb3ab0b", APIVersion:"apps/v1", ResourceVersion:"252416", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: kube-proxy-8pbmz
I0330 09:24:31.015395       1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"default", Name:"hostnames", UID:"b54625e7-6f84-400a-9048-acd4a9207d86", APIVersion:"apps/v1", ResourceVersion:"252991", FieldPath:""}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled up replica set hostnames-68b5ff98ff to 3
I0330 09:24:31.020097       1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"default", Name:"hostnames-68b5ff98ff", UID:"5b4bba3e-e15e-45a6-b33e-055cdb1beca4", APIVersion:"apps/v1", ResourceVersion:"252992", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: hostnames-68b5ff98ff-gzvxb
I0330 09:24:31.024513       1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"default", Name:"hostnames-68b5ff98ff", UID:"5b4bba3e-e15e-45a6-b33e-055cdb1beca4", APIVersion:"apps/v1", ResourceVersion:"252992", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: hostnames-68b5ff98ff-kl29m
I0330 09:24:31.024538       1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"default", Name:"hostnames-68b5ff98ff", UID:"5b4bba3e-e15e-45a6-b33e-055cdb1beca4", APIVersion:"apps/v1", ResourceVersion:"252992", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: hostnames-68b5ff98ff-czrqx
I0331 00:56:33.245614       1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"default", Name:"hostnames", UID:"10e9b06c-9e0c-4303-aff9-9ec03f5c5919", APIVersion:"apps/v1", ResourceVersion:"381792", FieldPath:""}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled up replica set hostnames-68b5ff98ff to 3
I0331 00:56:33.251743       1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"default", Name:"hostnames-68b5ff98ff", UID:"aaa4d5ac-b7f4-4bcb-b6ea-959ecee00e0e", APIVersion:"apps/v1", ResourceVersion:"381793", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: hostnames-68b5ff98ff-7z4bb
I0331 00:56:33.256083       1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"default", Name:"hostnames-68b5ff98ff", UID:"aaa4d5ac-b7f4-4bcb-b6ea-959ecee00e0e", APIVersion:"apps/v1", ResourceVersion:"381793", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: hostnames-68b5ff98ff-2zwxf
I0331 00:56:33.256171       1 event.go:278] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"default", Name:"hostnames-68b5ff98ff", UID:"aaa4d5ac-b7f4-4bcb-b6ea-959ecee00e0e", APIVersion:"apps/v1", ResourceVersion:"381793", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: hostnames-68b5ff98ff-x289b

The output of describe ep kube-dns:

kubectl describe ep kube-dns --namespace=kube-system
Name:         kube-dns
Namespace:    kube-system
Labels:       k8s-app=kube-dns
              kubernetes.io/cluster-service=true
              kubernetes.io/name=KubeDNS
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2020-03-31T04:27:42Z
Subsets:
  Addresses:          10.244.0.2,10.244.0.3
  NotReadyAddresses:  <none>
  Ports:
    Name     Port  Protocol
    ----     ----  --------
    dns-tcp  53    TCP
    metrics  9153  TCP
    dns      53    UDP

Events:  <none>
-- Jerold Tsao
kubernetes

1 Answer

4/20/2020

Based on the information that you provided there are couple of things that can be checked/done:

Your kube-controller-manager reports an error with endpoints:

Failed to update endpoint kube-system/kube-dns: Operation cannot be fulfilled on endpoints "kube-dns": the object has been modified; please apply your changes to the latest version and try again

Going further you may also notice that your your kube-dns endpoints does not match your core-dns ip addresses.

This could be caused by previous kubeadm installation that was not entirely cleaned up and did not remove the cni and flannel interfaces.

I would assure and check for any virtual NIC's created by flannel with previous installation. You can check them using ip link command and then delete them:

ip link delete cni0 
ip link delete flannel.1

Alternatively use brctl command (brctl delbr cni0)

Please also note that you reported initializing cluster with 10.244.0.0/16 but I can see that your system pods are running with different one (except the coreDNS pods which have the correct one). All the system pods should have the same pod subnet that you specified using the --pod-network-cidr flag. Your Pod network must not overlap with any of the host networks. Looking at your system pods having the same subnet as the host this may also be the reason for that.

Second thing is to check iptables-save for master and worker. Your reported that using NodePort you don't experience latency. I would assume it because you using NodePort you are bypassing the flannel networking and going straight to the pod that is running on the worker (I can see that you have only one). This also indicates an issues with CNI.

-- acid_fuji
Source: StackOverflow