kuberentes communication is not working cross nodes

10/14/2018

I have a procedure of installing kubernetes cluster via kubeadm and it worked multiple times.

for some reason now I have a cluster which I installed and for some reason the nodes are having trouble communicating.

the problem reflect in couple of ways : sometimes the cluster is unable to resolve global dns records such as mirrorlist.centos.org sometimes one pod from a specific node has no connectivity to another pod in different node

my kubernetes version is 1.9.2 my hosts are centOS 7.4 I use flannel as cni plugin in version 0.9.1 my cluster is built on AWS

mt debugging so far was :

kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}' - to see subnets 10.244.0.0/24 10.244.1.0/24

  1. I tried adding configurations to kubedns ( even though it is needed in all my other clusters ) like https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/#configure-stub-domain-and-upstream-dns-servers

    1. I tried installing busybox and ding nslookup to cluster kubernetes.default and it only works of busybox is on the same node as the dns ( tried this link https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/

I even tried creating an AMI from other running environments and deploying it as a node to this cluster and it still fails.

I tried checking if some port is missing so I even opened all ports between nodes

I also disabled iptables and firewall and all nodes just to make sure it is not the reason

nothing helps.

please any tip would help

edit : I added my flannel configuration:

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: flannel
rules:
  - apiGroups:
      - ""
    resources:
      - pods
    verbs:
      - get
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - nodes/status
    verbs:
      - patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: flannel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flannel
subjects:
- kind: ServiceAccount
  name: flannel
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
  namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "type": "flannel",
      "delegate": {
        "isDefaultGateway": true
      }
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/arch: amd64
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      serviceAccountName: flannel
      initContainers:
      - name: install-cni
        image: quay.io/coreos/flannel:v0.9.1-amd64
        command:
        - cp
        args:
        - -f
        - /etc/kube-flannel/cni-conf.json
        - /etc/cni/net.d/10-flannel.conf
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      containers:
      - name: kube-flannel
        image: quay.io/coreos/flannel:v0.9.1-amd64
        command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr" ]
        securityContext:
          privileged: true
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        volumeMounts:
        - name: run
          mountPath: /run
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      volumes:
        - name: run
          hostPath:
            path: /run
        - name: cni
          hostPath:
            path: /etc/cni/net.d
        - name: flannel-cfg
          configMap:
            name: kube-flannel-cfg
-- eran meiri
flannel
kubernetes

1 Answer

10/21/2018

the issues was that the AWS machines were provisioned not by me and the team that provisioned the machines assured that all internal traffic is opened.

after a lot of debugging with nmap I found out that UDP ports are not opened and since flannel requires UDP traffic the communication was not working properly.

once UDP was opened issues got solved.

-- eran meiri
Source: StackOverflow