Calico CNI pod networking not working across different hosts on EKS Kubernetes worker nodes

9/13/2019

I am running vanilla EKS Kubernetes at version 1.12.

I've used CNI Genie to allow custom selection of the CNI that pods use when starting and I've installed the standard Calico CNI setup.

With CNI Genie I configured the default CNI to be the AWS CNI (aws-node) and all pods start up as usual and get assigned an IP from my VPC subnets.

I then selectively use calico as the CNI for some basic pods I am testing with. I'm using the default calico 192.168.0.0/16 CIDR range. Everything works great if the pods are on the same EKS worker nodes.

Core DNS is working great too (as long as I keep the coredns pods running on the aws CNI).

However, if a pod moves to a different worker node, then networking between them does not work inside the cluster.

I've checked the routing tables on the worker nodes that calico auto configures and it appears logical to me.

Here is my wide pod listing across all namespaces:

NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE   IP                NODE                                       NOMINATED NODE
default       hello-node1-865588ccd7-64p5x               1/1     Running   0          31m   192.168.106.129   ip-10-0-2-31.eu-west-2.compute.internal    <none>
default       hello-node2-dc7bbcb74-gqpwq                1/1     Running   0          17m   192.168.25.193    ip-10-0-3-222.eu-west-2.compute.internal   <none>
kube-system   aws-node-cm2dp                             1/1     Running   0          26m   10.0.3.222        ip-10-0-3-222.eu-west-2.compute.internal   <none>
kube-system   aws-node-vvvww                             1/1     Running   0          31m   10.0.2.31         ip-10-0-2-31.eu-west-2.compute.internal    <none>
kube-system   calico-kube-controllers-56bfccb786-fc2j4   1/1     Running   0          30m   10.0.2.41         ip-10-0-2-31.eu-west-2.compute.internal    <none>
kube-system   calico-node-flmnl                          1/1     Running   0          31m   10.0.2.31         ip-10-0-2-31.eu-west-2.compute.internal    <none>
kube-system   calico-node-hcmqd                          1/1     Running   0          26m   10.0.3.222        ip-10-0-3-222.eu-west-2.compute.internal   <none>
kube-system   coredns-6c64c9f456-g2h9k                   1/1     Running   0          30m   10.0.2.204        ip-10-0-2-31.eu-west-2.compute.internal    <none>
kube-system   coredns-6c64c9f456-g5lhl                   1/1     Running   0          30m   10.0.2.200        ip-10-0-2-31.eu-west-2.compute.internal    <none>
kube-system   genie-plugin-hspts                         1/1     Running   0          26m   10.0.3.222        ip-10-0-3-222.eu-west-2.compute.internal   <none>
kube-system   genie-plugin-vqd2d                         1/1     Running   0          31m   10.0.2.31         ip-10-0-2-31.eu-west-2.compute.internal    <none>
kube-system   kube-proxy-jm7f7                           1/1     Running   0          26m   10.0.3.222        ip-10-0-3-222.eu-west-2.compute.internal   <none>
kube-system   kube-proxy-nnp76                           1/1     Running   0          31m   10.0.2.31         ip-10-0-2-31.eu-west-2.compute.internal    <none>

As you can see, the two hello-node pods are using the Calico CNI.

I've exposed the hello-node pods with two services:

NAME          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
hello-node1   ClusterIP   172.20.90.83    <none>        8081/TCP   43m
hello-node2   ClusterIP   172.20.242.22   <none>        8082/TCP   43m

I've confirmed if I start the hello-node pods with the aws CNI that I can ping / curl between them when they run on separate hosts using the cluster service names.

Things stop working when I use Calico CNI as above.

I only have two EKS worker hosts in this test cluster. Here is the routing for each:

K8s Worker 1 routes

[ec2-user@ip-10-0-3-222 ~]$ ip route
default via 10.0.3.1 dev eth0
10.0.3.0/24 dev eth0 proto kernel scope link src 10.0.3.222
169.254.169.254 dev eth0
blackhole 192.168.25.192/26 proto bird
192.168.25.193 dev calia0da7d91dc2 scope link
192.168.106.128/26 via 10.0.2.31 dev tunl0 proto bird onlink

K8s Worker 2 routes

[ec2-user@ip-10-0-2-31 ~]$ ip route
default via 10.0.2.1 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.31
10.0.2.41 dev enif4cf9019f11 scope link
10.0.2.200 dev eni412af1a0e55 scope link
10.0.2.204 dev eni04260ebbbe1 scope link
169.254.169.254 dev eth0
192.168.25.192/26 via 10.0.3.222 dev tunl0 proto bird onlink
blackhole 192.168.106.128/26 proto bird
192.168.106.129 dev cali19da7817849 scope link

To me, the route: 192.168.25.192/26 via 10.0.3.222 dev tunl0 proto bird onlink

tells me that traffic destined for the 192.168.25.192/16 subnet from this worker (and its containers/pods) should go out to the 10.0.3.222 (AWS VPC ENI for the EC2 host) on the tunl0 interface.

This route is on the EC2 host 10.0.2.31. So in other words when talking from this host's containers to containers on the calico subnet 192.168.25.192/16, network traffic should route to 10.0.3.222 (the ENI IP for my other EKS worker node where containers using Calico run on that subnet).

To clarify my testing procedure:

  1. Exec into hello-node1 pod, and curl http://hello-node2:8082 (or ping the calico assigned IP address of the hello-node2 pod.

EDIT

To further test this, I've run tcpdump on the host where the hello-node2 pod is running, capturing on port 8080 (the container listens on this port).

I do get activity on the destination host where the test container that I am curling to is running, but it doesn't seem to indicate dropped traffic.

[ec2-user@ip-10-0-3-222 ~]$ sudo tcpdump -vv -x -X -i tunl0 'port 8080'
tcpdump: listening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes
14:32:42.859238 IP (tos 0x0, ttl 254, id 63813, offset 0, flags [DF], proto TCP (6), length 60)
    10.0.2.31.29192 > 192.168.25.193.webcache: Flags [S], cksum 0xf932 (correct), seq 3206263598, win 28000, options [mss 1400,sackOK,TS val 2836614698 ecr 0,nop,wscale 7], length 0
        0x0000:  4500 003c f945 4000 fe06 9ced 0a00 021f  E..<.E@.........
        0x0010:  c0a8 19c1 7208 1f90 bf1b b32e 0000 0000  ....r...........
        0x0020:  a002 6d60 f932 0000 0204 0578 0402 080a  ..m`.2.....x....
        0x0030:  a913 4e2a 0000 0000 0103 0307            ..N*........
14:32:43.870168 IP (tos 0x0, ttl 254, id 63814, offset 0, flags [DF], proto TCP (6), length 60)
    10.0.2.31.29192 > 192.168.25.193.webcache: Flags [S], cksum 0xf53f (correct), seq 3206263598, win 28000, options [mss 1400,sackOK,TS val 2836615709 ecr 0,nop,wscale 7], length 0
        0x0000:  4500 003c f946 4000 fe06 9cec 0a00 021f  E..<.F@.........
        0x0010:  c0a8 19c1 7208 1f90 bf1b b32e 0000 0000  ....r...........
        0x0020:  a002 6d60 f53f 0000 0204 0578 0402 080a  ..m`.?.....x....
        0x0030:  a913 521d 0000 0000 0103 0307            ..R.........
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel

Even the calia0da7d91dc2 interface on the host running my target/test pod shows increased RX packets and byte counts whenever I run the curl from the other pod on the other host. Traffic is definitely traversing.

[ec2-user@ip-10-0-3-222 ~]$ ifconfig
calia0da7d91dc2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1440
        inet6 fe80::ecee:eeff:feee:eeee  prefixlen 64  scopeid 0x20<link>
        ether ee:ee:ee:ee:ee:ee  txqueuelen 0  (Ethernet)
        RX packets 84  bytes 5088 (4.9 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

What is preventing the networking from working between hosts here? Am I missing something obvious?

Edit 2 - information for Arjun Pandey- parjun8840

Here is some more info about my Calico configuration:

  • I am have disabled source/destination checking on all AWS EC2 worker nodes
  • I've followed the latest calico docs to configure the IP pool for cross-subnet and NAT use for traffic outside the cluster

calicoctl configs Note: it seems that the workloadendpoints are non-existent...

 me@mine ~ aws-vault exec my-vault-entry -- kubectl get IPPool --all-namespaces
NAME                  AGE
default-ipv4-ippool   1d

 me@mine ~ aws-vault exec my-vault-entry -- kubectl get IPPool default-ipv4-ippool -o yaml
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
  annotations:
    projectcalico.org/metadata: '{"uid":"41bd2c82-d576-11e9-b1ef-121f3d7b4d4e","creationTimestamp":"2019-09-12T15:59:09Z"}'
  creationTimestamp: "2019-09-12T15:59:09Z"
  generation: 1
  name: default-ipv4-ippool
  resourceVersion: "500448"
  selfLink: /apis/crd.projectcalico.org/v1/ippools/default-ipv4-ippool
  uid: 41bd2c82-d576-11e9-b1ef-121f3d7b4d4e
spec:
  blockSize: 26
  cidr: 192.168.0.0/16
  ipipMode: CrossSubnet
  natOutgoing: true
  nodeSelector: all()
  vxlanMode: Never

 me@mine ~ aws-vault exec my-vault-entry -- calicoctl get nodes
NAME
ip-10-254-109-184.ec2.internal
ip-10-254-109-237.ec2.internal
ip-10-254-111-147.ec2.internal

 me@mine ~ aws-vault exec my-vault-entry -- calicoctl get workloadendpoints
WORKLOAD   NODE   NETWORKS   INTERFACE


 me@mine ~

Here is some network info for a sample host in the cluster and one of the test container's container network:

host ip a

[ec2-user@ip-10-254-109-184 ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 02:1b:79:d1:c5:bc brd ff:ff:ff:ff:ff:ff
    inet 10.254.109.184/26 brd 10.254.109.191 scope global dynamic eth0
       valid_lft 2881sec preferred_lft 2881sec
    inet6 fe80::1b:79ff:fed1:c5bc/64 scope link
       valid_lft forever preferred_lft forever
3: eni808caba7453@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether c2:be:80:d4:6a:f3 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::c0be:80ff:fed4:6af3/64 scope link
       valid_lft forever preferred_lft forever
5: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 192.168.29.128/32 brd 192.168.29.128 scope global tunl0
       valid_lft forever preferred_lft forever
6: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 02:12:58:bb:c6:1a brd ff:ff:ff:ff:ff:ff
    inet 10.254.109.137/26 brd 10.254.109.191 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::12:58ff:febb:c61a/64 scope link
       valid_lft forever preferred_lft forever
7: enia6f1918d9e2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether 96:f5:36:53:e9:55 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::94f5:36ff:fe53:e955/64 scope link
       valid_lft forever preferred_lft forever
8: enia32d23ac2d1@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether 36:5e:34:a7:82:30 brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::345e:34ff:fea7:8230/64 scope link
       valid_lft forever preferred_lft forever
9: cali5e7dde1e39e@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 3
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
[ec2-user@ip-10-254-109-184 ~]$

nsenter on the test container pid to get ip a info:

[ec2-user@ip-10-254-109-184 ~]$ sudo nsenter -t 15715 -n ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default
    link/ether 9a:6d:db:06:74:cb brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.29.129/32 scope global eth0
       valid_lft forever preferred_lft forever
-- Shogan
aws-eks
kubernetes
project-calico

1 Answer

9/14/2019

I am not sure about the exact solution right now ( I haven't tested calico on AWS, normally I use amazon-vpc-cni-k8s on AWS and on physical cluster calico), but below are the quick things we can look into.

Calico AWS requirement- https://docs.projectcalico.org/v2.3/reference/public-cloud/aws

kubectl get IPPool --all-namespaces
NAME                  AGE
default-ipv4-ippool   15d

kubectl get IPPool default-ipv4-ippool -o yaml


~ calicoctl get nodes
NAME            
node1         
node2        
node3 
node4   

~ calicoctl get workloadendpoints

NODE            ORCHESTRATOR   WORKLOAD                                                   NAME    
node2               k8s            default.myapp-569c54f85-xtktk                   eth0       
node1               k8s            kube-system.calico-kube-controllers-5cbcccc885-b9x8s   eth0   
node1               k8s            kube-system.coredns-fb8b8dcde-2zpw8                    eth0   
node1               k8s            kube-system.coredns-fb8b8dcfg-hc6zv                    eth0 

Also if we can get the detail of container network: nsenter -t pid -n ip a

And for the host as well: ip a

-- Arjun Pandey- parjun8840
Source: StackOverflow