Our production cluster is running fine on k8s 1.12.3-rancher1-1 having several nodes in two different networks: 192.168.225.0/24 (2) and 172.30.0.0/24 (6). When upgrading the cluster to any newer version of k8s (verified with 1.16.4-rancher1-1 and 1.17.5-rancher1-1) communication between nodes of these networks fails.
To reproduce the issue set up the following environment. It is not necessary to perform an upgrade from 1.12.3 to a new version. A clean install of any new version seems to produce the same result:
nodes:
# frontend nodes
- address: 192.168.225.2
role:
- worker
hostname_override: frontend01
labels:
tier: frontend
environment: Production
user: deployuser
ssh_key_path: ./frontend.key
# note: for support of a key with a passphrase see https://rancher.com/docs/rke/v0.1.x/en/config-options/#ssh-agent
# core nodes
- address: 172.30.0.2
role:
- controlplane
- etcd
- worker
hostname_override: core01
labels:
tier: core
environment: Production
user: deployuser
ssh_key_path: ./backend.key
# note: for support of a key with a passphrase see https://rancher.com/docs/rke/v0.1.x/en/config-options/#ssh-agent
# Cluster Level Options
cluster_name: production
ignore_docker_version: false
kubernetes_version: "v1.16.4-rancher1-1"
# SSH Agent
ssh_agent_auth: false # use the rke built agent
# deploy an ingress controller on all ''
ingress:
provider: nginx
options:
server-tokens: false
ssl-redirect: false
Firewall-Rules
FRONTEND01 allow 8472/udp from 172.30.0.2
FRONTEND01 allow 10250/tcp from 172.30.0.2
FRONTEND01 allow ssh
CORE01 allow 6443/tcp from 192.168.225.2
CORE01 allow 8472/udp from 192.168.225.2
CORE01 allow ssh
rke
(v1.0.8) and wait for it to be ready.kubectl run -it centos1 --rm --image=centos --restart=Never --overrides='{"apiVersion":"v1","spec":{"affinity":{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchFields":[{"key":"metadata.name","operator":"In","values":["core01"]}]}]}}}}}' --kubeconfig kube_config_cluster.yml -- /bin/bash
for i in {1..100}; do ping -c 1 wikipedia.com; done
Notice the very slow speed in name resolution that often fails completely.
Name resolution works fast and ping succeeds every time.
Name resolution works fast and ping succeeds every time.
OS: Ubuntu 16.04.6
docker: 19.03.1 (docker-ce, docker-ce-cli)
k8s: 1.12.3-rancher1-1 (ok); 1.16.4-rancher1-1 (failed), 1.17.5-rancher1-1 (failed)
rke: 1.0.8
kubectl: 1.16.1