I am going to upgrade Calico node and cni as per this link for "Upgrading Components Individually"
The directions are very clear (I will cordon each node and do the step for the calico/cni
and calico/node
), but I am not sure what is meant by
Update the image in your process management to reference the new version
wrt to upgrading the calico/node
container.
Otherwise, I see no other issues wrt the directions. Our environment is a k8s kubeadm cluster.
I suppose the real question is: where do I tell k8s to use the newer version of the calico/node
image?
EDIT
To answer the above:
I just did a kubectl delete -f
on both calico.yaml
and rbac-kdd.yaml
and then did a kubectl create -f
on the newest version of these files.
Everything appears now to be at version 3.3.2, but I am getting this error now on all the calico-node pods:
Warning Unhealthy 84s (x181 over 31m) kubelet, thalia4 Readiness probe failed: calico/node is not ready: BIRD is not ready: BGP not established with <node IP addresses here
I ran calicoctl nodd status
and got
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+--------------------------------+
| 134.x.x.163 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.x.x.164 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.x.x.165 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.x.x.168 | node-to-node mesh | start | 02:36:29 | Active Socket: Host is |
| | | | | unreachable |
+---------------+-------------------+-------+----------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.
I would assume 134.x.x.168 being unreachable is why I am getting the above health check warning.
Not exactly sure what to do though. This node is available in the k8s cluster (this is node thalia4
):
[gms@thalia0 calico]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
thalia0 Ready master 87d v1.13.1
thalia1 Ready <none> 48d v1.13.1
thalia2 Ready <none> 30d v1.13.1
thalia3 Ready <none> 87d v1.13.1
thalia4 Ready <none> 48d v1.13.1
EDIT 2
calicoctl node status
on thalia4 gave
[sudo] password for gms:
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+----------+---------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+---------+
| 134.xx.xx.162 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.xx.xx.163 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.xx.xx.164 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.xx.xx.165 | node-to-node mesh | start | 02:36:29 | Connect |
+---------------+-------------------+-------+----------+---------+
while kubectl describe node thalia4
gave
Name: thalia4.domain
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
dns=dns4
kubernetes.io/hostname=thalia4
node_name=thalia4
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
projectcalico.org/IPv4Address: 134.xx.xx.168/26
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Mon, 03 Dec 2018 14:17:07 -0600
Taints: <none>
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk Unknown Fri, 21 Dec 2018 11:58:38 -0600 Sat, 12 Jan 2019 16:44:10 -0600 NodeStatusUnknown Kubelet stopped posting node status.
MemoryPressure False Mon, 21 Jan 2019 20:54:38 -0600 Sat, 12 Jan 2019 16:50:18 -0600 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 21 Jan 2019 20:54:38 -0600 Sat, 12 Jan 2019 16:50:18 -0600 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 21 Jan 2019 20:54:38 -0600 Sat, 12 Jan 2019 16:50:18 -0600 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 21 Jan 2019 20:54:38 -0600 Sun, 20 Jan 2019 20:27:10 -0600 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 134.xx.xx.168
Hostname: thalia4
Capacity:
cpu: 4
ephemeral-storage: 6878Mi
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8009268Ki
pods: 110
Allocatable:
cpu: 4
ephemeral-storage: 6490895145
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 7906868Ki
pods: 110
System Info:
Machine ID: c011569a40b740a88a672a5cc526b3ba
System UUID: 42093037-F27E-CA90-01E1-3B253813B904
Boot ID: ffa5170e-da2b-4c09-bd8a-032ce9fca2ee
Kernel Version: 3.10.0-957.1.3.el7.x86_64
OS Image: Red Hat Enterprise Linux
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.13.1
Kubelet Version: v1.13.1
Kube-Proxy Version: v1.13.1
PodCIDR: 192.168.4.0/24
Non-terminated Pods: (3 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system calico-node-8xqbs 250m (6%) 0 (0%) 0 (0%) 0 (0%) 24h
kube-system coredns-786f4c87c8-sbks2 100m (2%) 0 (0%) 70Mi (0%) 170Mi (2%) 47h
kube-system kube-proxy-zp4fk 0 (0%) 0 (0%) 0 (0%) 0 (0%) 31d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 350m (8%) 0 (0%)
memory 70Mi (0%) 170Mi (2%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
I'm thinking this is a firewall problem, but I was told on the Slack channel that "If you're not using host endpoints then we don't mess with your host's connectivity. It sounds like you've got something blocking port 179 on that host."
Not sure where that would be? The iptables rules look the same across all nodes.
--network-plugin=cni specifies that we use the cni network plugin with actual CNI plugin binaries located in --cni-bin-dir (default /opt/cni/bin) and CNI plugin configuration located in --cni-conf-dir (default /etc/cni/net.d).
For example
--network-plugin=cni
--cni-bin-dir=/opt/cni/bin #there maybe multi cni bin, such as calico/weave..., you can use command '/opt/cni/bin/calico -v' to show the calico version
--cni-conf-dir=/etc/cni/net.d #define detail cni plugin config, such as below:
{
"name": "calico-network",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "calico",
"mtu": 8950,
"policy": {
"type": "k8s"
},
"ipam": {
"type": "calico-ipam",
"assign_ipv6": "false",
"assign_ipv4": "true"
},
"etcd_endpoints": "https://172.16.1.5:2379,https://172.16.1.9:2379,https://172.16.1.15:2379",
"etcd_key_file": "/etc/etcd/ssl/etcd-client-key.pem",
"etcd_cert_file": "/etc/etcd/ssl/etcd-client.pem",
"etcd_ca_cert_file": "/etc/etcd/ssl/ca.pem",
"kubernetes": {
"kubeconfig": "/etc/kubernetes/cluster-admin.kubeconfig"
}
}
]
}
I figured out the issue. I had to add an explicit rule to iptables for the cali-failsafe-in
chain as sudo iptables -A cali-failsafe-in -p tcp --match multiport --dport 179 -j ACCEPT
on all nodes.
Now, everything appears to be functional across all nodes:
IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+-------------+
| 134.xx.xx.163 | node-to-node mesh | up | 19:33:58 | Established |
| 134.xx.xx.164 | node-to-node mesh | up | 19:33:40 | Established |
| 134.xx.xx.165 | node-to-node mesh | up | 19:35:07 | Established |
| 134.xx.xx.168 | node-to-node mesh | up | 19:35:01 | Established |
+---------------+-------------------+-------+----------+-------------+