My EKS cluster becomes unhealthy with errors of "ContainerCreating" from all pods which might be related to issues with CNI.
Once I launched new node workers they are not getting the "Ready" state and prompt errors of:
"couldn't get current server API group list; will keep using cached value. (Get https://172.20.0.1:443/api?timeout=32s: dial tcp
172.20.0.1:443: i/o timeout) Failed to communicate with K8S Server. Please check instance security groups or http proxy setting"
I'm not using http proxy and the security groups allowed from private CIDR (Telnet to the API server from port 443 is working).
My CNI version is 1.5.5, according to some threads about this issue I've tried to downgrade the CNI to 1.5.3 - Nodes still didn't connect, and to 1.5.1 - Nodes were connected as the /etc/cni/net.d/10-aws.conflist file exists but pods didn't manage to connect to them.
In version 1.5.5 the location of the conflist file changed to /etc/cni/10-aws.conflist but still nodes are in "NotReady" state.
My EKS version is 1.14 and the platform version is eks.2.
Ipamd log:
2019-11-27T09:09:13.446Z [INFO] Starting L-IPAMD v1.5.5 ...
2019-11-27T09:09:43.447Z [INFO] Testing communication with server
2019-11-27T09:10:13.448Z [INFO] Failed to communicate with K8S Server. Please check instance security groups or http proxy setting
2019-11-27T09:10:13.448Z [ERROR] Failed to create client: error communicating with apiserver: Get https://172.20.0.1:443/version?timeout=32s: dial tcp 172.20.0.1:443: i/o timeout
The errors from the containers are:
Warning FailedCreatePodSandBox 17m kubelet, ip-10-1-1-144.eu-west-1.compute.internal Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "b02f175d5e68011332655e0d6e6aa3ae226bbd7bf447c7461c0140a7e026d831" network for pod "coredns-759d6fc95f-zx292": NetworkPlugin cni failed to set up pod "coredns-759d6fc95f-zx292_kube-system" network: failed to find plugin "aws-cni" in path [/opt/cni/bin], failed to clean up sandbox container "b02f175d5e68011332655e0d6e6aa3ae226bbd7bf447c7461c0140a7e026d831" network for pod "coredns-759d6fc95f-zx292": NetworkPlugin cni failed to teardown pod "coredns-759d6fc95f-zx292_kube-system" network: failed to find plugin "aws-cni" in path [/opt/cni/bin]]
Normal SandboxChanged 2m47s (x70 over 17m) kubelet, ip-10-1-1-144.eu-west-1.compute.internal Pod sandbox changed, it will be killed and re-created.
CNI Image: 602401143452.dkr.ecr.eu-west-1.amazonaws.com/amazon-k8s-cni:v1.5.5
/opt/cni/bin/aws-cni-support.sh script output: /opt/cni/bin/aws-cni-support.sh
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 61679: Connection refused
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 61679: Connection refused
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 61679: Connection refused
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 61679: Connection refused
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 61679: Connection refused
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 61678: Connection refused
tar: Removing leading `/' from member names
/var/log/aws-routed-eni/
/var/log/aws-routed-eni/ipamd.log.2019-11-27-09
/var/log/aws-routed-eni/ipamd.log.2019-11-27-10
/var/log/aws-routed-eni/eni.out
/var/log/aws-routed-eni/pod.out
/var/log/aws-routed-eni/networkutils-env.out
/var/log/aws-routed-eni/ipamd-env.out
/var/log/aws-routed-eni/eni-configs.out
/var/log/aws-routed-eni/metrics.out
/var/log/aws-routed-eni/ifconfig.out
/var/log/aws-routed-eni/iprule.out
/var/log/aws-routed-eni/iptables-save.out
/var/log/aws-routed-eni/iptables.out
/var/log/aws-routed-eni/iptables-nat.out
/var/log/aws-routed-eni/iptables-mangle.out
/var/log/aws-routed-eni/cni/
/var/log/aws-routed-eni/cni/10-aws.conflist
/var/log/aws-routed-eni/messages
/var/log/aws-routed-eni/route.out
/var/log/aws-routed-eni/sysctls.out
Also, a lot of the following errors appear in /var/log/aws-routed-eni/messages: network: failed to find plugin \"aws-cni\" in path [/opt/cni/bin]"
There is no /opt/cni/bin/aws-cni file.
Does anyone have any clue on what the issue could be?