I have a vanilla EKS cluster deployed with Terraform
at version 1.14 with RBAC enabled, but nothing installed into the cluster. I just executed linkerd install | kubecetl apply -f -
.
After that completes I have waited about 4 minutes for things to stabilize. Running kubectl get pods -n linkerd
shows me the following:
linkerd-destination-8466bdc8cc-5mt5f 2/2 Running 0 4m20s
linkerd-grafana-7b9b6b9bbf-k5vc2 1/2 Running 0 4m19s
linkerd-identity-6f78cd5596-rhw72 2/2 Running 0 4m21s
linkerd-prometheus-64df8d5b5c-8fz2l 2/2 Running 0 4m19s
linkerd-proxy-injector-6775949867-m7vdn 1/2 Running 0 4m19s
linkerd-sp-validator-698479bcc8-xsxnk 1/2 Running 0 4m19s
linkerd-tap-64b854cdb5-45c2h 2/2 Running 0 4m18s
linkerd-web-bdff9b64d-kcfss 2/2 Running 0 4m20s
For some reason linkerd-proxy-injector
, linkerd-proxy-injector
, linkerd-controller
, and linkerd-grafana
are not fully started
Any ideas as to what I should check? The linkerd-check
command is hanging.
The logs for the linkerd-controller
show:
linkerd-controller-68d7f67bc4-kmwfw linkerd-proxy ERR! [ 335.058670s] admin={bg=identity} linkerd2_proxy::app::identity Failed to certify identity: grpc-status: Unknown, grpc-message: "the request could not be dispatched in a timely fashion"
and
linkerd-proxy ERR! [ 350.060965s] admin={bg=identity} linkerd2_proxy::app::identity Failed to certify identity: grpc-status: Unknown, grpc-message: "the request could not be dispatched in a timely fashion"
time="2019-10-18T21:57:49Z" level=info msg="starting admin server on :9996"
Deleting the pods and restarting the deployments results in different components becoming ready, but the entire control plane never becomes fully ready.
A Linkerd community member answered with:
Which VPC CNI version do you have installed? I ask because of: - https://github.com/aws/amazon-vpc-cni-k8s/issues/641 - https://github.com/mogren/amazon-vpc-cni-k8s/commit/7b2f7024f19d041396f9c05996b70d057f96da11
And after testing, this was the solution:
Sure enough, downgrading the AWS VPC CNI to v1.5.3 fixed everything in my cluster
Not sure why, but it does. It seems that admission controllers are not working with v1.5.4
So, the solution is to use AWS VPC CNI v1.5.3 until the root cause in AWS VPC CNIN v1.5.4 is determined.