How can I fix frequent, but intermittent TLS handshake timeouts in kubectl?

9/5/2019

I'm encountering TLS handshake timeout when trying to perform a number of operations against a local Kubernetes cluster on macOS 10.14.6. The errors show up when doing any kubectl action, any helm action (including helm init and helm version), as well as during deployments.

I've tried rebooting Docker for Mac, as well as rebooting the physical host machine, and wiping and recreating the cluster (which is difficult given that deployments will spontaneously fail because of the TLS handshake issue). I've also made sure that my major/minor/patch versions for kubectl (1.14.3 client, 1.14.6 server) and helm (2.9.1) all match those being used by known-good local deployments in the office.

I've also reviewed the firewall rules on my machine, but haven't found anything that would obviously cause this kind of issue.

Additionally, I've browsed many of the threads discussing this on the issue trackers for k8s itself as well as Helm, plus the questions already on SO, but these overwhelmingly concern Azure's AKS, while I'm working on a local setup.

Finally, I've made sure that enough resources are allocated to actually run the target applications -- in this case 16GB of RAM (which I've tried unsuccessfully upgrading to 24GB) as well as 8 CPU cores.

This problem seems to show up at random, and while it's most often manifested as a TLS handshake timeout, it will also occasionally interrupt an established connection, with skaffold run commands sometimes crashing out with "transport closed." It also doesn't seem to be caused by any missing certs, since the commands eventually succeed -- but the success rate is very low, of the order of 1 in 10 calls.

-- verandaguy
kubectl
kubernetes
kubernetes-helm
timeout

0 Answers