We have a service that sends tons of events in bulks. It basically opens multiple http POST connections.
Since we moved the service to kubernetes, we're getaddrinfo: Temporary failure in name resolution
errors from time to time. (most calls work but some fail and it's weird.
Can anyone explain why and how to fix?
Thanks!
Check the tinder post, they had a similar problem:
https://medium.com/tinder-engineering/tinders-move-to-kubernetes-cda2a6372f44
and the source for their dns info:
https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts
TLDR: check your arp tables cache gc_* host parameters, try to disable AAAA query in the containers /etc/gai.conf, move the DNS to a daemonset and inject the host IP as dns servers to the pods