Recurring "Can't connect to MySQL server .. Temporary failure in name resolution" in GKE cluster

3/10/2019

I deployed MySQL server using mysql:5.7 image on my GKE cluster. Its deployed with one replica and exposed with a ClusterIP service named "mysql-server".

In the last few hours I'm experiencing recurring flaky errors from other pods that are running Python servers:

sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'mysql-server' ([Errno -3] Temporary failure in name resolution)")

I've gone over Kubernetes DNS debugging and found no errors or other issues, except for CoreDNS not running at all in any of my clusters.

When executing nslookup mysql-server on another pod, I'm getting an healthy output.

Server:     10.39.240.10
Address:    10.39.240.10#53

Name:   mysql-server.default.svc.cluster.local
Address: 10.39.245.88

However, ping mysql-server never returns, don't know if its relevant.

PING mysql-server.default.svc.cluster.local (10.39.245.88) 56(84) bytes of data.
^C
--- mysql-server.default.svc.cluster.local ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2039ms

Would this be an issue on mysql or GKE? How can I debug it further?

-- Mugen
google-kubernetes-engine
kube-dns
kubernetes
mysql

0 Answers