Spark is unable to connect to mysql when deployed on GKE

10/21/2019

I am deploying a batch spark job on Kubernetes on GKE. Job tries to fetch some data from MySQL (Google Cloud SQL) but it gives connection link failure. I tried manually connecting to mysql by installing mysql client from pod and it connected fine. is there any additional thing which I need to configure?

Exception:

Exception in thread "main" com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.

at com.mysql.cj.jdbc.exceptions.SQLError.createCommunicationsException(SQLError.java:590)
        at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:57)
        at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:1606)
        at com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:633)
        at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:347)
        at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:219)
        at org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.connect(DriverWrapper.scala:45)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:63)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:54)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:56)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
-- Durgesh Choudhary
apache-spark
apache-spark-sql
google-kubernetes-engine
kubernetes

1 Answer

10/22/2019

Issue was actually with the firewall rules in GCP. Working fine now.

-- Durgesh Choudhary
Source: StackOverflow