So I have cluster node on Google Kubernetes Engine and I do spark-submit to run some spark job. (I didn't use spark-submit exactly, I launch the submit using java code, but they are essentially invoking the same Scala class, which is SparkSubmit.class)
And in my case, I have two clusters I can connect with on my laptop by using the gcloud command.
e.g.
gcloud container clusters get-credentials cluster-1
gcloud container clusters get-credentials cluster-2
when I connect to cluster-1, and spark-submit is submitting to cluster-1, it works. But when I ran the second gcloud command and still submitting to cluster-1, it won't work, and the following stack track appears (abridged version)
io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:194)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:208)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:148)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514)
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
I've been searching for a while without success. The main issue is probably when spark-submit launches, it searches for some sort of credential on the local machine relating to Kubernetes, and the changing context by previous two gcloud command messed it up.
I'm just curious, when we do spark-submit, how exactly does the remote K8s server knows who I am? What's the auth process involved in all this?
Thank you in advance.
If you want to see what the gcloud container clusters get-credentials cluster-1
command does you can start from scratch again and look at the content of ~/.kube/config
rm -rf ~/.kube
gcloud container clusters get-credentials cluster-1
cat ~/.kube/config
gcloud container clusters get-credentials cluster-2
cat ~/.kube/config
Something is probably not matching or conflicting. Or perhaps the user/contexts. Perhaps you have credentials for both cluster but you are using the context for cluster-1
to access cluster-2
$ kubectl config get-contexts
$ kubectl config get-clusters
The structure of the ~/.kube/config
file should be something like this:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: <redacted> or file
server: https://<IP>:6443
name: cluster-1
- cluster:
certificate-authority: <redacted> or file
server: https://<IP>:8443
name: cluster-2
contexts:
- context:
cluster: cluster-1
user: youruser
name: access-to-cluster-1
- context:
cluster: cluster-2
user: youruser
name: access-to-cluster-2
current-context: access-to-cluster-1
kind: Config
preferences: {}
users:
- name: ....
user:
...
- name: ....
user:
...
In the code, it looks like it uses the io.fabric8.kubernetes.client.KubernetesClient
library. For example, in this file KubernetesDriverBuilder.scala
A PKIX path building failed
error means Java tries to open an SSL connection, but was unable to find a chain of certificates (the path) that validates the certificate the server offered.
The code you're running from does not trust the certificate offered by the cluster. The clusters are probably using self-signed certificates.
Run from the command line, Java looks for the chain in the truststore located at jre/lib/security/cacerts. Run as part of a larger environment (Tomcat, Glassfish, etc) it will use that environment's certificate truststore.
Since you started spark_submit manually, you're likely missing an option to specify where to find the keystore (server certificate and private key) and truststore (CA certificates). These are usually specified as:
-Djavax.net.ssl.trustStore=/somepath/truststore.jks
-Djavax.net.ssl.keyStore=/somepath/keystore.jks
If you're running on Java 9+, you will also need to specify the StoreType:
-Djavax.net.ssl.keyStoreType=<TYPE>
-Djavax.net.ssl.trustStoreType=<TYPE>
Up through Java 8, the keystores were always JKS. Since Java 9 they can also be PKCS12.
In the case of a self-signed key, you can export it from the keystore and import it into the truststore as a trusted certificate. There are several sites with instructions for how to do this. I find Jakob Jenkov's site to be quite readable.