unable to pull jar from GCP bucket when running spark jobs on k8s

4/1/2020

I am running spark on k8s version 2.4.5. I have stored the spark images in GCS which could be accessed by spark.kubernetes.container.image.pullSecrets config. I am also storing the spark application jar in GCP buckets.When making the bucket public the spark submit works fine. My question is how can I access the private bucket, are there any config to pass with spark? I have the service account created in GCP and also have the json.keyfile. Below is spark submit command:

bin/spark-submit --master k8s://https://host:port --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa --conf spark.executor.instances=3 --conf spark.kubernetes.container.image.pullSecrets=cr-k8s-key --conf spark.kubernetes.container.image=eu.gcr.io/Project-ID/spark-image/spark_2.4.5/spark:0.1.0 https://storage.googleapis.com/Bucket-name/spark-examples_2.11-2.4.5.jar

-- Susanta Adhikary
apache-spark
google-cloud-platform
kubernetes

1 Answer

4/1/2020

I used gsutil signed url to solve the issue. 1. gsutil signurl -d 10m -r eu /home/centos/private-key.json gs://bucket-name/spark-examples_2.11-2.4.5.jar . (where -r eu is my region (europe multi region).

  1. did some awk transformation : awk -F '\t' ‘FNR==2 {print $4}' by piping the 1st output.

  2. This signed url can be used from anywhere (for 10 minutes in my case) to access the bucket object.

-- Susanta Adhikary
Source: StackOverflow