Kubernetes spark-submit

2/10/2020

I am trying to use kuberenets as cluster manger for spark. I also want to ship the container logs to splunk. Now I do have monitoring stack deployed (fluent-bit, prometheus etc)in the same namespace and the way it works is if your pod has a certain environment_variable it will start reading the logs and push it to splunk. The thing I am not able to find is how do I set a environment variable and populate it

bin/spark-submit \
   --deploy-mode cluster \
   --class org.apache.spark.examples.SparkPi \
   --master k8s://https://my-kube-cluster.com \
   --conf spark.executor.instances=2 \
   --conf spark.app.name=spark-pi \
   ....
   ....
   ....
   --conf spark.kubernetes.driverEnv.UID="set it to spark driver pod id" \
   local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar
-- devnull
apache-spark
kubernetes
scala

1 Answer

2/11/2020

To configure additional Spark Driver Pod environment variables you can pass additional --conf spark.kubernetes.driverEnv.EnvironmentVariableName=EnvironmentVariableValue (please refer docs for more details).

To configure additional Spark Executor Pods environment variables you can pass additional --conf spark.executorEnv.EnvironmentVariableName=EnvironmentVariableValue (please refer docs for more details).

Hope it helps.

-- Aliaksandr Sasnouskikh
Source: StackOverflow