I am writing a Kubernetes Spark Application using GCP spark on k8s.
Currently, I am stuck at not being able to inject environment variables into my container.
I am following the doc here
Manifest:
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-search-indexer
namespace: spark-operator
spec:
type: Scala
mode: cluster
image: "gcr.io/spark-operator/spark:v2.4.5"
imagePullPolicy: Always
mainClass: com.quid.indexer.news.jobs.ESIndexingJob
mainApplicationFile: "https://lala.com/baba-0.0.43.jar"
arguments:
- "--esSink"
- "http://something:9200/mo-sn-{yyyy-MM}-v0.0.43/searchable-article"
- "-streaming"
- "--kafkaTopics"
- "annotated_blogs,annotated_ln_news,annotated_news"
- "--kafkaBrokers"
- "10.1.1.1:9092"
sparkVersion: "2.4.5"
restartPolicy:
type: Never
volumes:
- name: "test-volume"
hostPath:
path: "/tmp"
type: Directory
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
env:
- name: "DEMOGRAPHICS_ES_URI"
value: "somevalue"
labels:
version: 2.4.5
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
executor:
cores: 1
instances: 1
memory: "512m"
env:
- name: "DEMOGRAPHICS_ES_URI"
value: "somevalue"
labels:
version: 2.4.5
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
Environment Variables set at pod:
Environment:
SPARK_DRIVER_BIND_ADDRESS: (v1:status.podIP)
SPARK_LOCAL_DIRS: /var/data/spark-1ed8539d-b157-4fab-9aa6-daff5789bfb5
SPARK_CONF_DIR: /opt/spark/conf
It turns out to use this one must enable webhooks
(how to set up in quick-start guide here)
The other approach could be to use envVars
Example:
spec:
executor:
envVars:
DEMOGRAPHICS_ES_URI: "somevalue"
Ref: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/978