I am trying to run a sample spark job in kubernetes by following the steps mentioned here: https://spark.apache.org/docs/latest/running-on-kubernetes.html.
I am trying to send the spark driver and executor logs to Splunk. Does spark provide any configuration to do the same? How do I send the Splunk configurations like the HEC endpoint, port, token, etc in the spark-submit command?
I did try passing it as args to the the spark driver as
bin/spark-submit
--deploy-mode cluster
--class org.apache.spark.examples.JavaSparkPi
--master k8s://http://127.0.0.1:8001
--conf spark.executor.instances=2
--conf spark.app.name=spark-pi
--conf spark.kubernetes.container.image=gcr.io/spark-operator/spark:v2.4.4
--conf spark.kubernetes.authenticate.driver.serviceAccountName=<account>
--conf spark.kubernetes.docker.image.pullPolicy=Always
--conf spark.kubernetes.namespace=default
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar
--log-driver=splunk
--log-opt splunk-url=<url:port>
-—log-opt splunk-token=<token>
--log-opt splunk-index=<index>
--log-opt splunk-sourcetype=<sourceType>
--log-opt splunk-format=json
but the logs were not forwarded to the desired index.
I am using spark version 2.4.4 to run spark-submit.
Thanks in advance for any inputs!!
Hi and welcome to the Stackoverflow.
I've searched the web for a while trying to find the similar to your question cases of Spark + Splunk
usages. What I've managed to realize is that possibly you're mixing several things. Referring the Docker docs about Splunk logging driver seems that you try to reproduce the same steps with `spark-submit. Unfortunately for you it doesn't work so.
Basically all the config options after local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar ...
in your script are the program arguments for the org.apache.spark.examples.JavaSparkPi#main
method , which (unless you customize it) simply ignores them.
What you need to do is to connect your Kubrnetes cluster to the Splunk API. One of the ways of doing that is installing the Splunk Connector to you Kubernetes cluster. Depending on your environment specifics there can be other ways of doing that, but reading the docs is a good place to start.
Hope it directs you to the right road.