K
Q

Question

Send spark driver logs running in k8s to Splunk

1/24/2020

I am trying to run a sample spark job in kubernetes by following the steps mentioned here: https://spark.apache.org/docs/latest/running-on-kubernetes.html.

I am trying to send the spark driver and executor logs to Splunk. Does spark provide any configuration to do the same? How do I send the Splunk configurations like the HEC endpoint, port, token, etc in the spark-submit command?

I did try passing it as args to the the spark driver as

bin/spark-submit
  --deploy-mode cluster
  --class org.apache.spark.examples.JavaSparkPi
  --master k8s://http://127.0.0.1:8001
  --conf spark.executor.instances=2
  --conf spark.app.name=spark-pi
  --conf spark.kubernetes.container.image=gcr.io/spark-operator/spark:v2.4.4
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=<account>
  --conf spark.kubernetes.docker.image.pullPolicy=Always
  --conf spark.kubernetes.namespace=default
  local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar
  --log-driver=splunk
  --log-opt splunk-url=<url:port>
  -—log-opt splunk-token=<token>
  --log-opt splunk-index=<index>
  --log-opt splunk-sourcetype=<sourceType>
  --log-opt splunk-format=json

but the logs were not forwarded to the desired index.

I am using spark version 2.4.4 to run spark-submit.

Thanks in advance for any inputs!!

-- bornidiot

apache-spark

kubernetes

logging

spark-submit

splunk

1 Answer

1/25/2020

Hi and welcome to the Stackoverflow.

I've searched the web for a while trying to find the similar to your question cases of Spark + Splunk usages. What I've managed to realize is that possibly you're mixing several things. Referring the Docker docs about Splunk logging driver seems that you try to reproduce the same steps with `spark-submit. Unfortunately for you it doesn't work so.

Basically all the config options after local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar ... in your script are the program arguments for the org.apache.spark.examples.JavaSparkPi#main method , which (unless you customize it) simply ignores them.

What you need to do is to connect your Kubrnetes cluster to the Splunk API. One of the ways of doing that is installing the Splunk Connector to you Kubernetes cluster. Depending on your environment specifics there can be other ways of doing that, but reading the docs is a good place to start.

Hope it directs you to the right road.

-- Aliaksandr Sasnouskikh

Source: StackOverflow

KQ

Send spark driver logs running in k8s to Splunk

Similar Questions

1 Answer

K
Q