We would like to test some Spark
submission on a Kubernetes
cluster;
However, the official documentation is kind of ambiguous.
Spark can run on clusters managed by Kubernetes. This feature makes use of native Kubernetes scheduler that has been added to Spark.
The Kubernetes scheduler is currently experimental. In future versions, there may be behavioral changes around configuration, container images and entrypoints.
Does this mean that the kubernetes
scheduler itself is experimental or some kind of its implementation related to spark?
Does it make sense to run spark on Kubernetes
in production-grade environments?
Yes, it's experimental if you are using the Spark Kubernetes scheduler like you mentioned here. Use it at your own risk.
Not really, if you are running a standalone cluster in Kubernetes without the Kubernetes scheduler. This means create a master in a Kubernetes pod and then allocate a number of slave pods that talk to that master. Then submitting your jobs with the good old spark-summit
without --master k8s://
command and with the usual --master spark://
command. The downside of this basically that your Spark cluster in Kubernetes is static.