I am trying to use spark
on Kubernetes
. Idea is to using spark-submit
to k8s
cluster which is running prometheus operator
. Now I know that prometheus operator
can respond to ServiceMonitor
yaml but I am confused how to provide some of the things required in the YAML
using spark-submit
Here is the YAML
:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: sparkloads-metrics
namespace: runspark
spec:
selector:
matchLabels:
app: runspark
namespaceSelector:
matchNames:
- runspark
endpoints:
- port: 8192 ---> How to provide the name to port using `spark-submit`
interval: 30s
scheme: http
You cannot provide additional ports and their names to the Service created by SparkSubmit
yet (Spark v2.4.4). Things can change in the later versions.
What you can do is create additional Kubernetes Service
(Spark Monitoring Service, eg. of type Cluster IP) per Spark job after the Job submission with SparkSubmit
, for instance running spark-submit ... && kubectl apply ...
. Or use any of the available Kubernetes clients with the language of your choice.
Note that you can use Kubernetes OwnerReference to configure automatic Service
deletion/GC on Spark Driver Pod
deletion.
Then you can supply the ServiceMonitor
's via the Prometheus Operator Helm values:
prometheus:
additionalServiceMonitors:
- name: spark-metrics # <- Spark Monitoring Service name
selector:
matchLabels:
k8s-app: spark-metrics # <- Spark Monitoring Service label
namespaceSelector:
any: true
endpoints:
- interval: 10s
port: metrics # <- Spark Monitoring Service port name
Be aware of the fact that Spark doesn't provide a way to customize Spark Pod
s yet, so your Pod ports which should expose metrics are not exposed on a Pod level and won't be accessible via Service
. To overcome it you can add additional EXPOSE ... 8088
statement in the Dockerfile
and rebuild Spark image.
This guide should help you to setup Spark monitoring with PULL strategy using for example Jmx Exporter.
There is an alternative (though it is recommended only for short-running Spark jobs, but you can try it in your environment if you do not run huge workloads):
By doing that your Spark Pods will PUSH metrics to the Gateway and Prometheus will PULL them from the Gateway in order.
You can refer the Spark Monitoring Helm chart example with the Prometheus Operator and Prometheus Pushgateway combined.
Hope it helps.