How to get access to Spark shell from Kubernetes?

2/13/2020

I've used the helm chart to deploy Spark to Kubernetes in GCE. According to default configuration in values.yaml the Spark is deployed to the path /opt/spark. I've checked that Spark has deployed successfully by running kubectl --namespace=my-namespace get pods -l "release=spark". There is 1 master and 3 workers running.

However when I've tried to check Spark version by executing spark-submit --version from the Google cloud console it returned -bash: spark-submit: command not found.

I've navigated to the /opt directory and the /spark folder is missing. What should I do to be able to open Spark shell Terminal and to execute Spark commands?

-- samba
apache-spark
bash
google-cloud-platform
kubernetes
scala

2 Answers

2/13/2020

You can verify by checking service

kubectl get services -n <namespace>

you can port-forward particular service and try running locally to check

kubectl port-forward svc/<service name> <external port>:<internal port or spark running port>

Locally you can try running spark terminal it will be connected to spark running on GCE instance.

If you check the helm chart document there is also options for UI you can also do same to access UI via port-forward

Access via SSH inside pod

Kubectl exec -it <spark pod name> -- /bin/bash

here you can directly run spark commands. spark-submit --version

Access UI

Access UI via port-forwarding if you have enable UI in helm chart.

kubectl port-forward svc/<spark service name> <external port>:<internal port or spark running port>

External Load balancer

This particular helm chart also creating External Load balancer you can also get External IP using

Kubectl get svc -n <namespace>

Access Shell

If want to connect via LB IP & port

./bin/spark-shell --conf spark.cassandra.connection.host=<Load balancer IP> spark.cassandra-connection.native.port=<Port>

Creating connection using port-forward

kubectl port-forward svc/<spark service name> <external(local) port>:<internal port or spark running port>

./bin/spark-shell --conf spark.cassandra.connection.host=localhost spark.cassandra-connection.native.port=<local Port>
-- Harsh Manvar
Source: StackOverflow

2/13/2020

One way would be login to pod and then run Spark commands

  1. List the pod kubectl --namespace=my-namespace get pods -l "release=spark"

  2. Now, Login to the pod using following command: kubectl exec -it <pod-id> /bin/bash

  3. Now, you should be inside the pod and can run spark commands spark-submit --version

Ref: https://kubernetes.io/docs/tasks/debug-application-cluster/get-shell-running-container/#getting-a-shell-to-a-container

Hope this helps.

-- pradeep
Source: StackOverflow