I've used the helm chart to deploy Spark to Kubernetes in GCE. According to default configuration in values.yaml the Spark is deployed to the path /opt/spark
. I've checked that Spark has deployed successfully by running kubectl --namespace=my-namespace get pods -l "release=spark"
. There is 1 master and 3 workers running.
However when I've tried to check Spark version by executing spark-submit --version
from the Google cloud console it returned -bash: spark-submit: command not found
.
I've navigated to the /opt
directory and the /spark
folder is missing. What should I do to be able to open Spark shell Terminal and to execute Spark commands?
You can verify by checking service
kubectl get services -n <namespace>
you can port-forward particular service and try running locally to check
kubectl port-forward svc/<service name> <external port>:<internal port or spark running port>
Locally you can try running spark terminal it will be connected to spark running on GCE instance.
If you check the helm chart document there is also options for UI you can also do same to access UI via port-forward
Access via SSH inside pod
Kubectl exec -it <spark pod name> -- /bin/bash
here you can directly run spark commands. spark-submit --version
Access UI
Access UI via port-forwarding if you have enable UI in helm chart.
kubectl port-forward svc/<spark service name> <external port>:<internal port or spark running port>
External Load balancer
This particular helm chart also creating External Load balancer
you can also get External IP
using
Kubectl get svc -n <namespace>
Access Shell
If want to connect via LB IP & port
./bin/spark-shell --conf spark.cassandra.connection.host=<Load balancer IP> spark.cassandra-connection.native.port=<Port>
Creating connection using port-forward
kubectl port-forward svc/<spark service name> <external(local) port>:<internal port or spark running port>
./bin/spark-shell --conf spark.cassandra.connection.host=localhost spark.cassandra-connection.native.port=<local Port>
One way would be login to pod and then run Spark commands
List the pod kubectl --namespace=my-namespace get pods -l "release=spark"
Now, Login to the pod using following command: kubectl exec -it <pod-id> /bin/bash
Now, you should be inside the pod and can run spark commands spark-submit --version
Hope this helps.