I'm using the Helm Chart to deploy Spark Operator to GKE. Then I define a SparkApplication
specification in a YAML file. But after reading the User Guide I still don't understand:
SparkApplication
YAML files on Kubernetes cluster or Google storage?SparkApplication
configurations to Google Storage and then run kubectl apply -f <YAML GS file path>
What are the best practices for storing SparkApplication
configurations on Kubernetes cluster or GS that I may be missing?
To address your questions:
There are a lot of possibilities to store your YAML
files. You can store it locally on your PC, laptop or you can store it in the cloud. Going further in that topic, syncing your YAML
files to version controlled system (for example Git) would be one of the better options because you will have full history of the changes with ability to check what changes you made and rollback if something failed. The main thing is that the kubectl
will need access to this files.
There is no such thing as master container in Kubernetes. There is master node. A master node is a machine which controls and manages a set of worker nodes (workloads runtime) Please check the official documentation about Kubernetes components.
You can put your YAML
files in your Google Storage (bucket). But you would not be able to run command in a way kubectl apply -f FILE
. kubectl
will not be able to properly interpret file location like gs://NAME_OF_THE_BUCKET/magical-deployment.yaml
.
One way to run kubectl apply -f FILE_NAME.yaml
would be to have it stored locally and synced outside.
You can access the data inside a bucket through gsutil
. You could try to tinker with gsutil cat gs://NAME_OF_THE_BUCKET/magical-deployment.yaml
and try to pipe it into kubectl
but I would not recommend that approach.
Please refer to gsutil
tool documentation in this case and be aware of:
The gsutil cat command does not compute a checksum of the downloaded data. Therefore, we recommend that users either perform their own validation of the output of gsutil cat or use gsutil cp or rsync (both of which perform integrity checking automatically).
-- https://cloud.google.com/storage/docs/gsutil/commands/cat
Let me know if you have any questions to this.