Where to store SparkApplication YAML files on Kubernetes cluster?

2/18/2020

I'm using the Helm Chart to deploy Spark Operator to GKE. Then I define a SparkApplication specification in a YAML file. But after reading the User Guide I still don't understand:

  1. Where to store SparkApplication YAML files on Kubernetes cluster or Google storage?
  2. Is it ok/possible to deploy them along with the Spark Operator Helm chart to the Spark Master container?
  3. Is it a good approach to load the SparkApplication configurations to Google Storage and then run kubectl apply -f <YAML GS file path>

What are the best practices for storing SparkApplication configurations on Kubernetes cluster or GS that I may be missing?

-- samba
apache-spark
google-cloud-platform
kubernetes
kubernetes-helm

1 Answer

2/19/2020

To address your questions:

  1. There are a lot of possibilities to store your YAML files. You can store it locally on your PC, laptop or you can store it in the cloud. Going further in that topic, syncing your YAML files to version controlled system (for example Git) would be one of the better options because you will have full history of the changes with ability to check what changes you made and rollback if something failed. The main thing is that the kubectl will need access to this files.

  2. There is no such thing as master container in Kubernetes. There is master node. A master node is a machine which controls and manages a set of worker nodes (workloads runtime) Please check the official documentation about Kubernetes components.

  3. You can put your YAML files in your Google Storage (bucket). But you would not be able to run command in a way kubectl apply -f FILE. kubectl will not be able to properly interpret file location like gs://NAME_OF_THE_BUCKET/magical-deployment.yaml.

    One way to run kubectl apply -f FILE_NAME.yaml would be to have it stored locally and synced outside.

    You can access the data inside a bucket through gsutil. You could try to tinker with gsutil cat gs://NAME_OF_THE_BUCKET/magical-deployment.yaml and try to pipe it into kubectl but I would not recommend that approach.

    Please refer to gsutil tool documentation in this case and be aware of:

    The gsutil cat command does not compute a checksum of the downloaded data. Therefore, we recommend that users either perform their own validation of the output of gsutil cat or use gsutil cp or rsync (both of which perform integrity checking automatically).

    -- https://cloud.google.com/storage/docs/gsutil/commands/cat

Let me know if you have any questions to this.

-- Dawid Kruk
Source: StackOverflow