If I am using Kubernetes cluster to run spark, then I am using Kubernetes resource manager in Spark.
If I am using Hadoop cluster to run spark, then I am using Yarn resource manager in Spark.
But my question is, if I am spawning multiple linux nodes in kebernetes, and use one of the node as spark maste and three other as worker, what resource manager should I use? can I use yarn over here?
Second question, in case of any 4 node linux spark cluster (not in kubernetes and not hadoop, simple connected linux machines), even if I do not have hdfs, can I use yarn here as resource manager? if not, then what resource manager should be used for saprk?
Thanks.
if I am spawning multiple linux nodes in kebernetes,
Then you'd obviously use kubernetes, since it's available
in case of any 4 node linux spark cluster (not in kubernetes and not hadoop, simple connected linux machines), even if I do not have hdfs, can I use yarn here
You can, or you can use Spark Standalone scheduler, instead. However Spark requires a shared filesystem for reading and writing data, so, while you could attempt to use NFS, or S3/GCS for this, HDFS is faster