Spark and HDFS on Kuberenetes data locality

8/22/2018

I'm trying to run Spark on K8 and struggling a bit with data locality. I'm using the native spark support but just watched https://databricks.com/session/hdfs-on-kubernetes-lessons-learned. I've followed the steps there in setting up my HDFS cluster (namenode on first k8 node, using host networking). I was wondering if anyone knows if the fix to the spark driver presented has been merged into the mainline spark code?

I ask as I still see ANY locality in places I'd expect NODE_LOCAL.

-- Paul Wolfe
apache-spark
hdfs
kubernetes

1 Answer

11/14/2019

The code has been a part of version v2.2.0-kubernetes-0.4.0

-- Pritam Sadhukhan
Source: StackOverflow