How to query hdfs from a spark cluster (2.1) which is running on kubernetes?

12/14/2018

I was trying to access HDFS files from a spark cluster which is running inside Kubernetes containers.

However I keep on getting the error: AnalysisException: 'The ORC data source must be used with Hive support enabled;'

What I am missing here?

-- Alok Gogate
kubernetes
pyspark
python-3.x

1 Answer

12/14/2018

Are you have SparkSession created with enableHiveSupport()?

Similar issue: Spark can access Hive table from pyspark but not from spark-submit

--
Source: StackOverflow