How to connect to a kerberoized hdfs from Spark on Kubernetes?

1/31/2019

I'm trying to connect to hdfs which is kerberized which fails with the error

org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]

What additional parameters do I need to add while creating the spark setup apart from the standard thing that you need to spawn Spark worker containers?

-- Alok Gogate
kubernetes
pyspark

2 Answers

2/7/2019

I have also asked a very similar question here.

Firstly, please verify whether this is error is occurring on your driver pod or the executor pods. You can do this by looking at the logs of the driver and the executors as they start running. While I don't have any errors with my spark job running only on the master, I do face this error when I summon executors. The solution is to use a sidecar image. You can see an implementation of this in ifilonenko's project, which he referred to in his demo.

The premise of this approach is to store the delegation token (obtained by running a kinit) into a shared persistent volume. This volume can then be mounted to your driver and executor pods, thus giving them access to the delegation token, and therefore, the kerberized hdfs. I believe you're getting this error because your executors currently do not have the delegation token necessary for access to hdfs.

P.S. I'm assuming you've already had a look at Spark's kubernetes documentation.

-- K.Naga
Source: StackOverflow

1/31/2019

Check <property>hadoop.security.authentication<property> in your hdfs-site.xml properties file.
In your case it should have value kerberos or token.
Or you can configure it from code by specifying property explicitly:

Configuration conf = new Configuration();
conf.set("hadoop.security.authentication", "kerberos");

You can find more information about secure connection to hdfs here

-- ruslangm
Source: StackOverflow