I'm currently in the process of setting up a Kerberized environment for submitting Spark Jobs using Livy in Kubernetes.
What I've achieved so far:
To achieve this I used the following Versions for the involved components:
What I'm currently struggling with:
The error message I'm currently getting, when trying to access HDFS from the executor is the following:
org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "livy-session-0-1575455179568-exec-1/10.42.3.242"; destination host is: "hdfs-namenode-0.hdfs-namenode.hdfs.svc.cluster.local":8020;
The following is the current state:
Since KNIME is placing jar files on HDFS which have to be included in the dependencies for the Spark Jobs it is important to be able to access HDFS. (KNIME requires this to be able to retrieve preview data from DataSets for example)
I tried to find a solution to this but unfortunately, haven't found any useful resources yet. I had a look at the code an checked UserGroupInformation.getCurrentUser().getTokens()
. But that collection seems to be empty. That's why I assume that there are not Delegation Tokens available.
Has anybody ever achieved running something like this and can help me with this?
Thank you all in advance!
For everybody struggeling with this: It took a while to find the reason on why this is not working, but basically it is related to Spark's Kubernetes implementation as of 2.4.4. There is no override defined for CoarseGrainedSchedulerBackend
's fetchHadoopDelegationTokens
in KubernetesClusterSchedulerBackend
.
There has been the pull request which will solve this by passing secrets to executors containing the delegation tokens. It was already pulled into master and is available in Spark 3.0.0-preview but is not, at least not yet, available in the Spark 2.4 branch.