Spark application syncing with Hive metastore - "There is no primary group for UGI spark" error

9/22/2021

I'm running a simple Spark job on Kubernetes cluster that writes data to HDFS with Hive catologization. For whatever reason my app fails to run Spark SQL commands with the following exception:

21/09/22 09:23:54 ERROR SplunkStreamListener: |exception=org.apache.spark.sql.AnalysisException
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: java.io.IOException There is no primary group for UGI spark (auth:SIMPLE));
	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
	at org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:183)
	at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createDatabase(ExternalCatalogWithListener.scala:47)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:211)
	at org.apache.spark.sql.execution.command.CreateDatabaseCommand.run(ddl.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
	at org.apache.spark.sql.Dataset$anonfun$6.apply(Dataset.scala:194)
	at org.apache.spark.sql.Dataset$anonfun$6.apply(Dataset.scala:194)
	at org.apache.spark.sql.Dataset$anonfun$52.apply(Dataset.scala:3370)
	at org.apache.spark.sql.execution.SQLExecution$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
	at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$withAction(Dataset.scala:3369)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)

I'm connecting to Hive metastore via Thrift URL. The docker container runs the application as non-root user. Are there some kind of groups I need the user to be added to sync with the metastore?

-- eugen-fried
apache-spark
apache-spark-sql
hive
hive-metastore
kubernetes

1 Answer

9/22/2021

Try add this before setting up the spark context

System.setProperty("HADOOP_USER_NAME", "root")
-- Funzo
Source: StackOverflow