We have a Spark Java application which reads from database and publishes messages on Kafka. When we execute the job locally on windows command line with following arguments it is working as expected :
bin/spark-submit -class com.data.ingestion.DataIngestion --jars local:///opt/spark/jars/spark-sql-kafka-0-10_2.11-2.3.0.jar local:///opt/spark/jars/data-ingestion-1.0-SNAPSHOT.jar
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.0 --class com.data.ingestion.DataIngestion data-ingestion-1.0-SNAPSHOT.jar
Similarly, when try to run the command using k8s master
bin/spark-submit --master k8s://https://172.16.3.105:8443 --deploy-mode cluster --conf spark.kubernetes.container.image=localhost:5000/spark-example:0.2 --class com.data.ingestion.DataIngestion --jars local:///opt/spark/jars/spark-sql-kafka-0-10_2.11-2.3.0.jar local:///opt/spark/jars/data-ingestion-1.0-SNAPSHOT.jar
It gives following error:
Exception in thread "main" java.util.ServiceConfigurationError:
org.apache.spark.sql.sources.DataSourceRegister: Provider
org.apache.spark.sql.kafka010.KafkaSourceProvider could not be instantiated
Seems Scala version and Spark Kafka version were not aligned.
Based on the error, it would indicate at least one node in the cluster does not have /opt/spark/jars/spark-sql-kafka-0-10_2.11-2.3.0.jar
I suggest you create an uber jar that includes this Kafka Structured Streaming package or use --packages
rather than local files in addition to setup a solution like Rook or MinIO to have a shared filesystem within k8s/spark