Problem executing spark on kubernetes - acess Minio s3a endpoint - Executor lost due to .jar file

9/23/2021

guys!

I'm having a problem when trying to execue a spark job on kubernetes (using k8s spark-operator). I'm trying to access a local minio bucket using s3a endpoint. My code work locally but when I try to kubectl apply -f .yaml to start the cluster, I get the following error:

++ id -u
+ myuid=1001
++ id -g
+ mygid=0
+ set +e
++ getent passwd 1001
+ uidentry=
+ set -e
+ '[' -z '' ']'
+ '[' -w /etc/passwd ']'
+ echo '1001:x:1001:0:anonymous uid:/opt/spark:/bin/false'
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z ']'
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
+ case "$1" in
+ shift 1
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.6 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner local:///app/main.py
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
21/09/23 15:13:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/09/23 15:13:55 INFO SparkContext: Running Spark version 3.1.1
21/09/23 15:13:55 INFO ResourceUtils: ==============================================================
21/09/23 15:13:55 INFO ResourceUtils: No custom resources configured for spark.driver.
21/09/23 15:13:55 INFO ResourceUtils: ==============================================================
21/09/23 15:13:55 INFO SparkContext: Submitted application: job-a
21/09/23 15:13:55 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
21/09/23 15:13:55 INFO ResourceProfile: Limiting resource is cpus at 1 tasks per executor
21/09/23 15:13:55 INFO ResourceProfileManager: Added ResourceProfile id: 0
21/09/23 15:13:55 INFO SecurityManager: Changing view acls to: 1001,root
21/09/23 15:13:55 INFO SecurityManager: Changing modify acls to: 1001,root
21/09/23 15:13:55 INFO SecurityManager: Changing view acls groups to:
21/09/23 15:13:55 INFO SecurityManager: Changing modify acls groups to:
21/09/23 15:13:55 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(1001, root); groups with view permissions: Set(); users  with modify permissions: Set(1001, root); groups with modify permissions: Set()
21/09/23 15:13:55 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
21/09/23 15:13:55 INFO SparkEnv: Registering MapOutputTracker
21/09/23 15:13:55 INFO SparkEnv: Registering BlockManagerMaster
21/09/23 15:13:55 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/09/23 15:13:55 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/09/23 15:13:55 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
21/09/23 15:13:55 INFO DiskBlockManager: Created local directory at /var/data/spark-db401653-53fc-4038-8a27-913af26bb90b/blockmgr-e5c8b6a8-76d5-4960-94c0-a30aa8a63997
21/09/23 15:13:55 INFO MemoryStore: MemoryStore started with capacity 516.0 MiB
21/09/23 15:13:55 INFO SparkEnv: Registering OutputCommitCoordinator
21/09/23 15:13:55 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/09/23 15:13:55 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://job-a-4901ff7c1337fcec-driver-svc.default.svc:4040
21/09/23 15:13:55 INFO SparkContext: Added JAR file:/tmp/spark-2396d3e2-06fb-40d1-a42f-a438d79ba31f/hadoop-aws-3.2.0.jar at spark://job-a-4901ff7c1337fcec-driver-svc.default.svc:7078/jars/hadoop-aws-3.2.0.jar with timestamp 1632410035212
21/09/23 15:13:55 INFO SparkContext: Added JAR file:/tmp/spark-2396d3e2-06fb-40d1-a42f-a438d79ba31f/aws-java-sdk-bundle-1.11.375.jar at spark://job-a-4901ff7c1337fcec-driver-svc.default.svc:7078/jars/aws-java-sdk-bundle-1.11.375.jar with timestamp 1632410035212
21/09/23 15:13:55 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
21/09/23 15:13:56 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1 running: 0.
21/09/23 15:13:56 INFO BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown script
21/09/23 15:13:56 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
21/09/23 15:13:56 INFO NettyBlockTransferService: Server created on job-a-4901ff7c1337fcec-driver-svc.default.svc:7079
21/09/23 15:13:56 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/09/23 15:13:56 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, job-a-4901ff7c1337fcec-driver-svc.default.svc, 7079, None)
21/09/23 15:13:56 INFO BlockManagerMasterEndpoint: Registering block manager job-a-4901ff7c1337fcec-driver-svc.default.svc:7079 with 516.0 MiB RAM, BlockManagerId(driver, job-a-4901ff7c1337fcec-driver-svc.default.svc, 7079, None)
21/09/23 15:13:56 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, job-a-4901ff7c1337fcec-driver-svc.default.svc, 7079, None)
21/09/23 15:13:56 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, job-a-4901ff7c1337fcec-driver-svc.default.svc, 7079, None)
21/09/23 15:14:01 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.17.0.7:33978) with ID 1,  ResourceProfileId 0
21/09/23 15:14:02 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
21/09/23 15:14:02 INFO BlockManagerMasterEndpoint: Registering block manager 172.17.0.7:38165 with 413.9 MiB RAM, BlockManagerId(1, 172.17.0.7, 38165, None)
21/09/23 15:14:02 ERROR TaskSchedulerImpl: Lost an executor 1 (already removed): Unable to create executor due to ./hadoop-aws-3.2.0.jar

Bellow are my Docker file that I used to build the image I'm using and my .yaml file with the kubernetes configuration for spark:

yaml file:

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: job-a
  namespace: default
spec:
  type: Python
  mode: cluster
  image: "arthexbr77/sparkjob:latest"
  imagePullPolicy: Always
  mainApplicationFile: local:///app/main.py
  sparkVersion: "3.1.1"
  restartPolicy:
    type: Never
  hadoopConf:
    "fs.s3a.endpoint": "http://127.0.0.1:9000"
  driver:
    cores: 1
    coreLimit: "2000m"
    memory: "1200m"
    labels:
      version: 3.1.1
    serviceAccount: spark
  executor:
    cores: 1  
    instances: 1
    memory: "1024m"
    labels:
      version: 3.1.1
  deps:
    jars:
      - https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.0/hadoop-aws-3.2.0.jar
      - https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.375/aws-java-sdk-bundle-1.11.375.jar

Dockerfile:

FROM spark-base/spark-py:1.0.0

USER root:root

WORKDIR /app

COPY requirements.txt . 

RUN pip install -r requirements.txt

COPY main.py .

USER 1001

It seens that the error is related to the 2 jars needed to connect with an s3a endpoint (aws-hadoop and java-sdk-bundle) but these two are listed in the yaml file, so I'm really lost in what to do to fix the problem.

Thanks for helping!

-- Arthur Cesarino
apache-spark
kubernetes
microk8s
pyspark

0 Answers