Capture Kubernetes Spark driver and executor logs in S3 and view in History Server

4/27/2021

I am running Spark 3.0.0 with Hadoop 2.7 on Kubernetes using spark-submit cli as following,

./spark-submit \
--master=k8s://https://api.k8s.my-domain.com \
--deploy-mode cluster \
--name sparkle \
--num-executors 2 \
--executor-cores 2 \
--executor-memory 2g \
--driver-memory 2g \
--class com.myorg.sparkle.Sparkle \
--conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/opt/spark/conf/log4j.properties \
--conf spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/opt/spark/conf/log4j.properties \
--conf spark.kubernetes.submission.waitAppCompletion=false \
--conf spark.kubernetes.allocation.batch.delay=10s \
--conf spark.kubernetes.appKillPodDeletionGracePeriod=20s \
--conf spark.kubernetes.node.selector.workloadType=spark \
--conf spark.kubernetes.driver.pod.name=sparkle-driver \
--conf spark.kubernetes.container.image=custom-registry/spark:latest \
--conf spark.kubernetes.namespace=spark \
--conf spark.eventLog.dir='s3a://my-bucket/spark-logs' \
--conf spark.history.fs.logDirectory='s3a://my-bucket/spark-logs' \
--conf spark.eventLog.enabled='true' \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.authenticate.executor.serviceAccountName=spark \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.WebIdentityTokenCredentialsProvider \
--conf spark.kubernetes.driver.annotation.iam.amazonaws.com/role=K8sRoleSpark \
--conf spark.kubernetes.executor.annotation.iam.amazonaws.com/role=K8sRoleSpark \
--conf spark.kubernetes.driver.secretKeyRef.AWS_ACCESS_KEY_ID=aws-secrets:key \
--conf spark.kubernetes.driver.secretKeyRef.AWS_SECRET_ACCESS_KEY=aws-secrets:secret \
--conf spark.kubernetes.executor.secretKeyRef.AWS_ACCESS_KEY_ID=aws-secrets:key \
--conf spark.kubernetes.executor.secretKeyRef.AWS_SECRET_ACCESS_KEY=aws-secrets:secret \
--conf spark.hadoop.fs.s3a.endpoint=s3.ap-south-1.amazonaws.com \
--conf spark.hadoop.com.amazonaws.services.s3.enableV4=true \
--conf spark.yarn.maxAppAttempts=4 \
--conf spark.yarn.am.attemptFailuresValidityInterval=1h \
s3a://dp-spark-jobs/sparkle/jars/sparkle.jar \
--commonConfigPath https://my-bucket.s3.ap-south-1.amazonaws.com/sparkle/configs/prod_main_configs.yaml \
--jobConfigPath https://my-bucket.s3.ap-south-1.amazonaws.com/sparkle/configs/cc_configs.yaml \
--filePathDate 2021-03-29 20000

I have hosted a different pod running history server with the same image. The history server is able to read all the event logs and shows details. The job is executed successfully.

I do not see driver or executor logs in history server. Also, no stage logs are available. I'm passing log4j.properties with the following content,

# Define the root logger with appender file
log4j.rootLogger = INFO, console

# Redirect log messages to console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.Target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR

# Silence akka remoting
log4j.logger.Remoting=ERROR
log4j.logger.akka.event.slf4j=ERROR
log4j.logger.org.spark-project.jetty.server=ERROR
# log4j.logger.org.apache.spark=ERROR
log4j.logger.org.apache.spark.deploy=INFO
log4j.logger.com.anjuke.dm=${dm.logging.level}
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.hadoop=ERROR
log4j.logger.org.apache.hive=ERROR
log4j.logger.org.apache.spark.sql.hive=ERROR
log4j.logger.org.apache.hadoop.hive=ERROR
log4j.logger.org.datanucleus=ERROR

# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

# Silence Spark Streaming
log4j.logger.org.apache.spark.sql.execution.streaming.state=ERROR
log4j.logger.org.apache.spark.SparkContext=ERROR
log4j.logger.org.apache.spark.executor=ERROR
log4j.logger.org.apache.spark.storage=ERROR
log4j.logger.org.apache.spark.scheduler=ERROR
log4j.logger.org.apache.spark.ContextCleaner=ERROR
log4j.logger.org.apache.spark.sql=ERROR
log4j.logger.org.apache.parquet.hadoop=ERROR
log4j.logger.org.apache.spark.sql.kafka010.KafkaSource=ERROR

log4j.additivity.org.apache.spark=false
log4j.additivity.org.apache.hadoop=false
log4j.logger.org.spark_project.jetty=WARN
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR

# Silence Kafka
log4j.logger.org.apache.kafka=ERROR, stdout
log4j.additivity.org.apache.kafka=false
log4j.logger.org.apache.kafka.clients.consumer=ERROR
log4j.logger.org.apache.kafka.clients.consumer.internals.Fetcher=ERROR

The history server shows no log column in Executor tab. For the stages sections, the logs column is empty. It seems only events are logged and no stdout, stderr is captured from any of the pods

How do I ensure driver and executor stdout,stderr are logged to S3 and are available in History server?

-- Krishna Modi
apache-spark
kubernetes
log4j

0 Answers