I have a spark streaming job that I am trying to submit by a spark-k8-operator. I have kept the restart policy as Always. However, on the manual deletion of the driver the driver is not getting restarted. My yaml:
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: test-v2
namespace: default
spec:
type: Scala
mode: cluster
image: "com/test:v1.0"
imagePullPolicy: Never
mainClass: com.test.TestStreamingJob
mainApplicationFile: "local:///opt/spark-2.4.5/work-dir/target/scala-2.12/test-assembly-0.1.jar"
sparkVersion: "2.4.5"
restartPolicy:
type: Always
volumes:
- name: "test-volume"
hostPath:
path: "/tmp"
type: Directory
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 2.4.5
serviceAccount: spark
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
terminationGracePeriodSeconds: 60
executor:
cores: 1
instances: 2
memory: "512m"
labels:
version: 2.4.5
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
Spark version: 2.4.5 apiVersion: "sparkoperator.k8s.io/v1beta2"
Steps which I followed:
Create resource via kubectl apply -f examples/spark-test.yaml . Pod created successfully. Delete the driver manually.
Expected behavior: A new driver pod would be restarted as per the restart policy.
Actual behavior: Driver and executor pods got deleted.
Environment: Testing out this with Docker On Mac. With 4 CPUs and 8 GB Memory
Logs from spark -operator {FAILING driver pod failed with ExitCode: 143, Reason: Error}
There was an issue with the spark-K8 driver, now it has been fixed and I can see the manually deleted driver getting restarted. Basically code was not handling default values
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/pull/898
OR just have the following config in place so that default values are not required"
restartPolicy:
type: Always
onFailureRetries: 3
onFailureRetryInterval: 10
onSubmissionFailureRetries: 3
onSubmissionFailureRetryInterval: 10