Why spark executor are not dying

9/17/2019

Here is my setup:

Kubernetes cluster running airflow, which submits the spark job to Kubernetes cluster, job runs fine but the container are suppose to die once the job is done but they are still hanging there.

  1. Airflow Setup comes up on K8S cluster.
  2. Dag is baked in the airflow docker image because somehow I am not able to sync the dags from s3. For some reason the cron wont run.
  3. Submits the spark job to K8S Cluster and job runs fine.
  4. But now instead of dying post execution and completion of job it still hangs around.

Here is my SparkSubmitOperator function

spark_submit_task = SparkSubmitOperator(
    task_id='spark_submit_job_from_airflow',
    conn_id='k8s_spark',
    java_class='com.dom.rom.mainclass',
    application='s3a://some-bucket/jars/demo-jar-with-dependencies.jar',
    application_args=['300000'],
    total_executor_cores='8',
    executor_memory='20g',
    num_executors='9',
    name='mainclass',
    verbose=True,
    driver_memory='10g',
    conf={
        'spark.hadoop.fs.s3a.aws.credentials.provider': 'com.amazonaws.auth.InstanceProfileCredentialsProvider',
        'spark.rpc.message.maxSize': '1024',
        'spark.hadoop.fs.s3a.impl': 'org.apache.hadoop.fs.s3a.S3AFileSystem',
        'spark.kubernetes.container.image': 'dockerhub/spark-image:v0.1',
        'spark.kubernetes.namespace' : 'random',
        'spark.kubernetes.container.image.pullPolicy': 'IfNotPresent',
        'spark.kubernetes.authenticate.driver.serviceAccountName': 'airflow-spark'
        },
    dag=dag,
)
-- devnull
airflow
amazon-s3
apache-spark
kubernetes

1 Answer

9/24/2019

Figured the problem it was my mistake I wasn't closing the spark session, added the following

session.stop();
-- devnull
Source: StackOverflow