I have an airflow DAG whose tasks Im attempting to run on an AWS EKS cluster.The dag tasks are built as docker images and uploaded to AWS ECR.
The dag task:
task_1 = KubernetesPodOperator(namespace='default',
image="XXXXX.dkr.ecr.us-west-2.amazonaws.com/com.YYYY/math-demo:v1",
labels={"foo": "bar"},
name="math-test",
task_id="math-task",
get_logs=True,
dag=dag
)
The docker image(on ECR) is of the form "XXXXX.dkr.ecr.us-west-2.amazonaws.com/com.YYYY/math-demo:v1" and the local docker image is math-demo:v1
When I run this task the pods are always in a pending state and never execute.I ran kubectl describe pods and get the following error:
Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "620ee0494e7aaf1776120df10351606c2203c194ca86079fd7198d56fabbc79b" network for pod "math-test-fbdab794": NetworkPlugin cni failed to set up pod "math-test-fbdab794_default" network: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused", failed to clean up sandbox container "620ee0494e7aaf1776120df10351606c2203c194ca86079fd7198d56fabbc79b" network for pod "math-test-fbdab794": NetworkPlugin cni failed to teardown pod "math-test-fbdab794_default" network: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused"]
Any idea on how to solve this?