We are using Airflow to schedule Spark job on Kubernetes. Recently, I have encountered a scenario where:
What happened due to this is that airflow created another POD for running the same job which resulted in race condition.
Airflow is using Kubernetes Python client's read_namespaced_pod API()
def read_pod(self, pod):
"""Read POD information"""
try:
return self._client.read_namespaced_pod(pod.metadata.name, pod.metadata.namespace)
except BaseHTTPError as e:
raise AirflowException(
'There was an error reading the kubernetes API: {}'.format(e)
)
I believe read_namespaced_pod() calls Kubernetes API. In order to investigate this further, I would like to like check logs of Kubernetes API server.
Can you please share steps to check what is happening on Kubernetes side ?
Note: Kubernetes version is 1.18 and Airflow version is 1.10.10.
Answering the question from the perspective of logs/troubleshooting:
I believe read_namespaced_pod() calls Kubernetes API. In order to investigate this further, I would like to like check logs of Kubernetes API server.
Yes, you are correct, this function calls the Kubernetes API. You can check the logs of Kubernetes API server by running:
$ kubectl logs -n kube-system KUBERNETES_API_SERVER_POD_NAME
I would also consider checking the kube-controller-manager
:
$ kubectl logs -n kube-system KUBERNETES_CONTROLLER_MANAGER_POD_NAME
The example output of it:
I0413 12:33:12.840270 1 event.go:291] "Event occurred" object="default/nginx-6799fc88d8" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: nginx-6799fc88d8-kchp7"
A side note!
Above commands will work assuming that your
kubernetes-apiserver
andkubernetes-controller-manager
Pod
is visible to you
Can you please share steps to check what is happening on Kubernetes side ?
This question targets the basics of troubleshooting/logs checking.
For that you can use following commands (and the ones mentioned earlier):
$ kubectl get RESOURCE RESOURCE_NAME
:$ kubectl get pod airflow-pod-name
also you can add -o yaml
for more information
$ kubectl describe RESOURCE RESOURCE_NAME
:$ kubectl describe pod airflow-pod-name
$ kubectl logs POD_NAME
:$ kubectl logs airflow-pod-name
Additional resources: