Airflow KubernetesExecutor scheduler kube watch process dies


Have got a K8S cluster on AWS, trying to deploy Airflow Webserver + Scheduler with KubernetesExecutor within. Unfortunately, every time I trigger a DAG in Webserver, in read_timeout amount of time (defined in airflow.cfg) scheduler raises this error:

[2019-11-27 11:25:26,607] {} ERROR - Error while health checking kube watcher process. Process died for unknown reasons
[2019-11-27 11:25:26,617] {} INFO - Event: and now my watch begins starting at resource_version: 0
[2019-11-27 11:26:26,700] {} ERROR - Unknown error in KubernetesJobWatcher. Failing
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/contrib/", line 294, in recv_into
    return self.connection.recv_into(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/OpenSSL/", line 1840, in recv_into
    self._raise_ssl_error(self._ssl, result)
  File "/usr/local/lib/python3.7/site-packages/OpenSSL/", line 1646, in _raise_ssl_error
    raise WantReadError()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/", line 360, in _error_catcher
  File "/usr/local/lib/python3.7/site-packages/urllib3/", line 666, in read_chunked
  File "/usr/local/lib/python3.7/site-packages/urllib3/", line 598, in _update_chunk_length
    line = self._fp.fp.readline()
  File "/usr/local/lib/python3.7/", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.7/site-packages/urllib3/contrib/", line 307, in recv_into
    raise timeout('The read operation timed out')
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/", line 333, in run
    self.worker_uuid, self.kube_config)
  File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/", line 357, in _run
  File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/", line 144, in stream
    for line in iter_resp_lines(resp):
  File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/", line 48, in iter_resp_lines
    for seg in resp.read_chunked(decode_content=False):
  File "/usr/local/lib/python3.7/site-packages/urllib3/", line 694, in read_chunked
  File "/usr/local/lib/python3.7/", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.7/site-packages/urllib3/", line 365, in _error_catcher
    raise ReadTimeoutError(self._pool, None, 'Read timed out.')
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='', port=443): Read timed out.
Process KubernetesJobWatcher-16:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/contrib/", line 294, in recv_into
    return self.connection.recv_into(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/OpenSSL/", line 1840, in recv_into
    self._raise_ssl_error(self._ssl, result)
  File "/usr/local/lib/python3.7/site-packages/OpenSSL/", line 1646, in _raise_ssl_error
    raise WantReadError()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/urllib3/", line 360, in _error_catcher
  File "/usr/local/lib/python3.7/site-packages/urllib3/", line 666, in read_chunked
  File "/usr/local/lib/python3.7/site-packages/urllib3/", line 598, in _update_chunk_length
    line = self._fp.fp.readline()
  File "/usr/local/lib/python3.7/", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.7/site-packages/urllib3/contrib/", line 307, in recv_into
    raise timeout('The read operation timed out')
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/multiprocessing/", line 297, in _bootstrap
  File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/", line 333, in run
    self.worker_uuid, self.kube_config)
  File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/", line 357, in _run
  File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/", line 144, in stream
    for line in iter_resp_lines(resp):
  File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/", line 48, in iter_resp_lines
    for seg in resp.read_chunked(decode_content=False):
  File "/usr/local/lib/python3.7/site-packages/urllib3/", line 694, in read_chunked
  File "/usr/local/lib/python3.7/", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.7/site-packages/urllib3/", line 365, in _error_catcher
    raise ReadTimeoutError(self._pool, None, 'Read timed out.')
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='', port=443): Read timed out.
[2019-11-27 11:26:26,898] {} ERROR - Error while health checking kube watcher process. Process died for unknown reasons
[2019-11-27 11:26:26,968] {} INFO - Event: and now my watch begins starting at resource_version: 0

PostgreSQL is installed via helm charts.

kubectl version.

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-14T04:24:29Z", GoVersion:"go1.12.13", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.8", GitCommit:"4e209c9383fa00631d124c8adcc011d617339b3c", GitTreeState:"clean", BuildDate:"2019-02-28T18:40:05Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"} is a kubernetes service (cluster ip).

Any suggestions?

-- eserdk

1 Answer


According to the comment I've written to a question, this problem doesn't interfere pods run. However, it exists.

-- eserdk
Source: StackOverflow