First time setting up Airflow/working with K8s for the most part so just trying to get it running locally and to run a couple of simple tasks in a small DAG. I had things running fine using the other executors, but given that I'd like to utilize K8s functionality once we are in production, I'm trying to get it set up locally.
The setup is pretty simple - generic testing DAG that ran fine with the other executors, and a relatively untouched config file as well for Airflow (main things to note are: using KubernetesExecutor, postgresql+psyocopg2 SQLAlchemy backend, and with in_cluster set to False
as we aren't running Airflow itself in K8s - everything else is standard).
Airflow launches the local webserver just fine, along with the scheduler, and starts scheduling tasks when I initiate a DAG run, but the tasks are thrown into a queued
state and never move from it. I am guessing it has something to do with the pod statuses that I am seeing for the tasks:
NAME READY STATUS RESTARTS AGE
testinglocalprintingdate-00b9b3a324b04913bf98d935ae076885 0/1 InvalidImageName 0 79s
testinglocalprintingdate-2d4a912ac30c4987af69d9ce62e36989 0/1 InvalidImageName 0 81s
testinglocalprintingdate-5a655060809647c69f4258fc32d9513d 0/1 InvalidImageName 0 77s
testinglocalprintingdate-9c3ccfebb34b4d0a84d6e8f43e144e69 0/1 InvalidImageName 0 75s
testinglocalprintingdate-d1b8d59260954638b0bc018b7743985b 0/1 InvalidImageName 0 73s
In addition, I am seeing these errors every minute or so (linked to this kube_client_request_args = {"_request_timeout" : [60,60] }
in the Airflow config - changing the number from 60,60 to anything else has no effect):
[2020-02-07 17:22:32,244] {kubernetes_executor.py:337} ERROR - Unknown error in KubernetesJobWatcher. Failing
Traceback (most recent call last):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 425, in _error_catcher
yield
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 752, in read_chunked
self._update_chunk_length()
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 682, in _update_chunk_length
line = self._fp.fp.readline()
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/ssl.py", line 1071, in recv_into
return self.read(nbytes, buffer)
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/ssl.py", line 929, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 335, in run
self.worker_uuid, self.kube_config)
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 359, in _run
**kwargs):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 144, in stream
for line in iter_resp_lines(resp):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 48, in iter_resp_lines
for seg in resp.read_chunked(decode_content=False):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 781, in read_chunked
self._original_response.close()
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 430, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='192.168.64.2', port=8443): Read timed out.
Process KubernetesJobWatcher-3:
Traceback (most recent call last):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 425, in _error_catcher
yield
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 752, in read_chunked
self._update_chunk_length()
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 682, in _update_chunk_length
line = self._fp.fp.readline()
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/ssl.py", line 1071, in recv_into
return self.read(nbytes, buffer)
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/ssl.py", line 929, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 335, in run
self.worker_uuid, self.kube_config)
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 359, in _run
**kwargs):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 144, in stream
for line in iter_resp_lines(resp):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 48, in iter_resp_lines
for seg in resp.read_chunked(decode_content=False):
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 781, in read_chunked
self._original_response.close()
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "/Users/genericuser/.pyenv/versions/3.7.4/lib/python3.7/site-packages/urllib3/response.py", line 430, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='192.168.64.2', port=8443): Read timed out.
[2020-02-07 17:22:32,597] {kubernetes_executor.py:442} ERROR - Error while health checking kube watcher process. Process died for unknown reasons
[2020-02-07 17:22:32,615] {kubernetes_executor.py:346} INFO - Event: and now my watch begins starting at resource_version: 0
I've been trying to debug this for a couple of days to no avail - so any help would be appreciated.