Let me preface that I am not well versed in Kubernetes.
Yesterday, we had to install/update a python dependency for the sake of one of our dags in Google Cloud Composer. I am not certain that this is the cause, but the whole composer environment crashed after this.
When I explore Logs Explorer, I find the following errors in the scheduler and workers:
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 4, in <module>
__import__('pkg_resources').require('apache-airflow===1.10.2-composer')
File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3105, in <module>
@_call_aside
File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3089, in _call_aside
f(*args, **kwargs)
File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3118, in _initialize_master_working_set
working_set = WorkingSet._build_master()
File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 580, in _build_master
return cls._build_from_requirements(__requires__)
File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 593, in _build_from_requirements
dists = ws.resolve(reqs, Environment())
File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 781, in resolve
raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'urllib3<1.25,>=1.21.1' distribution was not found and is required by requests
So I try to add urllib3==1.24.3 to the PyPI packages of Composer, and I get this error:
The update failed with this message :
UPDATE operation on this environment failed 39 minutes ago with the following error message: Failed to create a web server in new version. Check the airflow-webserver logs for details.
Anyway, it's clear that I need to resolve the conflicts of python lib dependencies, so I follow this article.
Within it, a step is to connect to the worker and do a pip freeze
, and this is problematic because when I try to do :
kubectl exec -itn composer-1-7-5-airflow-1-10-2-2d974007 airflow-worker-6cdfc68fd4-4k4jm -- /bin/bash
I get :
Defaulting container name to airflow-worker.
Use 'kubectl describe pod/airflow-worker-6cdfc68fd4-4k4jm -n composer-1-7-5-airflow-1-10-2-2d974007' to see all of the containers in this pod.
error: unable to upgrade connection: container not found ("airflow-worker")
Here's a result of kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
composer-1-7-5-airflow-1-10-2-2d974007 airflow-scheduler-574bcfbd47-gqnkp 1/2 CrashLoopBackOff 234 19h
composer-1-7-5-airflow-1-10-2-2d974007 airflow-worker-6cdfc68fd4-4k4jm 1/2 CrashLoopBackOff 233 19h
composer-1-7-5-airflow-1-10-2-2d974007 airflow-worker-6cdfc68fd4-fwz5h 1/2 CrashLoopBackOff 232 19h
composer-1-7-5-airflow-1-10-2-2d974007 airflow-worker-6cdfc68fd4-vl25g 1/2 CrashLoopBackOff 233 19h
composer-1-7-5-airflow-1-10-2-2d974007 airflow-worker-75ff8dbb56-qxc7j 0/2 Evicted 0 21d
default airflow-monitoring-5bd5f64896-g6q8v 1/1 Running 0 21d
default airflow-redis-0 1/1 Running 0 21d
default airflow-sqlproxy-577bbc7577-mxv5p 1/1 Running 0 21d
default composer-agent-7c388f77-840c-40c8-be09-66303d721742-xxqlf 0/1 Completed 0 66m
default composer-agent-9fd51464-6ed2-4f3d-9714-762ea723cb61-5lc2s 0/1 Completed 0 19h
default composer-fluentd-daemon-gm7vr 1/1 Running 0 21d
default composer-fluentd-daemon-srw2h 1/1 Running 4 21d
default composer-fluentd-daemon-swzgc 1/1 Running 0 21d
kube-system heapster-gke-7b4f99dd5f-8d2fx 3/3 Running 0 21d
kube-system kube-dns-5995c95f64-7hn2s 4/4 Running 0 21d
kube-system kube-dns-5995c95f64-dwlfv 4/4 Running 0 21d
kube-system kube-dns-autoscaler-8687c64fc-fpvm9 1/1 Running 0 21d
kube-system kube-proxy-gke-europe-west1-pipelin-default-pool-a8d0baad-7zcs 1/1 Running 0 21d
kube-system kube-proxy-gke-europe-west1-pipelin-default-pool-a8d0baad-h2u4 1/1 Running 0 21d
kube-system kube-proxy-gke-europe-west1-pipelin-default-pool-a8d0baad-i3kz 1/1 Running 1 7d20h
kube-system l7-default-backend-fd59995cd-hkk6z 1/1 Running 0 21d
kube-system metrics-server-v0.3.1-5c6fbf777-27hgk 2/2 Running 0 21CrashLoopBackOff
kube-system prometheus-to-sd-5z9sw 2/2 Running 0 21d
kube-system prometheus-to-sd-8dsr8 2/2 Running 2 21d
kube-system prometheus-to-sd-f55cl 2/2 Running 0 21d
A little googling around shows me that CrashLoopBackOff
errors could be hard to diagnose/resolve in Kubernetes. Since I'm hardly any familiar with this technology, I solicit your help on this matter.
If you can help, it would be great to get as many details as possible. Thank you.