Python conflicts in composer crash the whole kubernetes cluster, how do I fix it?

11/6/2020

Let me preface that I am not well versed in Kubernetes.

Yesterday, we had to install/update a python dependency for the sake of one of our dags in Google Cloud Composer. I am not certain that this is the cause, but the whole composer environment crashed after this.

When I explore Logs Explorer, I find the following errors in the scheduler and workers:

Traceback (most recent call last):
  File "/usr/local/bin/airflow", line 4, in <module>
    __import__('pkg_resources').require('apache-airflow===1.10.2-composer')
  File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3105, in <module>
    @_call_aside
  File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3089, in _call_aside
    f(*args, **kwargs)
  File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3118, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 580, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 593, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/opt/python3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 781, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'urllib3<1.25,>=1.21.1' distribution was not found and is required by requests

So I try to add urllib3==1.24.3 to the PyPI packages of Composer, and I get this error:

The update failed with this message :

UPDATE operation on this environment failed 39 minutes ago with the following error message: Failed to create a web server in new version. Check the airflow-webserver logs for details.

Anyway, it's clear that I need to resolve the conflicts of python lib dependencies, so I follow this article.

Within it, a step is to connect to the worker and do a pip freeze, and this is problematic because when I try to do :

kubectl exec -itn composer-1-7-5-airflow-1-10-2-2d974007 airflow-worker-6cdfc68fd4-4k4jm -- /bin/bash

I get :

Defaulting container name to airflow-worker.
Use 'kubectl describe pod/airflow-worker-6cdfc68fd4-4k4jm -n composer-1-7-5-airflow-1-10-2-2d974007' to see all of the containers in this pod.
error: unable to upgrade connection: container not found ("airflow-worker")

Here's a result of kubectl get pods --all-namespaces

NAMESPACE                                NAME                                                             READY   STATUS             RESTARTS   AGE
composer-1-7-5-airflow-1-10-2-2d974007   airflow-scheduler-574bcfbd47-gqnkp                               1/2     CrashLoopBackOff   234        19h
composer-1-7-5-airflow-1-10-2-2d974007   airflow-worker-6cdfc68fd4-4k4jm                                  1/2     CrashLoopBackOff   233        19h
composer-1-7-5-airflow-1-10-2-2d974007   airflow-worker-6cdfc68fd4-fwz5h                                  1/2     CrashLoopBackOff   232        19h
composer-1-7-5-airflow-1-10-2-2d974007   airflow-worker-6cdfc68fd4-vl25g                                  1/2     CrashLoopBackOff   233        19h
composer-1-7-5-airflow-1-10-2-2d974007   airflow-worker-75ff8dbb56-qxc7j                                  0/2     Evicted            0          21d
default                                  airflow-monitoring-5bd5f64896-g6q8v                              1/1     Running            0          21d
default                                  airflow-redis-0                                                  1/1     Running            0          21d
default                                  airflow-sqlproxy-577bbc7577-mxv5p                                1/1     Running            0          21d
default                                  composer-agent-7c388f77-840c-40c8-be09-66303d721742-xxqlf        0/1     Completed          0          66m
default                                  composer-agent-9fd51464-6ed2-4f3d-9714-762ea723cb61-5lc2s        0/1     Completed          0          19h
default                                  composer-fluentd-daemon-gm7vr                                    1/1     Running            0          21d
default                                  composer-fluentd-daemon-srw2h                                    1/1     Running            4          21d
default                                  composer-fluentd-daemon-swzgc                                    1/1     Running            0          21d
kube-system                              heapster-gke-7b4f99dd5f-8d2fx                                    3/3     Running            0          21d
kube-system                              kube-dns-5995c95f64-7hn2s                                        4/4     Running            0          21d
kube-system                              kube-dns-5995c95f64-dwlfv                                        4/4     Running            0          21d
kube-system                              kube-dns-autoscaler-8687c64fc-fpvm9                              1/1     Running            0          21d
kube-system                              kube-proxy-gke-europe-west1-pipelin-default-pool-a8d0baad-7zcs   1/1     Running            0          21d
kube-system                              kube-proxy-gke-europe-west1-pipelin-default-pool-a8d0baad-h2u4   1/1     Running            0          21d
kube-system                              kube-proxy-gke-europe-west1-pipelin-default-pool-a8d0baad-i3kz   1/1     Running            1          7d20h
kube-system                              l7-default-backend-fd59995cd-hkk6z                               1/1     Running            0          21d
kube-system                              metrics-server-v0.3.1-5c6fbf777-27hgk                            2/2     Running            0          21CrashLoopBackOff
kube-system                              prometheus-to-sd-5z9sw                                           2/2     Running            0          21d
kube-system                              prometheus-to-sd-8dsr8                                           2/2     Running            2          21d
kube-system                              prometheus-to-sd-f55cl                                           2/2     Running            0          21d

A little googling around shows me that CrashLoopBackOff errors could be hard to diagnose/resolve in Kubernetes. Since I'm hardly any familiar with this technology, I solicit your help on this matter.

  1. How do I connect to the worker?
  2. How do I install/update libraries of the python environment that executes airflow from this worker? Is this even the right approach to resolve python dependencies problems within Google Cloud Composer?

If you can help, it would be great to get as many details as possible. Thank you.

-- Imad
google-cloud-composer
google-cloud-platform
google-kubernetes-engine
kubernetes

0 Answers