Hi guys I'm in trouble with this one.
I set airflow on Kubernetes infra
I use AWS EKS, AWS EFS (for persistence volume )
Airflow: 2.2.3-python3.8
Kubernetes: 1.21
airflow uid: 50000, gid: 0
I refer to this blog to deploy this infra.
My Dockerfile
# Licensed to the Apache Software Foundation (ASF) under one *
# or more contributor license agreements. See the NOTICE file *
# distributed with this work for additional information *
# regarding copyright ownership. The ASF licenses this file *
# to you under the Apache License, Version 2.0 (the *
# "License"); you may not use this file except in compliance *
# with the License. You may obtain a copy of the License at *
# *
# http://www.apache.org/licenses/LICENSE-2.0 *
# *
# Unless required by applicable law or agreed to in writing, *
# software distributed under the License is distributed on an *
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY *
# KIND, either express or implied. See the License for the *
# specific language governing permissions and limitations *
# under the License. *
FROM apache/airflow:2.2.3-python3.8
RUN usermod -g 0 airflow
# install deps
USER root
RUN apt-get update -y && apt-get install -y \
libczmq-dev \
libssl-dev \
inetutils-telnet \
python3-dev \
build-essential \
postgresql postgresql-contrib \
bind9utils \
gcc \
git \
&& apt-get clean
# vim
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
vim \
&& apt-get autoremove -yqq --purge \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
USER airflow
RUN pip install --upgrade pip
COPY requirement.txt /tmp/requirement.txt
RUN pip install -r /tmp/requirement.txt
COPY airflow-test-env-init.sh /tmp/airflow-test-env-init.sh
COPY bootstrap.sh /bootstrap.sh
ENTRYPOINT ["/bootstrap.sh"]
My airflow.cfg file
# Licensed to the Apache Software Foundation (ASF) under one *
# or more contributor license agreements. See the NOTICE file *
# distributed with this work for additional information *
# regarding copyright ownership. The ASF licenses this file *
# to you under the Apache License, Version 2.0 (the *
# "License"); you may not use this file except in compliance *
# with the License. You may obtain a copy of the License at *
# *
# http://www.apache.org/licenses/LICENSE-2.0 *
# *
# Unless required by applicable law or agreed to in writing, *
# software distributed under the License is distributed on an *
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY *
# KIND, either express or implied. See the License for the *
# specific language governing permissions and limitations *
# under the License. *
# Note: The airflow image used in this example is obtained by *
# building the image from the local docker subdirectory. *
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: airflow
namespace: airflow
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: airflow
name: airflow
rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["pods"]
verbs: ["get", "list", "watch", "create", "update", "delete"]
- apiGroups: [ "" ]
resources: [ "pods/log" ]
verbs: [ "get", "list" ]
- apiGroups: [ "" ]
resources: [ "pods/exec" ]
verbs: [ "create", "get" ]
- apiGroups: ["batch", "extensions"]
resources: ["jobs"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: airflow
namespace: airflow
subjects:
- kind: ServiceAccount
name: airflow # Name of the ServiceAccount
namespace: airflow
roleRef:
kind: Role # This must be Role or ClusterRole
name: airflow # This must match the name of the Role
# or ClusterRole you wish to bind to
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: airflow
namespace: airflow
spec:
replicas: 1
selector:
matchLabels:
name: airflow
template:
metadata:
labels:
name: airflow
spec:
serviceAccountName: airflow
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: lifecycle
operator: NotIn
values:
- Ec2Spot
containers:
- name: webserver
image: {{AIRFLOW_IMAGE}}:{{AIRFLOW_TAG}}
imagePullPolicy: Always
ports:
- name: webserver
containerPort: 8080
args: ["webserver"]
env:
- name: AIRFLOW_KUBE_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
name: airflow-secrets
key: sql_alchemy_conn
volumeMounts:
- name: airflow-configmap
mountPath: /opt/airflow/airflow.cfg
subPath: airflow.cfg
- name: {{POD_AIRFLOW_VOLUME_NAME}}
mountPath: /opt/airflow/dags
- name: {{POD_AIRFLOW_VOLUME_NAME}}
mountPath: /opt/airflow/logs
- name: scheduler
image: {{AIRFLOW_IMAGE}}:{{AIRFLOW_TAG}}
imagePullPolicy: Always
args: ["scheduler"]
env:
- name: AIRFLOW_KUBE_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
name: airflow-secrets
key: sql_alchemy_conn
volumeMounts:
- name: airflow-configmap
mountPath: /opt/airflow/airflow.cfg
subPath: airflow.cfg
- name: {{POD_AIRFLOW_VOLUME_NAME}}
mountPath: /opt/airflow/dags
- name: {{POD_AIRFLOW_VOLUME_NAME}}
mountPath: /opt/airflow/logs
- name: git-sync
image: k8s.gcr.io/git-sync/git-sync:v3.4.0
imagePullPolicy: IfNotPresent
envFrom:
- configMapRef:
name: airflow-gitsync
- secretRef:
name: airflow-secrets
volumeMounts:
- name: {{POD_AIRFLOW_VOLUME_NAME}}
mountPath: /git
volumes:
- name: {{POD_AIRFLOW_VOLUME_NAME}}
persistentVolumeClaim:
claimName: airflow-efs-pvc
- name: airflow-dags-fake
emptyDir: {}
- name: airflow-configmap
configMap:
name: airflow-configmap
securityContext:
runAsUser: 50000
fsGroup: 0
---
apiVersion: v1
kind: Service
metadata:
name: airflow
namespace: airflow
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: {{AOK_SSL_ENDPOINT}}
spec:
type: LoadBalancer
ports:
- protocol: TCP
port: 80
targetPort: 8080
nodePort: 30031
name: http
- protocol: TCP
port: 443
targetPort: 8080
nodePort: 30032
name: https
selector:
name: airflow
Get logs and describe pods
NAME READY STATUS RESTARTS AGE
airflow-bfd79c998-d5gjf 3/3 Running 0 2m14s
examplebashoperatoralsorunthis.26319976af6747c5a6b09a0b99b44bfa 0/1 CrashLoopBackOff 1 15s
examplebashoperatorrunme0.9fd08bc8182a4bb7ad3d41cbb57942ff 0/1 CrashLoopBackOff 1 17s
examplebashoperatorrunme1.20e9bd925aaf4b4eb7645ad181267a8f 0/1 CrashLoopBackOff 1 17s
examplebashoperatorrunme2.58fb15f683184e83b4e714bd0e27ccb8 0/1 CrashLoopBackOff 1 16s
examplebashoperatorthiswillskip.71370cbbaa324a21915d73f4e07dc307 0/1 CrashLoopBackOff 1 13s
kubectl logs -n airflow -f
examplebashoperatoralsorunthis.26319976af6747c5a6b09a0b99b44bfa --previous
unable to retrieve container logs for docker://b81e5ea6ffa99d21b62b46500a865fbc7bfb6560683f8d8bfba4786ea02f361a
kubectl describe pod examplebashoperatoralsorunthis.26319976af6747c5a6b09a0b99b44bfa -n airflow
Name: examplebashoperatoralsorunthis.26319976af6747c5a6b09a0b99b44bfa
Namespace: airflow
Priority: 0
Node: ip-xxx.xxx.xxx.xxx.my region.compute.internal/xxx.xxx.xxx.xxx
Start Time: Tue, 22 Feb 2022 22:22:27 +0900
Labels: airflow-worker=144
airflow_version=2.2.3
dag_id=example_bash_operator
kubernetes_executor=True
run_id=manual__2022-02-22T132224.6817590000-81c9256fb
task_id=also_run_this
try_number=1
Annotations: dag_id: example_bash_operator
kubernetes.io/psp: eks.privileged
run_id: manual__2022-02-22T13:22:24.681759+00:00
task_id: also_run_this
try_number: 1
Status: Running
IP: xxx.xxx.xxx.xxx
IPs:
IP: xxx.xxx.xxx.xxx
Containers:
base:
Container ID: docker://f2e0648c4a6a585b753529964d4bc26bc5c5c061e4c74a9c9e71aab00b1505e0
Image: xxxxxxxxxxxx.dkr.ecr.my-region**strong text**.amazonaws.com/my-repo:latest
Image ID: docker-pullable://xxxxxxxxxxxx.dkr.ecr.my region.amazonaws.com/repo@xxxxxxxxxxxx
Port: <none>
Host Port: <none>
Args:
airflow
tasks
run
example_bash_operator
also_run_this
manual__2022-02-22T13:22:24.681759+00:00
--local
--subdir
DAGS_FOLDER/example_bash_operator.py
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 22 Feb 2022 22:23:54 +0900
Finished: Tue, 22 Feb 2022 22:23:54 +0900
Ready: False
Restart Count: 4
Environment:
AIRFLOW_IS_K8S_EXECUTOR_POD: True
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-bh4kp (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-bh4kp:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m7s default-scheduler Successfully assigned airflow/examplebashoperatoralsorunthis.26319976af6747c5a6b09a0b99b44bfa to ip-xxx.xxx.xxx.xx.my-region.compute.internal
Normal Pulled 2m5s kubelet Successfully pulled image "xxxxxxxxxxxx.dkr.ecr.my-region.amazonaws.com/repo:latest" in 94.764374ms
Normal Pulled 2m4s kubelet Successfully pulled image "xxxxxxxxxxxx.dkr.ecr.my-region.amazonaws.com/repo:latest" in 93.874971ms
Normal Pulled 108s kubelet Successfully pulled image "xxxxxxxxxxxx.dkr.ecr.my-region.amazonaws.com/repo:latest" in 106.66327ms
Normal Created 81s (x4 over 2m5s) kubelet Created container base
Normal Started 81s (x4 over 2m5s) kubelet Started container base
Normal Pulled 81s kubelet Successfully pulled image "xxxxxxxxxxxx.dkr.ecr.my-region.amazonaws.com/repo:latest" in 82.336875ms
Warning BackOff 54s (x7 over 2m3s) kubelet Back-off restarting failed container
Normal Pulling 40s (x5 over 2m5s) kubelet Pulling image "xxxxxxxxxxxx.dkr.ecr.my-region.amazonaws.com/repo:latest"
Normal Pulled 40s kubelet Successfully pulled image "xxxxxxxxxxxx.dkr.ecr.my-region.amazonaws.com/repo:latest" in 91.959453ms
All the process is done rightly ( i think ) but when i start the dag (manually or schedule ) the worker was crashed immediately and doesn't make a log file... ( in persistence volume i set) AWS Console error message Error crash loop back
kubectl get pods -n airflow message kubectl cli get pods
I need help... please somebody help me out of this hell...