I am running a django application in a Kubernetes cluster on gcloud. I implemented the database migration as a helm pre-intall hook that launches my app container and does the database migration. I use cloud-sql-proxy in a sidecar pattern as recommended in the official tutorial: https://cloud.google.com/sql/docs/mysql/connect-kubernetes-engine
Basically this launches my app and a cloud-sql-proxy containers within the pod described by the job. The problem is that cloud-sql-proxy never terminates after my app has completed the migration causing the pre-intall job to timeout and cancel my deployment. How do I gracefully exit the cloud-sql-proxy container after my app container completes so that the job can complete?
Here is my helm pre-intall hook template definition:
apiVersion: batch/v1
kind: Job
metadata:
name: database-migration-job
labels:
app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
app.kubernetes.io/instance: {{ .Release.Name | quote }}
app.kubernetes.io/version: {{ .Chart.AppVersion }}
helm.sh/chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
annotations:
# This is what defines this resource as a hook. Without this line, the
# job is considered part of the release.
"helm.sh/hook": pre-install,pre-upgrade
"helm.sh/hook-weight": "-1"
"helm.sh/hook-delete-policy": hook-succeeded,hook-failed
spec:
activeDeadlineSeconds: 230
template:
metadata:
name: "{{ .Release.Name }}"
labels:
app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
app.kubernetes.io/instance: {{ .Release.Name | quote }}
helm.sh/chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
spec:
restartPolicy: Never
containers:
- name: db-migrate
image: {{ .Values.my-project.docker_repo }}{{ .Values.backend.image }}:{{ .Values.my-project.image.tag}}
imagePullPolicy: {{ .Values.my-project.image.pullPolicy }}
env:
- name: DJANGO_SETTINGS_MODULE
value: "{{ .Values.backend.django_settings_module }}"
- name: SENDGRID_API_KEY
valueFrom:
secretKeyRef:
name: sendgrid-api-key
key: sendgrid-api-key
- name: DJANGO_SECRET_KEY
valueFrom:
secretKeyRef:
name: django-secret-key
key: django-secret-key
- name: DB_USER
value: {{ .Values.postgresql.postgresqlUsername }}
- name: DB_PASSWORD
{{- if .Values.postgresql.enabled }}
value: {{ .Values.postgresql.postgresqlPassword }}
{{- else }}
valueFrom:
secretKeyRef:
name: database-password
key: database-pwd
{{- end }}
- name: DB_NAME
value: {{ .Values.postgresql.postgresqlDatabase }}
- name: DB_HOST
{{- if .Values.postgresql.enabled }}
value: "postgresql"
{{- else }}
value: "127.0.0.1"
{{- end }}
workingDir: /app-root
command: ["/bin/sh"]
args: ["-c", "python manage.py migrate --no-input"]
{{- if eq .Values.postgresql.enabled false }}
- name: cloud-sql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.17
command:
- "/cloud_sql_proxy"
- "-instances=<INSTANCE_CONNECTION_NAME>=tcp:<DB_PORT>"
- "-credential_file=/secrets/service_account.json"
securityContext:
#fsGroup: 65532
runAsNonRoot: true
runAsUser: 65532
volumeMounts:
- name: db-con-mnt
mountPath: /secrets/
readOnly: true
volumes:
- name: db-con-mnt
secret:
secretName: db-service-account-credentials
{{- end }}
Funny enough, if I kill the job with "kubectl delete jobs database-migration-job" after the migration is done the helm upgrade completes and my new app version gets installed.
Well, I have a solution which will work but might be hacky. First of all this is Kubernetes is lacking feature which is in discussion in this issue.
With Kubernetes v1.17, containers in same Pods can share process namespaces. This enables us to kill proxy container from app container. Since this is a Kubernetes job there shouldn't be any anomalies to enable postStop handlers for app container.
With this solution when your app finishes and exits normally(or abnormally) then Kubernetes will run one last command from your dying container which will be kill another process
in this case. This should result in job completion with success or fail depending on how you will be killing process. Process exit code will be container exit code, then it will be job exit code basically.