how to shut down cloud-sql-proxy in a helm chart pre-install hook

6/21/2020

I am running a django application in a Kubernetes cluster on gcloud. I implemented the database migration as a helm pre-intall hook that launches my app container and does the database migration. I use cloud-sql-proxy in a sidecar pattern as recommended in the official tutorial: https://cloud.google.com/sql/docs/mysql/connect-kubernetes-engine

Basically this launches my app and a cloud-sql-proxy containers within the pod described by the job. The problem is that cloud-sql-proxy never terminates after my app has completed the migration causing the pre-intall job to timeout and cancel my deployment. How do I gracefully exit the cloud-sql-proxy container after my app container completes so that the job can complete?

Here is my helm pre-intall hook template definition:

apiVersion: batch/v1
kind: Job
metadata:
  name: database-migration-job
  labels:
    app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
    app.kubernetes.io/instance: {{ .Release.Name | quote }}
    app.kubernetes.io/version: {{ .Chart.AppVersion }}
    helm.sh/chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
  annotations:
    # This is what defines this resource as a hook. Without this line, the
    # job is considered part of the release.
    "helm.sh/hook": pre-install,pre-upgrade
    "helm.sh/hook-weight": "-1"
    "helm.sh/hook-delete-policy": hook-succeeded,hook-failed
spec:
  activeDeadlineSeconds: 230
  template:
    metadata:
      name: "{{ .Release.Name }}"
      labels:
        app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
        app.kubernetes.io/instance: {{ .Release.Name | quote }}
        helm.sh/chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
    spec:
      restartPolicy: Never
      containers:
      - name: db-migrate
        image: {{ .Values.my-project.docker_repo }}{{ .Values.backend.image }}:{{ .Values.my-project.image.tag}}
        imagePullPolicy: {{ .Values.my-project.image.pullPolicy }}
        env:
        - name: DJANGO_SETTINGS_MODULE
          value: "{{ .Values.backend.django_settings_module }}"
        - name: SENDGRID_API_KEY
          valueFrom:
            secretKeyRef:
              name: sendgrid-api-key
              key: sendgrid-api-key
        - name: DJANGO_SECRET_KEY
          valueFrom:
            secretKeyRef:
              name: django-secret-key
              key: django-secret-key
        - name: DB_USER
          value: {{ .Values.postgresql.postgresqlUsername }}
        - name: DB_PASSWORD
          {{- if .Values.postgresql.enabled }}
          value: {{ .Values.postgresql.postgresqlPassword }}
          {{- else }}
          valueFrom:
            secretKeyRef:
              name: database-password
              key: database-pwd
          {{- end }}
        - name: DB_NAME
          value: {{ .Values.postgresql.postgresqlDatabase }}
        - name: DB_HOST
          {{- if .Values.postgresql.enabled }}
          value: "postgresql"
          {{- else }}
          value: "127.0.0.1"
          {{- end }}
        workingDir: /app-root
        command: ["/bin/sh"]
        args: ["-c", "python manage.py migrate --no-input"]
      {{- if eq .Values.postgresql.enabled false }}
      - name: cloud-sql-proxy
        image: gcr.io/cloudsql-docker/gce-proxy:1.17
        command:
          - "/cloud_sql_proxy"
          - "-instances=<INSTANCE_CONNECTION_NAME>=tcp:<DB_PORT>"
          - "-credential_file=/secrets/service_account.json"
        securityContext:
          #fsGroup: 65532
          runAsNonRoot: true
          runAsUser: 65532
        volumeMounts:
        - name: db-con-mnt
          mountPath: /secrets/
          readOnly: true
      volumes:
      - name: db-con-mnt
        secret:
          secretName: db-service-account-credentials
      {{- end }}

Funny enough, if I kill the job with "kubectl delete jobs database-migration-job" after the migration is done the helm upgrade completes and my new app version gets installed.

-- Vess Perfanov
cloud-sql-proxy
gcloud
kubernetes
kubernetes-helm

1 Answer

6/21/2020

Well, I have a solution which will work but might be hacky. First of all this is Kubernetes is lacking feature which is in discussion in this issue.

With Kubernetes v1.17, containers in same Pods can share process namespaces. This enables us to kill proxy container from app container. Since this is a Kubernetes job there shouldn't be any anomalies to enable postStop handlers for app container.

With this solution when your app finishes and exits normally(or abnormally) then Kubernetes will run one last command from your dying container which will be kill another process in this case. This should result in job completion with success or fail depending on how you will be killing process. Process exit code will be container exit code, then it will be job exit code basically.

-- Akin Ozer
Source: StackOverflow