RunAsUser issue & Clicking external IP of load balancer -> Bad Request (400) on deploying Django app on GKE (Kubernetes) and db connection failing:

12/7/2019

Probem: I have 2 replicas for my app, library. I have configured a service to talk to my two replicas. I have a dockerfile that starts my app at port # 8080. The load balancer and the app containers all seem to be running. However, I am not able to connect to them. Every time I click on the external IP shown I get "Bad Request (400)"

This log from the container seems to indicate somethingw ith the db connection going wrong...

2019-12-07 07:33:56.669 IST Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection self.connect() File "/usr/local/lib/python3.8/site-packages/django/db/backends/base/base.py", line 195, in connect self.connection = self.get_new_connection(conn_params) File "/usr/local/lib/python3.8/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection connection = Database.connect(**conn_params) File "/usr/local/lib/python3.8/site-packages/psycopg2/init.py", line 126, in connect conn = _connect(dsn, connection_factory=connection_factory, **kwasync) psycopg2.OperationalError: could not connect to server: Connection refused

$ kubectl get services
NAME          TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)        AGE
kubernetes    ClusterIP      10.19.240.1     <none>         443/TCP        61m
library-svc   LoadBalancer   10.19.254.164   34.93.141.11   80:30227/TCP   50m

$  kubectl get pods
NAME                       READY   STATUS    RESTARTS   AGE
libary-6f9b45fcdb-g5xfv    1/1     Running   0          54m
library-745d6798d8-m45gq   3/3     Running   0          12m

In settings.py

ALLOWED_HOSTS = ['library-259506.appspot.com', '127.0.0.1', 'localhost', '*']

Here is the library.yaml file I have to go with it.

# [START kubernetes_deployment]
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: library
  labels:
    app: library
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: library
    spec:
      containers:
      - name: library-app
        # Replace  with your project ID or use `make template`
        image: gcr.io/library-259506/library
        # This setting makes nodes pull the docker image every time before
        # starting the pod. This is useful when debugging, but should be turned
        # off in production.
        imagePullPolicy: Always
        env:
            # [START cloudsql_secrets]
            - name: DATABASE_USER
              valueFrom:
                secretKeyRef:
                  name: cloudsql
                  key: username
            - name: DATABASE_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: cloudsql
                  key: password
            # [END cloudsql_secrets]
        ports:
        - containerPort: 8080

      # [START proxy_container]
      - image: gcr.io/cloudsql-docker/gce-proxy:1.16
        name: cloudsql-proxy
        command: ["/cloud_sql_proxy", "--dir=/cloudsql", 
                  "-instances=library-259506:asia-south1:library=tcp:3306",
                  "-credential_file=/secrets/cloudsql/credentials.json"]
        volumeMounts:
          - name: cloudsql-oauth-credentials
            mountPath: /secrets/cloudsql
            readOnly: true
          - name: ssl-certs
            mountPath: /etc/ssl/certs
          - name: cloudsql
            mountPath: /cloudsql
      # [END proxy_container] 
      # [START volumes]
      volumes:
        - name: cloudsql-oauth-credentials
          secret:
            secretName: cloudsql-oauth-credentials
        - name: ssl-certs
          hostPath:
            path: /etc/ssl/certs
        - name: cloudsql
          emptyDir:
      # [END volumes]        
# [END kubernetes_deployment]

---
    # [START service]
    # The library-svc service provides a load-balancing proxy over the polls app
    # pods. By specifying the type as a 'LoadBalancer', Container Engine will
    # create an external HTTP load balancer.
    # The service directs traffic to the deployment by matching the service's selector to the deployment's label
    #
    # For more information about external HTTP load balancing see:
    # https://cloud.google.com/container-engine/docs/load-balancer
    apiVersion: v1
    kind: Service
    metadata:
      name: library-svc
    spec:
      type: LoadBalancer
      ports:
      - port: 80
        targetPort: 8080
      selector:
        app: library

    # [END service]

The Dockerfile:

FROM python:3
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
COPY requirements.txt /code/
RUN pip install -r requirements.txt
COPY . /code/

# Server
EXPOSE 8080
STOPSIGNAL SIGINT
ENTRYPOINT ["python", "manage.py"]
CMD ["runserver", "0.0.0.0:8080"]

library-app container logs

2019-12-07T02:03:58.742639999Z Performing system checks...
 I 
2019-12-07T02:03:58.742701271Z 
 I 
2019-12-07T02:03:58.816567541Z System check identified no issues (0 silenced).
 I 
2019-12-07T02:03:59.338790311Z December 07, 2019 - 02:03:59
 I 
2019-12-07T02:03:59.338986187Z Django version 2.2.6, using settings 'locallibrary.settings'
 I 
2019-12-07T02:03:59.338995688Z Starting development server at http://0.0.0.0:8080/
 I 
2019-12-07T02:03:59.338999467Z Quit the server with CONTROL-C.
 I 
2019-12-07T02:04:00.814238478Z [07/Dec/2019 02:04:00] "GET / HTTP/1.1" 400 26
 E 
  undefined

library container logs:

2019-12-07T02:03:56.568839208Z Performing system checks...
 I 
2019-12-07T02:03:56.568912262Z 
 I 
2019-12-07T02:03:56.624835039Z System check identified no issues (0 silenced).
 I 
2019-12-07T02:03:56.669088750Z Exception in thread django-main-thread:
 E 
2019-12-07T02:03:56.669204639Z Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
    self.connect()
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/base/base.py", line 195, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
    connection = Database.connect(**conn_params)
  File "/usr/local/lib/python3.8/site-packages/psycopg2/__init__.py", line 126, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: could not connect to server: Connection refused
 E 
2019-12-07T02:03:56.672779570Z  Is the server running on host "127.0.0.1" and accepting
 E 
2019-12-07T02:03:56.672783910Z  TCP/IP connections on port 3306?
 E 
2019-12-07T02:03:56.672826889Z 
 E 
2019-12-07T02:03:56.672903098Z 
 E 
2019-12-07T02:03:56.672909494Z The above exception was the direct cause of the following exception:
 E 
2019-12-07T02:03:56.672913216Z 
 E 
2019-12-07T02:03:56.672962576Z Traceback (most recent call last):
  File "/usr/local/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.8/site-packages/django/utils/autoreload.py", line 54, in wrapper
    fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/django/core/management/commands/runserver.py", line 120, in inner_run
    self.check_migrations()
  File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 453, in check_migrations
    executor = MigrationExecutor(connections[DEFAULT_DB_ALIAS])
  File "/usr/local/lib/python3.8/site-packages/django/db/migrations/executor.py", line 18, in __init__
    self.loader = MigrationLoader(self.connection)
  File "/usr/local/lib/python3.8/site-packages/django/db/migrations/loader.py", line 49, in __init__
    self.build_graph()
  File "/usr/local/lib/python3.8/site-packages/django/db/migrations/loader.py", line 212, in build_graph
    self.applied_migrations = recorder.applied_migrations()
  File "/usr/local/lib/python3.8/site-packages/django/db/migrations/recorder.py", line 73, in applied_migrations
    if self.has_table():
  File "/usr/local/lib/python3.8/site-packages/django/db/migrations/recorder.py", line 56, in has_table
    return self.Migration._meta.db_table in self.connection.introspection.table_names(self.connection.cursor())
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/base/base.py", line 256, in cursor
    return self._cursor()
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/base/base.py", line 233, in _cursor
    self.ensure_connection()
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
    self.connect()
  File "/usr/local/lib/python3.8/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/base/base.py", line 217, in ensure_connection
    self.connect()
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/base/base.py", line 195, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/usr/local/lib/python3.8/site-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
    connection = Database.connect(**conn_params)
  File "/usr/local/lib/python3.8/site-packages/psycopg2/__init__.py", line 126, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: could not connect to server: Connection refused
 E 
2019-12-07T02:03:56.684080787Z  Is the server running on host "127.0.0.1" and accepting
 E 
2019-12-07T02:03:56.684085077Z  TCP/IP connections on port 3306?
 E 

Cloud SQL proxy container logs:

2019-12-07T02:03:57.648625670Z 2019/12/07 02:03:57 current FDs rlimit set to 1048576, wanted limit is 8500. Nothing to do here.
 E 
2019-12-07T02:03:57.660316236Z 2019/12/07 02:03:57 using credential file for authentication; email=library-service@library-259506.iam.gserviceaccount.com
 E 
2019-12-07T02:03:58.167649397Z 2019/12/07 02:03:58 Listening on 127.0.0.1:3306 for library-259506:asia-south1:library
 E 
2019-12-07T02:03:58.167761109Z 2019/12/07 02:03:58 Ready for new connections
 E 
2019-12-07T02:03:58.865747581Z 2019/12/07 02:03:58 New connection for "library-259506:asia-south1:library"
 E 
2019-12-07T03:03:29.000559014Z 2019/12/07 03:03:29 ephemeral certificate for instance library-259506:asia-south1:library will expire soon, refreshing now.
 E 
2019-12-07T04:02:59.004152307Z 2019/12/07 04:02:59 ephemeral certificate for instance library-259506:asia-south1:library will expire soon, refreshing now.
 E 
-- Sudhakar R
django
google-kubernetes-engine
kubernetes
kubernetes-pod

1 Answer

12/7/2019

I can't believe it the issue was my library.yaml file was missing this for the cloud proxy. I believe "RunAs" -- Controls which user ID the containers are run with -- this forces it to run as daemon user and allows us specify RunAsUser which can be overridden by RunAsUser in SecurityContext on a per Container basis.

https://cloud.google.com/solutions/best-practices-for-operating-containers#avoid_running_as_root

 # [START cloudsql_security_context]
    securityContext:
      runAsUser: 2  # non-root user
      allowPrivilegeEscalation: false
  # [END cloudsql_security_context]

From docs https://kubernetes.io/docs/concepts/policy/pod-security-policy/#users-and-groups: "MustRunAsNonRoot - Requires that the pod be submitted with a non-zero runAsUser or have the USER directive defined (using a numeric UID) in the image. Pods which have specified neither runAsNonRoot nor runAsUser settings will be mutated to set runAsNonRoot=true, thus requiring a defined non-zero numeric USER directive in the container. No default provided. Setting allowPrivilegeEscalation=false is strongly recommended with this strategy."

-- Sudhakar R
Source: StackOverflow