Kubernetes - nginx-ingress is crashing after file upload via php

12/1/2019

I'am running Kubernetes cluster on Google Cloud Platform via their Kubernetes Engine. Cluster version is 1.13.11-gke.14. PHP application pod contains 2 containers - Nginx as a reverse proxy and php-fpm (7.2).

In google cloud is used TCP Load Balancer and then internal routing via Nginx Ingress.

Problem is: when I upload some bigger file (17MB), ingress is crashing with this error:

W 2019-12-01T14:26:06.341588Z Dynamic reconfiguration failed: Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory 
E 2019-12-01T14:26:06.341658Z Unexpected failure reconfiguring NGINX: 
W 2019-12-01T14:26:06.345575Z requeuing initial-sync, err Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory 
I 2019-12-01T14:26:06.354869Z Configuration changes detected, backend reload required. 
E 2019-12-01T14:26:06.393528796Z Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory

E 2019-12-01T14:26:08.077580Z healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused 
I 2019-12-01T14:26:12.314526990Z 10.132.0.25 - [10.132.0.25] - - [01/Dec/2019:14:26:12 +0000] "GET / HTTP/2.0" 200 541 "-" "GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)" 99 1.787 [bap-staging-bap-staging-80] [] 10.102.2.4:80 553 1.788 200 5ac9d438e5ca31618386b35f67e2033b

E 2019-12-01T14:26:12.455236Z healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused 
I 2019-12-01T14:26:13.156963Z Exiting with 0 

Here is yaml configuration of Nginx ingress. Configuration is default by Gitlab's system that is creating cluster on their own.

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
  creationTimestamp: "2019-11-24T17:35:04Z"
  generation: 3
  labels:
    app: nginx-ingress
    chart: nginx-ingress-1.22.1
    component: controller
    heritage: Tiller
    release: ingress
  name: ingress-nginx-ingress-controller
  namespace: gitlab-managed-apps
  resourceVersion: "2638973"
  selfLink: /apis/apps/v1/namespaces/gitlab-managed-apps/deployments/ingress-nginx-ingress-controller
  uid: bfb695c2-0ee0-11ea-a36a-42010a84009f
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: nginx-ingress
      release: ingress
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        prometheus.io/port: "10254"
        prometheus.io/scrape: "true"
      creationTimestamp: null
      labels:
        app: nginx-ingress
        component: controller
        release: ingress
    spec:
      containers:
      - args:
        - /nginx-ingress-controller
        - --default-backend-service=gitlab-managed-apps/ingress-nginx-ingress-default-backend
        - --election-id=ingress-controller-leader
        - --ingress-class=nginx
        - --configmap=gitlab-managed-apps/ingress-nginx-ingress-controller
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        name: nginx-ingress-controller
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        - containerPort: 443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        resources: {}
        securityContext:
          allowPrivilegeEscalation: true
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - ALL
          runAsUser: 33
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/nginx/modsecurity/modsecurity.conf
          name: modsecurity-template-volume
          subPath: modsecurity.conf
        - mountPath: /var/log/modsec
          name: modsecurity-log-volume
      - args:
        - /bin/sh
        - -c
        - tail -f /var/log/modsec/audit.log
        image: busybox
        imagePullPolicy: Always
        name: modsecurity-log
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/log/modsec
          name: modsecurity-log-volume
          readOnly: true
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: ingress-nginx-ingress
      serviceAccountName: ingress-nginx-ingress
      terminationGracePeriodSeconds: 60
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: modsecurity.conf
            path: modsecurity.conf
          name: ingress-nginx-ingress-controller
        name: modsecurity-template-volume
      - emptyDir: {}
        name: modsecurity-log-volume

I have no Idea what else to try. I'm running cluster on 3 nodes (2x 1vCPU, 1.5GB RAM and 1x Preemptile 2vCPU, 1,8GB RAM), all of them on SSD drives.

Anytime i upload the image, disk IO will get crazy.

Disk IOPS Disk I/O Thanks for your help.

-- Jan Dominik
crash
docker
google-kubernetes-engine
kubernetes
nginx-ingress

1 Answer

2/27/2020

Found solution. Nginx-ingress pod contained modsecurity too. All requests were analyzed by mod security and bigger uploaded files caused those crashes. It wasn't crash at all but took too much CPU and I/O, that caused longer healthcheck response to all other pods. Solution is to configure correctly modsecurity or disable.

-- Jan Dominik
Source: StackOverflow