I'am running Kubernetes cluster on Google Cloud Platform via their Kubernetes Engine. Cluster version is 1.13.11-gke.14. PHP application pod contains 2 containers - Nginx as a reverse proxy and php-fpm (7.2).
In google cloud is used TCP Load Balancer and then internal routing via Nginx Ingress.
Problem is: when I upload some bigger file (17MB), ingress is crashing with this error:
W 2019-12-01T14:26:06.341588Z Dynamic reconfiguration failed: Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory
E 2019-12-01T14:26:06.341658Z Unexpected failure reconfiguring NGINX:
W 2019-12-01T14:26:06.345575Z requeuing initial-sync, err Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory
I 2019-12-01T14:26:06.354869Z Configuration changes detected, backend reload required.
E 2019-12-01T14:26:06.393528796Z Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory
E 2019-12-01T14:26:08.077580Z healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I 2019-12-01T14:26:12.314526990Z 10.132.0.25 - [10.132.0.25] - - [01/Dec/2019:14:26:12 +0000] "GET / HTTP/2.0" 200 541 "-" "GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)" 99 1.787 [bap-staging-bap-staging-80] [] 10.102.2.4:80 553 1.788 200 5ac9d438e5ca31618386b35f67e2033b
E 2019-12-01T14:26:12.455236Z healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I 2019-12-01T14:26:13.156963Z Exiting with 0
Here is yaml configuration of Nginx ingress. Configuration is default by Gitlab's system that is creating cluster on their own.
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
creationTimestamp: "2019-11-24T17:35:04Z"
generation: 3
labels:
app: nginx-ingress
chart: nginx-ingress-1.22.1
component: controller
heritage: Tiller
release: ingress
name: ingress-nginx-ingress-controller
namespace: gitlab-managed-apps
resourceVersion: "2638973"
selfLink: /apis/apps/v1/namespaces/gitlab-managed-apps/deployments/ingress-nginx-ingress-controller
uid: bfb695c2-0ee0-11ea-a36a-42010a84009f
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx-ingress
release: ingress
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io/port: "10254"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: nginx-ingress
component: controller
release: ingress
spec:
containers:
- args:
- /nginx-ingress-controller
- --default-backend-service=gitlab-managed-apps/ingress-nginx-ingress-default-backend
- --election-id=ingress-controller-leader
- --ingress-class=nginx
- --configmap=gitlab-managed-apps/ingress-nginx-ingress-controller
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
name: nginx-ingress-controller
ports:
- containerPort: 80
name: http
protocol: TCP
- containerPort: 443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
resources: {}
securityContext:
allowPrivilegeEscalation: true
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 33
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/nginx/modsecurity/modsecurity.conf
name: modsecurity-template-volume
subPath: modsecurity.conf
- mountPath: /var/log/modsec
name: modsecurity-log-volume
- args:
- /bin/sh
- -c
- tail -f /var/log/modsec/audit.log
image: busybox
imagePullPolicy: Always
name: modsecurity-log
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/log/modsec
name: modsecurity-log-volume
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: ingress-nginx-ingress
serviceAccountName: ingress-nginx-ingress
terminationGracePeriodSeconds: 60
volumes:
- configMap:
defaultMode: 420
items:
- key: modsecurity.conf
path: modsecurity.conf
name: ingress-nginx-ingress-controller
name: modsecurity-template-volume
- emptyDir: {}
name: modsecurity-log-volume
I have no Idea what else to try. I'm running cluster on 3 nodes (2x 1vCPU, 1.5GB RAM and 1x Preemptile 2vCPU, 1,8GB RAM), all of them on SSD drives.
Anytime i upload the image, disk IO will get crazy.
Found solution. Nginx-ingress pod contained modsecurity too. All requests were analyzed by mod security and bigger uploaded files caused those crashes. It wasn't crash at all but took too much CPU and I/O, that caused longer healthcheck response to all other pods. Solution is to configure correctly modsecurity or disable.