I have been trying to find information on how to best configure the ingress service for Kubernetes since we are seeing a lot of restarts (during performance testing) of the pods which then eventually end up in a crash loop.
I can't find any information on how to set the worker_process and what that value should be compared to the number of cores and ingress pods we are running. When I had a 4 vCore VM the worker_process was 4 and when I changed to a 16 vCore VM the worker_process changed to 16, so I guess by default it is taking the value of the VM, is this a ok value?
When I set auto scale I get for "TARGETS" and I can't see it scale when going above the 50% threshold:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-nginx-ingress-controller Deployment/my-nginx-ingress-controller <unknown>/50%, <unknown>/50% 2 12 6 2d
I have set this in my conf:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 12
targetCPUUtilizationPercentage: 50
targetMemoryUtilizationPercentage: 50
When I increase the number of replicas for the ingress I get less restarts, but they also eat a lot of CPU looking in the dashboard, so I would prefer to have less.
What do I need to change to get a god setup, please let me know if I should add any more information.
EDIT
I can see this in the log dumps I have, the time for all of these type of entries match the pod restart graph.
Logs
I0719 12:49:35.255169 8 leaderelection.go:187] attempting to acquire leader lease kube-system/ingress-controller-leader-nginx...
I0719 12:49:35.255314 8 nginx.go:279] Starting NGINX process
I0719 12:49:35.257764 8 controller.go:172] Configuration changes detected, backend reload required.
I0719 12:49:35.328405 8 leaderelection.go:196] successfully acquired lease kube-system/ingress-controller-leader-nginx
I0719 12:49:35.328744 8 status.go:148] new leader elected: xyz-nginx-ingress-controller-7d8c4474cb-xpg7c
I0719 12:49:35.599587 8 controller.go:190] Backend successfully reloaded.
I0719 12:49:35.599884 8 controller.go:202] Initial sync, sleeping for 1 second.
Query
KubePodInventory
| where Namespace == "kube-system"
| where ServiceName contains "ingress"
| summarize count() by bin(TimeGenerated, 10m), Name, PodRestartCount
| render timechart