Traefik v1.7.5 Kubernetes 1.10 (with kubenet networking on AWS)
I'm using Traefik as a Kubernetes ingress controller. It's been working well for my Elixir apps in production, but now I'm working to migrate a Ruby service using the puma webserver. Most requests (~200/s) are handled correctly. But some appear to cause Traefik to retry 120-200 times. The logs just show 100s of these:
172.58.x.x - - [11/Dec/2018:01:34:48 +0000] "PUT /users/123/game_results/234 HTTP/2.0" 500 21 "-" "okhttp/3.5.0" 610758 "www.example.com/" "http://100.96.13.37:5000" 329108ms
And there are zero corresponding errors in the Rails logs.
How can I troubleshoot this?
traefik config template (using Helm):
defaultEntryPoints = ["http","https"]
debug = false
logLevel = "INFO"
# Do not verify backend certificates (use https backends)
InsecureSkipVerify = true
[entryPoints]
[entryPoints.traefik]
address = ":8080"
[entryPoints.http]
address = ":80"
compress = true
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
compress = true
[entryPoints.https.proxyProtocol]
trustedIPs = ["0.0.0.0/0"]
[entryPoints.https.tls]
sniStrict = true
minVersion = "VersionTLS12"
[accessLog]
[api]
[kubernetes]
[metrics]
[metrics.prometheus]
buckets=[0.1,0.3,1.2,5.0]
entryPoint = "traefik"
[ping]
entryPoint = "http"
[acme]
email = "{{ .Values.acme.email }}"
storage = "{{ .Values.acme.storage }}"
acmeLogging = true
entryPoint = "https"
OnHostRule = true
caServer = "https://acme-v02.api.letsencrypt.org/directory"
[acme.dnsChallenge]
provider = "route53"
delayBeforeCheck = 5
{{- range .Values.acme.domains }}
[[acme.domains]]
main = "{{ .main }}"
{{- end }}
[consul]
endpoint = "traefik-consul.ingress:8500"
watch = true
prefix = "traefik"
[retry]
attempts = 1
edit
I ran with debug level logs for long enough to see if that has any more information. Unfortunately, I just got a bunch of these:
msg=vulcand/oxy/forward: completed ServeHttp on request
Request={"Method":"PUT","URL":{"Scheme":"http","Opaque":"","User":null,"Host":"100.96.13.37:5000","Path":"","RawPath":"","ForceQuery":false,"RawQuery":"","Fragment":""},"Proto":"HTTP/2.0","ProtoMajor":2,"ProtoMinor":0,"Header":{"Accept":["application/json version=1"],"Accept-Encoding":["gzip"],"Authorization":["Bearer abc"],"Content-Length":["138"],"Content-Type":["application/json; charset=utf-8"],"User-Agent":["okhttp/3.5.0"]},"ContentLength":138,"TransferEncoding":null,"Host":"www.example.com","Form":null,"PostForm":null,"MultipartForm":null,"Trailer":null,"RemoteAddr":"1.2.3.4:55773","RequestURI":"/users/123/game_results/123","TLS":null}
So that didn't seem to help me understand the problem any. I also upgraded to Traefik 1.7.5, but that didn't help either.