We have been traefik for over a year and it has been great. We are currently on 1.6.6 and everything works as expected, once we try to upgrade to 1.7 all of our endpoints get 502s. Any idea why this could be happening?
We are trying to upgrade because NS1 was introduced as a DNS provider and it seems like HTTP challenges no longer work.
This our current setup:
apiVersion: v1
data:
traefik.toml: |
# traefik.toml
defaultEntryPoints = ["http", "https"]
[web]
address = ":8080"
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.redirect]
[entryPoints.https.tls]
[kubernetes]
[acme]
email = "devops@something.com"
storage = "/acme/acme.json"
entryPoint = "https"
acmeLogging = true
caServer = "https://acme-v02.api.letsencrypt.org/directory"
[[acme.domains]]
main = "something.com"
[[acme.domains]]
main = "something.com"
[acme.httpChallenge]
entryPoint = "http"
[retry]
attempts = 5
[accessLog]
[traefikLog]
filePath = "/acme/traefik.log"
kind: ConfigMap
metadata:
labels:
app: traefik
name: traefik
namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
labels:
app: traefik
name: traefik
namespace: kube-system
spec:
selector:
matchLabels:
app: traefik
template:
metadata:
labels:
app: traefik
spec:
serviceAccountName: traefik-ingress-controller
hostNetwork: true
dnsPolicy: ClusterFirst
nodeSelector:
role: edge-routers
containers:
- args:
- --configfile=/config/traefik.toml
- --kubernetes
- --api
- -d
image: traefik:1.6.6
imagePullPolicy: Always
livenessProbe:
tcpSocket:
port: 80
initialDelaySeconds: 20
name: traefik
ports:
- containerPort: 80
protocol: TCP
- containerPort: 443
protocol: TCP
- containerPort: 8080
protocol: TCP
securityContext:
privileged: true
readinessProbe:
tcpSocket:
port: 80
initialDelaySeconds: 20
volumeMounts:
- mountPath: /config
name: config
- mountPath: /acme
name: acme
volumes:
- configMap:
name: traefik
name: config
- hostPath:
path: /etc/traefik
name: acme
Seems like we are getting:
time="2019-02-08T00:40:58Z" level=debug msg="'502 Bad Gateway' caused by: EOF"
It seems like traefik increased their time between bytes and our internal cache was timing out due to it. Fixed by increasing the timeout.