Traefik cannot read on k8s api

8/27/2018

This is now the fourth time I set up a kubernetes cluster. It's always the same setup: basic k8s, traefik as reverse proxy, dashboard, prometheus, elk-stack. But this time something with the traefik deployment is odd...

So for all other clusters I just deployed my default setup with some rbac entries, a config map containing the toml file, the actual deployment, a service and the web-ui:

RBAC:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: traefik-ingress-controller
  namespace: infra
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: traefik-ingress-controller
rules:
- apiGroups:
  - ""
  resources:
  - services
  - endpoints
  - secrets
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs:
  - get
  - list
  - watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: traefik-ingress-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: traefik-ingress-controller
subjects:
- kind: ServiceAccount
  name: traefik-ingress-controller
  namespace: infra

ConfigMap:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: traefik-toml
  labels:
    name: traefik-toml
  namespace: infra
data:
  traefik.toml: |-
    defaultEntryPoints = ["http","https"]
    [entryPoints]
      [entryPoints.http]
      address = ":80"
        [entryPoints.http.redirect]
          entryPoint = "https"
      [entryPoints.https]
      address = ":443"
        [entryPoints.https.tls]
          [[entryPoints.https.tls.certificates]]
          CertFile = "/ssl/external/<EXTERNAL_URL>.crt"
          KeyFile = "/ssl/external/<EXTERNAL_URL>.key"
          [[entryPoints.https.tls.certificates]]
          CertFile = "/ssl/internal/<INTERNAL_URL>.crt"
          KeyFile = "/ssl/internal/<INTERNAL_URL>.key"
    [accessLog]

Deployment

---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  name: traefik-ingress-controller
  namespace: infra
  labels:
    k8s-app: traefik-ingress-lb
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: traefik-ingress-lb
  template:
    metadata:
      labels:
        k8s-app: traefik-ingress-lb
        name: traefik-ingress-lb
    spec:
      serviceAccountName: traefik-ingress-controller
      terminationGracePeriodSeconds: 60
      containers:
      - image: traefik:v1.6.5
        name: traefik-ingress-lb
        volumeMounts:
        - mountPath: /ssl/external
          name: ssl-external
        - mountPath: /ssl/internal
          name: ssl-internal
        - name: traefik-toml
          subPath: traefik.toml
          mountPath: /config/traefik.toml
        ports:
        - name: http
          containerPort: 80
        - name: https
          containerPort: 443
        - name: admin
          containerPort: 8080
        args:
        - --configfile=/config/traefik.toml
        - --api
        - --kubernetes
        - --logLevel=INFO
      volumes:
      - name: ssl-external
        secret:
          secretName: <EXTERNAL_URL>.cert
      - name: ssl-internal
        secret:
          secretName: <INTERNAL_URL>.cert
      - name: traefik-toml
        configMap:
          name: traefik-toml

Service:

---
kind: Service
apiVersion: v1
metadata:
  name: traefik-ingress-service
  namespace: infra
spec:
  selector:
    k8s-app: traefik-ingress-lb
  ports:
    - protocol: TCP
      port: 80
      name: web
    - protocol: TCP
      port: 443
      name: sweb
  externalIPs:
    - <WORKER IP 1>
    - <WORKER IP 2>

This works nicely for the other ones, but on the new one (where I did not setup kubernetes myself), there is the following error in the logs every 30 seconds (the Error checking new version not that often!):

E0827 14:29:49.566294       1 reflector.go:205] github.com/containous/traefik/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout                                                       
E0827 14:29:49.572633       1 reflector.go:205] github.com/containous/traefik/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout                                                    
E0827 14:29:49.592844       1 reflector.go:205] github.com/containous/traefik/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1beta1.Ingress: Get https://10.96.0.1:443/apis/extensions/v1beta1/ingresses?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout                                
time="2018-08-27T14:30:00Z" level=warning msg="Error checking new version: Get https://update.traefik.io/repos/containous/traefik/releases: dial tcp: i/o timeout"

Has someone any idea? Is this a known issue? I could not find any known problems on this topic..

Thanks in advance!

-- razr
kubernetes
rbac
traefik

1 Answer

9/10/2018

I managed to fix the problem:

The problem was a faulty iptables FORWARD policy, that is set by newer docker engines: https://github.com/moby/moby/issues/35777

Currently we have a workaround, that is steadily setting the policy back to ACCEPT.

If we have a real fix I will hopefully remember to come back here and post it :)

-- razr
Source: StackOverflow