Monitor that service to pod iptables mappings are current

8/4/2016

The problem occurred on kubernetes 1.2.3 but we are running 1.3.3 now.

We have had 2 situations where kube-proxy was running but was wedged and not updating iptables with the current state of services to pods. This led to a situation where traffic destined for serviceA got routed to pods that are part of serviceB. So we have improved our monitoring after the fact to query /healthz on the kube-proxy. I'm wondering if I should be monitoring anything beyond the existence of the kube-proxy process and that it's returning 200 from /healthz.

Are you monitoring anything additional to ensure that service to pod mappings are current. I realize that as the service landscape is changing we can have a period of time where all hosts may not be accurate but i'm only interested in catching the scenario where say 3+ minutes have gone by and iptables is not current on every node in the cluster which would seem to indicate to me that something is broken somewhere.

I had thought about doing something like having a canary service where the backing deployment get's redeployed every 5 minutes and then i verify from each node that I can get to all of the backing pods via the service cluster ip.

I'm not sure if this is the right approach. It would seem like it could catch the problem we had earlier but I'm also thinking some other simpler way may exist like just checking the time stamp on when iptables was last updated?

Thanks!

-- Guido Pepper
kubernetes

1 Answer

8/5/2016

You could run kube-proxy inside a pod (by dropping a manifest inside /etc/kubernetes/manifests on each node), benefit from the health checking / liveness probes offered by Kubernetes, and let it take care of restarting the service for you in case of trouble.

Setting a very low threshold on the liveness probe will trigger a restart as soon as the /healthz endpoint takes too long to respond. It won't guarantee you that IPtables rules are always up-to-date, but will ensure that the kube-proxy is always healthy (which in turn will ensure IPtables rules are consistent)

Example:

Check the healthz endpoint of kube-proxy every 10s. Restart the pod if it doesn't respond in less than 1s:

apiVersion: v1
kind: Pod

metadata:
  name: kube-proxy
  namespace: kube-system

spec:
  hostNetwork: true

  containers:

  - name: kube-proxy
    image: gcr.io/google_containers/hyperkube:v1.3.4
    command:
    - /hyperkube
    - proxy
    - --master=https://master.kubernetes.io:6443
    - --kubeconfig=/conf/kubeconfig
    - --proxy-mode=iptables

    livenessProbe:
      httpGet:
        path: /healthz
        port: 10249
      timeoutSeconds: 1
      periodSeconds: 10
      failureThreshold: 1

    securityContext:
      privileged: true

    volumeMounts:
    - mountPath: /conf/kubeconfig
      name: kubeconfig
      readOnly: true
    - mountPath: /ssl/kubernetes
      name: ssl-certs-kubernetes
      readOnly: true
    - mountPath: /etc/ssl/certs
      name: ssl-certs-host
      readOnly: true

  volumes:
  - hostPath:
      path: /etc/kubernetes/proxy-kubeconfig.yml
    name: kubeconfig
  - hostPath:
      path: /etc/kubernetes/ssl
    name: ssl-certs-kubernetes
  - hostPath:
      path: /usr/share/ca-certificates
    name: ssl-certs-host
-- Antoine Cotten
Source: StackOverflow