Kubernetes starts giving errors after few hours of uptime

12/21/2015

I have installed K8S on OpenStack following this guide.

The installation went fine and I was able to run pods but after some time my applications stops working. I can still create pods but request won't reach the services from outside the cluster and also from within the pods. Basically, something in networking gets messed up. The iptables -L -vnt nat still shows the proper configuration but things won't work.

To make it working, I have to rebuild cluster, removing all services and replication controllers doesn't work.

I tried to look into the logs. Below is the journal for kube-proxy:

Dec 20 02:12:18 minion01.novalocal systemd[1]: Started Kubernetes Proxy.
    Dec 20 02:15:52 minion01.novalocal kube-proxy[1030]: I1220 02:15:52.269784    1030 proxier.go:487] Opened iptables from-containers public port for service "default/opensips:sipt" on TCP port 5060
    Dec 20 02:15:52 minion01.novalocal kube-proxy[1030]: I1220 02:15:52.278952    1030 proxier.go:498] Opened iptables from-host public port for service "default/opensips:sipt" on TCP port 5060
    Dec 20 03:05:11 minion01.novalocal kube-proxy[1030]: W1220 03:05:11.806927    1030 api.go:224] Got error status on WatchEndpoints channel: &{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink: ResourceVersion:} Status:Failure Message:401: The event in requested index is outdated and cleared (the requested history has been cleared [1433/544]) [2432] Reason: Details:<nil> Code:0}
    Dec 20 03:06:08 minion01.novalocal kube-proxy[1030]: W1220 03:06:08.177225    1030 api.go:153] Got error status on WatchServices channel: &{TypeMeta:{Kind: APIVersion:} ListMeta:{SelfLink: ResourceVersion:} Status:Failure Message:401: The event in requested index is outdated and cleared (the requested history has been cleared [1476/207]) [2475] Reason: Details:<nil> Code:0}
..
..
..
Dec 20 16:01:23 minion01.novalocal kube-proxy[1030]: E1220 16:01:23.448570    1030 proxier.go:161] Failed to ensure iptables: error creating chain "KUBE-PORTALS-CONTAINER": fork/exec /usr/sbin/iptables: too many open files:
Dec 20 16:01:23 minion01.novalocal kube-proxy[1030]: W1220 16:01:23.448749    1030 iptables.go:203] Error checking iptables version, assuming version at least 1.4.11: %vfork/exec /usr/sbin/iptables: too many open files
Dec 20 16:01:23 minion01.novalocal kube-proxy[1030]: E1220 16:01:23.448868    1030 proxier.go:409] Failed to install iptables KUBE-PORTALS-CONTAINER rule for service "default/kubernetes:"
Dec 20 16:01:23 minion01.novalocal kube-proxy[1030]: E1220 16:01:23.448906    1030 proxier.go:176] Failed to ensure portal for "default/kubernetes:": error checking rule: fork/exec /usr/sbin/iptables: too many open files:
Dec 20 16:01:23 minion01.novalocal kube-proxy[1030]: W1220 16:01:23.449006    1030 iptables.go:203] Error checking iptables version, assuming version at least 1.4.11: %vfork/exec /usr/sbin/iptables: too many open files
Dec 20 16:01:23 minion01.novalocal kube-proxy[1030]: E1220 16:01:23.449133    1030 proxier.go:409] Failed to install iptables KUBE-PORTALS-CONTAINER rule for service "default/repo-client:"

I found few posts relating to "failed to install iptables" but they don't seem to be relevant as initially everything works but after few hours it gets messed up.

-- user3275095
coreos
kubernetes

1 Answer

12/21/2015

What version of Kubernetes is this? A long time ago (~1.0.4) we had a bug in the kube-proxy where it leaked sockets/file-descriptors.

If you aren't running a 1.1.3 binary, consider upgrading.

Also, you should be able to use lsof to figure out who has all of the files open.

-- Brendan Burns
Source: StackOverflow