Kubernetes Replication Controller Integration Test Failure

9/20/2016

I am seeing the following kubernetes integration tests fail pretty consistently, about 90% of the time on RHEL 7.2, Fedora 24, and CentOS7.1:

test/integration/garbagecollector
test/integration/replicationcontroller

They seem to be due to an etcd failure. My online queries lead me to believe this may also encompass an apiserver issue. My setup is simple, I install/start docker, install go, clone the kubernetes repo from github, use hack/install-etcd.sh from the repo and add it to path, get ginkgo, gomega and go-bindata, then run 'make test-integration'. I don't manually change anything or add any custom files/configs. Has anyone run into these issues and know a solution? The only mention of this issue I have seen online has been deemed a flake and has no listed solution, but I run into this issue almost every single test run. Pieces of the error are below, I can give more if needed:

Garbage Collector:

\*many lines from garbagecollector.go that look good*

I0920 14:42:39.725768   11823 garbagecollector.go:479] create storage for resource { v1 secrets}

I0920 14:42:39.725786   11823 garbagecollector.go:479] create storage for resource { v1 serviceaccounts}

I0920 14:42:39.725803   11823 garbagecollector.go:479] create storage for resource { v1 services}

I0920 14:43:09.565529   11823 trace.go:61] Trace "List *rbac.ClusterRoleList" (started 2016-09-20 14:42:39.565113203 -0400 EDT):

[2.564µs] [2.564µs] About to list etcd node

[30.000353492s] [30.000350928s] Etcd node listed

[30.000361771s] [8.279µs] END

E0920 14:43:09.566770   11823 cacher.go:258] unexpected ListAndWatch error: pkg/storage/cacher.go:198: Failed to list *rbac.RoleBinding: client: etcd cluster is unavailable or misconfigured

\*repeats over and over with different thing failed to list*

Replication Controller:

I0920 14:35:16.907283   10482 replication_controller.go:481] replication controller worker shutting down

I0920 14:35:16.907293   10482 replication_controller.go:481] replication controller worker shutting down

I0920 14:35:16.907298   10482 replication_controller.go:481] replication controller worker shutting down

I0920 14:35:16.907303   10482 replication_controller.go:481] replication controller worker shutting down

I0920 14:35:16.907307   10482 replication_controller.go:481] replication controller worker shutting down

E0920 14:35:16.948417   10482 util.go:45] Metric for replication_controller already registered

--- FAIL: TestUpdateLabelToBeAdopted (30.07s)

replicationcontroller_test.go:270: Failed to create replication controller rc: Timeout: request did not complete within allowed duration

E0920 14:44:06.820506   12053 storage_rbac.go:116] unable to initialize clusterroles: client: etcd cluster is unavailable or misconfigured

There are no files in /var/log that even start with kube.

Thanks in advance!

-- Johnny Bieren
kubernetes

1 Answer

9/21/2016

I increased the limits on the number of file descriptors and haven't seen this issue since. So, gonna go ahead and call this solved

-- Johnny Bieren
Source: StackOverflow