Sonobuoy DNS Tests Fail, failure can't be replicated manually

1/10/2019

I'm running sonobuoy on my K8S cluster and DNS tests fail but I can't replicate the problem. How can I understand why they fail?

K8S version - v1.9.11

Infra - Azure, acs-engine v0.23.1

Sonobuoy version - v0.11.6 (latest to support K8S v1.9.11).

Run command -

sonobuoy run --kube-conformance-image "gcr.io/heptio-images/kube-conformance:v1.9"

Failed DNS tests:

[sig-network] DNS should provide DNS for services  [Conformance]
[sig-network] DNS should provide DNS for the cluster  [Conformance]

This is a part of an effort to benchmark K8S configurations and test them before they go live.

The tests I ran manually (i extracted it from the log):

dig +notcp +noall +answer +search netperf-w2 A
dig +tcp +noall +answer +search netperf-w2 A
dig +notcp +noall +answer +search netperf-w2.network-test A
dig +tcp +noall +answer +search netperf-w2.network-test A
dig +notcp +noall +answer +search netperf-w2.network-test.svc A
dig +tcp +noall +answer +search netperf-w2.network-test.svc A
dig +notcp +noall +answer +search _http._tcp.netperf-w2.network-test.svc SRV
dig +tcp +noall +answer +search _http._tcp.netperf-w2.network-test.svc SRV 
dig +notcp +noall +answer +search _http._tcp.test-service-2.network-test.svc SRV
dig +tcp +noall +answer +search _http._tcp.test-service-2.network-test.svc SRV

#podARec=$(hostname -i| awk -F. '{print $1"-"$2"-"$3"-"$4".network-test.pod.cluster.local"}');
dig +notcp +noall +answer +search 10-240-1-76.network-test.pod.cluster.local A
dig +tcp +noall +answer +search 10-240-1-76.network-test.pod.cluster.local A
dig +notcp +noall +answer +search 103.187.0.10.in-addr.arpa. PTR
dig +tcp +noall +answer +search 103.187.0.10.in-addr.arpa. PTR

The log shows a lot of:

Jan  8 10:17:47.221: INFO: Unable to read wheezy_tcp@PodARecord from pod dns-test-7dbef827-132e-11e9-bd06-e20b33d4fc6d: the server could not find the requested resource (get pods dns-test-7dbef827-132e-11e9-bd06-e20b33d4fc6d)
Jan  8 10:17:47.346: INFO: Unable to read jessie_tcp@PodARecord from pod dns-test-7dbef827-132e-11e9-bd06-e20b33d4fc6d: the server could not find the requested resource (get pods dns-test-7dbef827-132e-11e9-bd06-e20b33d4fc6d)
Jan  8 10:17:47.356: INFO: Lookups using dns-test-7dbef827-132e-11e9-bd06-e20b33d4fc6d failed for: [wheezy_tcp@PodARecord jessie_tcp@PodARecord]

And it ends with:

• Failure [619.219 seconds]
[sig-network] DNS
/workspace/anago-v1.9.4-beta.0.53+bee2d1505c4fe8/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/network/framework.go:22
should provide DNS for services  [Conformance] [It]
/workspace/anago-v1.9.4-beta.0.53+bee2d1505c4fe8/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:648

Expected error:
    <*errors.errorString | 0xc42026ab20>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
not to have occurred

/workspace/anago-v1.9.4-beta.0.53+bee2d1505c4fe8/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/network/dns.go:170

Eventually I expect to understand the root cause for the failure and have a better understating on how to better debug this.

The test in Kubernetes' repo.

-- BarakH
e2e-testing
kubernetes
standards-compliance

0 Answers