Kubernetes challenge waiting for http-01 propagation: dial tcp: no such host


I am trying to create a kubernetes cluster namespace with auto generated DNS for ingress, secured with Let's Encrypt TLS certificates. Unfortunately i'm running in some trouble and do not know where to look for the solution.

Deployment is being done with a multi-stage yaml pipeline into an AKS cluster, i've setup an nginx ingress controller and cert-manager, both in a separate namespace. The deployment succeeds and everything seems to be running, but the exposed hostnames from the ingress are not reachable. When taking a look at the certificates i see the following

Name:         letsencrypt-tls-cd
Namespace:    myApp-dev
Labels:       app.kubernetes.io/instance=myApp
Annotations:  <none>
API Version:  cert-manager.io/v1alpha3
Kind:         Certificate
  Creation Timestamp:  2020-06-15T11:59:53Z
  Generation:          1
  Owner References:
    API Version:           extensions/v1beta1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Ingress
    Name:                  myApp-cd
    UID:                   a6cbbf69-749e-4dd1-81cc-37a817051690
  Resource Version:        1218430
  Self Link:               /apis/cert-manager.io/v1alpha3/namespaces/myApp-dev/certificates/letsencrypt-tls-cd
  UID:                     46ac0acb-71bf-4dbc-a376-c024e92d68ca
  Dns Names:
  Issuer Ref:
    Group:      cert-manager.io
    Kind:       Issuer
    Name:       letsencrypt-prod
  Secret Name:  letsencrypt-tls-cd
    Last Transition Time:  2020-06-15T11:59:53Z
    Message:               ***Waiting for CertificateRequest "letsencrypt-tls-cd-95531636" to complete***
    Reason:                InProgress
    Status:                False
    Type:                  Ready
  Type    Reason        Age   From          Message
  ----    ------        ----  ----          -------
  Normal  GeneratedKey  57m   cert-manager  Generated a new private key
  Normal  Requested     57m   cert-manager  Created new CertificateRequest resource "letsencrypt-tls-cd-95531636"

Looking into the certificate request :

Name:         letsencrypt-tls-cd-95531636
Namespace:    myApp-dev
Labels:       app.kubernetes.io/instance=myApp
Annotations:  cert-manager.io/certificate-name: letsencrypt-tls-cd
              cert-manager.io/private-key-secret-name: letsencrypt-tls-cd
API Version:  cert-manager.io/v1alpha3
Kind:         CertificateRequest
  Creation Timestamp:  2020-06-15T11:59:54Z
  Generation:          1
  Owner References:
    API Version:           cert-manager.io/v1alpha2
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Certificate
    Name:                  letsencrypt-tls-cd
    UID:                   46ac0acb-71bf-4dbc-a376-c024e92d68ca
  Resource Version:        1218442
  Self Link:               /apis/cert-manager.io/v1alpha3/namespaces/myApp-dev/certificaterequests/letsencrypt-tls-cd-95531636
  UID:                     2bef5e93-6722-43c0-bd2c-283d70334b1c
  Csr:  mySecret
  Issuer Ref:
    Group:  cert-manager.io
    Kind:   Issuer
    Name:   letsencrypt-prod
    Last Transition Time:  2020-06-15T11:59:54Z
    Message:               Waiting on certificate issuance from order myApp-dev/letsencrypt-tls-cd-95531636-1679437339: "pending"
    Reason:                Pending
    Status:                False
    Type:                  Ready
  Type    Reason        Age   From          Message
  ----    ------        ----  ----          -------
  Normal  OrderCreated  58m   cert-manager  Created Order resource myApp-dev/letsencrypt-tls-cd-95531636-1679437339

And the challenge:

Name:         letsencrypt-tls-cm-1259919220-2936945618-694921812
Namespace:    myApp-dev
Labels:       <none>
Annotations:  <none>
API Version:  acme.cert-manager.io/v1alpha3
Kind:         Challenge
  Creation Timestamp:  2020-06-15T11:59:55Z
  Generation:  1
  Owner References:
    API Version:           acme.cert-manager.io/v1alpha2
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Order
    Name:                  letsencrypt-tls-cm-1259919220-2936945618
    UID:                   4d8eab8e-449b-494e-a751-912a77671223
  Resource Version:        1218492
  Self Link:               /apis/acme.cert-manager.io/v1alpha3/namespaces/myApp-dev/challenges/letsencrypt-tls-cm-1259919220-2936945618-694921812
  UID:                     8b355336-309a-4192-83b7-41397ebc20ac
  Authz URL:  https://acme-v02.api.letsencrypt.org/acme/authz-v3/5253543313
  Dns Name:   cm-myApp-dev.dev
  Issuer Ref:
    Group:  cert-manager.io
    Kind:   Issuer
    Name:   letsencrypt-prod
  Key:      0USdpDsQg7_NY1FB138oj6O3AtVVKn6rkdxUSBQk4KI.qZ3FGlVmwRY6MwBNqUR5iktM1fJWdXxFWZYFOpjSUkQ
        Class:  nginx
        Pod Template:
            Node Selector:
              kubernetes.io/os:  linux
  Token:                         0USdpDsQg7_NY1FB138oj6O3AtVVKn6rkdxUSBQk4KI
  Type:                          http-01
  URL:                           https://acme-v02.api.letsencrypt.org/acme/chall-v3/5253543313/1eUG0g
  Wildcard:                      false
  Presented:   true
  Processing:  true
  Reason:      Waiting for http-01 challenge propagation: failed to perform self check GET request 'http://cm-myApp-dev.dev/.well-known/acme-challenge/0USdpDsQg7_NY1FB138oj6O3AtVVKn6rkdxUSBQk4KI': Get "http://cm-myApp-dev.dev/.well-known/acme-challenge/0USdpDsQg7_NY1FB138oj6O3AtVVKn6rkdxUSBQk4KI": dial tcp: lookup cm-myApp-dev.dev on no such host
  State:       pending
  Type    Reason     Age    From          Message
  ----    ------     ----   ----          -------
  Normal  Started    2m15s  cert-manager  Challenge scheduled for processing
  Normal  Presented  2m14s  cert-manager  Presented challenge using http-01 challenge mechanism

I'm quite new to kubernetes and don't know where to look to fix the error bellow, any help is greatly appreciated.

Waiting for http-01 challenge propagation: failed to perform self check GET request 'http://cm-myApp-dev.dev/.well-known/acme-challenge/0USdpDsQg7_NY1FB138oj6O3AtVVKn6rkdxUSBQk4KI': Get "http://cm-myApp-dev.dev/.well-known/acme-challenge/0USdpDsQg7_NY1FB138oj6O3AtVVKn6rkdxUSBQk4KI": dial tcp: lookup cm-myApp-dev.dev on no such host

Looking in the ingress controller i get the following error:

    7 controller.go:1374] Error getting SSL certificate "myApp-dev/letsencrypt-tls-cd": local SSL certificate myApp-dev/letsencrypt-tls-cd was not found
W0616 06:24:29.033235       7 controller.go:1119] Error getting SSL certificate "myApp-dev/letsencrypt-tls-cm": local SSL certificate myApp-dev/letsencrypt-tls-cm was not found. Using default certificate
W0616 06:24:29.033264       7 controller.go:1374] Error getting SSL certificate "myApp-dev/letsencrypt-tls-cd": local SSL certificate myApp-dev/letsencrypt-tls-cd was not found
I0616 06:24:50.355937       7 status.go:275] updating Ingress myApp-dev/cm-acme-http-solver-9z88h status from [] to [{ } { }]
W0616 06:24:50.363181       7 controller.go:1119] Error getting SSL certificate "myApp-dev/letsencrypt-tls-cm": local SSL certificate myApp-dev/letsencrypt-tls-cm was not found. Using default certificate
W0616 06:24:50.363346       7 controller.go:1374] Error getting SSL certificate "myApp-dev/letsencrypt-tls-cd": local SSL certificate myApp-dev/letsencrypt-tls-cd was not found
I0616 06:24:50.363514       7 event.go:278] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"myApp-dev", Name:"cm-acme-http-solver-9z88h", UID:"1b53f4dc-1b52-4f11-9cd0-6ffe1d0d9d40", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"1451371", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress myApp-dev/cm-acme-http-solver-9z88h
-- Nick

Remember to add DNS records to the domain, such as A and CNAME to route traffic to the Kubernetes load balancer.

E.g. cm-myApp-dev.dev or any other subdomains.

If someone Googles this, then know that this issue can be also caused by DNS caching in your Kubernetes cluster. In this case, it is a transient error, but in some contexts speed could be important (e.g. if you are a managed service provider).

I wrote about it here but in summary.

  • cert-manager would emit the "no such host" error for a while, and eventually succeed
  • my coredns ConfigMap (in kube-system namespace) stipulated local DNS resolvers, and a 30 sec cache
  • you can fix the delay by (1) removing the cache, and (2) pointing the resolver to Google DNS (or another, depending on your needs)

Hope this pointer is helpful to someone.

The problem was that the top level domain name we were using was not valid, therefore the ingress didn't refer to a valid domain and threw an error. Creating a valid top level domain and implementing it in our deployment solved the problem.

You can refer this link to configure cert manager at AKS. It will automatically create the TLS secret too, once the certificate gets validated and will attain ready state

