Kubernetes certbot standalone not working

10/20/2018

I'm trying to generate an SSL certificate with certbot/certbot docker container in kubernetes. I am using Job controller for this purpose which looks as the most suitable option. When I run the standalone option, I get the following error:

Failed authorization procedure. staging.ishankhare.com (http-01): urn:ietf:params:acme:error:connection :: The server could not connect to the client to verify the domain :: Fetching http://staging.ishankhare.com/.well-known/acme-challenge/tpumqbcDWudT7EBsgC7IvtSzZvMAuooQ3PmSPh9yng8: Timeout during connect (likely firewall problem)

I've made sure that this isn't due to misconfigured DNS entries by running a simple nginx container, and it resolves properly. Following is my Jobs file:

apiVersion: batch/v1
kind: Job
metadata:
  #labels:
  #  app: certbot-generator
  name: certbot
spec:
  template:
    metadata:
      labels:
        app: certbot-generate
    spec:
      volumes:
        - name: certs
      containers:
        - name: certbot
          image: certbot/certbot
          command: ["certbot"]
          #command: ["yes"]
          args: ["certonly", "--noninteractive", "--agree-tos", "--staging", "--standalone", "-d", "staging.ishankhare.com", "-m", "me@ishankhare.com"]

          volumeMounts:
            - name: certs
              mountPath: "/etc/letsencrypt/"
              #- name: certs
              #mountPath: "/opt/"
          ports:
            - containerPort: 80
            - containerPort: 443
      restartPolicy: "OnFailure"

and my service:

apiVersion: v1
kind: Service
metadata:
  name: certbot-lb
  labels:
    app: certbot-lb
spec:
  type: LoadBalancer
  loadBalancerIP: 35.189.170.149
  ports:
    - port: 80
      name: "http"
      protocol: TCP
    - port: 443
      name: "tls"
      protocol: TCP
  selector:
    app: certbot-generator

the full error message is something like this:

Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator standalone, Installer None
Obtaining a new certificate
Performing the following challenges:
http-01 challenge for staging.ishankhare.com
Waiting for verification...
Cleaning up challenges
Failed authorization procedure. staging.ishankhare.com (http-01): urn:ietf:params:acme:error:connection :: The server could not connect to the client to verify the domain :: Fetching http://staging.ishankhare.com/.well-known/acme-challenge/tpumqbcDWudT7EBsgC7IvtSzZvMAuooQ3PmSPh9yng8: Timeout during connect (likely firewall problem)
IMPORTANT NOTES:
 - The following errors were reported by the server:

   Domain: staging.ishankhare.com
   Type:   connection
   Detail: Fetching
   http://staging.ishankhare.com/.well-known/acme-challenge/tpumqbcDWudT7EBsgC7IvtSzZvMAuooQ3PmSPh9yng8:
   Timeout during connect (likely firewall problem)

   To fix these errors, please make sure that your domain name was
   entered correctly and the DNS A/AAAA record(s) for that domain
   contain(s) the right IP address. Additionally, please check that
   your computer has a publicly routable IP address and that no
   firewalls are preventing the server from communicating with the
   client. If you're using the webroot plugin, you should also verify
   that you are serving files from the webroot path you provided.
 - Your account credentials have been saved in your Certbot
   configuration directory at /etc/letsencrypt. You should make a
   secure backup of this folder now. This configuration directory will
   also contain certificates and private keys obtained by Certbot so
   making regular backups of this folder is ideal.

I've also tried running this as a simple Pod but to no help. Although I still feel running it as a Job to completion is the way to go.

-- Ishan Khare
certbot
docker
kubernetes
lets-encrypt

1 Answer

11/20/2018

First, be aware your Job definition is valid, but the spec.template.metadata.labels.app: certbot-generate value does not match with your Service definition spec.selector.app: certbot-generator: one is certbot-generate, the second is certbot-generator. So the pod run by the job controller is never added as an endpoint to the service.

Adjust one or the other, but they have to match, and that might just work :)

Although, I'm not sure using a Service with a selector targeting short-lived pods from a Job controller would work, neither with a simple Pod as you tested. The certbot-randomId pod created by the job (or whatever simple pod you create) takes about 15 seconds total to run/fail, and the HTTP validation challenge is triggered after just a few seconds of the pod life: it's not clear to me that would be enough time for kubernetes proxying to be already working between the service and the pod.

We can safely assume that the Service is actually working because you mentioned that you tested DNS resolution, so you can easily ensure that's not a timing issue by adding a sleep 10 (or more!) to give more time for the pod to be added as an endpoint to the service and being proxied appropriately before the HTTP challenge is triggered by certbot. Just change your Job command and args for those:

command: ["/bin/sh"]
args: ["-c", "sleep 10 && certbot certonly --noninteractive --agree-tos --staging --standalone -d staging.ishankhare.com -m me@ishankhare.com"]

And here too, that might just work :)


That being said, I'd warmly recommend you to use cert-manager which you can install easily through its stable Helm chart: the Certificate custom resource that it introduces will store your certificate in a Secret which will make it straightforward to reuse from whatever K8s resource, and it takes care of renewal automatically so you can just forget about it all.

-- Clorichel
Source: StackOverflow