I'm trying to generate an SSL certificate with certbot/certbot
docker container in kubernetes. I am using Job
controller for this purpose which looks as the most suitable option. When I run the standalone option, I get the following error:
Failed authorization procedure. staging.ishankhare.com (http-01): urn:ietf:params:acme:error:connection :: The server could not connect to the client to verify the domain :: Fetching http://staging.ishankhare.com/.well-known/acme-challenge/tpumqbcDWudT7EBsgC7IvtSzZvMAuooQ3PmSPh9yng8: Timeout during connect (likely firewall problem)
I've made sure that this isn't due to misconfigured DNS entries by running a simple nginx container, and it resolves properly. Following is my Jobs
file:
apiVersion: batch/v1
kind: Job
metadata:
#labels:
# app: certbot-generator
name: certbot
spec:
template:
metadata:
labels:
app: certbot-generate
spec:
volumes:
- name: certs
containers:
- name: certbot
image: certbot/certbot
command: ["certbot"]
#command: ["yes"]
args: ["certonly", "--noninteractive", "--agree-tos", "--staging", "--standalone", "-d", "staging.ishankhare.com", "-m", "me@ishankhare.com"]
volumeMounts:
- name: certs
mountPath: "/etc/letsencrypt/"
#- name: certs
#mountPath: "/opt/"
ports:
- containerPort: 80
- containerPort: 443
restartPolicy: "OnFailure"
and my service:
apiVersion: v1
kind: Service
metadata:
name: certbot-lb
labels:
app: certbot-lb
spec:
type: LoadBalancer
loadBalancerIP: 35.189.170.149
ports:
- port: 80
name: "http"
protocol: TCP
- port: 443
name: "tls"
protocol: TCP
selector:
app: certbot-generator
the full error message is something like this:
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator standalone, Installer None
Obtaining a new certificate
Performing the following challenges:
http-01 challenge for staging.ishankhare.com
Waiting for verification...
Cleaning up challenges
Failed authorization procedure. staging.ishankhare.com (http-01): urn:ietf:params:acme:error:connection :: The server could not connect to the client to verify the domain :: Fetching http://staging.ishankhare.com/.well-known/acme-challenge/tpumqbcDWudT7EBsgC7IvtSzZvMAuooQ3PmSPh9yng8: Timeout during connect (likely firewall problem)
IMPORTANT NOTES:
- The following errors were reported by the server:
Domain: staging.ishankhare.com
Type: connection
Detail: Fetching
http://staging.ishankhare.com/.well-known/acme-challenge/tpumqbcDWudT7EBsgC7IvtSzZvMAuooQ3PmSPh9yng8:
Timeout during connect (likely firewall problem)
To fix these errors, please make sure that your domain name was
entered correctly and the DNS A/AAAA record(s) for that domain
contain(s) the right IP address. Additionally, please check that
your computer has a publicly routable IP address and that no
firewalls are preventing the server from communicating with the
client. If you're using the webroot plugin, you should also verify
that you are serving files from the webroot path you provided.
- Your account credentials have been saved in your Certbot
configuration directory at /etc/letsencrypt. You should make a
secure backup of this folder now. This configuration directory will
also contain certificates and private keys obtained by Certbot so
making regular backups of this folder is ideal.
I've also tried running this as a simple Pod
but to no help. Although I still feel running it as a Job
to completion is the way to go.
First, be aware your Job
definition is valid, but the spec.template.metadata.labels.app: certbot-generate
value does not match with your Service
definition spec.selector.app: certbot-generator
: one is certbot-generate
, the second is certbot-generator
. So the pod run by the job controller is never added as an endpoint to the service.
Adjust one or the other, but they have to match, and that might just work :)
Although, I'm not sure using a Service
with a selector targeting short-lived pods from a Job
controller would work, neither with a simple Pod
as you tested. The certbot-randomId
pod created by the job (or whatever simple pod you create) takes about 15 seconds total to run/fail, and the HTTP validation challenge is triggered after just a few seconds of the pod life: it's not clear to me that would be enough time for kubernetes proxying to be already working between the service and the pod.
We can safely assume that the Service
is actually working because you mentioned that you tested DNS resolution, so you can easily ensure that's not a timing issue by adding a sleep 10
(or more!) to give more time for the pod to be added as an endpoint to the service and being proxied appropriately before the HTTP challenge is triggered by certbot. Just change your Job
command and args for those:
command: ["/bin/sh"]
args: ["-c", "sleep 10 && certbot certonly --noninteractive --agree-tos --staging --standalone -d staging.ishankhare.com -m me@ishankhare.com"]
And here too, that might just work :)
That being said, I'd warmly recommend you to use cert-manager which you can install easily through its stable Helm chart: the Certificate
custom resource that it introduces will store your certificate in a Secret
which will make it straightforward to reuse from whatever K8s resource, and it takes care of renewal automatically so you can just forget about it all.