Nodes get no certificates when trying to join a cluster with `kubeadm`

1/20/2020

I was able to bootstrap the master node for a kubernetes deployment using kubeadm, but I'm getting errors in the kubeadm join phase kubelet-start phase:

kubeadm --v=5 join phase kubelet-start 192.168.1.198:6443 --token x4drpl.ie61lm4vrqyig5vg     --discovery-token-ca-cert-hash sha256:hjksdhjsakdhjsakdhajdka --node-name media-server         
W0118 23:53:28.414247   22327 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
I0118 23:53:28.414383   22327 initconfiguration.go:103] detected and using CRI socket: /var/run/dockershim.sock
I0118 23:53:28.414476   22327 join.go:441] [preflight] Discovering cluster-info
I0118 23:53:28.414744   22327 token.go:188] [discovery] Trying to connect to API Server "192.168.1.198:6443"
I0118 23:53:28.416434   22327 token.go:73] [discovery] Created cluster-info discovery client, requesting info from "https://192.168.1.198:6443"
I0118 23:53:28.433749   22327 token.go:134] [discovery] Requesting info from "https://192.168.1.198:6443" again to validate TLS against the pinned public key
I0118 23:53:28.446096   22327 token.go:152] [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.1.198:6443"
I0118 23:53:28.446130   22327 token.go:194] [discovery] Successfully established connection with API Server "192.168.1.198:6443"
I0118 23:53:28.446163   22327 discovery.go:51] [discovery] Using provided TLSBootstrapToken as authentication credentials for the join process
I0118 23:53:28.446186   22327 join.go:455] [preflight] Fetching init configuration
I0118 23:53:28.446197   22327 join.go:493] [preflight] Retrieving KubeConfig objects
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
I0118 23:53:28.461658   22327 interface.go:400] Looking for default routes with IPv4 addresses
I0118 23:53:28.461682   22327 interface.go:405] Default route transits interface "eno2"
I0118 23:53:28.462107   22327 interface.go:208] Interface eno2 is up
I0118 23:53:28.462180   22327 interface.go:256] Interface "eno2" has 2 addresses :[192.168.1.113/24 fe80::225:90ff:febe:5aaf/64].
I0118 23:53:28.462205   22327 interface.go:223] Checking addr  192.168.1.113/24.
I0118 23:53:28.462217   22327 interface.go:230] IP found 192.168.1.113
I0118 23:53:28.462228   22327 interface.go:262] Found valid IPv4 address 192.168.1.113 for interface "eno2".
I0118 23:53:28.462238   22327 interface.go:411] Found active IP 192.168.1.113 
I0118 23:53:28.462284   22327 kubelet.go:107] [kubelet-start] writing bootstrap kubelet config file at /etc/kubernetes/bootstrap-kubelet.conf
I0118 23:53:28.463384   22327 kubelet.go:115] [kubelet-start] writing CA certificate at /etc/kubernetes/pki/ca.crt
I0118 23:53:28.465766   22327 kubelet.go:133] [kubelet-start] Stopping the kubelet
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.17" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.Unfortunately, an error has occurred:
        timed out waiting for the conditionThis error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'
timed out waiting for the condition
error execution phase kubelet-start
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        /workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        /workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:422
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        /workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).BindToCommand.func1.1
        /workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:348
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
        /workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:826
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
        /workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
        /workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864
k8s.io/kubernetes/cmd/kubeadm/app.Run
        /workspace/anago-v1.17.1-beta.0.42+d224476cd0730b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
        _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
        /usr/local/go/src/runtime/proc.go:203
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1357

Now, looking at the kubelet logs with journalctl -xeu kubelet:

Jan 19 00:04:38 media-server systemd[23817]: kubelet.service: Executing: /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --cgroup-driver=cgroupfs
Jan 19 00:04:38 media-server kubelet[23817]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jan 19 00:04:38 media-server kubelet[23817]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jan 19 00:04:38 media-server kubelet[23817]: I0119 00:04:38.706834   23817 server.go:416] Version: v1.17.1
Jan 19 00:04:38 media-server kubelet[23817]: I0119 00:04:38.707261   23817 plugins.go:100] No cloud provider specified.
Jan 19 00:04:38 media-server kubelet[23817]: I0119 00:04:38.707304   23817 server.go:821] Client rotation is on, will bootstrap in background
Jan 19 00:04:38 media-server kubelet[23817]: E0119 00:04:38.709106   23817 bootstrap.go:240] unable to read existing bootstrap client config: invalid configuration: [unable to read client-cert /var/lib/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/kubelet/pki/kubelet-client-current.pem: no such file or directory, unable to read client-key /var/lib/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/kubelet/pki/kubelet-client-current.pem: no such file or directory]
Jan 19 00:04:38 media-server kubelet[23817]: F0119 00:04:38.709153   23817 server.go:273] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
Jan 19 00:04:38 media-server systemd[1]: kubelet.service: Child 23817 belongs to kubelet.service.
Jan 19 00:04:38 media-server systemd[1]: kubelet.service: Main process exited, code=exited, status=255/EXCEPTION

Interestingly, no kubelet-client-current.pem is found on the worker trying to join, in fact the only file inside /var/lib/kubelet/pki are kubelet.{crt,key}

If I run the following command on the node trying to join I get that all certificates are missing:

# kubeadm alpha certs check-expiration
W0119 00:06:35.088034   24017 validation.go:28] Cannot validate kube-proxy config - no validator is available
W0119 00:06:35.088082   24017 validation.go:28] Cannot validate kubelet config - no validator is available
CERTIFICATE                          EXPIRES   RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
!MISSING! admin.conf                                                                   
!MISSING! apiserver                                                                    
!MISSING! apiserver-etcd-client                                                        
!MISSING! apiserver-kubelet-client                                                     
!MISSING! controller-manager.conf                                                      
!MISSING! etcd-healthcheck-client                                                      
!MISSING! etcd-peer                                                                    
!MISSING! etcd-server                                                                  
!MISSING! front-proxy-client                                                           
!MISSING! scheduler.conf                                                               Error checking external CA condition for ca certificate authority: failure loading certificate for API server: failed to load certificate: couldn't load the certificate file /etc/kubernetes/pki/apiserver.crt: open /etc/kubernetes/pki/apiserver.crt: no such file or directory
To see the stack trace of this error execute with --v=5 or higher

The only file in /etc/kubernetes/pki is ca.crt

Both master and worker have kubeadm and kubelet versions 1.17.1, so a version mismatch doesn't look likely

something possibly unrelated but also prone to cause errors is that both worker and master nodes have docker setup with Cgroup Driver: systemd , but for some reason kubelet is being passed --cgroup-driver=cgroupfs

What could be causing this issue? and more importantly, how do I fix it so I can successfully join nodes to the master?

Edit: more information

On the worker, the systemd files are:

~# cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf 
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
#Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

the unit service for kubelet:

~# cat /etc/systemd/system/multi-user.target.wants/kubelet.service 
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/home/

[Service]
ExecStart=/usr/bin/kubelet
Restart=always
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target

and the kubelet config.yaml:

~# cat /var/lib/kubelet/config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s

contents of /var/lib/kubelet/kubeadm-flags.env on worker node versus master node:

worker:

KUBELET_KUBEADM_ARGS="--cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1"

master:

KUBELET_KUBEADM_ARGS="--cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --resolv-conf=/run/systemd/resolve/resolv.conf"

both master and worker have the same docker version 18.09, and their config files are identical:

~$ cat /etc/docker/daemon.json
{
 "exec-opts": ["native.cgroupdriver=systemd"],
 "data-root": "/opt/var/docker/"
}
-- lurscher
kubeadm
kubernetes

1 Answer

1/27/2020

I believe, kubelet service on the worker node failed to authenticate to API server due to expired bootstrap token. Can you regenerate the token on master node and try to run kubeadm join command on the worker node ?

CMD:  kubeadm token create --print-join-command
-- Subramanian Manickam
Source: StackOverflow