I have an issue with my kubernetes node that doesn't register to the kubernetes master.
I have seen lots of issues corresponding to my problem but most of them was bugs that have been corrected. The prerequisite and the different components of kubernetes seems operationnal. I certainly have a bad configuration, but trying some that works doesn't do the work for me.
I'm following the Step by Step tutorial from the CoreOS team.
My configuration:
My procedure:
- I boot a kubernetes master
- start etcd
- start flanneld
- start docker after flanneld
- start kubelet
- it start apiserver (as a container)
- it start controller-manager (as a container)
- it start scheduler (as a container)
- it start proxy (as a container)
- I start a kubernetes node
- start etcd
- start flanneld
- start docker after flanneld
- start the kubelet
ETCD2:
FLANNELD:
MASTER KUBELET:
KUBERNETES SEEMS TO RUN:
NODE KUBELET:
Here are the logs :
$ journalctl -fu kubelet --since=2012-01-01
-- Logs begin at Thu 2015-09-17 09:38:17 UTC. --
Sep 17 09:39:37 node1 systemd[1]: Starting Kubernetes Kubelet for Node...
Sep 17 09:39:37 node1 systemd[1]: Started Kubernetes Kubelet for Node.
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.080731 1634 manager.go:127] cAdvisor running in container: "/system.slice/kubelet.service"
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.081391 1634 fs.go:93] Filesystem partitions: map[/dev/sda9:{mountpoint:/ major:8 minor:9} /dev/sda3:{mountpoint:/usr major:8 minor:3} /dev/sda6:{mountpoint:/usr/share/oem major:8 minor:6}]
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.083078 1634 manager.go:156] Machine: {NumCores:1 CpuFrequency:3403222 MemoryCapacity:4048441344 MachineID:1c0a9b68c0044cfdb5024dc80a5cdec2 SystemUUID:35A45175-4822-4FFA-9CBF-ECC10430ED28 BootID:18baf9ac-73a9-42f3-9bc5-2dca985d03e9 Filesystems:[{Device:/dev/sda6 Capacity:113229824} {Device:/dev/sda9 Capacity:16718393344} {Device:/dev/sda3 Capacity:1031946240}] DiskMap:map[8:0:{Name:sda Major:8 Minor:0 Size:19818086400 Scheduler:cfq}] NetworkDevices:[{Name:eth0 MacAddress:08:00:27:8c:0a:cd Speed:0 Mtu:1500} {Name:eth1 MacAddress:08:00:27:bc:e6:70 Speed:0 Mtu:1500} {Name:eth2 MacAddress:08:00:27:b9:33:63 Speed:0 Mtu:1500} {Name:flannel0 MacAddress: Speed:10 Mtu:1472}] Topology:[{Id:0 Memory:4048441344 Cores:[{Id:0 Threads:[0] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2}]}] Caches:[{Size:6291456 Type:Unified Level:3}]}]}
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.087467 1634 manager.go:163] Version: {KernelVersion:4.1.6-coreos-r2 ContainerOsVersion:CoreOS 801.0.0 DockerVersion:1.8.1 CadvisorVersion:0.15.1}
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.087674 1634 plugins.go:69] No cloud provider specified.
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.087698 1634 docker.go:295] Connecting to docker on unix:///var/run/docker.sock
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.088720 1634 server.go:663] Adding manifest file: /etc/kubernetes/manifests
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.088734 1634 server.go:673] Watching apiserver
Sep 17 09:39:37 node1 kubelet[1634]: E0917 09:39:37.110463 1634 reflector.go:136] Failed to list *api.Node: Get http://192.168.1.88:8080/api/v1/nodes?fieldSelector=metadata.name%3D192.168.1.31: dial tcp 192.168.1.88:8080: connection refused
Sep 17 09:39:37 node1 kubelet[1634]: E0917 09:39:37.111317 1634 reflector.go:136] Failed to list *api.Service: Get http://192.168.1.88:8080/api/v1/services: dial tcp 192.168.1.88:8080: connection refused
Sep 17 09:39:37 node1 kubelet[1634]: E0917 09:39:37.111641 1634 reflector.go:136] Failed to list *api.Pod: Get http://192.168.1.88:8080/api/v1/pods?fieldSelector=spec.nodeName%3D192.168.1.31: dial tcp 192.168.1.88:8080: connection refused
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.219264 1634 plugins.go:56] Registering credential provider: .dockercfg
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.221429 1634 server.go:635] Started kubelet
Sep 17 09:39:37 node1 kubelet[1634]: E0917 09:39:37.221752 1634 kubelet.go:682] Image garbage collection failed: unable to find data for container /
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.230631 1634 kubelet.go:702] Running in container "/kubelet"
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.235396 1634 server.go:63] Starting to listen on 0.0.0.0:10250
Sep 17 09:39:37 node1 kubelet[1634]: E0917 09:39:37.257384 1634 event.go:194] Unable to write event: 'Post http://192.168.1.88:8080/api/v1/namespaces/default/events: dial tcp 192.168.1.88:8080: connection refused' (may retry after sleeping)
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.368996 1634 factory.go:226] System is using systemd
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.369627 1634 factory.go:234] Registering Docker factory
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.370640 1634 factory.go:89] Registering Raw factory
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.490377 1634 manager.go:946] Started watching for new ooms in manager
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.490733 1634 oomparser.go:183] oomparser using systemd
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.491323 1634 manager.go:243] Starting recovery of all containers
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.647835 1634 manager.go:248] Recovery completed
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.702130 1634 status_manager.go:76] Starting to sync pod status with apiserver
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.702375 1634 kubelet.go:1725] Starting kubelet main sync loop.
Sep 17 09:39:37 node1 kubelet[1634]: E0917 09:39:37.712658 1634 kubelet.go:1641] error getting node: node 192.168.1.31 not found
Sep 17 09:39:37 node1 kubelet[1634]: I0917 09:39:37.736035 1634 provider.go:91] Refreshing cache for provider: *credentialprovider.defaultDockerConfigProvider
Sep 17 09:39:37 node1 kubelet[1634]: W0917 09:39:37.743037 1634 status_manager.go:80] Failed to updated pod status: error updating status for pod "kube-proxy-192.168.1.31_default": Get http://192.168.1.88:8080/api/v1/namespaces/default/pods/kube-proxy-192.168.1.31: dial tcp 192.168.1.88:8080: connection refused
Sep 17 09:39:38 node1 kubelet[1634]: E0917 09:39:38.113116 1634 reflector.go:136] Failed to list *api.Pod: Get http://192.168.1.88:8080/api/v1/pods?fieldSelector=spec.nodeName%3D192.168.1.31: dial tcp 192.168.1.88:8080: connection refused
Sep 17 09:39:38 node1 kubelet[1634]: E0917 09:39:38.113170 1634 reflector.go:136] Failed to list *api.Service: Get http://192.168.1.88:8080/api/v1/services: dial tcp 192.168.1.88:8080: connection refused
Sep 17 09:39:38 node1 kubelet[1634]: E0917 09:39:38.113191 1634 reflector.go:136] Failed to list *api.Node: Get http://192.168.1.88:8080/api/v1/nodes?fieldSelector=metadata.name%3D192.168.1.31: dial tcp 192.168.1.88:8080: connection refused
Sep 17 09:39:39 node1 kubelet[1634]: E0917 09:39:39.114141 1634 reflector.go:136] Failed to list *api.Node: Get http://192.168.1.88:8080/api/v1/nodes?fieldSelector=metadata.name%3D192.168.1.31: dial tcp 192.168.1.88:8080: connection refused
Sep 17 09:39:39 node1 kubelet[1634]: E0917 09:39:39.114207 1634 reflector.go:136] Failed to list *api.Service: Get http://192.168.1.88:8080/api/v1/services: dial tcp 192.168.1.88:8080: connection refused
There are lots of message of that kind : 192.168.1.88:8080: connection refused
When I look to the registred nodes :
$ kubectl get nodes
NAME LABELS STATUS
For me, the apiserver credentials are not well configured as local kubelet can register but distant cannot.
So here is my apiserver configuration :
$ cat /etc/kubernetes/manifests/kube-apiserver.yml
apiVersion: v1
kind: Pod
metadata:
name: kube-apiserver
spec:
hostNetwork: true
containers:
- name: kube-apiserver
image: gcr.io/google_containers/hyperkube:v1.0.6
command:
- /hyperkube
- apiserver
- --bind-address=0.0.0.0
- --etcd_servers=http://192.168.1.88:2379
- --allow-privileged=true
- --service-cluster-ip-range=10.3.0.0/24
- --secure_port=443
- --advertise-address=192.168.1.88
- --admission-control=NamespaceLifecycle,NamespaceExists,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota
- --tls-cert-file=/etc/kubernetes/ssl/apiserver.pem
- --tls-private-key-file=/etc/kubernetes/ssl/apiserver-key.pem
- --client-ca-file=/etc/kubernetes/ssl/ca.pem
- --service-account-key-file=/etc/kubernetes/ssl/apiserver-key.pem
- --cloud-provider=
ports:
- containerPort: 443
hostPort: 443
name: https
- containerPort: 7080
hostPort: 7080
name: http
- containerPort: 8080
hostPort: 8080
name: local
volumeMounts:
- mountPath: /etc/kubernetes/ssl
name: ssl-certs-kubernetes
readOnly: true
- mountPath: /etc/ssl/certs
name: ssl-certs-host
readOnly: true
volumes:
- hostPath:
path: /etc/kubernetes/ssl
name: ssl-certs-kubernetes
- hostPath:
path: /usr/share/ca-certificates
name: ssl-certs-host
The certificates are present :
core@master1 ~ $ ls -l /etc/kubernetes/ssl/
total 40
-rw-r--r-- 1 core core 1675 Sep 17 09:31 apiserver-key.pem
-rw-r--r-- 1 core core 1099 Sep 17 09:31 apiserver.pem
-rw-r--r-- 1 core core 1090 Sep 17 09:31 ca.pem
And the logs from the apiserver :
I0917 09:33:48.692147 1 plugins.go:69] No cloud provider specified.
I0917 09:33:49.049701 1 master.go:273] Node port range unspecified. Defaulting to 30000-32767.
E0917 09:33:49.080829 1 reflector.go:136] Failed to list *api.ResourceQuota: Get http://127.0.0.1:8080/api/v1/resourcequotas: dial tcp 127.0.0.1:8080: connection refused
E0917 09:33:49.080955 1 reflector.go:136] Failed to list *api.Secret: Get http://127.0.0.1:8080/api/v1/secrets?fieldSelector=type%3Dkubernetes.io%2Fservice-account-token: dial tcp 127.0.0.1:8080: connection refused
E0917 09:33:49.081032 1 reflector.go:136] Failed to list *api.ServiceAccount: Get http://127.0.0.1:8080/api/v1/serviceaccounts: dial tcp 127.0.0.1:8080: connection refused
E0917 09:33:49.081075 1 reflector.go:136] Failed to list *api.LimitRange: Get http://127.0.0.1:8080/api/v1/limitranges: dial tcp 127.0.0.1:8080: connection refused
E0917 09:33:49.081141 1 reflector.go:136] Failed to list *api.Namespace: Get http://127.0.0.1:8080/api/v1/namespaces: dial tcp 127.0.0.1:8080: connection refused
E0917 09:33:49.081186 1 reflector.go:136] Failed to list *api.Namespace: Get http://127.0.0.1:8080/api/v1/namespaces: dial tcp 127.0.0.1:8080: connection refused
[restful] 2015/09/17 09:33:49 log.go:30: [restful/swagger] listing is available at https://192.168.1.88:443/swaggerapi/
[restful] 2015/09/17 09:33:49 log.go:30: [restful/swagger] https://192.168.1.88:443/swaggerui/ is mapped to folder /swagger-ui/
W0917 09:33:49.132239 1 controller.go:212] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default 0 0001-01-01 00:00:00 +0000 UTC <nil> map[] map[]} [{[{192.168.1.88 <nil>}] [{ 443 TCP}]}]}
I0917 09:33:49.148355 1 server.go:441] Serving securely on 0.0.0.0:443
I0917 09:33:49.148404 1 server.go:483] Serving insecurely on 127.0.0.1:8080
Per the last two lines of your apiserver log, it is listening on 0.0.0.0 (all interfaces) on port 443, and 127.0.0.1 (localhost) on port 8080.
From the log output of your kubelet, you have it trying to reach the apiserver on 192.168.1.88:8080 (which it is not listening on).
For remote kubelets, they should be using "https://192.168.1.88" (public interface via port 443) to connect to the api server.
Depending on your TLS configuration, you will likely also need to configure a kubeconfig for the kubelet which uses the proper TLS certificates, which is covered in: https://coreos.com/kubernetes/docs/latest/deploy-workers.html#set-up-kubeconfig