Kube-proxy fails to retrieve node info - invalid nodeIP

6/15/2017

I've been trying to setup a Kubernetes cluster for a few months now, but I have no luck so far.

I'm trying to set it up on 4 bare metal PCs running coreOS. I've just clean installed everything again, but I get to the same problem as before. I'm following this tutorial. I think I've configured everything correctly, but am not 100% sure. When I reboot any of the machines, kubelet and flanneld services are running, but I see the following errors for them when checking service status with systemctl status:

kubelet error: Process: 1246 ExecStartPre=/usr/bin/rkt rm --uuid-file=/var/run/kubelet-pod.uuid (code=exited, status=254)

flanneld error: Process: 1057 ExecStartPre=/usr/bin/rkt rm --uuid-file=/var/lib/coreos/flannel-wrapper.uuid (code=exited, status=254)

If I restart both services, they work, or at least look like they work - I get no errors.

Everything else seems to work fine, so the only problem (I think) left are the kube-proxy service on all nodes.

If I run kubectl get pods I see all pods running:

$ kubectl get pods
NAME                                   READY     STATUS    RESTARTS   AGE
kube-apiserver-kubernetes-4            1/1       Running   4          6m
kube-controller-manager-kubernetes-4   1/1       Running   6          6m
kube-proxy-kubernetes-1                1/1       Running   4          18h
kube-proxy-kubernetes-2                1/1       Running   5          26m
kube-proxy-kubernetes-3                1/1       Running   4          19m
kube-proxy-kubernetes-4                1/1       Running   4          18h
kube-scheduler-kubernetes-4            1/1       Running   6          18h

The answer to this question suggest to check if kubectl get node returns same names that are registered on kubelet. As far as I checked the logs, nodes are registered correctly, and this is the output of kubectl get node

$ kubectl get node
NAME           STATUS                        AGE       VERSION
kubernetes-1   Ready                         18h       v1.6.1+coreos.0
kubernetes-2   Ready                         36m       v1.6.1+coreos.0
kubernetes-3   Ready                         29m       v1.6.1+coreos.0
kubernetes-4   Ready,SchedulingDisabled      18h       v1.6.1+coreos.0

The tutorial I've used (linked above) suggest I use --hostname-override but I couldn't get node info on master node (kubernetes-4) If i tried to curl it locally. So I removed it and I can get node info normally now.

Someone suggested it might be a flannel problem and that I should check the flannel ports. Using netstat -lntu I get the following output:

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN     
tcp        0      0 127.0.0.1:10249         0.0.0.0:*               LISTEN     
tcp        0      0 127.0.0.1:2379          0.0.0.0:*               LISTEN     
tcp        0      0 MASTER_IP:2379          0.0.0.0:*               LISTEN     
tcp        0      0 MASTER_IP:2380          0.0.0.0:*               LISTEN     
tcp        0      0 127.0.0.1:8080          0.0.0.0:*               LISTEN     
tcp6       0      0 :::4194                 :::*                    LISTEN     
tcp6       0      0 :::10250                :::*                    LISTEN     
tcp6       0      0 :::10251                :::*                    LISTEN     
tcp6       0      0 :::10252                :::*                    LISTEN     
tcp6       0      0 :::10255                :::*                    LISTEN     
tcp6       0      0 :::22                   :::*                    LISTEN     
tcp6       0      0 :::443                  :::*                    LISTEN     
udp        0      0 0.0.0.0:8472            0.0.0.0:*                     

So I assume the ports are fine?

Also etcd2 works, etcdctl cluster-health shows that all nodes are healthy

This is the part of cloud-config that starts the etcd2 on reboot, besides that I only store ssh keys and node username/password/groups in it:

#cloud-config

coreos:
  etcd2:
    name: "kubernetes-4"
    initial-advertise-peer-urls: "http://NODE_IP:2380"
    listen-peer-urls: "http://NODE_IP:2380"
    listen-client-urls: "http://NODE_IP,http://127.0.0.1:2379"
    advertise-client-urls: "http://NODE_IP:2379"
    initial-cluster-token: "etcd-cluster-1"
    initial-cluster: "kubernetes-4=http://MASTER_IP:2380,kubernetes-1=http://WORKER_1_IP:2380,kubernetes-2=http://WORKER_2_IP:2380,kubernetes-3=http://WORKER_3_IP:2380"
    initial-cluster-state: "new"
  units:
    - name: etcd2.service
      command: start

This is the content of /etc/flannel/options.env file:

FLANNELD_IFACE=NODE_IP
FLANNELD_ETCD_ENDPOINTS=http://MASTER_IP:2379,http://WORKER_1_IP:2379,http://WORKER_2_IP:2379,http://WORKER_3_IP:2379

The same endpoints are under --etcd-servers in kube-apiserver.yaml file

Any ideas/suggestion what could be the problem? Also if there are some details missing let me know, I'll add them to the post.

Edit: I forgot to include kube-proxy logs.

Master node kube-proxy log:

$ kubectl logs kube-proxy-kubernetes-4
I0615 07:47:45.250631       1 server.go:225] Using iptables Proxier.
W0615 07:47:45.286923       1 server.go:469] Failed to retrieve node info: Get http://127.0.0.1:8080/api/v1/nodes/kubernetes-4: dial tcp 127.0.0.1:8080: getsockopt: connection refused
W0615 07:47:45.303576       1 proxier.go:304] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
W0615 07:47:45.303593       1 proxier.go:309] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0615 07:47:45.303646       1 server.go:249] Tearing down userspace rules.
E0615 07:47:45.357276       1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:49: Failed to list *api.Endpoints: Get http://127.0.0.1:8080/api/v1/endpoints?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E0615 07:47:45.357278       1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:46: Failed to list *api.Service: Get http://127.0.0.1:8080/api/v1/services?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused

Worker nodes kube-proxy log:

$ kubectl logs kube-proxy-kubernetes-1
I0615 07:47:33.667025       1 server.go:225] Using iptables Proxier.
W0615 07:47:33.697387       1 server.go:469] Failed to retrieve node info: Get https://MASTER_IP/api/v1/nodes/kubernetes-1: dial tcp MASTER_IP:443: getsockopt: connection refused
W0615 07:47:33.712718       1 proxier.go:304] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
W0615 07:47:33.712734       1 proxier.go:309] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0615 07:47:33.712773       1 server.go:249] Tearing down userspace rules.
E0615 07:47:33.787122       1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:49: Failed to list *api.Endpoints: Get https://MASTER_IP/api/v1/endpoints?resourceVersion=0: dial tcp MASTER_IP:443: getsockopt: connection refused
E0615 07:47:33.787144       1 reflector.go:201] k8s.io/kubernetes/pkg/proxy/config/api.go:46: Failed to list *api.Service: Get https://MASTER_IP/api/v1/services?resourceVersion=0: dial tcp MASTER_IP:443: getsockopt: connection refused
-- mythic
coreos
kube-proxy
kubectl
kubernetes

1 Answer

7/6/2017

Did you try the scripts here? These are the condensed versions of the tutorial you used, for various platforms. The scripts worked perfectly for me on bare metal for k8s v1.6.4. I have a tweaked script with better encryption.

kube-apiserver isn't running which explains the error dial tcp 127.0.0.1:8080: getsockopt: connection refused. When I was debugging kube-apiserver, this was what I would do in the node:

  1. Remove /etc/kubernetes/manifests/kube-apiserver.yaml.
  2. Manually run a hyperkube container. Depending on your config, you will have to mount additional volumes (ie. -v) to expose files to the container. Update the image version to the one you use.

    docker run --net=host -it -v /etc/kubernetes/ssl:/etc/kubernetes/ssl quay.io/coreos/hyperkube:v1.6.2_coreos.0

  3. The above command will launch a shell in the hyperkube container. Now, launch kube-apiserver with the flags in your kube-apiserver.yaml manifest. It should look similar to this example:

    /hyperkube apiserver \ --bind-address=0.0.0.0 \ --etcd-cafile=/etc/kubernetes/ssl/apiserver/ca.pem \ --etcd-certfile=/etc/kubernetes/ssl/apiserver/client.pem \ --etcd-keyfile=/etc/kubernetes/ssl/apiserver/client-key.pem \ --etcd-servers=https://10.246.40.20:2379,https://10.246.40.21:2379,https://10.246.40.22:2379 \ ...

In any case, I suggest that you tear down the cluster and try the scripts first. It might just work ootb.

-- Eugene Chow
Source: StackOverflow