kube-controller-manager outputs an error "cannot change NodeName"

11/17/2016

I use kubernetes on AWS with CoreOS & flannel VLAN network. (followed this guide https://coreos.com/kubernetes/docs/latest/getting-started.html) k8s version is 1.4.6.

And I have the following node-exporter daemon-set.

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: node-exporter
  labels:
    app: node-exporter
    tier: monitor
    category: platform
spec:
  template:
    metadata:
      labels:
        app: node-exporter
        tier: monitor
        category: platform
      name: node-exporter
    spec:
      containers:
      - image: prom/node-exporter:0.12.0
        name: node-exporter
        ports:
        - containerPort: 9100
          hostPort: 9100
          name: scrape
      hostNetwork: true
      hostPID: true

When I run this, kube-controller-manager outputs an error repeatedly as below:

E1117 18:31:23.197206       1 endpoints_controller.go:513]
Endpoints "node-exporter" is invalid:
[subsets[0].addresses[0].nodeName: Forbidden: Cannot change NodeName for 172.17.64.5 to ip-172-17-64-5.ec2.internal,
subsets[0].addresses[1].nodeName: Forbidden: Cannot change NodeName for 172.17.64.6 to ip-172-17-64-6.ec2.internal,
subsets[0].addresses[2].nodeName: Forbidden: Cannot change NodeName for 172.17.80.5 to ip-172-17-80-5.ec2.internal,
subsets[0].addresses[3].nodeName: Forbidden: Cannot change NodeName for 172.17.80.6 to ip-172-17-80-6.ec2.internal,
subsets[0].addresses[4].nodeName: Forbidden: Cannot change NodeName for 172.17.96.6 to ip-172-17-96-6.ec2.internal]

Just for information, despite from this error message, node_exporter is accessible on e.g.) 172-17-96-6:9100 . My nodes are in a private network including k8s master.

But these logs are output too many and makes it difficult to see other logs by eyes from our log console. Could I see how to resolve this error?

Because I built my k8s cluster from scratch, cloud-provider=aws flag was not activated at first and I recently turned it on, but not sure if it's related to this issue.

-- Norio Akagi
amazon-web-services
coreos
kubernetes

1 Answer

11/17/2016

It looks this is caused by my another manifest file

apiVersion: v1
kind: Service
metadata:
  name: node-exporter
  labels:
    app: node-exporter
    tier: monitor
    category: platform
  annotations:
    prometheus.io/scrape: 'true'
spec:
  clusterIP: None
  ports:
  - name: scrape
    port: 9100
    protocol: TCP
  selector:
    app: node-exporter
  type: ClusterIP

I thought this is necessary to expose node-exporter daemon-set above, but it could rather introduce some sort of conflict when I set hostNetwork: true in a daemon-set (actually, a pod) manifest. I'm not 100% certain though, after I delete this service the error disappears while I can still access to 172-17-96-6:9100 from outside of the k8s cluster.

I just followed by this post when setting prometheus and node-exporter, https://coreos.com/blog/prometheus-and-kubernetes-up-and-running.html

in case others face with the same problem, I'm leaving my comment here.

-- Norio Akagi
Source: StackOverflow