Using a previous version of Kubernetes (0.16.x) I was able to create a cluster of CoreOS based VMs on GCE that were capable of generating external network load balancers for services. With the release of v1 of Kubernetes the configuration necessary for this functionality seems to have changed. Could anyone offer any advice or point me in the direction of some documentation that might help me out with this issue?
I suspect that the problem is to do with ip/naming as I was previously using kube-register to handle this, and this component no longer seems necessary. My current configuration will create internal service load balancers without issue, and will even create external service load balancers, but they are only viewable through the gcloud UI and are not registered or displayed in kubectl output. Unfortunately the external ips generated do not actually proxy the traffic through either.
The kube-controller-manager service log looks like this:
Aug 05 12:15:42 europe-west1-b-k8s-master.c.staging-infrastructure.internal hyperkube[1604]: I0805 12:15:42.516360 1604 gce.go:515] Firewall doesn't exist, moving on to deleting target pool.
Aug 05 12:15:42 europe-west1-b-k8s-master.c.staging-infrastructure.internal hyperkube[1604]: E0805 12:15:42.516492 1604 servicecontroller.go:171] Failed to process service delta. Retrying: googleapi: Error 404: The resource 'projects/staging-infrastructure/global/firewalls/k8s-fw-a4db9328c3b6b11e5ab9f42010af0397' was not found, notFound
Aug 05 12:15:42 europe-west1-b-k8s-master.c.staging-infrastructure.internal hyperkube[1604]: I0805 12:15:42.516539 1604 servicecontroller.go:601] Successfully updated 2 out of 2 external load balancers to direct traffic to the updated set of nodes
Aug 05 12:16:07 europe-west1-b-k8s-master.c.staging-infrastructure.internal hyperkube[1604]: E0805 12:16:07.620094 1604 servicecontroller.go:171] Failed to process service delta. Retrying: failed to create external load balancer for service default/autobot-cache-graph: googleapi: Error 400: Invalid value for field 'resource.targetTags[0]': 'europe-west1-b-k8s-node-0.c.staging-infrastructure.int'. Must be a match of regex '(?:[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?)', invalid
Aug 05 12:16:12 europe-west1-b-k8s-master.c.staging-infrastructure.internal hyperkube[1604]: I0805 12:16:12.804512 1604 servicecontroller.go:275] Deleting old LB for previously uncached service default/autobot-cache-graph whose endpoint &{[{146.148.114.97 }]} doesn't match the service's desired IPs []
Here is the config I am using (download chmod etc omitted for clarity).
On the master:
- name: kube-apiserver.service
command: start
content: |
[Unit]
Description=Kubernetes API Server
Requires=setup-network-environment.service etcd.service generate-serviceaccount-key.service
After=setup-network-environment.service etcd.service generate-serviceaccount-key.service
[Service]
EnvironmentFile=/etc/network-environment
ExecStart=/opt/bin/hyperkube apiserver \
--cloud-provider=gce \
--service_account_key_file=/opt/bin/kube-serviceaccount.key \
--service_account_lookup=false \
--admission_control=NamespaceLifecycle,NamespaceAutoProvision,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota \
--runtime_config=api/v1 \
--allow_privileged=true \
--insecure_bind_address=0.0.0.0 \
--insecure_port=8080 \
--kubelet_https=true \
--secure_port=6443 \
--service-cluster-ip-range=10.100.0.0/16 \
--etcd_servers=http://127.0.0.1:2379 \
--bind-address=${DEFAULT_IPV4} \
--logtostderr=true
Restart=always
RestartSec=10
- name: kube-controller-manager.service
command: start
content: |
[Unit]
Description=Kubernetes Controller Manager
Requires=kube-apiserver.service
After=kube-apiserver.service
[Service]
ExecStart=/opt/bin/hyperkube controller-manager \
--cloud-provider=gce \
--service_account_private_key_file=/opt/bin/kube-serviceaccount.key \
--master=127.0.0.1:8080 \
--logtostderr=true
Restart=always
RestartSec=10
- name: kube-scheduler.service
command: start
content: |
[Unit]
Description=Kubernetes Scheduler
Requires=kube-apiserver.service
After=kube-apiserver.service
[Service]
ExecStart=/opt/bin/hyperkube scheduler --master=127.0.0.1:8080
Restart=always
RestartSec=10
And on the node:
- name: kubelet.service
command: start
content: |
[Unit]
Description=Kubernetes Kubelet
Requires=setup-network-environment.service
After=setup-network-environment.service
[Service]
EnvironmentFile=/etc/network-environment
WorkingDirectory=/root
ExecStart=/opt/bin/hyperkube kubelet \
--cloud-provider=gce \
--address=0.0.0.0 \
--port=10250 \
--api_servers=<master_ip>:8080 \
--allow_privileged=true \
--logtostderr=true \
--cadvisor_port=4194 \
--healthz_bind_address=0.0.0.0 \
--healthz_port=10248
Restart=always
RestartSec=10
- name: kube-proxy.service
command: start
content: |
[Unit]
Description=Kubernetes Proxy
Requires=setup-network-environment.service
After=setup-network-environment.service
[Service]
ExecStart=/opt/bin/hyperkube proxy \
--master=<master_ip>:8080 \
--logtostderr=true
Restart=always
RestartSec=10
To me it looks like a mismatch in naming and ip, but I'm not sure how to adjust my config to resolve. Any guidance greatly appreciated.
How did you create the nodes in your cluster? We've seen another instance of this issue due to bugs in the cluster bootstrapping script that was used that didn't apply the expected node names and tags.
If you recreate your cluster using the following two commands as recommended on the issue linked to above, creating load balancers should work for you:
export OS_DISTRIBUTION=coreos
cluster/kube-up.sh
Otherwise, you may need to wait for the issues to be fixed.