Flannel doesn't seem to be working while configuring a coreOS + kubernetes cluster

3/7/2017

I've installed coreOS baremetal on 4 PCs. I've set static IPs for all machines. I followed official coreOS tutorial for coreOS + kubernetes.

Since I have a static network configuration and I'll have a multi-node etcd cluster, I've followed the following tutorial for bootstrapping etcd. I've ran the script on all PCs and using etcdctl member list I see all nodes (PCs) are present in the etcd cluster.

Then I moved to step 2 (Deploy Kubernetes Master Node(s)) and I've followed the instructions step by step.

I've ran into the problem here:

curl -X PUT -d "value={\"Network\":\"$POD_NETWORK\",\"Backend\":{\"Type\":\"vxlan\"}}" "$ETCD_SERVER/v2/keys/coreos.com/network/config"

I've used the default POD_NETWORK (as stated in step 1) and one of ETCD_ENDPOINTS as ETCD_SERVER. However when I curl, the connection is established but I get the reply 404 page not found.

I assume the problem is either in flannel or etcd (probably etcd). Even if I just curl $ETCD_SERVER, I get page not found. After a few days I'm at loss, I really don't know what could be wrong and how to fix it? If you need more info, please let me know. If you just point me in a right direciton so I can begin solving this problem, I'd appreciate it. Thanks

Edit: I've found out If I curl "${ETCD_SERVER}/version" I get the correct reply ({"etcdserver":"2.3.7","etcdcluster":"2.3.0"}), if that helps.

Update: I've found out the CURL didn't work, because I set the ETCD_SERVER to the wrong port (2380 instead of 2379). That works now. However, the flanneld service still doesn't start and it returns an error. Job for flanneld.service failed because the control process exited with error code. Here is the outpout of journalctl -xe

-- Subject: Unit flannel-docker-opts.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit flannel-docker-opts.service has failed.
-- 
-- The result is failed.
Mar 08 08:43:26 kubernetes-4 systemd[1]: flannel-docker-opts.service: Unit entered failed state.
Mar 08 08:43:26 kubernetes-4 systemd[1]: flannel-docker-opts.service: Failed with result 'exit-code'.
Mar 08 08:43:30 kubernetes-4 sudo[27594]:      kub : TTY=pts/2 ; PWD=/home/kub ; USER=root ; COMMAND=/bin/systemctl start flanneld
Mar 08 08:43:30 kubernetes-4 sudo[27594]: pam_unix(sudo:session): session opened for user root by kub(uid=0)
Mar 08 08:43:30 kubernetes-4 sudo[27594]: pam_systemd(sudo:session): Cannot create session: Already running in a session
Mar 08 08:43:36 kubernetes-4 systemd[1]: flanneld.service: Service hold-off time over, scheduling restart.
Mar 08 08:43:36 kubernetes-4 systemd[1]: Stopped flannel - Network fabric for containers (System Application Container).
-- Subject: Unit flanneld.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit flanneld.service has finished shutting down.
Mar 08 08:43:36 kubernetes-4 systemd[1]: Starting flannel - Network fabric for containers (System Application Container)...
-- Subject: Unit flanneld.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit flanneld.service has begun starting up.
Mar 08 08:43:36 kubernetes-4 rkt[27608]: rm: unable to resolve UUID from file: open /var/lib/coreos/flannel-wrapper.uuid: no such file or directory
Mar 08 08:43:36 kubernetes-4 rkt[27608]: rm: failed to remove one or more pods
Mar 08 08:43:36 kubernetes-4 flannel-wrapper[27625]: + exec /usr/bin/rkt run --uuid-file-save=/var/lib/coreos/flannel-wrapper.uuid --trust-keys-from-https --mount volume=notify,target=/run/systemd/notify 
Mar 08 08:43:36 kubernetes-4 flannel-wrapper[27625]: run: discovery failed
Mar 08 08:43:36 kubernetes-4 systemd[1]: flanneld.service: Main process exited, code=exited, status=254/n/a
Mar 08 08:43:36 kubernetes-4 rkt[27652]: stop: unable to resolve UUID from file: open /var/lib/coreos/flannel-wrapper.uuid: no such file or directory
Mar 08 08:43:36 kubernetes-4 systemd[1]: Failed to start flannel - Network fabric for containers (System Application Container).
-- Subject: Unit flanneld.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit flanneld.service has failed.
-- 
-- The result is failed.
Mar 08 08:43:36 kubernetes-4 systemd[1]: flanneld.service: Unit entered failed state.
Mar 08 08:43:36 kubernetes-4 systemd[1]: flanneld.service: Failed with result 'exit-code'.
Mar 08 08:43:36 kubernetes-4 sudo[27594]: pam_unix(sudo:session): session closed for user root
Mar 08 08:43:36 kubernetes-4 systemd[1]: Starting flannel docker export service - Network fabric for containers (System Application Container)...
-- Subject: Unit flannel-docker-opts.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit flannel-docker-opts.service has begun starting up.
Mar 08 08:43:36 kubernetes-4 rkt[27659]: rm: unable to resolve UUID from file: open /var/lib/coreos/flannel-wrapper2.uuid: no such file or directory
Mar 08 08:43:36 kubernetes-4 rkt[27659]: rm: failed to remove one or more pods
Mar 08 08:43:36 kubernetes-4 flannel-wrapper[27674]: + exec /usr/bin/rkt run --uuid-file-save=/var/lib/coreos/flannel-wrapper2.uuid --trust-keys-from-https --net=host --volume run-flannel,kind=host,source
Mar 08 08:43:38 kubernetes-4 flannel-wrapper[27674]: run: discovery failed
Mar 08 08:43:38 kubernetes-4 systemd[1]: flannel-docker-opts.service: Main process exited, code=exited, status=254/n/a
Mar 08 08:43:38 kubernetes-4 systemd[1]: Failed to start flannel docker export service - Network fabric for containers (System Application Container).
-- Subject: Unit flannel-docker-opts.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit flannel-docker-opts.service has failed.
-- 
-- The result is failed.
Mar 08 08:43:38 kubernetes-4 systemd[1]: flannel-docker-opts.service: Unit entered failed state.
Mar 08 08:43:38 kubernetes-4 systemd[1]: flannel-docker-opts.service: Failed with result 'exit-code'.
Mar 08 08:43:39 kubernetes-4 sudo[27708]:      kub : TTY=pts/2 ; PWD=/home/kub ; USER=root ; COMMAND=/bin/journalctl -xe
Mar 08 08:43:39 kubernetes-4 sudo[27708]: pam_unix(sudo:session): session opened for user root by kub(uid=0)
Mar 08 08:43:39 kubernetes-4 sudo[27708]: pam_systemd(sudo:session): Cannot create session: Already running in a session

Update 2: (added systemctl output for flanneld and flannel-docker-opts service)

systemctl cat flannel-docker-opts output:

# /usr/lib/systemd/system/flannel-docker-opts.service
[Unit]
Description=flannel docker export service - Network fabric for containers (System Application Container)
Documentation=https://github.com/coreos/flannel
PartOf=flanneld.service
Before=docker.service

[Service]
Type=oneshot
TimeoutStartSec=60

Environment="FLANNEL_IMAGE_TAG=v0.6.2"
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/lib/coreos/flannel-wrapper2.uuid"
Environment="FLANNEL_IMAGE_ARGS=--exec=/opt/bin/mk-docker-opts.sh"

ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/lib/coreos/flannel-wrapper2.uuid
ExecStart=/usr/lib/coreos/flannel-wrapper -d /run/flannel/flannel_docker_opts.env -i
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/lib/coreos/flannel-wrapper2.uuid

[Install]
WantedBy=multi-user.target

systemctl cat flanneld output:

# /usr/lib/systemd/system/flanneld.service
[Unit]
Description=flannel - Network fabric for containers (System Application Container)
Documentation=https://github.com/coreos/flannel
After=etcd.service etcd2.service etcd-member.service
Before=docker.service flannel-docker-opts.service
Requires=flannel-docker-opts.service

[Service]
Type=notify
Restart=always
RestartSec=10s
LimitNOFILE=40000
LimitNPROC=1048576

Environment="FLANNEL_IMAGE_TAG=v0.6.2"
Environment="FLANNEL_OPTS=--ip-masq=true"
Environment="RKT_RUN_ARGS=--uuid-file-save=/var/lib/coreos/flannel-wrapper.uuid"
EnvironmentFile=-/run/flannel/options.env

ExecStartPre=/sbin/modprobe ip_tables
ExecStartPre=/usr/bin/mkdir --parents /var/lib/coreos /run/flannel
ExecStartPre=-/usr/bin/rkt rm --uuid-file=/var/lib/coreos/flannel-wrapper.uuid
ExecStart=/usr/lib/coreos/flannel-wrapper $FLANNEL_OPTS
ExecStop=-/usr/bin/rkt stop --uuid-file=/var/lib/coreos/flannel-wrapper.uuid

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/flanneld.service.d/40-ExecStartPre-symlink.conf
[Service]
ExecStartPre=/usr/bin/ln -sf /etc/flannel/options.env /run/flannel/options.env

Update 3: Using journalctl -xer I get a new error, if this is helpful:

Mar 09 08:39:15 kubernetes-4 locksmithd[1147]: Unlocking old locks failed: [etcd.service etcd2.service] are inactive. Retrying in 5m0s.
Mar 09 08:39:15 kubernetes-4 locksmithd[1147]: [etcd.service etcd2.service] are inactive
-- mythic
cluster-computing
coreos
etcd
kubernetes

1 Answer

3/8/2017

From one of the Etcd nodes, or any of the nodes if you are unning Etcd in proxy-mode on them. Try to use the etcdctl binary included in CoreOS to find out cluster health:

etcdctl cluster-health

should show something like:

member ce2a822cea30bfca is healthy: got healthy result from http://10.129.69.201:2379
cluster is healthy

Try also:

etcdctl set /coreos.com/network/config '{"Network":"$POD_NETWORK", "Backend": {"Type": "vxlan"}}
-- Vincent De Smet
Source: StackOverflow