K3s node keeps failing and I’m not sure what is causing it

7/4/2021

So i have been running k3s on a stack of raspberry pi 4 and 3s for a while but one node always fails. Here is my setup

Servers: Raspberry pi 4-8GB and 2x Raspberry pi 4-4GB Workers: 3 Raspberry pi 3Bs

My second raspberry pi 4 - 4GB keeps failing. All 6 nodes are connected to GB ethernet and Samsung 500 GB SSDs

I’m running an HA environment with an external sql database connected to my NAS (192.168.1.200) as shown below.

So here are some of the errors I’m getting

When I run Sudo systemctl status k3s I get (changed user/pass for privacy):

    k3s.service - Lightweight Kubernetes
   Loaded: loaded (/etc/systemd/system/k3s.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Sun 2021-07-04 01:59:29 BST; 609ms ago
     Docs: https://k3s.io
  Process: 14637 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)
  Process: 14639 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
  Process: 14640 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
  Process: 14641 ExecStart=/usr/local/bin/k3s server --tls-san 192.168.1.100 --datastore-endpoint mysql://user:pass@tcp(192.168.1.200:3306)/k3s --disable traefik (code=exited, status=1/FAILURE)
 Main PID: 14641 (code=exited, status=1/FAILURE)

When I run journalctl -xe I get:

-- Subject: A start job for unit k3s.service has begun execution
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- A start job for unit k3s.service has begun execution.
-- 
-- The job identifier is 325190.
Jul 04 02:13:46 node50 sh[18386]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Jul 04 02:13:46 node50 sh[18386]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Jul 04 02:13:47 node50 k3s[18390]: time="2021-07-04T02:13:47.365627432+01:00" level=info msg="Starting k3s v1.19.12+k3s1 (559d0c47)"
Jul 04 02:13:47 node50 k3s[18390]: time="2021-07-04T02:13:47.366403382+01:00" level=info msg="Cluster bootstrap already complete"
Jul 04 02:13:47 node50 k3s[18390]: time="2021-07-04T02:13:47.418216205+01:00" level=fatal msg="starting kubernetes: preparing server: creating storage endpoint: building kine: Error 1129: Host is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'"
Jul 04 02:13:47 node50 systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- An ExecStart= process belonging to unit k3s.service has exited.
-- 
-- The process' exit code is 'exited' and its exit status is 1.
Jul 04 02:13:47 node50 systemd[1]: k3s.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- The unit k3s.service has entered the 'failed' state with result 'exit-code'.
Jul 04 02:13:47 node50 systemd[1]: Failed to start Lightweight Kubernetes.
-- Subject: A start job for unit k3s.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
-- 
-- A start job for unit k3s.service has finished with a failure.

As per this error I have logged into MySQL instance and ran mysqladmin flush-hosts and it fixes the issue for a few hours and then happens all over again. So I am a bit at a loss on why this issue is only happening one one Pi. Otherwise the Pi works fine. Can run docker and other programs with no issues.

I’ve also bumped my max connections in my.cnf from 100 to 10000

Anybody have any ideas?

-- SKS81
k3s
kubernetes
mysql
networking
raspberry-pi4

0 Answers