Kubelet on ARM failed to start: Failed to start ContainerManager system validation failed - Following Cgroup subsystem not mounted: [cpuset]

1/17/2019

I'm running debian 9.6. with uboot on beaglebone black, 32bit ARM system.

Image downloaded from: https://rcn-ee.com/rootfs/bb.org/testing/2019-01-06/stretch-console/bone-debian-9.6-console-armhf-2019-01-06-1gb.img.xz

Kernel command params:

Jan 17 14:19:19 bbb-test kernel: Kernel command line: console=ttyO0,115200n8 bone_capemgr.enable_partno=BB-UA
RT1,BB-UART4,BB-UART5 bone_capemgr.uboot_capemgr_enabled=1 root=/dev/mmcblk1p1 ro rootfstype=ext4 rootwait coherent_p
ool=1M net.ifnames=0 quiet cape_universal=enable cgroup_enable=cpuset cgroup_enable=memory cgroup_memory=1 swapaccoun
t=1

debian@bbb-test:~$ uname -a
Linux bbb-test 4.14.79-ti-r84 #1 SMP PREEMPT Tue Nov 13 20:11:18 UTC 2018 armv7l GNU/Linux

Docker is installed and working on this machine. Kubelet log: debian@bbb-test:~$ sudo kubelet

I0117 14:31:25.972837    2841 server.go:407] Version: v1.13.2
I0117 14:31:25.982041    2841 plugins.go:103] No cloud provider specified.
W0117 14:31:25.995051    2841 server.go:552] standalone mode, no API client
E0117 14:31:26.621471    2841 machine.go:194] failed to get cache information for node 0: open /sys/devices/system/cpu/cpu0/cache: no such file or directory
W0117 14:31:26.655651    2841 server.go:464] No api server defined - no events will be sent to API server.
I0117 14:31:26.657953    2841 server.go:666] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /
I0117 14:31:26.664398    2841 container_manager_linux.go:248] container manager verified user specified cgroup-root exists: []
I0117 14:31:26.666142    2841 container_manager_linux.go:253] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms}
I0117 14:31:26.675196    2841 container_manager_linux.go:272] Creating device plugin manager: true
I0117 14:31:26.676961    2841 state_mem.go:36] [cpumanager] initializing new in-memory state store
I0117 14:31:26.680692    2841 state_mem.go:84] [cpumanager] updated default cpuset: ""
I0117 14:31:26.682789    2841 state_mem.go:92] [cpumanager] updated cpuset assignments: "map[]"
I0117 14:31:26.760365    2841 client.go:75] Connecting to docker on unix:///var/run/docker.sock
I0117 14:31:26.762342    2841 client.go:104] Start docker client with request timeout=2m0s
W0117 14:31:26.820114    2841 docker_service.go:540] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
I0117 14:31:26.821450    2841 docker_service.go:236] Hairpin mode set to "hairpin-veth"
W0117 14:31:26.832217    2841 cni.go:203] Unable to update cni config: No networks found in /etc/cni/net.d
W0117 14:31:26.932660    2841 hostport_manager.go:68] The binary conntrack is not installed, this can cause failures in network connection cleanup.
I0117 14:31:26.980052    2841 docker_service.go:251] Docker cri networking managed by kubernetes.io/no-op
I0117 14:31:27.241438    2841 docker_service.go:256] Docker Info: &{ID:YHXC:3CJD:MPM6:SQ6I:37PA:E7CY:YG32:EQZH:XKFS:5FLV:OHRN:UYQS Containers:0 ContainersRunning:0 ContainersPaused:0 ContainersStopped:0 Images:0 Driver:overlay2 DriverStatus:[[Backing Filesystem extfs] [Supports d_type true] [Native Overlay Diff true]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host macvlan null overlay] Authorization:[] Log:[awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog]} MemoryLimit:true SwapLimit:true KernelMemory:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:false IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:24 OomKillDisable:true NGoroutines:44 SystemTime:2019-01-17T14:31:27.033459842Z LoggingDriver:json-file CgroupDriver:cgroupfs NEventsListener:0 KernelVersion:4.14.79-ti-r84 OperatingSystem:Debian GNU/Linux 9 (stretch) OSType:linux Architecture:armv7l IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0x65be080 NCPU:1 MemTotal:506748928 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:bbb-test Labels:[] ExperimentalBuild:false ServerVersion:18.06.1-ce ClusterStore: ClusterAdvertise: Runtimes:map[runc:{Path:docker-runc Args:[]}] DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:<nil>} LiveRestoreEnabled:false Isolation: InitBinary:docker-init ContainerdCommit:{ID:468a545b9edcd5932818eb9de8e72413e616e86e Expected:468a545b9edcd5932818eb9de8e72413e616e86e} RuncCommit:{ID:69663f0bd4b60df09991c08812a60108003fa340 Expected:69663f0bd4b60df09991c08812a60108003fa340} InitCommit:{ID:fec3683 Expected:fec3683} SecurityOptions:[name=seccomp,profile=default]}
I0117 14:31:27.248202    2841 docker_service.go:269] Setting cgroupDriver to cgroupfs
I0117 14:31:27.558049    2841 kuberuntime_manager.go:198] Container runtime docker initialized, version: 18.06.1-ce, apiVersion: 1.38.0
I0117 14:31:27.611546    2841 server.go:999] Started kubelet
E0117 14:31:27.623092    2841 kubelet.go:1308] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data in memory cache
W0117 14:31:27.623253    2841 kubelet.go:1412] No api server defined - no node status update will be sent.
I0117 14:31:27.644045    2841 fs_resource_analyzer.go:66] Starting FS ResourceAnalyzer
I0117 14:31:27.650728    2841 status_manager.go:148] Kubernetes client is nil, not starting status manager.
I0117 14:31:27.651717    2841 kubelet.go:1829] Starting kubelet main sync loop.
I0117 14:31:27.652815    2841 kubelet.go:1846] skipping pod synchronization - [container runtime status check may not have completed yet PLEG is not healthy: pleg has yet to be successful]
I0117 14:31:27.623347    2841 server.go:137] Starting to listen on 0.0.0.0:10250
I0117 14:31:27.671780    2841 server.go:333] Adding debug handlers to kubelet server.
I0117 14:31:27.684102    2841 volume_manager.go:248] Starting Kubelet Volume Manager
I0117 14:31:27.713834    2841 desired_state_of_world_populator.go:130] Desired state populator starts to run
I0117 14:31:27.808299    2841 kubelet.go:1846] skipping pod synchronization - [container runtime status check may not have completed yet PLEG is not healthy: pleg has yet to be successful]
I0117 14:31:27.943782    2841 reconciler.go:154] Reconciler: start to sync state
I0117 14:31:28.051598    2841 kubelet.go:1846] skipping pod synchronization - [container runtime status check may not have completed yet]
W0117 14:31:28.415337    2841 nvidia.go:66] Error reading "/sys/bus/pci/devices/": open /sys/bus/pci/devices/: no such file or directory
I0117 14:31:28.496333    2841 kubelet.go:1846] skipping pod synchronization - [container runtime status check may not have completed yet]
I0117 14:31:29.362011    2841 kubelet.go:1846] skipping pod synchronization - [container runtime status check may not have completed yet]
W0117 14:31:29.477526    2841 container.go:409] Failed to create summary reader for "/system.slice/system-openvpn.slice/openvpn@bbb.service": none of the resources are being tracked.
I0117 14:31:30.999869    2841 kubelet.go:1846] skipping pod synchronization - [container runtime status check may not have completed yet]
I0117 14:31:31.131276    2841 kubelet_node_status.go:278] Setting node annotation to enable volume controller attach/detach
I0117 14:31:31.184490    2841 cpu_manager.go:155] [cpumanager] starting with none policy
I0117 14:31:31.195065    2841 cpu_manager.go:156] [cpumanager] reconciling every 10s
I0117 14:31:31.196076    2841 policy_none.go:42] [cpumanager] none policy: Start
F0117 14:31:31.199910    2841 kubelet.go:1384] Failed to start ContainerManager system validation failed - Following Cgroup subsystem not mounted: [cpuset]
d

The issue persists with valid config and proper kubeconfig.

EDIT: additional data

cgroups are kinda mounted:

debian@bbb-test:~$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=220140k,nr_inodes=55035,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=49492k,mode=755)
/dev/mmcblk1p1 on / type ext4 (rw,noatime,errors=remount-ro,data=ordered)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=34,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=15992)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=49488k,mode=700,uid=1000,gid=1000)


debian@bbb-test:~$ systemctl status 'cg*'
debian@bbb-test:~$
-- nmiculinic
arm
beagleboneblack
debian
kubernetes

2 Answers

1/18/2019

I've updated to 4.19 kernel and full system package upgrade and it resolved the problem. That's kinda good enough for me

-- nmiculinic
Source: StackOverflow

1/17/2019

I suggest looking at https://unix.stackexchange.com/questions/427327/how-can-i-check-if-cgroups-are-available-on-my-linux-host to check if your cgroup activation works as expected.

cgroup_enable=cpuset as a parameter for kernel should be enough unless there's something else that's missing (eg: maybe it's in the wrong place?)

-- Andrei Dascalu
Source: StackOverflow