Getting Errors on worker nodes as "Too many openfiles in the system"

7/2/2019

I am using kube-aws to create the kubernetes cluster on AWS, I have the kube-aws version v0.12.3, I am getting frequent issue on worker nodes as "too many open files in the system" when I try to ssh into the worker node and the nodes becomes unresponsive and gets restarted.

Because of this The pods running on the nodes gets rescheduled frequently on different nodes and application goes down for some time.

How can I resolve this issue.

✗ kubectl version Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T18:02:47Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T17:53:03Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Worker Node : node|k8s-- core@ip-10-0-214-11 ~ $ ulimit -a

core file size (blocks, -c) unlimited

data seg size (kbytes, -d) unlimited

scheduling priority (-e) 0

file size (blocks, -f) unlimited

pending signals (-i) 251640

max locked memory (kbytes, -l) 16384

max memory size (kbytes, -m) unlimited

open files (-n) 1024

pipe size (512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority (-r) 0

stack size (kbytes, -s) 8192

cpu time (seconds, -t) unlimited

max user processes (-u) 251640

virtual memory (kbytes, -v) unlimited

file locks (-x) unlimited

-- namrata
kube-aws
kubernetes

1 Answer

7/3/2019

As you can see the maximum number of open files is set to quite small value (1024). Perhaps this is inherited from the AWS template used for the worker node instance.

You should increase this value but this should be done with clear understanding of what level it should be set on:

  • globally or for a specific security principal;
  • what exact principal this limit has to be applied to: user/system/daemon account or a group;
  • login service (su, ssh, telnet, etc)

Also, you should be careful in order to not to exceed the kernel limit.

For a simple case just add the two strings like below to the end of the /etc/security/limits.conf file:

mike           soft    nofile          4096
mike           hard    nofile          65536

and then re-login or restart the service which account you make the changes for.

You could find further explanations in the Internet; one of many is available here: Security and Hardening Guide

In order to keep those settings applied to your AWS instance during the launch, you might compose a simple script code like this:

#!/bin/bash
cd /etc/security
cp limits.conf limits.conf.$(date "+%Y%m%d")
cat <<EndOfMyStrings >> limits.conf
mike           soft    nofile          4096
mike           hard    nofile          65536
EndOfMyStrings

and then add it into the "User data" field of the Launch Instance Wizard as described here: Running Commands on Your Linux Instance at Launch

-- mebius99
Source: StackOverflow