I am using kube-aws to create the kubernetes cluster on AWS, I have the kube-aws version v0.12.3, I am getting frequent issue on worker nodes as "too many open files in the system" when I try to ssh into the worker node and the nodes becomes unresponsive and gets restarted.
Because of this The pods running on the nodes gets rescheduled frequently on different nodes and application goes down for some time.
How can I resolve this issue.
✗ kubectl version Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T18:02:47Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T17:53:03Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Worker Node : node|k8s-- core@ip-10-0-214-11 ~ $ ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 251640
max locked memory (kbytes, -l) 16384
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 251640
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
As you can see the maximum number of open files is set to quite small value (1024
). Perhaps this is inherited from the AWS template used for the worker node instance.
You should increase this value but this should be done with clear understanding of what level it should be set on:
Also, you should be careful in order to not to exceed the kernel limit.
For a simple case just add the two strings like below to the end of the /etc/security/limits.conf file:
mike soft nofile 4096
mike hard nofile 65536
and then re-login or restart the service which account you make the changes for.
You could find further explanations in the Internet; one of many is available here: Security and Hardening Guide
In order to keep those settings applied to your AWS instance during the launch, you might compose a simple script code like this:
cd /etc/security
cp limits.conf limits.conf.$(date "+%Y%m%d")
cat <<EndOfMyStrings >> limits.conf
mike soft nofile 4096
mike hard nofile 65536
and then add it into the "User data" field of the Launch Instance Wizard as described here: Running Commands on Your Linux Instance at Launch