I'm running a kubernetes cluster on amazon-ec2 with 1 master and 2 slaves, each with 8GB RAM and 2 vcpus. Following is the unusually high disk usage that I observe, only after running some batch jobs for about 40 mins across both the slaves. There, are three instances of the mentioned job ( kind : Job)
running at a time, on each slave. The ram usage on a slave is nominal with about 3GB and below 50% cpu.
Description of the job : It downloads some images from a server, does some image processing and stores the result on the NFS (shared by both the slaves.) (I mounted the NFS volume on the slaves and then used the path as mountPath volume in k8s job description. I didn't use the nfs option provided by kubernetes). This job doesn't explicitly do any read/write to the local volume at all.
After about 40 mins of normal operation, I noticed high disk usage ( both iops and bandwidth ) on all the slaves and also high RAM consumption of almost 7.8GB out of 8 GB, ultimately forcing the node into Not Ready
state. The master kubernetes nodes is not configured to run user jobs and the master node is not affected at all.
How do I fix this? Please let me know if any other information is required.