Ive been running a huge process that is supposed to use max 64gb RAM, after it tries to allocate more than 30gb RAM it gets OOM killed by the kernel with this error:
./start.sh: line 23: 7 Killed
Its important to note that its a argo workflow instantiated by argo-events.
Then after reviewing the pods memory usage in grafana i can see it hasnt crossed the 30gb threshold:
Also there is only one pod running in this node that spins up only to run this process (apart from Prometheus and Loki daemon sets)
Node exporter graph:
Then its obvious there is still plenty of resources that the pod could use but it makes me think there is a 30gb limit somewhere, it could be on the os, docker or kubernetes kubelet
So are there any default memory allocation constraints for docker or kubernetes or more specifically in EKS 1.15?
If not, what is wrong here and how can I debug any further ? (Note: The node where this process run already got deleted so cannot ssh or cat any logs)