I need to persist the heap dump when the java process gets OOM and the pod is restarted.
I have following added in the jvm args
-XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/dumps
...and emptydir is mounted on the same path.
But the issue is if the pod gets restarted and if it gets scheduled on a different node, then we are losing the heap dump. How do I persist the heap dump even if the pod is scheduled to a different node?
We are using AWS EKS and we are having more than 1 replica for the pod.
Could anyone help with this, please?
As writing to EFS is too slow in your case, there is another option for AWS EKS - awsElasticBlockStore
.
The contents of an EBS volume are persisted and the volume is unmounted when a pod is removed. This means that an EBS volume can be pre-populated with data, and that data can be shared between pods.
Note: You must create an EBS volume by using aws ec2 create-volume or the AWS API before you can use it.
There are some restrictions when using an awsElasticBlockStore volume:
Check the official k8s documentation page on this topic, please. And How to use persistent storage in EKS.
You will have to persists the heap dumps on a shared network location between the pods. In order to achieve this, you will need to provide persistent volume claims and in EKS, this could be achieved using an Elastic File System mounted on different availability zones. You can start learning about it by reading this guide about EFS-based PVCs.