Apache Ignite OutOfMemory on low usage

11/18/2019

I have deployed a StatefulSet of Apache Ignite in multiple testing Kubernetes Cluster.

I have passed my stress testing phase with my current configuration. However, I found that there are some OutOfMemory ERROR in Apache Ignite some in some new testing Cluster with much lower of load.

Below is a snapshot of log i extracted from 1 of the Apache Ignite instance:

INFO: TCP discovery spawning a new thread for connection [rmtAddr=/10.254.174.226, rmtPort=45453]
Nov 15, 2019 1:01:26 AM org.apache.ignite.logger.java.JavaLogger error
SEVERE: Runtime error caught during grid runnable execution: GridWorker [name=tcp-disco-client-message-worker, igniteInstanceName=null, finished=false, heartbeatTs=1573779638619, hashCode=373238347, interrupted=true, runner=tcp-disco-client-message-worker-#109]
java.lang.OutOfMemoryError: Java heap space

Nov 15, 2019 1:01:26 AM org.apache.ignite.logger.java.JavaLogger error
SEVERE: Runtime error caught during grid runnable execution: IgniteSpiThread [name=tcp-disco-client-message-worker-#109]
java.lang.OutOfMemoryError: Java heap space
Exception in thread "tcp-disco-client-message-worker-#109" java.lang.OutOfMemoryError: Java heap space
Nov 15, 2019 1:01:26 AM org.apache.ignite.logger.java.JavaLogger info
INFO: TCP discovery accepted incoming connection [rmtAddr=/10.254.183.232, rmtPort=41313]
Nov 15, 2019 1:01:26 AM org.apache.ignite.logger.java.JavaLogger info
INFO: TCP discovery spawning a new thread for connection [rmtAddr=/10.254.183.232, rmtPort=41313]
Nov 15, 2019 1:01:26 AM org.apache.ignite.logger.java.JavaLogger info
INFO: Started serving remote node connection [rmtAddr=/10.254.174.226:45453, rmtPort=45453]
Nov 15, 2019 @ 09:01:26.612 Nov 15, 2019 1:01:26 AM org.apache.ignite.logger.java.JavaLogger warning
Nov 15, 2019 @ 09:01:26.612 WARNING: New next node has connection to it's previous, trying previous again. [next=TcpDiscoveryNode [id=5cbb5f1c-ca74-4b2f-ba70-314f621ab997, addrs=[10.254.168.12, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ignite-sit-5.ignite-sit.sit.svc.cluster.local/10.254.168.12:47500], discPort=47500, order=3922, intOrder=2000, lastExchangeTime=1573779246139, loc=false, ver=2.7.5#20190603-sha1:be4f2a15, isClient=false]]
Nov 15, 2019 @ 09:01:26.612 Nov 15, 2019 1:01:26 AM org.apache.ignite.logger.java.JavaLogger info
Nov 15, 2019 @ 09:01:26.612 INFO: New next node [newNext=TcpDiscoveryNode [id=6fcccf11-f903-4b4a-bbac-730ca0b80ce8, addrs=[10.254.169.217, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ignite-sit-4.ignite-sit.sit.svc.cluster.local/10.254.169.217:47500], discPort=47500, order=3912, intOrder=1993, lastExchangeTime=1573779190075, loc=false, ver=2.7.5#20190603-sha1:be4f2a15, isClient=false]]
Nov 15, 2019 1:01:26 AM org.apache.ignite.logger.java.JavaLogger info
INFO: Finished serving remote node connection [rmtAddr=/10.254.174.226:45453, rmtPort=45453
Nov 15, 2019 1:01:26 AM org.apache.ignite.logger.java.JavaLogger error

Sorry for the bad log formatting.

I would like to know what causes the OutOfMemory error and how can I prevent this from happening again.

Help would be much appreciated.

Update: Heapdump analysis result:

The thread org.apache.ignite.spi.discovery.tcp.ServerImpl$SocketReader @ 0xd9fbe2d0 tcp-disco-sock-reader-#369 keeps local variables with total size 312,295,344 (48.95%) bytes.

It looks like TCP SocketReader requires a lot of heap memory.

-- ho wing kent
ignite
java
kubernetes
out-of-memory

2 Answers

11/18/2019

Tune some combination of your Java heap settings, pod resource requests, and pod resource limits. You probably need to set all three, but the specifics depend on your workload.

-- coderanger
Source: StackOverflow

11/18/2019

You have to keep in mind that process in containers don't see any limits in term of resources, althought you might have defined some in your pod templates.

This is a common issue faced with java, where the jvm allocates Heap and off-heap space based on the node resource (node Memory instead of container memory).

You don't notice this when you are working on your laptop, but it always happens when you go into a machine with larger Specs.

This can be resolved by -Xmx option when starting your java process.

Note that this won't work with old java versions since this option is only applied to memory heap (in older versions) and didn't take care of off-heap memory size aswell.

-- Ezwig
Source: StackOverflow