I'm working with OpenJDK 11 and a very simple SpringBoot application that almost the only thing it has is the SpringBoot actuator enabled so I can call /actuator/health etc.
I also have a kubernetes cluster on GCE very simple with just a pod with a container (containing this app of course)
My configuration has some key points that I want to highlight, it has some requirements and limits
resources:
limits:
memory: 600Mi
requests:
memory: 128Mi
And it has a readiness probe
readinessProbe:
initialDelaySeconds: 30
periodSeconds: 30
httpGet:
path: /actuator/health
port: 8080
I'm also setting a JVM_OPTS like (that my program is using obviously)
env:
- name: JVM_OPTS
value: "-XX:MaxRAM=512m"
The problem
I launch this and it gets OOMKilled in about 3 hours every time!
I'm never calling anything myself the only call is the readiness probe each 30 seconds that kubernetes does, and that is enough to exhaust the memory ? I have also not implemented anything out of the ordinary, just a Get method that says hello world along all the SpringBoot imports to have the actuators
If I run kubectl top pod XXXXXX I actually see how gradually get bigger and bigger
I have tried a lot of different configurations, tips, etc, but anything seems to work with a basic SpringBoot app
Is there a way to actually hard limit the memory in a way that Java can raise a OutOfMemory exception ? or to prevent this from happening?
Thanks in advance
NAME READY STATUS RESTARTS AGE
pod/test-79fd5c5b59-56654 1/1 Running 4 15h
describe pod says...
State: Running
Started: Wed, 27 Feb 2019 10:29:09 +0000
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Wed, 27 Feb 2019 06:27:39 +0000
Finished: Wed, 27 Feb 2019 10:29:08 +0000
That last span of time is about 4 hours and only have 483 calls to /actuator/health, apparently that was enough to make java exceed the MaxRAM hint ?
its about to die again
$ kubectl top pod test-79fd5c5b59-56654
NAME CPU(cores) MEMORY(bytes)
test-79fd5c5b59-56654 43m 575Mi
NAME READY STATUS RESTARTS AGE
pod/test-79fd5c5b59-56654 1/1 Running 6 23h
describe pod:
State: Running
Started: Wed, 27 Feb 2019 18:01:45 +0000
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Wed, 27 Feb 2019 14:12:09 +0000
Finished: Wed, 27 Feb 2019 18:01:44 +0000
Yesterday night I was doing some interesting reading:
https://developers.redhat.com/blog/2017/03/14/java-inside-docker/ https://banzaicloud.com/blog/java10-container-sizing/ https://medium.com/adorsys/jvm-memory-settings-in-a-container-environment-64b0840e1d9e
TL;DR I decided to remove the memory limit and start the process again, the result was quite interesting (after like 11 hours running)
NAME CPU(cores) MEMORY(bytes)
test-84ff9d9bd9-77xmh 218m 1122Mi
So... WTH with that CPU? I kind expecting a big number on memory usage but what happens with the CPU?
The one thing I can think is that the GC is running as crazy thinking that the MaxRAM is 512m and he is using more than 1G. I'm wondering, is Java detecting ergonomics correctly? (I'm starting to doubt it)
To test my theory I set a limit of 512m and deploy the app this way and I found that from the start there is a unusual CPU load that it has to be the GC running very frequently
kubectl create ...
limitrange/mem-limit-range created
pod/test created
kubectl exec -it test-64ccb87fd7-5ltb6 /usr/bin/free
total used free shared buff/cache available
Mem: 7658200 1141412 4132708 19948 2384080 6202496
Swap: 0 0 0
kubectl top pod ..
NAME CPU(cores) MEMORY(bytes)
test-64ccb87fd7-5ltb6 522m 283Mi
522m is too much vCPU, so my logical next step was to ensure I'm using the most appropriated GC for this case, I changed the JVM_OPTS this way:
env:
- name: JVM_OPTS
value: "-XX:MaxRAM=512m -Xmx128m -XX:+UseSerialGC"
...
resources:
requests:
memory: 256Mi
cpu: 0.15
limits:
memory: 700Mi
And thats bring the vCPU usage to a reasonable status again, after kubectl top pod
NAME CPU(cores) MEMORY(bytes)
test-84f4c7445f-kzvd5 13m 305Mi
Messing with Xmx having MaxRAM is obviously affecting the JVM but how is not possible to control the amount of memory we have on virtualized containers ? I know that free
command will report the host available RAM but OpenJDK should be using cgroups rihgt?.
I'm still monitoring the memory ...
I did two things, the first one was to remove again my container limit, I want to analyze how much it will grow, also I added a new flag to see how the process is using the native memory -XX:NativeMemoryTracking=summary
At the beginning every thing was normal, the process started consuming like 300MB via kubectl top pod
so I let it running about 4 hours and then ...
kubectl top pod
NAME CPU(cores) MEMORY(bytes)
test-646864bc48-69wm2 54m 645Mi
kind of expected, right ? but then I checked the native memory usage
jcmd <PID> VM.native_memory summary
Native Memory Tracking:
Total: reserved=2780631KB, committed=536883KB
- Java Heap (reserved=131072KB, committed=120896KB)
(mmap: reserved=131072KB, committed=120896KB)
- Class (reserved=203583KB, committed=92263KB)
(classes #17086)
( instance classes #15957, array classes #1129)
(malloc=2879KB #44797)
(mmap: reserved=200704KB, committed=89384KB)
( Metadata: )
( reserved=77824KB, committed=77480KB)
( used=76069KB)
( free=1411KB)
( waste=0KB =0.00%)
( Class space:)
( reserved=122880KB, committed=11904KB)
( used=10967KB)
( free=937KB)
( waste=0KB =0.00%)
- Thread (reserved=2126472KB, committed=222584KB)
(thread #2059)
(stack: reserved=2116644KB, committed=212756KB)
(malloc=7415KB #10299)
(arena=2413KB #4116)
- Code (reserved=249957KB, committed=31621KB)
(malloc=2269KB #9949)
(mmap: reserved=247688KB, committed=29352KB)
- GC (reserved=951KB, committed=923KB)
(malloc=519KB #1742)
(mmap: reserved=432KB, committed=404KB)
- Compiler (reserved=1913KB, committed=1913KB)
(malloc=1783KB #1343)
(arena=131KB #5)
- Internal (reserved=7798KB, committed=7798KB)
(malloc=7758KB #28415)
(mmap: reserved=40KB, committed=40KB)
- Other (reserved=32304KB, committed=32304KB)
(malloc=32304KB #3030)
- Symbol (reserved=20616KB, committed=20616KB)
(malloc=17475KB #212850)
(arena=3141KB #1)
- Native Memory Tracking (reserved=5417KB, committed=5417KB)
(malloc=347KB #4494)
(tracking overhead=5070KB)
- Arena Chunk (reserved=241KB, committed=241KB)
(malloc=241KB)
- Logging (reserved=4KB, committed=4KB)
(malloc=4KB #184)
- Arguments (reserved=17KB, committed=17KB)
(malloc=17KB #469)
- Module (reserved=286KB, committed=286KB)
(malloc=286KB #2704)
Wait, What ? 2.1 GB reserved for threads? and 222 MB being used, what is this ? I currently don't know, I just saw it...
I need time trying to understand why this is happening
I finally found my issue and I want to share it so others can benefit in some way from this.
As I found on my last edit I had a thread problem that was causing all the memory consumption over time, specifically we was using an asynchronous method from a third party library without properly taking care those resources (ensure those calls was ending correctly in this case).
I was able to detect the issue because I used a memory limit on my kubernete deployment from the beginning (which is a good practice on production environments) and then I monitored very closely my app memory consumption using tools like jstat, jcmd, visualvm, kill -3
and most importantly the -XX:NativeMemoryTracking=summary
flag that gave me so much detail in this regard.