GC - G1 Old Generation 'spike' - finding the cause

1/26/2017

App in question is running on (in?) Kubernetes (Google container engine) processing a regular stream of kafka data. The data is evenly divided among all the pods (using a random key to prevent 'hot spots'1).

80-90% of the processes will run fine indefinitely but some will show memory issues like these:

enter image description here

What would cause particular instances of what should be identical processes to have these spikes? As I said - the data is homogenous and evenly divided among all the consumers so there is no reason for one instance to be doing more work.

Clearly something is different for these outliers. Something is forcing allocations to be promoted to Old Gen. The other pods show (roughly) even <1% Young Gen GC during this same period.

Here is a second example from the same time span:

enter image description here

1Other monitoring shows this to be accurate.

-- ethrbunny
garbage-collection
java
kubernetes

0 Answers