High I/O Java process consistently gets signal 11 SIGSEGV in a JavaThread when run in a Docker container

1/3/2019

Has anyone been able to consistently replicate SIGSEGVs on the JRE using different hardware and different JRE versions? Note (potentially a big note): I am running the process in a Docker container deployed on Kubernetes.

Sample error:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fea64dd9d01, pid=21, tid=0x00007fe8dfbfb700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_191-b12) (build 1.8.0_191-b12)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.191-b12 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# J 8706 C2 com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextFieldName()Ljava/lang/String; (493 bytes) @ 0x00007fea64dd9d01 [0x00007fea64dd9b60+0x1a1]

I'm currently managing a high I/O process that has many threads doing I/O and serialization: downloading CSVs and JSONs, reading CSVs, writing JSONs into CSVs, and loading CSVs into MySQL. I do this thousands of times during the application's run cycle. I use nothing but commonly-used libraries (Jackson, jOOQ) and "normal" code: specifically, I did not write custom code that uses the JNI.

Without fail, the JVM will SIGSEGV during each run cycle. It seems to SIGSERV in various parts of the code base, but never on a GC thread or any other well-known threads. The "problematic frame" is always compiled code.

Testing specs:

  • Multiple different hardware instances in AWS.
  • Tested using Java 8 191 and 181. Ubuntu 16.04.
  • This process is running in a container (Docker) and deployed on Kubernetes.
  • Docker version: 17.03.2-ce

Here's the full log: https://gist.github.com/navkast/9c95f56ce818d76276684fa5bb9a6864

-- navkast
docker
java
jvm
kubernetes

3 Answers

1/4/2019

Based on your comment, this is likely a case where your container limits are lower than your heap space + space needed for GC.

Some insights on how to run the JVM in a container here.

You didn't post any pod specs but you can also take a look a setting limits on your Kubernetes pods.

-- Rico
Source: StackOverflow

1/4/2019

From the full log:

siginfo: si_signo: 11 (SIGSEGV), si_code: 0 (SI_USER)

This means a kill() was issued. This is not a JVM issue. Something is killing the process deliberately. Probably due to running out of memory.

-- navkast
Source: StackOverflow

1/4/2019

A big hint is here

 Memory: 4k page, physical 33554432k(1020k free), swap 0k(0k free)

Out of 32 GB, only 1 MB is free at the time of the crash. Most likely the process was killed as the system has run out of memory. I suggest:

  • reducing the heap size significantly. e.g. 2 - 8 GB
  • increasing the available memory. e.g. 4 - 16 GB
  • adding some swap space. e.g. 8 - 32 GB, this doesn't fix the problem but will handle full memory a little more gracefully.
-- Peter Lawrey
Source: StackOverflow