Java fatal error when attempting to deploy Cassandra in Kubernetes

3/11/2019

I am trying to deploy a Cassandra Kubernetes pod as in here, except I am using my own Cassandra image, which deploys version 3.11.3 with JDK 8-201. The infrastructure is an AWS cluster composed by c4.2xlarge nodes.

The container launches successfully but the Cassandra deployment fails with the following error:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00007f5c77622d84, pid=73, tid=0x00007f5c79e64700
#
# JRE version:  (8.0_201-b09) (build )
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.201-b09 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x60bd84]  CodeHeap::allocate(unsigned long, bool)+0x2b4
#
# Core dump written. Default location: /opt/apache-cassandra-3.11.3/bin/core or core.73
#
# An error report file with more information is saved as:
# /opt/apache-cassandra-3.11.3/bin/hs_err_pid73.log

I am not sure if it helps, but here is the full log:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00007f5c77622d84, pid=73, tid=0x00007f5c79e64700
#
# JRE version:  (8.0_201-b09) (build )
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.201-b09 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x60bd84]  CodeHeap::allocate(unsigned long, bool)+0x2b4
#
# Core dump written. Default location: /opt/apache-cassandra-3.11.3/bin/core or core.73
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0x00007f5c76cd6400):  JavaThread "Unknown thread" [_thread_in_vm, id=74, stack(0x00007f5c79e24000,0x00007f5c79e65000)]

siginfo: si_signo: 7 (SIGBUS), si_code: 2 (BUS_ADRERR), si_addr: 0x00007f5c67200000

Registers:
RAX=0x00007f5c67200000, RBX=0x0000000000000100, RCX=0x0000000000000006, RDX=0x00007f5c66e40004
RSP=0x00007f5c79e63928, RBP=0x00007f5c79e63950, RSI=0x00007f5c66e40004, RDI=0x00007f5c780213c0
R8 =0x0000000000000004, R9 =0x0000000000000000, R10=0x0000000000000032, R11=0x0000000000000206
R12=0x0000000000000000, R13=0x000000000007d000, R14=0x00007f5c77f980a0, R15=0x00007f5c76c94e80
RIP=0x00007f5c77622d84, EFLAGS=0x0000000000010206, CSGSFS=0x0000000000000033, ERR=0x0000000000000006
  TRAPNO=0x000000000000000e

Top of Stack: (sp=0x00007f5c79e63928)
0x00007f5c79e63928:   0000000000000090 0000000000000090
0x00007f5c79e63938:   0000000000000000 00007f5c77b6007e
0x00007f5c79e63948:   00007f5c76c94e80 00007f5c79e63980
0x00007f5c79e63958:   00007f5c7747127c 0000000000000090
0x00007f5c79e63968:   0000000000000060 0000000000000000
0x00007f5c79e63978:   00007f5c77b6007e 00007f5c79e639c0
0x00007f5c79e63988:   00007f5c7746af7a 0000000000000000
0x00007f5c79e63998:   00007f5c79e63a20 00007f5c79e639f0
0x00007f5c79e639a8:   00007f5c76cd6800 00007f5c76c13700
0x00007f5c79e639b8:   00007f5c76cd6810 00007f5c79e63be0
0x00007f5c79e639c8:   00007f5c7763bb66 00007f5c66e50000
0x00007f5c79e639d8:   00000000000003d8 00007f5c79e63a60
0x00007f5c79e639e8:   00007f5c76cd6be8 00007f5c79e63a30
0x00007f5c79e639f8:   00000000003c0000 0000000000000000
0x00007f5c79e63a08:   0000000000000000 0000000000000000
0x00007f5c79e63a18:   0000000000000000 0000000000000000
0x00007f5c79e63a28:   00007f5c79e63b60 00007f5c79e63aa0
0x00007f5c79e63a38:   0000000000000020 00007f5c79e63a60
0x00007f5c79e63a48:   00007f5c7791eed0 00007f5c78021430
0x00007f5c79e63a58:   5b2d86e9a1108f00 00007f5c79e63ab0
0x00007f5c79e63a68:   0000000000000070 00007f5c79e63ad0
0x00007f5c79e63a78:   0000000000000000 0000000000000007
0x00007f5c79e63a88:   5b2d86e9a1108f00 00007f5c77b8a14c
0x00007f5c79e63a98:   0000000000000068 00007f5c79e63b00
0x00007f5c79e63aa8:   0000000000000000 0000000000000007
0x00007f5c79e63ab8:   00007f5c79e63b40 0000000000000000
0x00007f5c79e63ac8:   00007f5c7791c4b5 00007f5c79e63b00
0x00007f5c79e63ad8:   0000000000000068 0000000000000068
0x00007f5c79e63ae8:   00007f5c79e63ddf 00007f5c76cd6400
0x00007f5c79e63af8:   0000000000000000 00007f5c79e63b30
0x00007f5c79e63b08:   00007f5c772ddf49 0000000000000000
0x00007f5c79e63b18:   0000000000000000 0000000000000000 

Instructions: (pc=0x00007f5c77622d84)
0x00007f5c77622d64:   e9 4d fe ff ff 0f 1f 80 00 00 00 00 48 8b 87 00
0x00007f5c77622d74:   01 00 00 8b 8f f8 00 00 00 48 d3 e0 48 03 47 10
0x00007f5c77622d84:   4c 89 00 c6 40 08 01 4c 01 87 00 01 00 00 e9 1b
0x00007f5c77622d94:   fe ff ff 66 0f 1f 84 00 00 00 00 00 48 39 d6 74 

Stack: [0x00007f5c79e24000,0x00007f5c79e65000],  sp=0x00007f5c79e63928,  free space=254k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x60bd84]  CodeHeap::allocate(unsigned long, bool)+0x2b4
V  [libjvm.so+0x45a27c]  CodeCache::allocate(int, bool)+0x5c
V  [libjvm.so+0x453f7a]  BufferBlob::create(char const*, int)+0x8a
V  [libjvm.so+0x624b66]  AbstractICache::initialize()+0x66
V  [libjvm.so+0x631fbc]  init_globals()+0x3c
V  [libjvm.so+0xa79b69]  Threads::create_vm(JavaVMInitArgs*, bool*)+0x409
V  [libjvm.so+0x6d7b4f]  JNI_CreateJavaVM+0x4f
C  [libjli.so+0x7ee4]  JavaMain+0x84
C  [libpthread.so.0+0x77fc]  start_thread+0xdc


---------------  P R O C E S S  ---------------

Java Threads: ( => current thread )

Other Threads:

=>0x00007f5c76cd6400 (exited) JavaThread "Unknown thread" [_thread_in_vm, id=74, stack(0x00007f5c79e24000,0x00007f5c79e65000)]

VM state:not at safepoint (not fully initialized)

VM Mutex/Monitor currently owned by a thread:  ([mutex/lock_event])
[0x00007f5c76c94e80] CodeCache_lock - owner thread: 0x00007f5c76cd6400

One thing that I find relevant (not sure if it really is) is that I used this exact docker image with Docker swarm and I did not have this problem, I only reproduce it using Kubernetes. Moreover, I tried also Cassandra 3.13.1 and JDK 8_152, also with the same result.

Does anybody know what is causing this problem and how to fix it?

Thank you for your help.

-- João Matos
amazon-web-services
cassandra
java
kubernetes

1 Answer

3/11/2019

Looks like a SIGBUS error. There are multiple reasons why you may see this. For example:

  • Program instructs the CPU to read or write a specific physical memory address which is not valid / Requested physical address is unrecognized by the whole computer system.
  • Unaligned access of memory (For example, if multi-byte accesses must be 16 bit-aligned, addresses (given in bytes) at 0, 2, 4, 6, and so on would be considered aligned and therefore accessible, while addresses 1, 3, 5, and so on would be considered unaligned.)

Are you setting any memory limits on your Pods? It could that it's Cassandra is trying to find an address that doesn't exist (The JVM thinks it's there for some reason)

If you are using Docker in Kubernetes a good way is to see the command line that docker is using with Kubernetes and compare that to the docker command for Docker Swarm. Another way is to run docker inspect on the K8s and Swarm containers are see the differences.

More in causes of SIGBUS here.

-- Rico
Source: StackOverflow