I've got a problem doing automatic heap dump to a mounted persistent volume in Microsoft Azure AKS (Kubernetes).
So the situation looks like this:
What could be the reason of such a behaviour?
My test program looks like this:
import java.io._
object Main {
def main(args: Array[String]): Unit = {
println("Before printing test info to file")
val pw = new PrintWriter(new File("/borsuk_data/hello.txt"))
pw.write("Hello, world")
pw.close
println("Before allocating to big Array for current memory settings")
val vectorOfDouble = Range(0, 50 * 1000 * 1000).map(x => 666.0).toArray
println("After creating to big Array")
}
}
My entrypoint.sh:
#!/bin/sh
java -jar /root/scala-heap-dump.jar -Xmx200m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/scala-heap-dump.bin
My Dockerfile:
FROM openjdk:jdk-alpine
WORKDIR /root
ADD target/scala-2.12/scala-heap-dump.jar /root/scala-heap-dump.jar
ADD etc/entrypoint.sh /root/entrypoint.sh
ENTRYPOINT ["/bin/sh","/root/entrypoint.sh"]
My deployment yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: scala-heap-dump
spec:
replicas: 1
template:
metadata:
labels:
app: scala-heap-dump
spec:
containers:
- name: scala-heap-dump-container
image: PRIVATE_REPO_ADDRESS/scala-heap-dump:latest
imagePullPolicy: Always
resources:
requests:
cpu: 500m
memory: "1Gi"
limits:
cpu: 500m
memory: "1Gi"
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
persistentVolumeClaim:
claimName: dynamic-persistence-volume-claim
dnsPolicy: ClusterFirst
hostNetwork: false
imagePullSecrets:
- name: regsecret
UPDATE: As lawrencegripper pointed out the first issue was that pod was OOM killed due to memory limits in yaml. After changing memory to 2560Mi or higher (I've tried even such ridiculous values in yaml as CPU: 1000m and memory 5Gi) I don't get reason OOM killed. However, no dump file is created and different kind of message occurs under lastState terminated. The reason is: Error. Unfortunately this isn't very helpful. If anybody knows how to narrow it down, please help.
UPDATE 2: I've added some println in code to have better understanding of what's going on. The logs for killed pod are:
Before printing test info to file
Before allocating to big Array for current memory settings
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:153)
at scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:151)
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:285)
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:283)
at scala.collection.AbstractTraversable.toArray(Traversable.scala:104)
at Main$.main(Main.scala:12)
at Main.main(Main.scala)
So as you can see program never reaches: println("After creating to big Array").
I think the problem is the entrypoint.sh command.
> java --help
Usage: java [options] <mainclass> [args...]
(to execute a class)
or java [options] -jar <jarfile> [args...]
(to execute a jar file)
Note that anything after the -jar are arguments passed to your application, not to the JVM.
Try:
java -Xmx200m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/scala-heap-dump.bin -jar /root/scala-heap-dump.jar
It's a long-shot but one possibility is that Kubernetes is killing the pod due to it breaching the memory limit set in the YAML
while it's building the dump but before it writes it to disk.
Use kubectl get pod <yourPodNameHere> --output=yaml
to get the pod information back and look under lastState
for Reason: OOMKilled
https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/