Running the latest Kafka helm charts, https://github.com/confluentinc/cp-helm-charts, we are experiencing huge exponentially growing memory leaks that are causing our GCP clusters to crash after two days of running. I have raised the issue https://github.com/confluentinc/cp-helm-charts/issues/296, but with no avail.
I still have not been able to trace the source of leak however I think it may be from the prometheus exporter.
Are there any known fixes for this issue?
disabled jmx in the chart values, yet we are still experiencing massive leakage, the majority coming from the control-center pod
Sometimes a consumer that is stuck in a crash loop can build up ridiculous amounts of memory. We spent a long time tracking one down that filled terabytes of information rather quickly, turns out it submitted the crash log as a record. Double check the consumers and producers just in case it is something similar