I develop a big data analysis system based on Hadoop.
And I want to containerize the system on Kubernetes. Now I use the stable hadoop chart on helm to setup hadoop, but the data I analyzed usually up to 100GB.
And the kubernetes just can use the capactiy of / directory. In other words, my hadoop job cannot use the resource of other disk.
Is there other approaches that Kubernetes can use the other disk resource as container job(not like volume).
Or Can I setup the worker nodes of Kubernetes on non-system disk?
based on Hadoop
You can use a Hadoop compatible filesystem with any Hadoop-API system, including Apache Spark on Kubernetes.
You don't need Hadoop/YARN/MapReduce to be "based on Hadoop"
In other words, try something else that properly works in such an environment like the Rook project (Ceph) or MinIO (S3-like)
However, I would suggest not putting your datalake storage inside ephemeral containers