How can I read files from a MapR cluster using Go?

1/27/2022

I have a Go application running in a Kubernetes cluster which needs to read files from a large MapR cluster. The two clusters are separate and the Kubernetes cluster does not permit us to use the CSI driver. All I can do is run userspace apps in Docker containers inside Kubernetes pods and I am given maprtickets to connect to the MapR cluster.

I'm able to use the com.mapr.hadoop maprfs jar to write a Java app which is able to connect and read files using a maprticket, but we need to integrate this into a Go app, which, ideally, shouldn't require a Java sidecar process.

-- Mihai Todor
go
kubernetes
mapr

1 Answer

1/27/2022

This is a good question because it highlights the way that some environments impose limits that violate the assumptions external software may hold.

And just for reference, MapR was acquired by HPE so a MapR cluster is now an HPE Ezmeral Data Fabric cluster. I am still training myself to say that.

Anyway, the accepted method for a generic program in language X to communicate with the Ezmeral Data Fabric (the filesystem formerly known as MapR FS) is to mount the file system and just talk to it using file APIs like open/read/write and such. This applies to Go, Python, C, Julia or whatever. Inside Kubernetes, the normal way to do this mount is to use a CSI driver that has some kind of operator working in the background. That operator isn't particularly magical ... it just does what is needful. In the case of data fabric, the operator mounts the data fabric using NFS or FUSE and then bind mounts1 part of that into the pod's awareness.

But this question is cool because it precludes all of that. If you can't install an operator, then this other stuff is just a dead letter.

There are three alternative approaches that may work.

1) NFS mounts were included in Kubernetes as a native capability before the CSI plugin approach was standardized. It might still be possible to use that on a very vanilla Kubernetes cluster and that could give access to the data cluster.

2) It is possible to integrate a container into your pod that does the necessary FUSE mount in an unprivileged way. This will be kind of painful because you would have to tease apart the FUSE driver from the data fabric install and get it to work. That would let you see the data fabric inside the pod. Even then, there is no guarantee Kubernetes or the OS will allow this to work.

3) There is an unpublished Go file system client that users the low level data fabric API directly. We don't yet release that separately. For more information on that, folks should ping me directly (my contact info is everywhere ... email to ted.dunning <at> hpe.com or <at> gmail.com works)

4) The data fabric allows you to access data via S3. With the 7.0 release of Ezmeral Data Fabric, this capability is heavily revamped to give massive performance especially since you can scale up the number of gateways essentially without limit (I have heard numbers like 3-5GB/s per stateless connection to a gateway, but YMMV). This will require the least futzing and should give plenty of performance. You can even access files as if they were S3 objects.

1 https://unix.stackexchange.com/questions/198590/what-is-a-bind-mount#:~:text=A%20bind%20mount%20is%20an,the%20same%20as%20the%20original.

-- Ted Dunning
Source: StackOverflow