We are trying to run Snakemake workflows on Kubernetes on premise infrastructure. We are using OpenShift OKD on a MapR filesystem more precisely.
We followed the official documentation command:
snakemake --kubernetes --use-conda --default-remote-provider $REMOTE --default-remote-prefix $PREFIX
But the command line help provided for --default-remote-provider
and --default-remote-prefix
are not clear about how we should execute Snakemake pipelines on a on premise Kubernetes or OpenShift cluster:
--default-remote-provider: choose from 'S3', 'GS', 'FTP', 'SFTP', 'S3Mocked', 'gfal', 'gridftp', 'iRODS'
Also, the official documentation states:
In this mode, Snakemake will assume all input and output files to be stored in a given remote location, configured by setting $REMOTE to your provider of choice (e.g. GS for Google cloud storage or S3 for Amazon S3) and $PREFIX to a bucket name or subfolder within that remote storage.
So I was wondering:
How does one should proceed to deploy Snakemake workflow to on premise OpenShift/Kubernetes installation?
Is there example (such as github repo or blog) of running Snakemake on premise clusters?
In particular, I am not sure which remote provider should be chosen, and how to provide the prefix (can it be linked to a Kubernetes Persistent Volume Claim?)
Thanks a lot for your help!
Not really familiar with on premise Kubernetes set up, but this segment of snakemake's documentation on cluster execution may help.
The portion you've highlighted relates more to cloud implementation of compute clusters.