How to run Snakemake workflows on premise Kubernetes or OpenShift clusters?

11/1/2019

We are trying to run Snakemake workflows on Kubernetes on premise infrastructure. We are using OpenShift OKD on a MapR filesystem more precisely.

We followed the official documentation command:

snakemake --kubernetes --use-conda --default-remote-provider $REMOTE --default-remote-prefix $PREFIX

But the command line help provided for --default-remote-provider and --default-remote-prefix are not clear about how we should execute Snakemake pipelines on a on premise Kubernetes or OpenShift cluster:

--default-remote-provider: choose from 'S3', 'GS', 'FTP', 'SFTP', 'S3Mocked', 'gfal', 'gridftp', 'iRODS'

Also, the official documentation states:

In this mode, Snakemake will assume all input and output files to be stored in a given remote location, configured by setting $REMOTE to your provider of choice (e.g. GS for Google cloud storage or S3 for Amazon S3) and $PREFIX to a bucket name or subfolder within that remote storage.

So I was wondering:

  • How does one should proceed to deploy Snakemake workflow to on premise OpenShift/Kubernetes installation?

  • Is there example (such as github repo or blog) of running Snakemake on premise clusters?

  • In particular, I am not sure which remote provider should be chosen, and how to provide the prefix (can it be linked to a Kubernetes Persistent Volume Claim?)

Thanks a lot for your help!

-- vemonet
kubernetes
openshift
snakemake
workflow

1 Answer

12/13/2019

Not really familiar with on premise Kubernetes set up, but this segment of snakemake's documentation on cluster execution may help.

The portion you've highlighted relates more to cloud implementation of compute clusters.

-- MattMyint
Source: StackOverflow