Snakemake can't find GS remote file

12/13/2019

I'm trying to run DNA sequence analysis on Google Cloud with snakemake and kubernetes. Snakemake complains that the files in my GCP bucket can't be found.

This is the command I run (FYI swapping my actual bucket path with 'bucketname'):

snakemake --kubernetes --use-conda --container-image 'https://hub.docker.com/r/mmyint93/snakemake_wes' --default-remote-provider GS --default-remote-prefix bucketname --debug

And I get this error message as a result:

FileNotFoundError in line 8 of /path/to/snakefile/rules/common.smk:
[Errno 2] No such file or directory: 'bucketname/LCC_one_lane_test.txt'
  File "/path/to/snakefile/Snakefile", line 1, in <module>
  File "/path/to/snakefile/rules/common.smk", line 8, in <module>

I've tried copying the offending file LCC_one_lane_test.txt locally and pointing to that instead of GS.remote("gs://bucketname/LCC_one_lane_test.txt"). While that resolves the issue for that one file, it causes subsequent remote files to throw errors:

Building DAG of jobs...
MissingInputException in line 1 of /path/to/snakefile/rules/mapping.smk:
Missing input files for rule bwaMEM:
bucketname/fastq/H1213_L1_2.fq.gz
bucketname/fastq/H1213_L1_1.fq.gz

I've also tried using --latency-wait 50 --wait-for-files 50 to see if the latency between my machine and the remote was the cause, but the same error is returned.

For reference, this is some sample code from my .smk rule file:

from snakemake.remote.GS import RemoteProvider as GSRemoteProvider
GS = GSRemoteProvider()
import os 
import re
from itertools import chain, combinations

SAMPLE_FILE = GS.remote("gs://bucketname/LCC_one_lane_test.txt")

I've also tried dropping the gs://bucketname in the path, but it just returns the same error without the bucket name. Am I overlooking something?

Extra info, my version of snakemake is 5.4.3

-- MattMyint
google-kubernetes-engine
snakemake

0 Answers