I'm trying to run DNA sequence analysis on Google Cloud with snakemake and kubernetes. Snakemake complains that the files in my GCP bucket can't be found.
This is the command I run (FYI swapping my actual bucket path with 'bucketname'):
snakemake --kubernetes --use-conda --container-image 'https://hub.docker.com/r/mmyint93/snakemake_wes' --default-remote-provider GS --default-remote-prefix bucketname --debug
And I get this error message as a result:
FileNotFoundError in line 8 of /path/to/snakefile/rules/common.smk:
[Errno 2] No such file or directory: 'bucketname/LCC_one_lane_test.txt'
File "/path/to/snakefile/Snakefile", line 1, in <module>
File "/path/to/snakefile/rules/common.smk", line 8, in <module>
I've tried copying the offending file LCC_one_lane_test.txt
locally and pointing to that instead of GS.remote("gs://bucketname/LCC_one_lane_test.txt")
. While that resolves the issue for that one file, it causes subsequent remote files to throw errors:
Building DAG of jobs...
MissingInputException in line 1 of /path/to/snakefile/rules/mapping.smk:
Missing input files for rule bwaMEM:
bucketname/fastq/H1213_L1_2.fq.gz
bucketname/fastq/H1213_L1_1.fq.gz
I've also tried using --latency-wait 50 --wait-for-files 50
to see if the latency between my machine and the remote was the cause, but the same error is returned.
For reference, this is some sample code from my .smk rule file:
from snakemake.remote.GS import RemoteProvider as GSRemoteProvider
GS = GSRemoteProvider()
import os
import re
from itertools import chain, combinations
SAMPLE_FILE = GS.remote("gs://bucketname/LCC_one_lane_test.txt")
I've also tried dropping the gs://bucketname
in the path, but it just returns the same error without the bucket name. Am I overlooking something?
Extra info, my version of snakemake is 5.4.3