Following is the code that I am running on local machine and remote - apache-beam + dataflow. The execution requires a larger disk space and works great on local machine with resources increase. For apache-beam worker, I have tried multiple machines types and experiencing root disk error. see the comparison of the steps below and please let me know if you have any idea how to make this work.
Local machine:
n1-standard-8 (8 vCPUs, 30 GB memory)
60 GB persistent disk
Created base environment with kallisto package
Code:
from subprocess import Popen, PIPE, STDOUT
import logging
script = "/home/eila_orielresearch_org/etc/profile.d/conda.sh"
cmd1 = ". {}; env".format(script)
cmd2 = "echo finished kallisto"
cmd3 = "echo before init"
cmd4 = "conda init --all"
cmd5 = "conda activate"
cmd6 = "kallisto quant -t 2 -i release-99_transcripts.idx --single -l 200 -s 20 -o srr SRR2144345.fastq"
cmd7 = "conda deactivate"
final = Popen("{}; {}; {}; {}; {}; {}; {}".format(cmd1,cmd2,cmd3,cmd4,cmd5,cmd6,cmd7), shell=True,
stdin=PIPE,stdout=PIPE, stderr=STDOUT, close_fds=True)
stdout, nothing = final.communicate()
stdout
Output includes 3 generated files
Apache-beam:
GoogleCloudOptions.worker_machine_type = 'n1-standard-4' OR:'m2-ultramem-208 OR: '#'n1-highcpu-96' OR: #'n1-highmem-32'# OR:'n1-highcpu-96'
Created base environment with kallisto package
Code:
from subprocess import Popen, PIPE, STDOUT
import logging
script = "/opt/userowned/etc/profile.d/conda.sh"
cmd1 = ". {}; env".format(script)
cmd2 = "echo finished kallisto"
cmd3 = "echo before init"
cmd4 = "conda init --all"
cmd5 = "conda activate"
cmd6 = "kallisto quant -t 2 -i release-99_transcripts.idx --single -l 200 -s 20 -o srr SRR2144345.fastq"
cmd7 = "conda deactivate"
final = Popen("{}; {}; {}; {}; {}; {}; {}".format(cmd1,cmd2,cmd3,cmd4,cmd5,cmd6,cmd7), shell=True, stdin=PIPE,stdout=PIPE, stderr=STDOUT, close_fds=True)
stdout, nothing = final.communicate()
stdout
output:
failed to collect filesystem stats - rootDiskErr: du command failed on /var/lib/docker/overlay2/7c97f73cb854e5f5b092a0b58add63cd2f08475b0d487ca9ab50460b9c149f3b/diff with output stdout: 3757384 /var/lib/docker/overlay2/7c97f73cb854e5f5b092a0b58add63cd2f08475b
Thanks, eilalan