I am having trouble in running a Snakemake rule in combination with the package sortmerna.
I have 6 samples in fastq format. I want to process each sample with sortmerna.
The rule I am attempting to run is the following:
# Define the directory containing the FASTQ files
single_end_dir = "FASTQ/single_end"
# Define patterns to match specific files
sample_name = glob_wildcards(single_end_dir + "/{sample}.fastq").sample
rule all:
input:
expand("results/sortmerna_files/unpaired/rRNA/{sample}.log", sample = sample_name)
rule rna_filtering_not_paired:
input:
reads = "FASTQ/single_end/{sample}.fastq",
output:
aligned = "results/sortmerna_files/unpaired/rRNA/{sample}.log"
params:
aligned = "results/sortmerna_files/unpaired/rRNA/{sample}",
other = "results/sortmerna_files/unpaired/rRNAf/{sample}",
threads = 24
conda:
"../envs/fastqc.yaml"
shell:
"""
mkdir -p results/sortmerna_files/unpaired/rRNA results/sortmerna_files/unpaired/rRNAf
sortmerna --ref /home/oscar/rnaseq/resources/rRNA_databases_v4/smr_v4.3_default_db.fasta --reads {input.reads} --aligned {params.aligned} --other {params.other} --workdir /home/oscar/rnaseq --fastx -threads {params.threads} -v --idx-dir ./idx
"""
I then run the snakefile with snakemake --use-conda -c 24
.
I have tried putting the ./kvdb directory that sortmerna creates in a temporary directory in params of the rule rna_filtering_not_paired, but it did not affect the outcome. It completes one job, but fails the rest.
I suspect the not completion of the jobs is related to this directory, but i am unable to think of another solution.
If i run the shell commands that snakemake outputs one after another, it outputs the expected files, so the problem must lie in the parallel use of kvdb.
The snakemake log outputs the following:
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Using shell: /usr/bin/bash
Provided cores: 24
Rules claiming more threads will be scaled down.
Job stats:
job count
------------------------ -------
all 1
rna_filtering_not_paired 6
total 7
Select jobs to execute...
Execute 6 jobs...
[Mon Aug 5 10:53:53 2024]
localrule rna_filtering_not_paired:
input: FASTQ/single_end/Zwt3_02162AAC_GATCAG.fastq
output: results/sortmerna_files/unpaired/rRNA/Zwt3_02162AAC_GATCAG.log
jobid: 2
reason: Missing output files: results/sortmerna_files/unpaired/rRNA/Zwt3_02162AAC_GATCAG.log
wildcards: sample=Zwt3_02162AAC_GATCAG
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/172f44aa594738803b665ab48840e734_
[Mon Aug 5 10:53:53 2024]
localrule rna_filtering_not_paired:
input: FASTQ/single_end/Zwt2_02160AAC_TTAGGC.fastq
output: results/sortmerna_files/unpaired/rRNA/Zwt2_02160AAC_TTAGGC.log
jobid: 6
reason: Missing output files: results/sortmerna_files/unpaired/rRNA/Zwt2_02160AAC_TTAGGC.log
wildcards: sample=Zwt2_02160AAC_TTAGGC
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/172f44aa594738803b665ab48840e734_
[Mon Aug 5 10:53:53 2024]
localrule rna_filtering_not_paired:
input: FASTQ/single_end/Zwt1_02158AAC_ATCACG.fastq
output: results/sortmerna_files/unpaired/rRNA/Zwt1_02158AAC_ATCACG.log
jobid: 1
reason: Missing output files: results/sortmerna_files/unpaired/rRNA/Zwt1_02158AAC_ATCACG.log
wildcards: sample=Zwt1_02158AAC_ATCACG
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/172f44aa594738803b665ab48840e734_
[Mon Aug 5 10:53:53 2024]
localrule rna_filtering_not_paired:
input: FASTQ/single_end/Zcr2_02161AAC_CAGATC.fastq
output: results/sortmerna_files/unpaired/rRNA/Zcr2_02161AAC_CAGATC.log
jobid: 5
reason: Missing output files: results/sortmerna_files/unpaired/rRNA/Zcr2_02161AAC_CAGATC.log
wildcards: sample=Zcr2_02161AAC_CAGATC
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/172f44aa594738803b665ab48840e734_
[Mon Aug 5 10:53:53 2024]
localrule rna_filtering_not_paired:
input: FASTQ/single_end/Zcr1_02159AAC_CGATGT.fastq
output: results/sortmerna_files/unpaired/rRNA/Zcr1_02159AAC_CGATGT.log
jobid: 4
reason: Missing output files: results/sortmerna_files/unpaired/rRNA/Zcr1_02159AAC_CGATGT.log
wildcards: sample=Zcr1_02159AAC_CGATGT
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/172f44aa594738803b665ab48840e734_
[Mon Aug 5 10:53:53 2024]
localrule rna_filtering_not_paired:
input: FASTQ/single_end/Zcr3_02163AAC_AGTTCC.fastq
output: results/sortmerna_files/unpaired/rRNA/Zcr3_02163AAC_AGTTCC.log
jobid: 3
reason: Missing output files: results/sortmerna_files/unpaired/rRNA/Zcr3_02163AAC_AGTTCC.log
wildcards: sample=Zcr3_02163AAC_AGTTCC
resources: tmpdir=/tmp
Activating conda environment: .snakemake/conda/172f44aa594738803b665ab48840e734_
[Mon Aug 5 10:53:53 2024]
Error in rule rna_filtering_not_paired:
jobid: 1
input: FASTQ/single_end/Zwt1_02158AAC_ATCACG.fastq
output: results/sortmerna_files/unpaired/rRNA/Zwt1_02158AAC_ATCACG.log
conda-env: /home/oscar/rnaseq/.snakemake/conda/172f44aa594738803b665ab48840e734_
shell:
mkdir -p results/sortmerna_files/unpaired/rRNA results/sortmerna_files/unpaired/rRNAf
sortmerna --ref /home/oscar/rnaseq/resources/rRNA_databases_v4/smr_v4.3_default_db.fasta --reads FASTQ/single_end/Zwt1_02158AAC_ATCACG.fastq --aligned results/sortmerna_files/unpaired/rRNA/Zwt1_02158AAC_ATCACG --other results/sortmerna_files/unpaired/rRNAf/Zwt1_02158AAC_ATCACG --workdir /home/oscar/rnaseq --fastx -threads 24 -v --idx-dir ./idx
rm -r ./kvdb
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
[Mon Aug 5 10:53:53 2024]
Error in rule rna_filtering_not_paired:
jobid: 5
input: FASTQ/single_end/Zcr2_02161AAC_CAGATC.fastq
output: results/sortmerna_files/unpaired/rRNA/Zcr2_02161AAC_CAGATC.log
conda-env: /home/oscar/rnaseq/.snakemake/conda/172f44aa594738803b665ab48840e734_
shell:
mkdir -p results/sortmerna_files/unpaired/rRNA results/sortmerna_files/unpaired/rRNAf
sortmerna --ref /home/oscar/rnaseq/resources/rRNA_databases_v4/smr_v4.3_default_db.fasta --reads FASTQ/single_end/Zcr2_02161AAC_CAGATC.fastq --aligned results/sortmerna_files/unpaired/rRNA/Zcr2_02161AAC_CAGATC --other results/sortmerna_files/unpaired/rRNAf/Zcr2_02161AAC_CAGATC --workdir /home/oscar/rnaseq --fastx -threads 24 -v --idx-dir ./idx
rm -r ./kvdb
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
[Mon Aug 5 10:53:53 2024]
Error in rule rna_filtering_not_paired:
jobid: 2
input: FASTQ/single_end/Zwt3_02162AAC_GATCAG.fastq
output: results/sortmerna_files/unpaired/rRNA/Zwt3_02162AAC_GATCAG.log
conda-env: /home/oscar/rnaseq/.snakemake/conda/172f44aa594738803b665ab48840e734_
shell:
mkdir -p results/sortmerna_files/unpaired/rRNA results/sortmerna_files/unpaired/rRNAf
sortmerna --ref /home/oscar/rnaseq/resources/rRNA_databases_v4/smr_v4.3_default_db.fasta --reads FASTQ/single_end/Zwt3_02162AAC_GATCAG.fastq --aligned results/sortmerna_files/unpaired/rRNA/Zwt3_02162AAC_GATCAG --other results/sortmerna_files/unpaired/rRNAf/Zwt3_02162AAC_GATCAG --workdir /home/oscar/rnaseq --fastx -threads 24 -v --idx-dir ./idx
rm -r ./kvdb
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
[Mon Aug 5 10:53:53 2024]
Error in rule rna_filtering_not_paired:
jobid: 6
input: FASTQ/single_end/Zwt2_02160AAC_TTAGGC.fastq
output: results/sortmerna_files/unpaired/rRNA/Zwt2_02160AAC_TTAGGC.log
conda-env: /home/oscar/rnaseq/.snakemake/conda/172f44aa594738803b665ab48840e734_
shell:
mkdir -p results/sortmerna_files/unpaired/rRNA results/sortmerna_files/unpaired/rRNAf
sortmerna --ref /home/oscar/rnaseq/resources/rRNA_databases_v4/smr_v4.3_default_db.fasta --reads FASTQ/single_end/Zwt2_02160AAC_TTAGGC.fastq --aligned results/sortmerna_files/unpaired/rRNA/Zwt2_02160AAC_TTAGGC --other results/sortmerna_files/unpaired/rRNAf/Zwt2_02160AAC_TTAGGC --workdir /home/oscar/rnaseq --fastx -threads 24 -v --idx-dir ./idx
rm -r ./kvdb
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
[Mon Aug 5 10:53:53 2024]
Error in rule rna_filtering_not_paired:
jobid: 3
input: FASTQ/single_end/Zcr3_02163AAC_AGTTCC.fastq
output: results/sortmerna_files/unpaired/rRNA/Zcr3_02163AAC_AGTTCC.log
conda-env: /home/oscar/rnaseq/.snakemake/conda/172f44aa594738803b665ab48840e734_
shell:
mkdir -p results/sortmerna_files/unpaired/rRNA results/sortmerna_files/unpaired/rRNAf
sortmerna --ref /home/oscar/rnaseq/resources/rRNA_databases_v4/smr_v4.3_default_db.fasta --reads FASTQ/single_end/Zcr3_02163AAC_AGTTCC.fastq --aligned results/sortmerna_files/unpaired/rRNA/Zcr3_02163AAC_AGTTCC --other results/sortmerna_files/unpaired/rRNAf/Zcr3_02163AAC_AGTTCC --workdir /home/oscar/rnaseq --fastx -threads 24 -v --idx-dir ./idx
rm -r ./kvdb
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
[Mon Aug 5 10:54:12 2024]
Error in rule rna_filtering_not_paired:
jobid: 4
input: FASTQ/single_end/Zcr1_02159AAC_CGATGT.fastq
output: results/sortmerna_files/unpaired/rRNA/Zcr1_02159AAC_CGATGT.log
conda-env: /home/oscar/rnaseq/.snakemake/conda/172f44aa594738803b665ab48840e734_
shell:
mkdir -p results/sortmerna_files/unpaired/rRNA results/sortmerna_files/unpaired/rRNAf
sortmerna --ref /home/oscar/rnaseq/resources/rRNA_databases_v4/smr_v4.3_default_db.fasta --reads FASTQ/single_end/Zcr1_02159AAC_CGATGT.fastq --aligned results/sortmerna_files/unpaired/rRNA/Zcr1_02159AAC_CGATGT --other results/sortmerna_files/unpaired/rRNAf/Zcr1_02159AAC_CGATGT --workdir /home/oscar/rnaseq --fastx -threads 24 -v --idx-dir ./idx
rm -r ./kvdb
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job rna_filtering_not_paired since they might be corrupted:
results/sortmerna_files/unpaired/rRNA/Zcr1_02159AAC_CGATGT.log
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-08-05T105352.555644.snakemake.log
WorkflowError:
At least one job did not complete successfully.
The dry run outputs the following:
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Job stats:
job count
------------------------ -------
all 1
rna_filtering_not_paired 6
total 7
Execute 6 jobs...
[Mon Aug 5 11:03:45 2024]
rule rna_filtering_not_paired:
input: FASTQ/single_end/Zcr2_02161AAC_CAGATC.fastq
output: results/sortmerna_files/unpaired/rRNA/Zcr2_02161AAC_CAGATC.log
jobid: 5
reason: Missing output files: results/sortmerna_files/unpaired/rRNA/Zcr2_02161AAC_CAGATC.log
wildcards: sample=Zcr2_02161AAC_CAGATC
resources: tmpdir=<TBD>
mkdir -p results/sortmerna_files/unpaired/rRNA results/sortmerna_files/unpaired/rRNAf
sortmerna --ref /home/oscar/rnaseq/resources/rRNA_databases_v4/smr_v4.3_default_db.fasta --reads FASTQ/single_end/Zcr2_02161AAC_CAGATC.fastq --aligned results/sortmerna_files/unpaired/rRNA/Zcr2_02161AAC_CAGATC --other results/sortmerna_files/unpaired/rRNAf/Zcr2_02161AAC_CAGATC --workdir /home/oscar/rnaseq --fastx -threads 24 -v --idx-dir ./idx
rm -r ./kvdb
[Mon Aug 5 11:03:45 2024]
rule rna_filtering_not_paired:
input: FASTQ/single_end/Zcr3_02163AAC_AGTTCC.fastq
output: results/sortmerna_files/unpaired/rRNA/Zcr3_02163AAC_AGTTCC.log
jobid: 3
reason: Missing output files: results/sortmerna_files/unpaired/rRNA/Zcr3_02163AAC_AGTTCC.log
wildcards: sample=Zcr3_02163AAC_AGTTCC
resources: tmpdir=<TBD>
mkdir -p results/sortmerna_files/unpaired/rRNA results/sortmerna_files/unpaired/rRNAf
sortmerna --ref /home/oscar/rnaseq/resources/rRNA_databases_v4/smr_v4.3_default_db.fasta --reads FASTQ/single_end/Zcr3_02163AAC_AGTTCC.fastq --aligned results/sortmerna_files/unpaired/rRNA/Zcr3_02163AAC_AGTTCC --other results/sortmerna_files/unpaired/rRNAf/Zcr3_02163AAC_AGTTCC --workdir /home/oscar/rnaseq --fastx -threads 24 -v --idx-dir ./idx
rm -r ./kvdb
[Mon Aug 5 11:03:45 2024]
rule rna_filtering_not_paired:
input: FASTQ/single_end/Zwt3_02162AAC_GATCAG.fastq
output: results/sortmerna_files/unpaired/rRNA/Zwt3_02162AAC_GATCAG.log
jobid: 2
reason: Missing output files: results/sortmerna_files/unpaired/rRNA/Zwt3_02162AAC_GATCAG.log
wildcards: sample=Zwt3_02162AAC_GATCAG
resources: tmpdir=<TBD>
mkdir -p results/sortmerna_files/unpaired/rRNA results/sortmerna_files/unpaired/rRNAf
sortmerna --ref /home/oscar/rnaseq/resources/rRNA_databases_v4/smr_v4.3_default_db.fasta --reads FASTQ/single_end/Zwt3_02162AAC_GATCAG.fastq --aligned results/sortmerna_files/unpaired/rRNA/Zwt3_02162AAC_GATCAG --other results/sortmerna_files/unpaired/rRNAf/Zwt3_02162AAC_GATCAG --workdir /home/oscar/rnaseq --fastx -threads 24 -v --idx-dir ./idx
rm -r ./kvdb
[Mon Aug 5 11:03:45 2024]
rule rna_filtering_not_paired:
input: FASTQ/single_end/Zwt2_02160AAC_TTAGGC.fastq
output: results/sortmerna_files/unpaired/rRNA/Zwt2_02160AAC_TTAGGC.log
jobid: 6
reason: Missing output files: results/sortmerna_files/unpaired/rRNA/Zwt2_02160AAC_TTAGGC.log
wildcards: sample=Zwt2_02160AAC_TTAGGC
resources: tmpdir=<TBD>
mkdir -p results/sortmerna_files/unpaired/rRNA results/sortmerna_files/unpaired/rRNAf
sortmerna --ref /home/oscar/rnaseq/resources/rRNA_databases_v4/smr_v4.3_default_db.fasta --reads FASTQ/single_end/Zwt2_02160AAC_TTAGGC.fastq --aligned results/sortmerna_files/unpaired/rRNA/Zwt2_02160AAC_TTAGGC --other results/sortmerna_files/unpaired/rRNAf/Zwt2_02160AAC_TTAGGC --workdir /home/oscar/rnaseq --fastx -threads 24 -v --idx-dir ./idx
rm -r ./kvdb
[Mon Aug 5 11:03:45 2024]
rule rna_filtering_not_paired:
input: FASTQ/single_end/Zwt1_02158AAC_ATCACG.fastq
output: results/sortmerna_files/unpaired/rRNA/Zwt1_02158AAC_ATCACG.log
jobid: 1
reason: Missing output files: results/sortmerna_files/unpaired/rRNA/Zwt1_02158AAC_ATCACG.log
wildcards: sample=Zwt1_02158AAC_ATCACG
resources: tmpdir=<TBD>
mkdir -p results/sortmerna_files/unpaired/rRNA results/sortmerna_files/unpaired/rRNAf
sortmerna --ref /home/oscar/rnaseq/resources/rRNA_databases_v4/smr_v4.3_default_db.fasta --reads FASTQ/single_end/Zwt1_02158AAC_ATCACG.fastq --aligned results/sortmerna_files/unpaired/rRNA/Zwt1_02158AAC_ATCACG --other results/sortmerna_files/unpaired/rRNAf/Zwt1_02158AAC_ATCACG --workdir /home/oscar/rnaseq --fastx -threads 24 -v --idx-dir ./idx
rm -r ./kvdb
[Mon Aug 5 11:03:45 2024]
rule rna_filtering_not_paired:
input: FASTQ/single_end/Zcr1_02159AAC_CGATGT.fastq
output: results/sortmerna_files/unpaired/rRNA/Zcr1_02159AAC_CGATGT.log
jobid: 4
reason: Missing output files: results/sortmerna_files/unpaired/rRNA/Zcr1_02159AAC_CGATGT.log
wildcards: sample=Zcr1_02159AAC_CGATGT
resources: tmpdir=<TBD>
mkdir -p results/sortmerna_files/unpaired/rRNA results/sortmerna_files/unpaired/rRNAf
sortmerna --ref /home/oscar/rnaseq/resources/rRNA_databases_v4/smr_v4.3_default_db.fasta --reads FASTQ/single_end/Zcr1_02159AAC_CGATGT.fastq --aligned results/sortmerna_files/unpaired/rRNA/Zcr1_02159AAC_CGATGT --other results/sortmerna_files/unpaired/rRNAf/Zcr1_02159AAC_CGATGT --workdir /home/oscar/rnaseq --fastx -threads 24 -v --idx-dir ./idx
rm -r ./kvdb
Execute 1 jobs...
[Mon Aug 5 11:03:45 2024]
rule all:
input: results/sortmerna_files/unpaired/rRNA/Zwt1_02158AAC_ATCACG.log, results/sortmerna_files/unpaired/rRNA/Zwt3_02162AAC_GATCAG.log, results/sortmerna_files/unpaired/rRNA/Zcr3_02163AAC_AGTTCC.log, results/sortmerna_files/unpaired/rRNA/Zcr1_02159AAC_CGATGT.log, results/sortmerna_files/unpaired/rRNA/Zcr2_02161AAC_CAGATC.log, results/sortmerna_files/unpaired/rRNA/Zwt2_02160AAC_TTAGGC.log
jobid: 0
reason: Rules with a run or shell declaration but no output are always executed.
resources: tmpdir=<TBD>
echo "I just run subrules!"
Job stats:
job count
------------------------ -------
all 1
rna_filtering_not_paired 6
total 7
Reasons:
(check individual jobs above for details)
input files updated by another job:
all
output files have to be generated:
rna_filtering_not_paired
run or shell but no output:
all
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
obf is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.