Suppose I have the following input FASTQ files:
$ ls -1 inputs/
CM0009619-ABB_L01_Read1_Sample_Library_XY102.fastq.gz
CM0009619-ABB_L01_Read1_Sample_Library_XY110.fastq.gz
CM0009619-ABB_L01_Read1_Sample_Library_XY84.fastq.gz
CM0009619-ABB_L01_Read2_Sample_Library_XY102.fastq.gz
CM0009619-ABB_L01_Read2_Sample_Library_XY110.fastq.gz
CM0009619-ABB_L01_Read2_Sample_Library_XY84.fastq.gz
CM0009619-ABB_L02_Read1_Sample_Library_XY84.fastq.gz
CM0009619-ABB_L02_Read1_Sample_Library_XY88.fastq.gz
CM0009619-ABB_L02_Read2_Sample_Library_XY84.fastq.gz
CM0009619-ABB_L02_Read2_Sample_Library_XY88.fastq.gz
There are three variables: the lane identifier (L0*), read identifier (Read*) and the sample identifier (XY*).
All samples have both Read1 and Read2. Most samples belong to just 1 lane, but they can belong to both. Therefore, the combination of {sample} and {lane} must be conditional.
I’m trying to figure this out using a simple test workflow but the solution has eluded me for many hours
SAMPLES={"XY84": (1, 2),
"XY88": 2,
"XY102": 1,
"XY110": 1
}
rule all:
input:
expand("inputs/CM0009619-ABB_L0{lane}_Read{read}_Sample_Library_{sample}.fastq.gz", sample=SAMPLES.keys(), read=[1,2], lane=SAMPLES[sample] )
rule copy_files:
input:
expand("inputs/CM0009619-ABB_L0{lane}_Read{read}_Sample_Library_{sample}.fastq.gz", sample=SAMPLES.keys(), read=[1,2], lane=SAMPLES[sample] )
output:
expand("outputs/CM0009619-ABB_L0{lane}_Read{read}_Sample_Library_{sample}.fastq.gz", sample=SAMPLES.keys(), read=[1,2], lane=SAMPLES[sample] )
shell:
"cp {input} {output}"
However I get the error:
$ snakemake --snakefile Snakefile --cores 1
NameError in line 11 of /path/to/smktests/Snakefile:
name 'sample' is not defined
File "/path/to/smktests/Snakefile", line 11, in <module>
Clearly I cannot use the sample value in the expansion function as the key to the SAMPLES dictionary when trying to get the appropriate lane value. However, simply using lane=[1,2]
also does not work because this generates some invalid combinations.
Is there a simple way to have the lane value depend on the sample value? Is Snakemake inappropriate for this scenario?
I’m using v5.26.1 of Snakemake but could consider a more recent version if it gets me to a solution.