My workflow process the following optional inputs params.yaml
- samples id, alignment (aln) file OR
- Sample id, variants (vcf) file OR
- Sample id, both alignment (aln) file and variants (vcf) file.
How do I give the list of output files which are varying in numbers (sample id + only vcf (2 output files) or sample id + only aln (5 output files) or sample id + vcf and aln (7 output files) based on the the above given optional inputs for each sample as a common input sets to a process ga4gh_metrics
. Appreciate any help.
params.yaml
samples:
-
biosample_id: NA12878
aln: /data/NA12878.bam
vcf: /data/NA12878.hard-filtered.vcf.gz
-
biosample_id: NA12877
aln: /data/NA12877.bam
vcf: /data/NA12877.hard-filtered.vcf.gz
OR
samples:
-
biosample_id: NA12878
aln: /data/NA12878.bam
-
biosample_id: NA12877
aln: /data/NA12877.bam
OR
samples:
-
biosample_id: NA12878
vcf: /data/NA12878.hard-filtered.vcf.gz
-
biosample_id: NA12877
vcf: /data/NA12877.hard-filtered.vcf.gz
…
Channel
samples.map { it.biosample_id }
.set { sample_ids }
…
// channel for samplelist vcf input file processed outputs
Channel
.empty()
sample_ids
.join( count_variants.out )
.join( bcftools_stats.out )
.set { vcf_qc }
..
// aln can be either bam or cram format
// channel for samplelist input file type bam processed outputs
Channel
.empty()
sample_ids
.join( samtools_stats_bam.out.stats )
.join( picard_collect_multiple_metrics_bam.out.insert_size )
.join( picard_collect_multiple_metrics_bam.out.quality )
.join( picard_collect_wgs_metrics_bam.out.wgs_coverage )
.join( verifybamid2_bam.out.freemix, remainder: true )
.set { ch_bam }
// channel for samplelist input file type cram processed outputs
Channel
.empty()
sample_ids
.join( samtools_stats_cram.out.stats )
.join( picard_collect_multiple_metrics_cram.out.insert_size )
.join( picard_collect_multiple_metrics_cram.out.quality )
.join( picard_collect_wgs_metrics_cram.out.wgs_coverage )
.join( verifybamid2_cram.out.freemix, remainder: true )
.set { ch_cram }
…
// channel to mix the bam/cram process outputs and/or variants process outputs
ch_bam.mix(ch_cram) // .ifEmpty([]) The channel contain sample id, will not be fully empty even if the aln (bam/cram) input are not given
//.combine(vcf_qc,by:0)
.join(vcf_qc, remainder: true)
.map { file -> file - null } //.map { it.minus(null) }
//.map { sample, stats, insertsize, quality, wgs_coverage, freemix, count_variants, bcftools_stats -> [ sample, stats ?: [], insertsize ?: [], quality ?: [], wgs_coverage ?: [], freemix ?: [], count_variants ?: [], bcftools_stats ?: [] ] }
.view()
//.flatten()
.set { ga4gh_metrics_in }
ga4gh_metrics( ga4gh_metrics_in )
I get the following lists…
If the input samples list (params.yaml
) contain..
- only
vcf
file, then the processed output to channelga4gh_metrics
are… (sample id + 2 files in the list)
[NA12878, ../NA12878_1000genomes-dragen-3.7.6/work/../NA12878.variant_counts.json, ../NA12878_1000genomes-dragen-3.7.6/work/../NA12878.pass.stats]
- only
aln
file, then the processed output to channelga4gh_metrics
are… (sample id + 5 files in the list)
[NA12878, ../NPM-sample-qc/tests/NA12878_1000genomes-dragen-3.7.6/work/../NA12878.stats, ../NA12878_1000genomes-dragen-3.7.6/work/../NA12878.insert_size_metrics.txt, /../NA12878_1000genomes-dragen-3.7.6/work/../NA12878.quality_yield_metrics.txt, ../NA12878_1000genomes-dragen-3.7.6/work/../NA12878_wgs_metrics.txt, ../NA12878_1000genomes-dragen-3.7.6/work/../NA12878.selfSM]
- both
aln
andvcf
files, the processed output to channelga4gh_metrics
are…(sample id + 7 files in the list)
[NA12878, ../NA12878_1000genomes-dragen-3.7.6/work/../NA12878.stats, ../NA12878_1000genomes-dragen-3.7.6/work/../NA12878.insert_size_metrics.txt, ../NA12878_1000genomes-dragen-3.7.6/work/../NA12878.quality_yield_metrics.txt, ../NA12878_1000genomes-dragen-3.7.6/work/../NA12878_wgs_metrics.txt, ../NA12878_1000genomes-dragen-3.7.6/work/../NA12878.selfSM, ../NA12878_1000genomes-dragen-3.7.6/work/../NA12878.variant_counts.json, ../NA12878_1000genomes-dragen-3.7.6/work/../NA12878.pass.stats]
I filter the null
(.map { file -> file – null }) from the list if the input contain only alignment (aln) or variants (vcf) files.
eg. only vcf input option
[NA12878, null, ../NA12878_1000genomes-dragen-3.7.6/work/../NA12878.variant_counts.json, ../NA12878_1000genomes-dragen-3.7.6/work/../NA12878.pass.stats]
process ga4gh_metrics
process ga4gh_metrics {
tag { sample }
input:
??????
//tuple val(sample), path('*')
//tuple val(sample), path(stats), path(picard_insert_size), path(picard_quality), path(picard_wgs_coverage), path(verifybamid_freemix), path(count_variants), path(bcftools_stats) // both aln and vcf processed outputs
//tuple val(sample), path(variantscounts), path(stats) // only vcf processed output
//tuple val(sample), path(stats), path(picard_insert_size), path(picard_quality), path(picard_wgs_coverage), path(verifybamid_freemix) // only aln processed output