-
Notifications
You must be signed in to change notification settings - Fork 443
Addition of bbsplit for filtering of genomic contaminants #1850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
Warning Newer version of the nf-core template is available 8000 . Your pipeline is using an old version of the nf-core template: 3.0.2. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
Tests are passing now! |
main.nf
Outdated
@@ -61,6 +61,9 @@ params.vep_genome = getGenomeAttribute('vep_genome') | |||
params.vep_species = getGenomeAttribute('vep_species') | |||
|
|||
aligner = params.aligner | |||
skip_bbsplit = params.skip_bbsplit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to stay aligned with the rest of the pipeline, can this be part of skip_tools
please. This sounds like bbsplit is always run. Given how unstable it is I would prefer it to be added to the tools that can be selected instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was introduced so bbsplit is not run by default, i.e. users have to specify --skip_bbsplit false
. I guess we should have named this flag --run_bbsplit
instead?
Ofc we could produce the same logic with skip_tools
as well and specify that if you provide bbsplit
to --skip_tools
it's actually not skipped but that seems counterintuitive.
main.nf
Outdated
bbsplit_fasta_list = params.bbsplit_fasta_list | ||
bbsplit_index = params.bbsplit_index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please align these. Will the user need to provide the same fasta twice? ONce in --fasta and once here?
@@ -159,6 +165,8 @@ workflow NFCORE_SAREK { | |||
: PREPARE_GENOME.out.bwamem2 | |||
dragmap = params.dragmap ? Channel.fromPath(params.dragmap).map{ it -> [ [id:'dragmap'], it ] }.collect() | |||
: PREPARE_GENOME.out.hashtable | |||
// get index from bbsplit | |||
bbsplit_index = PREPARE_GENOME.out.bbsplit_index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the indices are provided and not newly computed, should they not be assigned here?
@@ -127,6 +164,7 @@ workflow PREPARE_GENOME { | |||
known_indels_tbi = TABIX_KNOWN_INDELS.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: {known_indels*}.vcf.gz.tbi | |||
msisensorpro_scan = MSISENSORPRO_SCAN.out.list.map{ meta, list -> [list] } // path: genome_msi.list | |||
pon_tbi = TABIX_PON.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: pon.vcf.gz.tbi | |||
bbsplit_index = params.skip_bbsplit ? Channel.empty() : ch_bbsplit_index // Conditional emission // channel: path(bbsplit/index/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should really be handled in the main workflow by only running prepare genomes if necessary
Thanks @FriederikeHanssen I'll try and get to this this week! |
} else { | ||
Channel | ||
.from(file(bbsplit_fasta_list)) | ||
.splitCsv() // Read in 2 column csv file: short_name,path_to_fasta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use the samplesheet validation from nf-schema here? It comes with a bunch of neat utility functions and we have a schema to validate the file before the workflow starts
*/ | ||
|
||
// Check if file with list of fastas is provided when running BBSplit | ||
if (!params.skip_bbsplit && !params.bbsplit_index && params.bbsplit_fasta_list) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you use nf-schema, we can outsource all of this to the plugin as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this!!
I added some comments in the code. In addition, before we can approve, can you please add:
- The tool to the readme overview
- Usage docs, if anything particular is to be said about that
- Metro map
- Overview map
- All new outputs should be described in the output.md
- workflow level nf-test with snapshots
Great, thanks for the helpful review! I'll try and get to most of those points this or next week.
Regarding the tests how do we approach that? is that just rerunning |
Please add a new test case. We can then trigger it only when relevant files fo bbsplit have changed + bbsplit is not part of the default execution. |
Co-authored-by: Friederike Hanssen <friederike.hanssen@seqera.io>
Addition of bbsplit for filtering of genomic contaminants. Current issues:
PR checklist
nf-core pipelines lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).