8000 Addition of bbsplit for filtering of genomic contaminants by apsteinberg · Pull Request #1850 · nf-core/sarek · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Addition of bbsplit for filtering of genomic contaminants #1850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 51 commits into
base: dev
Choose a base branch
from

Conversation

apsteinberg
Copy link
@apsteinberg apsteinberg commented Mar 26, 2025

Addition of bbsplit for filtering of genomic contaminants. Current issues:

  • need help with nextflow_schema.json
  • fastq files get split up and it results in empty files after the FASTP. This creates an issue for bbsplit.
  • needs linting and documentation

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available 8000 .

Your pipeline is using an old version of the nf-core template: 3.0.2.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@tobsecret
Copy link

Tests are passing now!

main.nf Outdated
@@ -61,6 +61,9 @@ params.vep_genome = getGenomeAttribute('vep_genome')
params.vep_species = getGenomeAttribute('vep_species')

aligner = params.aligner
skip_bbsplit = params.skip_bbsplit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to stay aligned with the rest of the pipeline, can this be part of skip_tools please. This sounds like bbsplit is always run. Given how unstable it is I would prefer it to be added to the tools that can be selected instead

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was introduced so bbsplit is not run by default, i.e. users have to specify --skip_bbsplit false. I guess we should have named this flag --run_bbsplit instead?
Ofc we could produce the same logic with skip_tools as well and specify that if you provide bbsplit to --skip_tools it's actually not skipped but that seems counterintuitive.

main.nf Outdated
Comment on lines 65 to 66
bbsplit_fasta_list = params.bbsplit_fasta_list
bbsplit_index = params.bbsplit_index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please align these. Will the user need to provide the same fasta twice? ONce in --fasta and once here?

@@ -159,6 +165,8 @@ workflow NFCORE_SAREK {
: PREPARE_GENOME.out.bwamem2
dragmap = params.dragmap ? Channel.fromPath(params.dragmap).map{ it -> [ [id:'dragmap'], it ] }.collect()
: PREPARE_GENOME.out.hashtable
// get index from bbsplit
bbsplit_index = PREPARE_GENOME.out.bbsplit_index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the indices are provided and not newly computed, should they not be assigned here?

@@ -127,6 +164,7 @@ workflow PREPARE_GENOME {
known_indels_tbi = TABIX_KNOWN_INDELS.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: {known_indels*}.vcf.gz.tbi
msisensorpro_scan = MSISENSORPRO_SCAN.out.list.map{ meta, list -> [list] } // path: genome_msi.list
pon_tbi = TABIX_PON.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: pon.vcf.gz.tbi
bbsplit_index = params.skip_bbsplit ? Channel.empty() : ch_bbsplit_index // Conditional emission // channel: path(bbsplit/index/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really be handled in the main workflow by only running prepare genomes if necessary

@tobsecret
Copy link

Thanks @FriederikeHanssen I'll try and get to this this week!

} else {
Channel
.from(file(bbsplit_fasta_list))
.splitCsv() // Read in 2 column csv file: short_name,path_to_fasta
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use the samplesheet validation from nf-schema here? It comes with a bunch of neat utility functions and we have a schema to validate the file before the workflow starts

*/

// Check if file with list of fastas is provided when running BBSplit
if (!params.skip_bbsplit && !params.bbsplit_index && params.bbsplit_fasta_list) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you use nf-schema, we can outsource all of this to the plugin as well

Copy link
Contributor
@FriederikeHanssen FriederikeHanssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this!!

I added some comments in the code. In addition, before we can approve, can you please add:

  • The tool to the readme overview
  • Usage docs, if anything particular is to be said about that
  • Metro map
  • Overview map
  • All new outputs should be described in the output.md
  • workflow level nf-test with snapshots

@tobsecret
Copy link

Great, thanks for the helpful review! I'll try and get to most of those points this or next week.

workflow level nf-test with snapshots

Regarding the tests how do we approach that? is that just rerunning nf-test test tests/main.nf.test --update-snapshot or do we have to actually write new test cases?

@FriederikeHanssen
Copy link
Contributor

is that just rerunning nf-test test tests/main.nf.test --update-snapshot or do we have to actually write new test cases?

Please add a new test case. We can then trigger it only when relevant files fo bbsplit have changed + bbsplit is not part of the default execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0