-
Notifications
You must be signed in to change notification settings - Fork 859
Add new module called rundbcan and subcommands(database, cazymeannotation, easycgc, easysubstrate) #8216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I will say in future can you please make individual PRs for subtools, it will be easier and faster for you to get them reviewed one at a time than all together. |
I'm really sorry for the inconvenience. I'll split the tools next time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could each of the tools (except the database generation) create the files directly in the work directory, instead of inside a folder? You would then want to rename them with the prefix after creation, because it looks like it makes the files with identical filenames, but it'll mean that the output folders wouldn't be nested in the same way.
I'm sorry, but the run_dbcan tool won't give the output files individually; the entire output folder contains all outputs. This is an original design from a previous developer, and I kept the same idea from them. |
New rundbcan
fix github commit issue
@SPPearce Dear Simon, I've revised all issues based on your comments and passed all tests/lint. Please check it when you are available. Thank you so much for your help and time! Best Regards, |
This means that the files are not nested when they are output by the module. |
(also I accidentally closed the branch, that wasn't intentional) |
Thank you for your comment! I'm working on revising all commands following your suggestions. Best Regards, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at what I posted, this is what I'm suggesting. You make the files directly in the work directory, and then rename them so that they have prefix. Not into a nested subfolder, prefix/prefix_...
Hi Simon, sorry for the misunderstanding! Due to some past run_dbcan habits (adding a separate output_dir instead of "."), I didn't fully understand what you meant. I'm very sorry for that, and I will modify it according to the latest suggestion. Best Regards, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added detailed comments on cazymeannotation
.
If all the output files are consistent then please use the test assertion that I've given, that checks that everything is exactly the same rather than just the files exist.
It would also be good to add ontologies in the meta.yml files, as detailed here.
You can try using nf-core modules lint --fix
to add the structure for them if you are on v3.3.0.
tuple val(meta), path("${prefix}_dbCAN_hmm_results.tsv") , emit: dbcanhmm_results | ||
tuple val(meta), path("${prefix}_dbCANsub_hmm_results.tsv"), emit: dbcansub_results | ||
tuple val(meta), path("${prefix}_diamond.out") , emit: dbcandiamond_results | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def VERSION = '5.0.4' | ||
|
||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SPPearce Hi Simon. I've revised all scripts based on your comments. I have some question for the database and easysubstrate module:
Thank you for your time and help! |
@SPPearce Hi Simon, could you please review the current package when you are available? Many thanks for considering my request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gone through the first 3 subtools.
Almost there, mostly just whitespace and alignment.
If you can remove the single_end
part from the meta map details, that isn't relevant here (or basically anywhere).
modules/nf-core/rundbcan/cazymeannotation/tests/main.nf.test.snap
Outdated
Show resolved
Hide resolved
Co-authored-by: Simon Pearce <24893913+SPPearce@users.noreply.github.com>
…nd from meta files
@SPPearce Hi Simon, thank you for your suggestions and help! Please review the current version and hope it looks good to you :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost there, just formatting I think.
You seem to have resolved a number of my suggested changes, without actually doing them.
We don't want lots of empty lines, which is what I'm trying to remove.
prefix = task.ext.prefix ?: "${meta.id}_dbcan_substrate" | ||
|
||
""" | ||
echo "CM000172.1|CGC122|EAL84470.1|TC|2.A.3 PUL0296_1:PUL0296::APC1503_1956:PKC87186.1:TC:gnl|TC-DB|P15993|2.A.3.1.3 30.8 439 256 8 8 411 38 463 9.08e-56 192 492 491" > PUL_blast.out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this here for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to the problem of the dbcan program itself, the next step file will not be generated when the result of the previous step is empty, and an error will be reported here. Therefore, a simulation file is generated to avoid program errors. The file will be overwritten. If an empty file is generated directly, it will fail the lint check due to md5 problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I still don't understand, how is this PUL_blast.out
file being used? It isn't referenced in the command in any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the unclear description. Generally, PUL_blast.out
is an intermediate file, which is a file generated as an intermediate process of subsequent analysis. The general process of substrate prediction here is: CAZyme annotation->CGC annotation->apply CGC annotation result to do blast search againist database->substrate annotation.
Not all input files can have PUL_blast.out
results, which is related to the actual situation. But in the dbcan code (this part is not written by me, so it is difficult to change), once blast has no results, PUL_blast.out will not be generated, which will cause the subsequent program to interrupt, which will cause the test and lint of nf-core to fail, so I pre-generate a fake result to avoid this problem, and when it is actually run, the fake result will be overwritten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
So does that happen with the test cases that you have here? If you remove that line and run the test, does it fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if I remove it, the test and lint will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. I wonder if instead we should catch the error gracefully.
Does this happen in general use too, or is it because the test data is not really suitable for the tool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only for the test data. For normal usage, it would not be an issue except for some data that is similar to the test data. To avoid the potential rare issue, I added this mock file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why is it created in the main section of the module script? I would expect a mock file to be put in the test_datasets or be created in the nf.test file but not in the real module :)
@SPPearce Hi Simon, thank you again for your suggestions and help! I've fixed the issues you mentioned, and please review it when you are available :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good already just my two cents on removing some empty lines and a few minor things :)
|
||
input: | ||
tuple val(meta), path(input_raw_data) | ||
path dbcan_db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path dbcan_db | |
path dbcan_db |
tuple val(meta), path("${prefix}_overview.tsv") , emit: cazyme_annotation | ||
tuple val(meta), path("${prefix}_dbCAN_hmm_results.tsv") , emit: dbcanhmm_results | ||
tuple val(meta), path("${prefix}_dbCANsub_hmm_results.tsv") , emit: dbcansub_results | ||
tuple val(meta), path("${prefix}_diamond.out") , emit: dbcandiamond_results | ||
path "versions.yml" , emit: versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tuple val(meta), path("${prefix}_overview.tsv") , emit: cazyme_annotation | |
tuple val(meta), path("${prefix}_dbCAN_hmm_results.tsv") , emit: dbcanhmm_results | |
tuple val(meta), path("${prefix}_dbCANsub_hmm_results.tsv") , emit: dbcansub_results | |
tuple val(meta), path("${prefix}_diamond.out") , emit: dbcandiamond_results | |
path "versions.yml" , emit: versions | |
tuple val(meta), path("${prefix}_overview.tsv") , emit: cazyme_annotation | |
tuple val(meta), path("${prefix}_dbCAN_hmm_results.tsv") , emit: dbcanhmm_results | |
tuple val(meta), path("${prefix}_dbCANsub_hmm_results.tsv"), emit: dbcansub_results | |
tuple val(meta), path("${prefix}_diamond.out") , emit: dbcandiamond_results | |
path "versions.yml" , emit: versions |
script: | ||
def args = task.ext.args ?: '' | ||
prefix = task.ext.prefix ?: "${meta.id}" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type: file | ||
description: | | ||
TSV file containing the results of dbCAN CAZyme annotation. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type: file | ||
description: | | ||
TSV file containing the detailed dbCAN HMM results for CAZyme annotation. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type: file | ||
description: | | ||
TSV file containing the results of signaling transduction proteins (STP) annotation. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type: file | ||
description: | | ||
TSV file summarizing the total additional genes in the genome. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type: directory | ||
description: | | ||
Directory containing the synteny plots in PDF format for the CAZyme gene clusters (CGC) identified by dbCAN. This directory will contain one or more PDF files showing the syntenic regions of the CGC in the genome. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"13": [ | ||
[ | ||
{ | ||
"id": "stub" | ||
}, | ||
[ | ||
|
||
] | ||
] | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is one file that is not being created!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would implement this in a way where the database name can also be defined with ext.prefix!
PR checklist
Closes #6047
versions.yml
file.label
nf-core modules test <MODULE> --profile docker
nf-core modules test <MODULE> --profile singularity
nf-core modules test <MODULE> --profile conda
nf-core subworkflows test <SUBWORKFLOW> --profile docker
nf-core subworkflows test <SUBWORKFLOW> --profile singularity
nf-core subworkflows test <SUBWORKFLOW> --profile conda