Add new module called rundbcan and subcommands(database, cazymeannotation, easycgc, easysubstrate) #8216

Xinpeng021001 · 2025-04-04T21:53:52Z

PR checklist

modules/nf-core/rundbcan/database/main.nf

SPPearce · 2025-04-07T08:13:03Z

I will say in future can you please make individual PRs for subtools, it will be easier and faster for you to get them reviewed one at a time than all together.

Xinpeng021001 · 2025-04-07T20:46:14Z

I will say in future can you please make individual PRs for subtools, it will be easier and faster for you to get them reviewed one at a time than all together.

I'm really sorry for the inconvenience. I'll split the tools next time.

modules/nf-core/rundbcan/easycgc/main.nf

modules/nf-core/rundbcan/easysubstrate/main.nf

SPPearce

Could each of the tools (except the database generation) create the files directly in the work directory, instead of inside a folder? You would then want to rename them with the prefix after creation, because it looks like it makes the files with identical filenames, but it'll mean that the output folders wouldn't be nested in the same way.

modules/nf-core/rundbcan/cazymeannotation/main.nf

Xinpeng021001 · 2025-04-10T03:02:06Z

Could each of the tools (except the database generation) create the files directly in the work directory, instead of inside a folder? You would then want to rename them with the prefix after creation, because it looks like it makes the files with identical filenames, but it'll mean that the output folders wouldn't be nested in the same way.

I'm sorry, but the run_dbcan tool won't give the output files individually; the entire output folder contains all outputs. This is an original design from a previous developer, and I kept the same idea from them.

New rundbcan

fix github commit issue

Xinpeng021001 · 2025-04-21T01:52:57Z

@SPPearce Dear Simon, I've revised all issues based on your comments and passed all tests/lint. Please check it when you are available. Thank you so much for your help and time!

Best Regards,
Xinpeng

SPPearce · 2025-04-21T10:33:51Z

This means that the files are not nested when they are output by the module.
You should also change the nf-test, to actually capture the snapshot to have { assert snapshot(process.out).match() }, to actually test the files are the same.
If I could push to your branch I would have just updated the files directly.

SPPearce · 2025-04-21T10:34:13Z

(also I accidentally closed the branch, that wasn't intentional)

Xinpeng021001 · 2025-04-21T13:29:25Z

This means that the files are not nested when they are output by the module. You should also change the nf-test, to actually capture the snapshot to have { assert snapshot(process.out).match() }, to actually test the files are the same. If I could push to your branch I would have just updated the files directly.

Thank you for your comment! I'm working on revising all commands following your suggestions.

Best Regards,
Xinpeng

…erlapping

SPPearce

If you look at what I posted, this is what I'm suggesting. You make the files directly in the work directory, and then rename them so that they have prefix. Not into a nested subfolder, prefix/prefix_...

modules/nf-core/rundbcan/cazymeannotation/main.nf

Xinpeng021001 · 2025-04-21T19:08:13Z

If you look at what I posted, this is what I'm suggesting. You make the files directly in the work directory, and then rename them so that they have prefix. Not into a nested subfolder, prefix/prefix_...

Hi Simon, sorry for the misunderstanding! Due to some past run_dbcan habits (adding a separate output_dir instead of "."), I didn't fully understand what you meant. I'm very sorry for that, and I will modify it according to the latest suggestion.

Best Regards,
Xinpeng

…issues)

SPPearce

I've added detailed comments on cazymeannotation.
If all the output files are consistent then please use the test assertion that I've given, that checks that everything is exactly the same rather than just the files exist.

It would also be good to add ontologies in the meta.yml files, as detailed here.
You can try using nf-core modules lint --fix to add the structure for them if you are on v3.3.0.

modules/nf-core/rundbcan/cazymeannotation/main.nf

SPPearce · 2025-04-22T06:09:16Z

modules/nf-core/rundbcan/cazymeannotation/main.nf

+    tuple val(meta), path("${prefix}_dbCAN_hmm_results.tsv")   , emit: dbcanhmm_results
+    tuple val(meta), path("${prefix}_dbCANsub_hmm_results.tsv"), emit: dbcansub_results
+    tuple val(meta), path("${prefix}_diamond.out")             , emit: dbcandiamond_results
+


Suggested change

SPPearce · 2025-04-22T06:09:28Z

modules/nf-core/rundbcan/cazymeannotation/main.nf

+    def VERSION = '5.0.4'
+
+    """
+


Suggested change

modules/nf-core/rundbcan/cazymeannotation/main.nf

modules/nf-core/rundbcan/cazymeannotation/meta.yml

modules/nf-core/rundbcan/cazymeannotation/tests/main.nf.test

modules/nf-core/rundbcan/cazymeannotation/meta.yml

Xinpeng021001 · 2025-04-30T01:38:13Z

@SPPearce Hi Simon. I've revised all scripts based on your comments. I have some question for the database and easysubstrate module:

For the database, the action said during the nf-core lint, the snapshot file lacks version information, however, it did show in it.
For the easysubstrate, I create a mock file to pass the program, which may cause the empty md5 issue. However, it is needed to avoid issue because not all input file could generate that essential file.
Also when I run nf-core lint on my own sever, sometimes it said the "container" cannot be found (Unable to connect to container URL), but it could pass the CI test here.

Thank you for your time and help!

Xinpeng021001 · 2025-05-15T20:09:21Z

@SPPearce Hi Simon, could you please review the current package when you are available? Many thanks for considering my request.

SPPearce

Gone through the first 3 subtools.
Almost there, mostly just whitespace and alignment.
If you can remove the single_end part from the meta map details, that isn't relevant here (or basically anywhere).

modules/nf-core/rundbcan/cazymeannotation/tests/main.nf.test.snap

modules/nf-core/rundbcan/cazymeannotation/tests/main.nf.test

modules/nf-core/rundbcan/database/main.nf

modules/nf-core/rundbcan/easycgc/main.nf

modules/nf-core/rundbcan/easycgc/meta.yml

Co-authored-by: Simon Pearce <24893913+SPPearce@users.noreply.github.com>

…nd from meta files

…into rundbcan

Xinpeng021001 · 2025-05-16T20:10:56Z

@SPPearce Hi Simon, thank you for your suggestions and help! Please review the current version and hope it looks good to you :)

SPPearce

This is almost there, just formatting I think.
You seem to have resolved a number of my suggested changes, without actually doing them.
We don't want lots of empty lines, which is what I'm trying to remove.

modules/nf-core/rundbcan/cazymeannotation/main.nf

modules/nf-core/rundbcan/cazymeannotation/tests/main.nf.test

modules/nf-core/rundbcan/easysubstrate/main.nf

SPPearce · 2025-05-18T06:22:08Z

modules/nf-core/rundbcan/easysubstrate/main.nf

+    prefix = task.ext.prefix ?: "${meta.id}_dbcan_substrate"
+
+    """
+    echo "CM000172.1|CGC122|EAL84470.1|TC|2.A.3	PUL0296_1:PUL0296::APC1503_1956:PKC87186.1:TC:gnl|TC-DB|P15993|2.A.3.1.3	30.8	439	256	8	8	411	38	463	9.08e-56	192	492	491" > PUL_blast.out


What is this here for?

Due to the problem of the dbcan program itself, the next step file will not be generated when the result of the previous step is empty, and an error will be reported here. Therefore, a simulation file is generated to avoid program errors. The file will be overwritten. If an empty file is generated directly, it will fail the lint check due to md5 problems.

Sorry, I still don't understand, how is this PUL_blast.out file being used? It isn't referenced in the command in any way.

Sorry for the unclear description. Generally, PUL_blast.out is an intermediate file, which is a file generated as an intermediate process of subsequent analysis. The general process of substrate prediction here is: CAZyme annotation->CGC annotation->apply CGC annotation result to do blast search againist database->substrate annotation.

Not all input files can have PUL_blast.out results, which is related to the actual situation. But in the dbcan code (this part is not written by me, so it is difficult to change), once blast has no results, PUL_blast.out will not be generated, which will cause the subsequent program to interrupt, which will cause the test and lint of nf-core to fail, so I pre-generate a fake result to avoid this problem, and when it is actually run, the fake result will be overwritten.

Right.
So does that happen with the test cases that you have here? If you remove that line and run the test, does it fail?

Yes, if I remove it, the test and lint will fail.

Hmmm. I wonder if instead we should catch the error gracefully.
Does this happen in general use too, or is it because the test data is not really suitable for the tool?

It's only for the test data. For normal usage, it would not be an issue except for some data that is similar to the test data. To avoid the potential rare issue, I added this mock file.

But why is it created in the main section of the module script? I would expect a mock file to be put in the test_datasets or be created in the nf.test file but not in the real module :)

modules/nf-core/rundbcan/easysubstrate/main.nf

modules/nf-core/rundbcan/cazymeannotation/main.nf

Xinpeng021001 · 2025-05-20T17:17:26Z

@SPPearce Hi Simon, thank you again for your suggestions and help! I've fixed the issues you mentioned, and please review it when you are available :).

famosab

Looks good already just my two cents on removing some empty lines and a few minor things :)

famosab · 2025-05-23T08:30:54Z

modules/nf-core/rundbcan/cazymeannotation/main.nf

+
+    input:
+    tuple val(meta), path(input_raw_data)
+    path  dbcan_db


Suggested change

path dbcan_db

path dbcan_db

famosab · 2025-05-23T08:31:28Z

modules/nf-core/rundbcan/cazymeannotation/main.nf

+    tuple val(meta), path("${prefix}_overview.tsv")                 , emit: cazyme_annotation
+    tuple val(meta), path("${prefix}_dbCAN_hmm_results.tsv")        , emit: dbcanhmm_results
+    tuple val(meta), path("${prefix}_dbCANsub_hmm_results.tsv")     , emit: dbcansub_results
+    tuple val(meta), path("${prefix}_diamond.out")                  , emit: dbcandiamond_results
+    path  "versions.yml"                                            , emit: versions


Suggested change

tuple val(meta), path("${prefix}_overview.tsv") , emit: cazyme_annotation

tuple val(meta), path("${prefix}_dbCAN_hmm_results.tsv") , emit: dbcanhmm_results

tuple val(meta), path("${prefix}_dbCANsub_hmm_results.tsv") , emit: dbcansub_results

tuple val(meta), path("${prefix}_diamond.out") , emit: dbcandiamond_results

path "versions.yml" , emit: versions

tuple val(meta), path("${prefix}_overview.tsv") , emit: cazyme_annotation

tuple val(meta), path("${prefix}_dbCAN_hmm_results.tsv") , emit: dbcanhmm_results

tuple val(meta), path("${prefix}_dbCANsub_hmm_results.tsv"), emit: dbcansub_results

tuple val(meta), path("${prefix}_diamond.out") , emit: dbcandiamond_results

path "versions.yml" , emit: versions

famosab · 2025-05-23T08:31:35Z

modules/nf-core/rundbcan/cazymeannotation/main.nf

+    script:
+    def args = task.ext.args ?: ''
+    prefix = task.ext.prefix ?: "${meta.id}"
+


Suggested change

famosab · 2025-05-23T08:32:09Z

modules/nf-core/rundbcan/cazymeannotation/meta.yml

+          type: file
+          description: |
+            TSV file containing the results of dbCAN CAZyme annotation.
+


Suggested change

famosab · 2025-05-23T08:32:13Z

modules/nf-core/rundbcan/cazymeannotation/meta.yml

+          type: file
+          description: |
+            TSV file containing the detailed dbCAN HMM results for CAZyme annotation.
+


Suggested change

famosab · 2025-05-23T08:38:34Z

modules/nf-core/rundbcan/easysubstrate/meta.yml

+          type: file
+          description: |
+            TSV file containing the results of signaling transduction proteins (STP) annotation.
+


Suggested change

famosab · 2025-05-23T08:38:39Z

modules/nf-core/rundbcan/easysubstrate/meta.yml

+          type: file
+          description: |
+            TSV file summarizing the total additional genes in the genome.
+


Suggested change

famosab · 2025-05-23T08:38:44Z

modules/nf-core/rundbcan/easysubstrate/meta.yml

+          type: directory
+          description: |
+            Directory containing the synteny plots in PDF format for the CAZyme gene clusters (CGC) identified by dbCAN. This directory will contain one or more PDF files showing the syntenic regions of the CGC in the genome.
+


Suggested change

famosab · 2025-05-23T08:39:06Z

modules/nf-core/rundbcan/easysubstrate/tests/main.nf.test.snap

+                "13": [
+                    [
+                        {
+                            "id": "stub"
+                        },
+                        [
+
+                        ]
+                    ]
+                ],


Here is one file that is not being created!

famosab · 2025-05-23T08:39:37Z

modules/nf-core/rundbcan/database/main.nf

I would implement this in a way where the database name can also be defined with ext.prefix!

Xinpeng021001 and others added 4 commits April 4, 2025 16:45

add rundbcan tool and subcommands

7d462a7

add rundbcan tool and subcommands

45b1648

Merge branch 'master' into rundbcan

d5e24cc

fix duplicated package

317146c

Xinpeng021001 closed this Apr 4, 2025

Delete modules/nf-core/rundbcan/rundbcan-nf-modules directory

401d675

Xinpeng021001 reopened this Apr 4, 2025

SPPearce reviewed Apr 6, 2025

View reviewed changes

modules/nf-core/rundbcan/database/main.nf Outdated Show resolved Hide resolved

SPPearce reviewed Apr 6, 2025

View reviewed changes

modules/nf-core/rundbcan/database/main.nf Outdated Show resolved Hide resolved

SPPearce reviewed Apr 8, 2025

View reviewed changes

modules/nf-core/rundbcan/easycgc/main.nf Outdated Show resolved Hide resolved

SPPearce reviewed Apr 8, 2025

View reviewed changes

modules/nf-core/rundbcan/easysubstrate/main.nf Outdated Show resolved Hide resolved

SPPearce reviewed Apr 8, 2025

View reviewed changes

modules/nf-core/rundbcan/cazymeannotation/main.nf Outdated Show resolved Hide resolved

SPPearce reviewed Apr 8, 2025

View reviewed changes

modules/nf-core/rundbcan/cazymeannotation/main.nf Show resolved Hide resolved

SPPearce reviewed Apr 8, 2025

View reviewed changes

modules/nf-core/rundbcan/cazymeannotation/main.nf Outdated Show resolved Hide resolved

Xinpeng021001 and others added 10 commits April 19, 2025 22:07

rewrite all scripts and update the dbcan version

1ce9695

Merge branch 'nf-core:master' into rundbcan

2d2840f

Merge branch 'nf-core:master' into new_rundbcan

14d2123

Delete modules/nf-core/rundbcan directory

0d7da69

Merge pull request #1 from bcb-unl/new_rundbcan

92e3695

New rundbcan

fix github commit issue

0769cdf

Delete modules/nf-core/rundbcan directory

e881f68

Merge pull request #2 from bcb-unl/new_rundbcan

5b9e24e

fix github commit issue

Delete modules/nf-core/rundbcan directory

532262d

fix meta.yml and process with lint

d8a489b

Xinpeng021001 enabled auto-merge April 21, 2025 01:50

revise test nf script; revise main.nf based on suggestion to avoid ov…

1bee314

…erlapping

SPPearce reviewed Apr 21, 2025

View reviewed changes

modules/nf-core/rundbcan/cazymeannotation/main.nf Outdated Show resolved Hide resolved

modules/nf-core/rundbcan/cazymeannotation/main.nf Outdated Show resolved Hide resolved

modules/nf-core/rundbcan/cazymeannotation/main.nf Outdated Show resolved Hide resolved

revise all scripts (change the output path, revise meta.yml, and fix …

e6ae0a0

…issues)

SPPearce reviewed Apr 22, 2025

View reviewed changes

Xinpeng021001 and others added 2 commits April 29, 2025 13:20

revise with all comments; add version function

594bf5c

Merge branch 'master' into rundbcan

9e04afb

Xinpeng021001 and others added 4 commits May 14, 2025 21:04

try to fix lint issue

10000

42f6256

Merge branch 'master' into rundbcan

ce9ad87

revise easysubstrate test

ef6a850

fix version issue

1d6da24

Xinpeng021001 enabled auto-merge May 15, 2025 20:08

SPPearce reviewed May 15, 2025

View reviewed changes

Update modules/nf-core/rundbcan/cazymeannotation/tests/main.nf.test

1b13383

Co-authored-by: Simon Pearce <24893913+SPPearce@users.noreply.github.com>

Xinpeng021001 disabled auto-merge May 15, 2025 21:11

Xinpeng021001 added 2 commits May 15, 2025 16:20

fix whitespace and alignment issues; fix name issues; remove single_e…

2c0f040

…nd from meta files

Merge branch 'rundbcan' of https://github.com/bcb-unl/modules-rundbcan …

fc88e8e

…into rundbcan

Xinpeng021001 enabled auto-merge May 16, 2025 02:06

SPPearce reviewed May 18, 2025

View reviewed changes

fix issues;add version in cazymeannoation(missed before)

beadd8b

Xinpeng021001 disabled auto-merge May 20, 2025 19:46

SPPearce mentioned this pull request May 23, 2025

New pipeline: nf-core/dbcan nf-core/proposals#27

Open

9 tasks

famosab reviewed May 23, 2025

View reviewed changes

Add new module called rundbcan and subcommands(database, cazymeannotation, easycgc, easysubstrate) #8216

Are you sure you want to change the base?

Add new module called rundbcan and subcommands(database, cazymeannotation, easycgc, easysubstrate) #8216

Uh oh!

Conversation

Uh oh!

PR checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!