8000 New bash files for Autometa workflow by samche42 · Pull Request #281 · KwanLab/Autometa · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

New bash files for Autometa workflow #281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Aug 5, 2022
Merged

New bash files for Autometa workflow #281

merged 14 commits into from
Aug 5, 2022

Conversation

samche42
Copy link
Collaborator

🐚 Add feature: New batch file scripts for documentation

One for when spades was used for the assembly, one where coverage is calculated using bwa/kart/samtools and one that uses the built in autometa coverages option with paired-end data. The spades one has been tested and ran successfully on the KwanLab server. A second test is ongoing on the agrp server in SA with a different, much larger dataset.

Note: For autometa_bwa_kart.sh to work, bwa kart and samtools will need to be added to the autometa conda env set up.

If you are happy with the scripts, I can create a QuickStart guide much as I did for the nextflow workflow.

PR checklist

  • [ x] This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • [ x] Have you followed the pipeline conventions in the contribution docs

@github-actions
Copy link
github-actions bot commented Jun 29, 2022

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit f5ad32f

+| ✅  62 tests passed       |+
#| ❔  34 tests were ignored |#
!| ❗   9 tests had warnings |!

❗ Test warnings:

  • readme - README did not have a Nextflow minimum version badge.
  • readme - README did not have a Nextflow minimum version mentioned in Quick Start section.
  • schema_lint - Schema $id should be https://raw.githubusercontent.com/autometa/master/nextflow_schema.json
    Found https://raw.githubusercontent.com/autometa/main/nextflow_schema.json
  • schema_description - No description provided in schema for parameter: plaintext_email
  • schema_description - No description provided in schema for parameter: custom_config_version
  • schema_description - No description provided in schema for parameter: custom_config_base
  • schema_description - No description provided in schema for parameter: hostnames
  • schema_description - No description provided in schema for parameter: show_hidden_params
  • schema_description - No description provided in schema for parameter: singularity_pull_docker_container

❔ Tests ignored:

  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/feature_request.yml
  • files_exist - File is ignored: .github/workflows/branch.yml
  • files_exist - File is ignored: .github/workflows/ci.yml
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • files_exist - File is ignored: assets/nf-core-autometa_logo_light.png
  • files_exist - File is ignored: docs/usage.md
  • files_exist - File is ignored: docs/output.md
  • files_exist - File is ignored: docs/images/nf-core-autometa_logo.png
  • files_exist - File is ignored: docs/images/nf-core-autometa_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-autometa_logo_dark.png
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/bug_report.md
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/feature_request.md
  • nextflow_config - nextflow_config
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File does not exist: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File does not exist: .github/ISSUE_TEMPLATE/feature_request.yml
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File does not exist: .github/workflows/branch.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting_comment.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
  • files_unchanged - File ignored due to lint config: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File does not exist: assets/nf-core-autometa_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-autometa_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-autometa_logo_dark.png
  • files_unchanged - File ignored due to lint config: docs/README.md
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy
  • files_unchanged - File ignored due to lint config: .gitignore or foo
  • actions_ci - '.github/workflows/ci.yml' not found
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/Autometa/Autometa/.github/workflows/awstest.yml
  • template_strings - template_strings

✅ Tests passed:

Run details

  • nf-core/tools version 2.2
  • Run at 2022-08-05 14:24:28

Copy link
Collaborator
@evanroyrees evanroyrees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In each workflow's current state, required flags are not required and the script attempts to run each command even when no flags are supplied. There are also some commands that may cause unintended side-effects that should be removed.

Each of these workflows is mostly copy/paste with the exception of the coverage table generation sub-workflow. Do you think it would be more appropriate to mention coverage in the filenames? I'm not certain whether this is necessary, but curious to hear your thoughts.

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>
8000
samche42 added 4 commits July 13, 2022 11:08
Added features:
 - User can choose between calculating coverage from spades headers or from builtin paired-end coverage calcs.
 - Checks that all required flags are present
 - Checks that correct conda env is activated
  -Confirms that ncbi files can be found. If not provides command to download files but does not actually download the db for the user. 
 - Camel case has also been converted to snake case throughout.
 - Various steps are also now sent to the std out file to track and clarify where things may go awry for a user. 
 - Fiddling with automata config files removed

Example usage:

For Spades coverage:
sbatch autometa_flagged.sh -o /home/sam/test -a /home/sam/test/10_scaffolds.fasta -s 10_Spades -c 16 -n /media/bigdrive1/Databases/autometa_databases -m /home/sam/Tools/Autometa/autometa/databases/markers -l 1000 -v spades

For paired-end cov:
sbatch autometa_flagged.sh -o /home/sam/test -a /home/sam/test/10_scaffolds.fasta -s 10_Paired_end -c 16 -n /media/bigdrive1/Databases/autometa_databases -m /home/sam/Tools/Autometa/autometa/databases/markers -l 1000 -v paired -f /home/sam/test/10.1.fastq.gz -r /home/sam/test/10.2.fastq.gz

Tests run to confirm errors pop up as expected:
#Return error that incorrect option provided
bash autometa_flagged.sh -o /home/sam/test -a /home/sam/test/10_scaffolds.fasta -s 10_Paired_end -c 16 -n /media/bigdrive1/Databases/autometa_databases -m /home/sam/Tools/Autometa/autometa/databases/markers -l 1000 -v Paired -f /home/sam/test/10.1.fastq.gz -r /home/sam/test/10.2.fastq.gz

#Return error that fwd reads not provided
bash autometa_flagged.sh -o /home/sam/test -a /home/sam/test/10_scaffolds.fasta -s 10_Paired_end -c 16 -n /media/bigdrive1/Databases/autometa_databases -m /home/sam/Tools/Autometa/autometa/databases/markers -l 1000 -v paired


#Return error that rev reads not provided
bash autometa_flagged.sh -o /home/sam/test -a /home/sam/test/10_scaffolds.fasta -s 10_Paired_end -c 16 -n /media/bigdrive1/Databases/autometa_databases -m /home/sam/Tools/Autometa/autometa/databases/markers -l 1000 -v paired -f /home/sam/test/10.1.fastq.gz

#Return error the fwd read file doesn't exist
bash autometa_flagged.sh -o /home/sam/test -a /home/sam/test/10_scaffolds.fasta -s 10_Paired_end -c 16 -n /media/bigdrive1/Databases/autometa_databases -m /home/sam/Tools/Autometa/autometa/databases/markers -l 1000 -v paired -f /home/sam/test/10fastq.gz -r /home/sam/test/10.2.fastq.gz

#Return error the rev read file doesn't exist
bash autometa_flagged.sh -o /home/sam/test -a /home/sam/test/10_scaffolds.fasta -s 10_Paired_end -c 16 -n /media/bigdrive1/Databases/autometa_databases -m /home/sam/Tools/Autometa/autometa/databases/markers -l 1000 -v paired -f /home/sam/test/10.1.fastq.gz -r /home/sam/test/10.r.fastq.gz

#When in conda base env, return that env is incorrect
bash autometa_flagged.sh -o /home/sam/test -a /home/sam/test/10_scaffolds.fasta -s 10_Paired_end -c 16 -n /media/bigdrive1/Databases/autometa_databases -m /home/sam/Tools/Autometa/autometa/databases/markers -l 1000 -v paired -f /home/sam/test/10.1.fastq.gz -r /home/sam/test/10.2.fastq.gz

All tests were repeated for spades coverage calc version. 

Similarly, the script was run using metagenomic sequence data from a Lagria beetle and successfully generated bins in both "spades" and "paired" mode.
Copy link
Collaborator Author
@samche42 samche42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added features:

  • User can choose between calculating coverage from spades headers or from builtin paired-end coverage calcs.
  • Checks that all required flags are present
  • Checks that correct conda env is activated
    -Confirms that ncbi diamond-formatted nr.dmnd file can be found. If not provides command to download files but does not actually download the db for the user.
  • Camel case has also been converted to snake case throughout.
  • Various steps are also now sent to the std out file to track and clarify where things may go awry for a user.
  • Fiddling with automata config files removed

Example usage:

For Spades coverage:
sbatch autometa_flagged.sh -o /home/sam/test -a /home/sam/test/10_scaffolds.fasta -s 10_Spades -c 16 -n /media/bigdrive1/Databases/autometa_databases -m /home/sam/Tools/Autometa/autometa/databases/markers -l 1000 -v spades

For paired-end cov:
sbatch autometa_flagged.sh -o /home/sam/test -a /home/sam/test/10_scaffolds.fasta -s 10_Paired_end -c 16 -n /media/bigdrive1/Databases/autometa_databases -m /home/sam/Tools/Autometa/autometa/databases/markers -l 1000 -v paired -f /home/sam/test/10.1.fastq.gz -r /home/sam/test/10.2.fastq.gz

Tests run to confirm errors pop up as expected:
#Return error that incorrect option provided
bash autometa_flagged.sh -o /home/sam/test -a /home/sam/test/10_scaffolds.fasta -s 10_Paired_end -c 16 -n /media/bigdrive1/Databases/autometa_databases -m /home/sam/Tools/Autometa/autometa/databases/markers -l 1000 -v Paired -f /home/sam/test/10.1.fastq.gz -r /home/sam/test/10.2.fastq.gz

#Return error that fwd reads not provided
bash autometa_flagged.sh -o /home/sam/test -a /home/sam/test/10_scaffolds.fasta -s 10_Paired_end -c 16 -n /media/bigdrive1/Databases/autometa_databases -m /home/sam/Tools/Autometa/autometa/databases/markers -l 1000 -v paired

#Return error that rev reads not provided
bash autometa_flagged.sh -o /home/sam/test -a /home/sam/test/10_scaffolds.fasta -s 10_Paired_end -c 16 -n /media/bigdrive1/Databases/autometa_databases -m /home/sam/Tools/Autometa/autometa/databases/markers -l 1000 -v paired -f /home/sam/test/10.1.fastq.gz

#Return error the fwd read file doesn't exist
bash autometa_flagged.sh -o /home/sam/test -a /home/sam/test/10_scaffolds.fasta -s 10_Paired_end -c 16 -n /media/bigdrive1/Databases/autometa_databases -m /home/sam/Tools/Autometa/autometa/databases/markers -l 1000 -v paired -f /home/sam/test/10fastq.gz -r /home/sam/test/10.2.fastq.gz

#Return error the rev read file doesn't exist
bash autometa_flagged.sh -o /home/sam/test -a /home/sam/test/10_scaffolds.fasta -s 10_Paired_end -c 16 -n /media/bigdrive1/Databases/autometa_databases -m /home/sam/Tools/Autometa/autometa/databases/markers -l 1000 -v paired -f /home/sam/test/10.1.fastq.gz -r /home/sam/test/10.r.fastq.gz

#When in conda base env, return that env is incorrect
bash autometa_flagged.sh -o /home/sam/test -a /home/sam/test/10_scaffolds.fasta -s 10_Paired_end -c 16 -n /media/bigdrive1/Databases/autometa_databases -m /home/sam/Tools/Autometa/autometa/databases/markers -l 1000 -v paired -f /home/sam/test/10.1.fastq.gz -r /home/sam/test/10.2.fastq.gz

All tests were repeated for spades coverage calc version.

Similarly, the script was run using metagenomic sequence data from a Lagria beetle and successfully generated bins in both "spades" and "paired" mode.

Copy link
Collaborator
@evanroyrees evanroyrees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few minor nitpicks and questions. This is looking great so far! 👷‍♀️ 👷‍♂️ 🛠️ 👍

Comment on lines +133 to +138
if [ -f "$ncbi/nr.dmnd" ]; then
echo "NCBI files are present where specified, moving on..."
else
echo "NCBI files not found where specified. Please run autometa-update-databases --update-ncbi to download"
exit
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodes.dmp, names.dmp, merged.dmp, delnodes.dmp (from taxdump.tar.gz) as well as prot.accession2taxid.gz are also used but not checked here. Should the echo statement only mention nr.dmnd or should this also check whether these other files exist?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually didn't know I needed to check for those. It was just a quick way to check if the diamond database was there, like a quick diagnostic - not meant to be a comprehensive check. Can add them in if you think necessary though?

Co-authored-by: Evan Rees <25933122+WiscEvan@users.noreply.github.com>
Copy link
Collaborator Author
@samche42 samche42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's everything you requested for change covered? Let me know if I missed anything :)

@evanroyrees evanroyrees mentioned this pull request Aug 1, 2022
3 tasks
Copy link
Collaborator
@evanroyrees evanroyrees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor nitpicks regarding variables and help text. I encountered one bug and some code was a bit out of order. I think most is covered with the code suggestions. I can try to get around to following up on the suggested edits. I think there are a few more entrypoints that may be echoed using the set -x then { set +x; } 2>/dev/null syntax. Examples are available in the suggestions. This is looking great!

echo "-n Path to NCBI databases directory"
echo "-m Path to marker files directory"
echo "-l Contig length cutoff in base pairs. (NOTE: Change this according to the metagenome assembly's N50)"
echo "-v Coverage calculation. Options are 'spades' or 'paired'. If you assembled your metagenome with spades, choose the spades option. If you used a different assembler, please use the 'paired' option for the coverage calculation. Paired reads used for assembly are required as input"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this help text be broken up across multiple lines or made more concise?

- 🐛 Fix kmers check prior to `autometa-binning`
- 📝🐚 echo autometa entrypoints using `set -x` and `{ set +x; } 2>/dev/null` syntax
- 🔥 Remove ORFs generation check
- prepend `set -e` to cause immediate exit of the workflow for non-zero exit code
- 📝🐚 Update FileNotFound errors to also emit the corresponding parameter
- 📝 Replace `densne` under choices for `embed_method` with `densmap`
- 🎨🐚 Move inputs echo to follow parameter checks
📝 Clarify parameter flag when user does not provide correct input
Copy link
Collaborator
@evanroyrees evanroyrees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. After merge, this will need to be updated in PR #284

@evanroyrees evanroyrees merged commit ef08847 into dev Aug 5, 2022
@evanroyrees evanroyrees deleted the new_bash_scripts branch August 5, 2022 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0