8000 GitHub - JacquelineAldridge/nanotax: a pipeline for the characterization of microbial communities with Nanopore
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

JacquelineAldridge/nanotax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

catg-umag/nanotax

Nextflow run with conda run with docker run with singularity

Introduction

catg-umag/nanotax is a bioinformatics pipeline for the analysis of 16S rRNA gene sequencing data obtained by Nanopore sequencing. It takes a samplesheet with POD5 or FastQ files and barcodes (optional) and groups (optional) as input and performs basecalling and demultiplexing, quality control (QC), taxonomic assignment with databases, functional prediction and alpha diversity metrics and produces tables and plots with all the results. All the process are optional except taxonomic assignment.

The pipeline then:

  1. Basecalling and demultiplexing with (Dorado).
  2. Read QC (FastQC) and (NanoQ).
  3. Quality and length filter with (NanoQ).
  4. Sampling by quality with (Filtlong)
  5. Present QC for raw reads (MultiQC)
  6. Assigns taxonomy to reads using (MMSeqs2) with (Genbank) or (SILVA) database.
  7. (optionally) alpha diversity metrics with (Vegan) (R).
  8. (optionally) functional prediction with (PICRUSt2) .
  9. (optionally) differential expression for functional prediction with(LEfSe).

All plots and tables are generated using Python, through either (pandas) or (polars).

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

The columns in the sample file will vary depending on the analyses to be performed. Each row represents a sample, specifying either its associated FASTQ file or the barcode used during sequencing.

By default, the pipeline performs quality control and taxonomic assignment. For these tasks, the sample file must include the columns "samples" and "fastq".

sample,fastq
sample_1,sample_1.fastq.gz
sample_2,sample_2.fastq.gz

If you wish to start from the basecalling step, the fastq column should be replaced with barcode , and the directory containing the POD5 files must be specified using the --basecalling.pod5_dir parameter.

sample,barcode
sample_1,barcode01
sample_2,barcode02

To perform diversity analyses or differential expression of metabolic pathways, the sample file must include a groups column.

sample,fastq,group
sample_1,sample_1.fastq.gz,G1
sample_2,sample_2.fastq.gz,G2

Now, you can run the pipeline using:

nextflow run catg-umag/nanotax \
   -profile <docker/apptainer/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR>

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Parameters

General parameters

The following parameters can be modified to enable or disable specific modules:

Parameter Type Description Default
basecalling.run boolean Enable run basecalling and demultiplexing false
qc.run boolean Enable run quality check true
diversity.run boolean Enable run diversity analysis module true if samplesheet has groups
functional_pred.run boolean Enable run functional prediction true if samplesheet has groups
exclude list samples that are not included in the analyses, but their quality and quantity of readings information will be reported []

Each module has specific parameters that can be configured when enabled.

Basecalling module

Parameter Type Description Default
basecalling.pod5_dir string directory containing POD5 files input/pod5
basecalling.gpus integer Number of GPUs to use 1
basecalling.dorado_basecalling_model string Basecalling model to use (fast, hac, sup) sup
basecalling.qscore_filter integer Q-score threshold for passing and failing reads 10
basecalling.barcoding_kit string Barcoding kit used for multiplexing SQK-16S114-24
basecalling.save_reads boolean Save reads after basecalling and demultiplexing in the results directory false

QC module

Parameter Type Description Default
qc.subsampling integer Number of reads to sampling 100000
qc.min_length integer Minimum required length for a read 1000
qc.max_length integer Maximum allowed length for a read 2000
qc.min_qscore integer Minimum q-score 15
qc.save_reads boolean Save reads after quality control in the results directory false

Taxonomic assignment module

Parameter Type Description Default
taxonomic_assignment.min_aln integer Minimum alignment length to retain an alignment 1000
taxonomic_assignment.min_identity integer Minimum sequence identity between the read and database hit (range: 0–1) 0.95
taxonomic_assignment.download_db boolean Download the database from the internet true
taxonomic_assignment.db_name string Database name to use (genbank or silva) genbank
taxonomic_assignment.db_dir string Directory containing the database (required if download_db is false) `` (empty)

The diversity and functional prediction modules do not have specific parameters associated with them.

Credits

catg-umag/nanotax was originally written by JacquelineAldridge.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

About

a pipeline for the characterization of microbial communities with Nanopore

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  
0