-
Notifications
You must be signed in to change notification settings - Fork 6
USAGE
Nextflow workflows are fairly portable in general due to using containers for each process and requires either Docker or Singularity.
Installation is covered on a different wiki page.
Grandeur can run as a standalone workflow and start from paired-end Illumina fastq files or on the contig/fasta files from other sources including
- fasta/contig files from PHOENIX
- fasta files from Donut Falls
- fasta files downloaded from NCBI
We have created some examples for typical use-cases on a different wiki page. We welcome additional suggestions to this page, and communications with us may be anonymized and used to augment this wiki.
For simplificity, Grandeur has some profiles which should fit the majority of uses. The workflow is meant to work with containers and has basic profiles for both docker and singularity.
- singularity : use singularity to manage containers
- docker : use docker to manage containers
All test profiles download reads from SRA accessions with fasterq-dump. As such, if fasterq-dump fails due to authentication issues, the entire workflow will fail. Most information about the test subworkflow can be found on a different wiki page.
Selected test profiles:
- test0 : default values for fastq files
- test2 : default phylogenetic analysis
- msa : for multiple sequence alignment with roary (all inputs should be related) of input files
- just_msa : for multiple sequence alignment with roary (all inputs should be related) of input files, and turns off processes not directly used
- uphl : the profile used at UPHL (is not intended to work on other systems)
WARNING: All input files for *msa*
profiles must all be somewhat related (i.e. same species) because they need to share enough genes in their core genome.
For all settings and inputs, the results are copied to the directory specified with the 'outdir' param. The default is 'grandeur'.
Paired-end fastq.gz (ending with 'fastq', 'fastq.gz', 'fq', or 'fq.gz') reads in a directory named 'directory/reads'
(can be set in a config file with 'params.reads')
directory
└── reads
└── *fastq.gz
Usage:
nextflow run UPHL-BioNGS/Grandeur -profile docker --reads directory/reads
When using a sample sheet, Grandeur is expecting a csv file with columns 'sample', 'fastq_1', and 'fastq_2'.
- sample : value used in Grandeur for filenames
- fastq_1 : forward read or read 1 of a paired-end fastq file
- fastq_2 : reverse read or read 2 of a paired-end fastq file
Example sample sheet:
sample,fastq_1,fastq_2
SRR11725329,/home/eriny/sandbox/test_files/grandeur/reads/SRR11725329_1.fastq.gz,/home/eriny/sandbox/test_files/grandeur/reads/SRR11725329_2.fastq.gz
SRR13643280,/home/eriny/sandbox/test_files/grandeur/reads/SRR13643280_1.fastq.gz,/home/eriny/sandbox/test_files/grandeur/reads/SRR13643280_2.fastq.gz
SRR14436834,/home/eriny/sandbox/test_files/grandeur/reads/SRR14436834_1.fastq.gz,/home/eriny/sandbox/test_files/grandeur/reads/SRR14436834_2.fastq.gz
SRR14634837,/home/eriny/sandbox/test_files/grandeur/reads/SRR14634837_1.fastq.gz,/home/eriny/sandbox/test_files/grandeur/reads/SRR14634837_2.fastq.gz
SRR7738178,/home/eriny/sandbox/test_files/grandeur/reads/SRR7738178_1.fastq.gz,/home/eriny/sandbox/test_files/grandeur/reads/SRR7738178_2.fastq.gz
SRR7889058,/home/eriny/sandbox/test_files/grandeur/reads/SRR7889058_1.fastq.gz,/home/eriny/sandbox/test_files/grandeur/reads/SRR7889058_2.fastq.gz
Usage:
nextflow run UPHL-BioNGS/Grandeur -profile docker --sample_sheet sample_sheet.csv
Fasta files could be from prior versions of Grandeur, created by another workflow, or downloaded from NCBI. There are two options of reading fasta files into Grandeur.
Option 1 : Putting all the fasta files into a single directory (must end in '.fasta', '.fa', or '.fna')
directory
└── fastas
└── *fasta
Then following with the nextflow command
nextflow run UPHL-BioNGS/Grandeur -profile docker --fastas directory/fastas
Option 2 : Listing the fasta files in a file and specifying via --fasta_list (which is similar to a sample sheet)
Example fasta list:
sample1.fasta
sample2.fasta
sample3.fasta
Then following with the nextflow command
nextflow run UPHL-BioNGS/Grandeur -profile docker --fasta_list fastas.txt
PHOENIX is a nextflow workflow developed for the identification of known antimicrobial resistance (AMR) genes, and has the core features of de novo alignment for contig file generation. The authors of Grandeur do not see any real benefit of running a new de novo alignment on reads again, so the resultant contig/fasta files from PHOENIX can be used as input instead of fastq files.
Copy the PHOENIX-generated contig files to a directory, and then specify that directory with '--fastas ' or set 'params.fastas = ' in a config file.
(can be set in a config file with 'params.fastas')
directory
└── fastas
└── *fasta
Usage:
nextflow run UPHL-BioNGS/Grandeur -profile docker --fastas directory/fastas
This does essentially mean that any fasta file fed into Grandeur will attempt to go through the subworkflows and processes, so we request that users only post issues about using microbial sequence files. (Do not give Grandeur [Candida] auris files!)
Grandeur is a nextflow workflow should work on
- local linux instances (as long as the cpu and memory is sufficient for the tools used)
- HPC environments
- cloud-based systems that support nextflow workflows (such as AWS)
Each of these environments may need inputs from the user.
More information can be found in Nextflow's documentation. A highlighted list of pages that may be useful includes:
- information about setting an executor : https://www.nextflow.io/docs/latest/executor.html
- tips for hpc users : https://www.nextflow.io/blog/2021/5_tips_for_hpc_users.html
- using nextflow in AWS cloud : https://www.nextflow.io/docs/latest/awscloud.html
To get a copy of an editable config file with many of the params needed for some of these options can be obtained with the following command:
nextflow run UPHL-BioNGS/Grandeur --config_file true
More information about config files can be found on a different page of this wiki.
-
- amrfinderplus
- bbduk
- blastn
- blobtools_*
- core_genome_evaluation
- circulocov
- datasets_*
- drprg
- elgato
- emmtyper
- fastani
- fastp
- fastqc
- heatcluster
- iqtree2
- kaptive
- kleborate
- kraken2
- mash_*
- mashtree
- mlst
- multiqc
- mykrobe
- panaroo
- pbptyper
- phytreeviz
- plasmidfinder
- prokka
- quast
- seqsero2
- serotypefinder
- shigatyper
- snp_dists
- spades