FindFungi-v0.23.3

A pipeline for the identification of fungi in public metagenomics datasets. The FindFungi pipeline uses the metagenomics read-classifier Kraken with 32 custom fungal databases to generate 32 taxon predictions for a single read. These 32 predictions are combined to generate a consensus prediction. All reads are then BLASTed against their predicted genomes to generate read distribution skewness scores to select for the most likely true positives.

FindFungi-0.23 was built on an IBM platform load-sharing facility with 32 worker nodes. If you would like a SLURM version of this pipeline, or a version that can run on a standard Unix/Ubuntu machine, please navigate to the bottom of this README.

Quickstart

Download the pipeline, databases, associated scripts, prerequisites and other tools. Run the pipeline:

./FindFungi-0.23.3.sh /path/to/FASTQ-file.fastq Dataset-name

Getting Started

These instructions will hopefully allow you to get a copy of FindFungi up and running on your own compute-cluster/server for development or your own analyses. If using a non-IBM LSF compute cluster, change the 'bsub' commands to reflect your architecture. If using a single server, remove the 'bsub' commands.

Prerequisites

FindFungi v0.23 was built using the following:

gcc version 4.4.4 20100726 (Red Hat 4.4.4-13)
coreutils 8.27
python 2.7.13 (modules: sys, os, ete3, biopython (Bio), math, argparse, itertools, collections, re)
- Don't use conda to install these modules, please use pip
skewer 0.2.2
kraken 0.10.5-beta
ncbi blast 2.2.30
Rscript 3.3.3 (packages: wordcloud)
graphviz 2.40.1

Installing

Download all of the scripts from GitHub/GiantSpaceRobot and move to a directory (/your/directory/scripts). You may need to give these scripts more permissions (e.g. chmod 755 *).
In the FindFungi-v0.23.3 script, change the absolute paths of skewer, kraken, blast, the shell and python scripts to reflect your environment, or add these tools and scripts to you $PATH. You will also need to edit the LowestCommonAncestor.sh script to include the path to the downloaded scripts.
NOTE: It may be necessary for you to include the absolute paths for all of the scripts and tools within the FindFungi-0.23.sh master script, depending on the cluster node preferences (e.g. executing 'python' actually calls the node's version of python, not yours).
Download the Kraken and BLAST databases from this website (http://bioinformatics.czc.hokudai.ac.jp/findfungi/).
Uncompress these files and put them somewhere sensible:

tar -xvfz Kraken_*.tar.gz
mv Kraken_* Kraken_DB_Directory/

Testing the pipeline

Download the dataset ERR675624 from the European Nucleotide Archive database. This dataset contains fungal reads.

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR675/ERR675624/ERR675624_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR675/ERR675624/ERR675624_2.fastq.gz

Gunzip these files and concatenate them, as we have no need to read pair information.

gunzip ERR675624_*.fastq.gz
cat ERR675624_*.fastq > ERR675624_both-pairs.fastq

Execute the FindFungi pipeline on this FASTQ file. We use nohup here to allow the pipeline to run in the background.

nohup ./FindFungi-0.23.sh /path/to/ERR675624_both-pairs.fastq ERR675624

The first command line argument (/path/to/ERR675624_both-pairs.fastq) points to your FASTQ file. The second (ERR675624) is the name FindFungi will use for this dataset. This name should be informative.

The .csv results should show the following:

#Taxon name,Taxid,Reads mapping to taxid,Reads mapping to children taxids,Pearson skewness score,Percent of pseudo-chromosomes with read hits
Candida sp. LDI48194,1759314,671,0,0.524623062587,100.0
Malassezia restricta,76775,378,0,0.496034792692,100.0
Candida tropicalis MYA-3404,294747,265,0,-0.265788716977,100.0

SLURM Implementation

Dr Ali Snedden has kindly created a SLURM implementation of the pipeline: https://github.com/astrophys/FindFungi_adapted_for_slurm.

Standard Unix Implementation (not a compute cluster)

Please use this version if you intend to use FindFungi on a single Unix machine: https://github.com/GiantSpaceRobot/FindFungi_SingleServerVersion

Contributors

Paul Donovan, PhD (email: pauldonovandonegal@gmail.com)
Gabriel Gonzalez, PhD (e-mail: gagonzalez@czc.hokudai.ac.jp)

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
Database-setup		Database-setup
FindFungi-v0.23.1		FindFungi-v0.23.1
FindFungi-v0.23.2		FindFungi-v0.23.2
FindFungi-v0.23.3		FindFungi-v0.23.3
FindFungi-v0.23		FindFungi-v0.23
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FindFungi-v0.23.3

Quickstart

Getting Started

Prerequisites

Installing

Testing the pipeline

SLURM Implementation

Standard Unix Implementation (not a compute cluster)

Contributors

License

About

Uh oh!

Releases

Packages

Languages

GiantSpaceRobot/FindFungi

Folders and files

Latest commit

History

Repository files navigation

FindFungi-v0.23.3

Quickstart

Getting Started

Prerequisites

Installing

Testing the pipeline

SLURM Implementation

Standard Unix Implementation (not a compute cluster)

Contributors

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages