Pan-genome pipeline

Installation

Depencencies:

bedtools
CD-HIT
BLAST
DIAMOND
MCL
pandas
SeqIO

The simplest method is installed via conda:

Download and install the appropriate conda, such as miniconda from here
Create a conda environment with all the necessary dependencies: From the repository directory run

conda create -y -c conda-forge -c defaults --name panta python=3.10 mamba

conda activate panta

mamba install -y -c conda-forge -c bioconda -c anaconda -c defaults  --file requirements.txt

pip install .

Or via docker:

docker pull amromics/panta:latest

if you want to build the image yourself:

docker build -t amromics/panta:latest .

Usage

Activate conda enviroment:

source activate panta

Main pipeline: run pan-genome analysis for the first time

usage: panta main [-h] [-g [GFF ...]] [-f TSV] -o OUTDIR [-s] [-b {diamond,blast}] [-i IDENTITY] [--LD LD] [--AL AL] [--AS AS] [-e EVALUE]
                          [-t THREADS] [--table TABLE] [-a [{nucleotide,protein} ...]]

Main pipeline: run pan-genome analysis for the first time

options:
  -h, --help            show this help message and exit
  -g [GFF ...], --gff [GFF ...]
                        gff input files (default: None)
  -f TSV, --tsv TSV     tsv input file (default: None)
  -o OUTDIR, --outdir OUTDIR
                        output directory (default: None)
  -s, --dont-split      dont split paralog clusters (default: False)
  -b {diamond,blast}, --blast {diamond,blast}
                        method for all-against-all alignment (default: diamond)
  -i IDENTITY, --identity IDENTITY
                        minimum percentage identity (default: 0.7)
  --LD LD               length difference cutoff between two sequences (default: 0.7)
  --AL AL               alignment coverage for the longer sequence (default: 0)
  --AS AS               alignment coverage for the shorter sequence (default: 0)
  -e EVALUE, --evalue EVALUE
                        Blast evalue (default: 1e-06)
  -t THREADS, --threads THREADS
                        number of threads to use, 0 for all (default: 0)
  --table TABLE         codon table (default: 11)
  -a [{nucleotide,protein} ...], --alignment [{nucleotide,protein} ...]
                        run alignment for each gene cluster (default: None)

Add pipeline: add sample into previous collection

usage: panta add [-h] [-g [GFF ...]] [-f TSV] -c COLLECTION_DIR [-s] [-b {diamond,blast}] [-i IDENTITY] [--LD LD] [--AL AL] [--AS AS]
                         [-e EVALUE] [-t THREADS] [--table TABLE] [-a [{nucleotide,protein} ...]]

Add pipeline: add sample into previous collection

options:
  -h, --help            show this help message and exit
  -g [GFF ...], --gff [GFF ...]
                        gff input files (default: None)
  -f TSV, --tsv TSV     tsv input file (default: None)
  -c COLLECTION_DIR, --collection-dir COLLECTION_DIR
                        previous collection directory (default: None)
  -s, --dont-split      dont split paralog clusters (default: False)
  -b {diamond,blast}, --blast {diamond,blast}
                        method for all-against-all alignment (default: diamond)
  -i IDENTITY, --identity IDENTITY
                        minimum percentage identity (default: 0.7)
  --LD LD               length difference cutoff between two sequences (default: 0.7)
  --AL AL               alignment coverage for the longer sequence (default: 0)
  --AS AS               alignment coverage for the shorter sequence (default: 0)
  -e EVALUE, --evalue EVALUE
                        Blast evalue (default: 1e-06)
  -t THREADS, --threads THREADS
                        number of threads to use, 0 for all (default: 0)
  --table TABLE         codon table (default: 11)
  -a [{nucleotide,protein} ...], --alignment [{nucleotide,protein} ...]
                        run alignment for each gene cluster (default: None)

Example

Basic:

panta main -o examples/test/output -g examples/test/main/*.gff
panta add -c examples/test/output -g examples/test/add/*.gff

Via docker from the repository directory run:

docker run -it --rm -v $PWD:/tmp amromics/panta:latest panta main -o examples/test/output -g examples/test/main/*.gff

Name		Name	Last commit message	Last commit date
Latest commit History 190 Commits
examples		examples
panta		panta
scripts		scripts
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pan-genome pipeline

Installation

Usage

Main pipeline: run pan-genome analysis for the first time

Add pipeline: add sample into previous collection

Example

About

Releases

Packages

Contributors 5

Languages

License

amromics/panta

Folders and files

Latest commit

History

Repository files navigation

Pan-genome pipeline

Installation

Usage

Main pipeline: run pan-genome analysis for the first time

Add pipeline: add sample into previous collection

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages